博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Python机器学习(Sebastian著 ) 学习笔记——第六章模型评估与参数调优实战(Windows Spyder Python 3.6)...
阅读量:4987 次
发布时间:2019-06-12

本文共 5601 字,大约阅读时间需要 18 分钟。

import pandas as pddf = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data', header=None)from sklearn.preprocessing import LabelEncoderX = df.loc[:, 2:].valuesy = df.loc[:, 1].valuesle = LabelEncoder()y = le.fit_transform(y)print (le.transform(['M', 'B']))#输出[1 0]
from sklearn.cross_validation import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=1) #数据集分为训练集和测试集 #流水线中集成数据转换及评估操作                                                   from sklearn.preprocessing import StandardScalerfrom sklearn.decomposition import PCAfrom sklearn.linear_model import LogisticRegressionfrom sklearn.pipeline import Pipelinepipe_lr = Pipeline([('scl', StandardScaler()),                    ('pca', PCA(n_components=2)),                    ('clf', LogisticRegression(random_state=1))])pipe_lr.fit(X_train, y_train)print('Test Accuracy: %.3f' % pipe_lr.score(X_test, y_test))#输出Test Accuracy: 0.947

scikit-learn 分层K折交叉验 StratifiedKFold迭代器

import numpy as np from sklearn.cross_validation import StratifiedKFoldkfold = StratifiedKFold(y=y_train,                        n_folds=10,                        random_state=1)scores = []for k, (train, test) in enumerate(kfold):    pipe_lr.fit(X_train[train], y_train[train])    score = pipe_lr.score(X_train[test], y_train[test])    scores.append(score)    print ('Fold: %s, Class dist.: %s, Acc: %.3f' % (k+1,                                                     np.bincount(y_train[train]), score))print ('CV accuracy: %.3f +/- %.3f' % (np.mean(scores), np.std(scores)))#输出Fold: 1, Class dist.: [256 153], Acc: 0.891Fold: 2, Class dist.: [256 153], Acc: 0.978Fold: 3, Class dist.: [256 153], Acc: 0.978Fold: 4, Class dist.: [256 153], Acc: 0.913Fold: 5, Class dist.: [256 153], Acc: 0.935Fold: 6, Class dist.: [257 153], Acc: 0.978Fold: 7, Class dist.: [257 153], Acc: 0.933Fold: 8, Class dist.: [257 153], Acc: 0.956Fold: 9, Class dist.: [257 153], Acc: 0.978Fold: 10, Class dist.: [257 153], Acc: 0.956CV accuracy: 0.950 +/- 0.029

scikit-learn k折交叉验证

from sklearn.cross_validation import cross_val_scorescores = cross_val_score(estimator=pipe_lr,                          X=X_train,                          y=y_train,                                                    cv=10,                          n_jobs=1)print ('CV accuracy scores: %s' % scores)print ('CV accuracy: %.3f +/- %.3f' % (np.mean(scores), np.std(scores)))#输出CV accuracy scores: [0.89130435 0.97826087 0.97826087 0.91304348 0.93478261 0.97777778 0.93333333 0.95555556 0.97777778 0.95555556]CV accuracy: 0.950 +/- 0.029

使用scikit-learn中的学习曲线函数评估模型 样本大小与训练准确率、测试准确率之间的关系

import matplotlib.pyplot as pltfrom sklearn.learning_curve import learning_curvepipe_lr = Pipeline([        ('scl', StandardScaler()),        ('clf', LogisticRegression(                penalty='l2', random_state=0))])    train_sizes, train_scores, test_scores = learning_curve(estimator=pipe_lr,                                                        X=X_train,                                                        y=y_train,                                                        train_sizes=np.linspace(0.1, 1.0, 10),                                                        cv=10,                                                        n_jobs=1)train_mean = np.mean(train_scores, axis=1)train_std = np.std(train_scores, axis=1)test_mean = np.mean(test_scores, axis=1)test_std = np.std(test_scores, axis=1)plt.plot(train_sizes, train_mean,         color='blue', marker='o',         markersize=5,         label='training accuracy')plt.fill_between(train_sizes,                 train_mean + train_std,                 train_mean - train_std,                 alpha=0.15, color='blue')plt.plot(train_sizes, test_mean,         color='green', linestyle='--',         marker='s', markersize=5,         label='validation accuracy')plt.fill_between(train_sizes,                 test_mean + test_std,                 test_mean - test_std,                 alpha=0.15, color='green')plt.grid()plt.xlabel('Number of training samples')plt.ylabel('Accuracy')plt.legend(loc='lower right')plt.ylim([0.8, 1.0])plt.show()

1161599-20190616164000664-1085350096.png

通过验证曲线判定过拟合与欠拟合

#使用scikit-learn 绘制验证曲线 表示准确率与模型参数之间的关系import numpy as npfrom sklearn.learning_curve import validation_curveparam_range = [0.001, 0.01, 0.1, 1.0, 10.0, 100.0]train_scores, test_scores = validation_curve(estimator=pipe_lr,                                             X=X_train,                                             y=y_train,                                             param_name='clf_C',                                             param_range=param_range,                                             cv=10) train_mean = np.mean(train_scores, axis=1)train_std = np.std(train_scores, axis=1)test_mean = np.mean(test_scores, axis=1)test_std = np.std(test_scores, axis=1)plt.plot(param_range, train_mean,         color='blue', marker='o',         markersize=5,         label='training accuracy')plt.fill_between(param_range, train_mean + train_std,                 train_mean - train_std, alpha=0.15,                 color='blue')plt.plot(param_range, test_mean,         color='green', linestyle='--',         marker='s', markersize=5,         lable='validation accuracy')plt.fill_between(param_range,                 test_mean + test_std,                 test_mean - test_std,                 alpha=0.15, color='green')plt.grid()plt.xscale('log')plt.legend(loc='lower right')plt.xlabel('Parameter C')plt.ylabel('Accuracy')plt.ylim([0.8, 1.0])plt.show()

1161599-20190616163808113-929233295.png

转载于:https://www.cnblogs.com/eugene0/p/11029174.html

你可能感兴趣的文章
推荐!国外程序员整理的机器学习资源大全
查看>>
Eclipse中代码整体左移,右移快捷键
查看>>
C# 局部类型 Partial
查看>>
本地vagrant配置虚拟域名的坑
查看>>
20170512-20170516学习计划--自动化背景学习
查看>>
ASP.NET 获取来源网站的网址,获取上一网页的网址,获取来源网页的URL,获取上一网页的URL...
查看>>
HDU 4386 Quadrilateral 几何
查看>>
(10)zabbix item key详解
查看>>
Cocoa touch(一):总览
查看>>
在微信浏览器中如何让他自动关闭当前页面回到会话框js
查看>>
Bootstrap 导航栏
查看>>
WPF 树型控件(TreeView)
查看>>
软件架构、IP、端口号、域名、网站分类
查看>>
思甜雅--关于qq的NABCD模型分析
查看>>
洛谷 4172 [WC2006]水管局长
查看>>
洛谷 1316 丢瓶盖
查看>>
python-1
查看>>
ELK-elkstack-使用消息队列
查看>>
sql按照汉字首字母顺序排序(桃)
查看>>
Python-迭代器-生成器-1
查看>>