99999久久久久久亚洲,欧美人与禽猛交狂配,高清日韩av在线影院,一个人在线高清免费观看,啦啦啦在线视频免费观看www

熱線電話：13121318867

登錄

首頁精彩閱讀 scikit-learn的主要模塊和基本使用

scikit-learn的主要模塊和基本使用

2016-03-22

scikit-learn的主要模塊和基本使用

對于一些開始搞機(jī)器學(xué)習(xí)算法有害怕下手的小朋友，該如何快速入門，這讓人挺掙扎的。
在從事數(shù)據(jù)科學(xué)的人中，最常用的工具就是R和Python了，每個工具都有其利弊，但是Python在各方面都相對勝出一些，這是因為scikit-learn庫實現(xiàn)了很多機(jī)器學(xué)習(xí)算法。

加載數(shù)據(jù)(Data Loading)

我們假設(shè)輸入時一個特征矩陣或者csv文件。
首先，數(shù)據(jù)應(yīng)該被載入內(nèi)存中。
scikit-learn的實現(xiàn)使用了NumPy中的arrays，所以，我們要使用NumPy來載入csv文件。
以下是從UCI機(jī)器學(xué)習(xí)數(shù)據(jù)倉庫中下載的數(shù)據(jù)。

import numpy as np import urllib # url with dataset url = "http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data" # download the file raw_data = urllib.urlopen(url) # load the CSV file as a numpy matrix dataset = np.loadtxt(raw_data, delimiter=",") # separate the data from the target attributes X = dataset[:,0:7]
y = dataset[:,8]

我們要使用該數(shù)據(jù)集作為例子，將特征矩陣作為X，目標(biāo)變量作為y。

數(shù)據(jù)歸一化(Data Normalization)

大多數(shù)機(jī)器學(xué)習(xí)算法中的梯度方法對于數(shù)據(jù)的縮放和尺度都是很敏感的，在開始跑算法之前，我們應(yīng)該進(jìn)行歸一化或者標(biāo)準(zhǔn)化的過程，這使得特征數(shù)據(jù)縮放到0-1范圍中。scikit-learn提供了歸一化的方法：

from sklearn import preprocessing # normalize the data attributes normalized_X = preprocessing.normalize(X) # standardize the data attributes standardized_X = preprocessing.scale(X)

特征選擇(Feature Selection)

在解決一個實際問題的過程中，選擇合適的特征或者構(gòu)建特征的能力特別重要。這成為特征選擇或者特征工程。
特征選擇時一個很需要創(chuàng)造力的過程，更多的依賴于直覺和專業(yè)知識，并且有很多現(xiàn)成的算法來進(jìn)行特征的選擇。
下面的樹算法(Tree algorithms)計算特征的信息量：

from sklearn import metrics from sklearn.ensemble import ExtraTreesClassifier
model = ExtraTreesClassifier()
model.fit(X, y) # display the relative importance of each attribute print(model.feature_importances_)

算法的使用

scikit-learn實現(xiàn)了機(jī)器學(xué)習(xí)的大部分基礎(chǔ)算法，讓我們快速了解一下。

邏輯回歸

大多數(shù)問題都可以歸結(jié)為二元分類問題。這個算法的優(yōu)點(diǎn)是可以給出數(shù)據(jù)所在類別的概率。

from sklearn import metrics from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X, y)
print(model) # make predictions expected = y
predicted = model.predict(X) # summarize the fit of the model print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))

結(jié)果：

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, penalty=l2, random_state=None, tol=0.0001)
precision recall f1-score support
0.0       0.79      0.89      0.84       500
   1.0       0.74      0.55      0.63       268
avg / total 0.77 0.77 0.77 768

[[447 53]
[120 148]]

樸素貝葉斯

這也是著名的機(jī)器學(xué)習(xí)算法，該方法的任務(wù)是還原訓(xùn)練樣本數(shù)據(jù)的分布密度，其在多類別分類中有很好的效果。

from sklearn import metrics from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X, y)
print(model) # make predictions expected = y
predicted = model.predict(X) # summarize the fit of the model print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))

結(jié)果：

GaussianNB()
precision recall f1-score support
0.0       0.80      0.86      0.83       500
    1.0       0.69      0.60      0.64       268
avg / total 0.76 0.77 0.76 768

[[429 71]
[108 160]]

k近鄰

k近鄰算法常常被用作是分類算法一部分，比如可以用它來評估特征，在特征選擇上我們可以用到它。

from sklearn import metrics from sklearn.neighbors import KNeighborsClassifier # fit a k-nearest neighbor model to the data model = KNeighborsClassifier()
model.fit(X, y)
print(model) # make predictions expected = y
predicted = model.predict(X) # summarize the fit of the model print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))

結(jié)果：

KNeighborsClassifier(algorithm=auto, leaf_size=30, metric=minkowski,
n_neighbors=5, p=2, weights=uniform)
precision recall f1-score support
0.0       0.82      0.90      0.86       500
    1.0       0.77      0.63      0.69       268
avg / total 0.80 0.80 0.80 768

[[448 52]
[ 98 170]]

決策樹

分類與回歸樹(Classification and Regression Trees ,CART)算法常用于特征含有類別信息的分類或者回歸問題，這種方法非常適用于多分類情況。

from sklearn import metrics from sklearn.tree import DecisionTreeClassifier # fit a CART model to the data model = DecisionTreeClassifier()
model.fit(X, y)
print(model) # make predictions expected = y
predicted = model.predict(X) # summarize the fit of the model print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))

結(jié)果：

DecisionTreeClassifier(compute_importances=None, criterion=gini,
max_depth=None, max_features=None, min_density=None,
min_samples_leaf=1, min_samples_split=2, random_state=None,
splitter=best)
precision recall f1-score support
0.0       1.00      1.00      1.00       500
    1.0       1.00      1.00      1.00       268
avg / total 1.00 1.00 1.00 768

[[500 0]
[ 0 268]]

支持向量機(jī)

SVM是非常流行的機(jī)器學(xué)習(xí)算法，主要用于分類問題，如同邏輯回歸問題，它可以使用一對多的方法進(jìn)行多類別的分類。

from sklearn import metrics from sklearn.svm import SVC # fit a SVM model to the data model = SVC()
model.fit(X, y)
print(model) # make predictions expected = y
predicted = model.predict(X) # summarize the fit of the model print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))

結(jié)果：

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel=rbf, max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
precision recall f1-score support
0.0       1.00      1.00      1.00       500
    1.0       1.00      1.00      1.00       268
avg / total 1.00 1.00 1.00 768

[[500 0]
[ 0 268]]

除了分類和回歸算法外，scikit-learn提供了更加復(fù)雜的算法，比如聚類算法，還實現(xiàn)了算法組合的技術(shù)，如Bagging和Boosting算法。

如何優(yōu)化算法參數(shù)

一項更加困難的任務(wù)是構(gòu)建一個有效的方法用于選擇正確的參數(shù)，我們需要用搜索的方法來確定參數(shù)。scikit-learn提供了實現(xiàn)這一目標(biāo)的函數(shù)。
下面的例子是一個進(jìn)行正則參數(shù)選擇的程序：

import numpy as np from sklearn.linear_model import Ridge from sklearn.grid_search import GridSearchCV # prepare a range of alpha values to test alphas = np.array([1,0.1,0.01,0.001,0.0001,0]) # create and fit a ridge regression model, testing each alpha model = Ridge()
grid = GridSearchCV(estimator=model, param_grid=dict(alpha=alphas))
grid.fit(X, y)
print(grid) # summarize the results of the grid search print(grid.best_score_)
print(grid.best_estimator_.alpha)

結(jié)果：

GridSearchCV(cv=None,
estimator=Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
normalize=False, solver=auto, tol=0.001),
estimator__alpha=1.0, estimator__copy_X=True,
estimator__fit_intercept=True, estimator__max_iter=None,
estimator__normalize=False, estimator__solver=auto,
estimator__tol=0.001, fit_params={}, iid=True, loss_func=None,
n_jobs=1,
param_grid={‘a(chǎn)lpha’: array([ 1.00000e+00, 1.00000e-01, 1.00000e-02, 1.00000e-03,
1.00000e-04, 0.00000e+00])},
pre_dispatch=2*n_jobs, refit=True, score_func=None, scoring=None,
verbose=0)
0.282118955686
1.0

有時隨機(jī)從給定區(qū)間中選擇參數(shù)是很有效的方法，然后根據(jù)這些參數(shù)來評估算法的效果進(jìn)而選擇最佳的那個。

import numpy as np from scipy.stats import uniform as sp_rand from sklearn.linear_model import Ridge from sklearn.grid_search import RandomizedSearchCV # prepare a uniform distribution to sample for the alpha parameter param_grid = {'alpha': sp_rand()} # create and fit a ridge regression model, testing random alpha values model = Ridge()
rsearch = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=100)
rsearch.fit(X, y)
print(rsearch) # summarize the results of the random parameter search print(rsearch.best_score_)
print(rsearch.best_estimator_.alpha)

結(jié)果：

RandomizedSearchCV(cv=None,
estimator=Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
normalize=False, solver=auto, tol=0.001),
estimator__alpha=1.0, estimator__copy_X=True,
estimator__fit_intercept=True, estimator__max_iter=None,
estimator__normalize=False, estimator__solver=auto,
estimator__tol=0.001, fit_params={}, iid=True, n_iter=100,
n_jobs=1,
param_distributions={‘a(chǎn)lpha’:

小結(jié)

我們總體了解了使用scikit-learn庫的大致流程，希望這些總結(jié)能讓初學(xué)者沉下心來，一步一步盡快的學(xué)習(xí)如何去解決具體的機(jī)器學(xué)習(xí)問題。

CDA數(shù)據(jù)分析師考試相關(guān)入口一覽（建議收藏）：

? 想報名CDA認(rèn)證考試，點(diǎn)擊>>> “CDA報名” 了解CDA考試詳情；

? 想學(xué)習(xí)CDA考試教材，點(diǎn)擊>>> “CDA教材” 了解CDA考試詳情；

? 想加入CDA考試題庫，點(diǎn)擊>>> “CDA題庫” 了解CDA考試詳情；

? 想了解CDA考試含金量，點(diǎn)擊>>> “CDA含金量” 了解CDA考試詳情；

特征機(jī)器學(xué)習(xí) numpy SVC 邏輯回歸支持向量機(jī) SVM 決策樹

數(shù)據(jù)分析咨詢請掃描二維碼

若不方便掃碼，搜微信號：CDAshujufenxi

上一篇回歸系列（一）| 怎樣正確地理解線性回歸

下一篇2020美國總統(tǒng)競選大戲開鑼，川普當(dāng)選的奇跡會再發(fā)生嗎？

CDA報考指南

報考流程
考試時間
報名費(fèi)用
聯(lián)系我們

數(shù)據(jù)分析學(xué)習(xí)

數(shù)據(jù)分析師資訊

京公網(wǎng)安備 11010802034615號經(jīng)營許可證編號：京B2-20210330

聯(lián)系電話：13321103290 (微信同號)

CDA教材
CDA題庫
CDA大綱

客服在線

立即咨詢

客服在線

立即咨詢

免密碼登錄

提交首次登錄驗證后自動注冊

') } function initGt() { var handler = function (captchaObj) { captchaObj.appendTo('#captcha'); captchaObj.onReady(function () { $("#wait").hide(); }).onSuccess(function(){ $('.getcheckcode').removeClass('dis'); $('.getcheckcode').trigger('click'); }); window.captchaObj = captchaObj; }; $('#captcha').show(); $.ajax({ url: "/login/gtstart?t=" + (new Date()).getTime(), // 加隨機(jī)數(shù)防止緩存 type: "get", dataType: "json", success: function (data) { $('#text').hide(); $('#wait').show(); // 調(diào)用 initGeetest 進(jìn)行初始化 // 參數(shù)1：配置參數(shù) // 參數(shù)2：回調(diào)，回調(diào)的第一個參數(shù)驗證碼對象，之后可以使用它調(diào)用相應(yīng)的接口 initGeetest({ // 以下 4 個配置參數(shù)為必須，不能缺少 gt: data.gt, challenge: data.challenge, offline: !data.success, // 表示用戶后臺檢測極驗服務(wù)器是否宕機(jī) new_captcha: data.new_captcha, // 用于宕機(jī)時表示是新驗證碼的宕機(jī) product: "float", // 產(chǎn)品形式，包括：float，popup width: "280px", https: true // 更多配置參數(shù)說明請參見：http://docs.geetest.com/install/client/web-front/ }, handler); } }); } function codeCutdown() { if(_wait == 0){ //倒計時完成 $(".getcheckcode").removeClass('dis').html("重新獲取"); }else{ $(".getcheckcode").addClass('dis').html("重新獲取("+_wait+"s)"); _wait--; setTimeout(function () { codeCutdown(); },1000); } } function inputValidate(ele,telInput) { var oInput = ele; var inputVal = oInput.val(); var oType = ele.attr('data-type'); var oEtag = $('#etag').val(); var oErr = oInput.closest('.form_box').next('.err_txt'); var empTxt = '請輸入'+oInput.attr('placeholder')+'！'; var errTxt = '請輸入正確的'+oInput.attr('placeholder')+'！'; var pattern; if(inputVal==""){ if(!telInput){ errFun(oErr,empTxt); } return false; }else { switch (oType){ case 'login_mobile': pattern = /^1[3456789]\d{9}$/; if(inputVal.length==11) { $.ajax({ url: '/login/checkmobile', type: "post", dataType: "json", data: { mobile: inputVal, etag: oEtag, page_ur: window.location.href, page_referer: document.referrer }, success: function (data) { } }); } break; case 'login_yzm': pattern = /^\d{6}$/; break; } if(oType=='login_mobile'){ } if(!!validateFun(pattern,inputVal)){ errFun(oErr,'') if(telInput){ $('.getcheckcode').removeClass('dis'); } }else { if(!telInput) { errFun(oErr, errTxt); }else { $('.getcheckcode').addClass('dis'); } return false; } } return true; } function errFun(obj,msg) { obj.html(msg); if(msg==''){ $('.login_submit').removeClass('dis'); }else { $('.login_submit').addClass('dis'); } } function validateFun(pat,val) { return pat.test(val); }

99999久久久久久亚洲,欧美人与禽猛交狂配,高清日韩av在线影院,一个人在线高清免费观看,啦啦啦在线视频免费观看www

scikit-learn的主要模塊和基本使用

加載數(shù)據(jù)(Data Loading)

數(shù)據(jù)歸一化(Data Normalization)

特征選擇(Feature Selection)

算法的使用

邏輯回歸

樸素貝葉斯

k近鄰

決策樹

支持向量機(jī)

如何優(yōu)化算法參數(shù)

小結(jié)

數(shù)據(jù)分析師考試動態(tài)

CDA報考指南

數(shù)據(jù)分析學(xué)習(xí)

數(shù)據(jù)分析師資訊

【CDA干貨】LSTM 模型輸入長度選擇技巧：提升序列建 ...

CDA 數(shù)據(jù)分析師報考條件詳解與準(zhǔn)備指南 ...

【CDA干貨】數(shù)據(jù)透視表中兩列相乘合計的實用指南 ...

CDA 認(rèn)證考試大綱 2025 重磅更新：一二級考綱變化匯 ...

BI 大數(shù)據(jù)分析師：連接數(shù)據(jù)與業(yè)務(wù)的價值轉(zhuǎn)化者 ...

SQL 在預(yù)測分析中的應(yīng)用：從數(shù)據(jù)查詢到趨勢預(yù)判 ...

數(shù)據(jù)查詢結(jié)束后：分析師的收尾工作與價值深化 ...

CDA 數(shù)據(jù)分析師考試：從報考到取證的全攻略 ...

【CDA干貨】單樣本趨勢性檢驗：捕捉數(shù)據(jù)背后的時間 ...

year_month數(shù)據(jù)類型：時間維度的精準(zhǔn)切片 ...

CDA 備考干貨：Python 在數(shù)據(jù)分析中的核心應(yīng)用與實 ...

【CDA干貨】SPSS 中的 Mann-Kendall 檢驗：數(shù)據(jù)趨勢 ...

備戰(zhàn) CDA 數(shù)據(jù)分析師考試：需要多久？如何規(guī)劃？ ...

【CDA干貨】LSTM 輸出不確定的成因、影響與應(yīng)對策略 ...

統(tǒng)計學(xué)方法在市場調(diào)研數(shù)據(jù)中的深度應(yīng)用 ...

CDA數(shù)據(jù)分析師證書考試全攻略

剖析 CDA 數(shù)據(jù)分析師考試題型：解鎖高效備考與答題 ...

【CDA干貨】SQL Server 字符串截取轉(zhuǎn)日期：解鎖數(shù)據(jù) ...

CDA 數(shù)據(jù)分析師視角：從數(shù)據(jù)迷霧中探尋商業(yè)真相 ...

CDA 數(shù)據(jù)分析師：開啟數(shù)據(jù)職業(yè)發(fā)展新征程 ...

CDA教育閉環(huán)

常見問題

關(guān)于我們

CDA數(shù)據(jù)分析師公眾號

CDA考試中心小程序

CDA數(shù)據(jù)分析師App下載

備戰(zhàn) CDA 數(shù)據(jù)分析師考試：需要多久？如何規(guī)劃？ ...

【CDA干貨】LSTM 輸出不確定的成因、影響與應(yīng)對策略 ...