99999久久久久久亚洲,欧美人与禽猛交狂配,高清日韩av在线影院,一个人在线高清免费观看,啦啦啦在线视频免费观看www

熱線電話：13121318867

登錄

首頁精彩閱讀機器學習python實戰(zhàn)之決策樹

機器學習python實戰(zhàn)之決策樹

2018-02-10

機器學習 python實戰(zhàn)之決策樹

決策樹原理：從數(shù)據(jù)集中找出決定性的特征對數(shù)據(jù)集進行迭代劃分，直到某個分支下的數(shù)據(jù)都屬于同一類型，或者已經(jīng)遍歷了所有劃分數(shù)據(jù)集的特征，停止決策樹算法。

每次劃分數(shù)據(jù)集的特征都有很多，那么我們怎么來選擇到底根據(jù)哪一個特征劃分數(shù)據(jù)集呢？這里我們需要引入信息增益和信息熵的概念。

一、信息增益

劃分數(shù)據(jù)集的原則是：將無序的數(shù)據(jù)變的有序。在劃分數(shù)據(jù)集之前之后信息發(fā)生的變化稱為信息增益。知道如何計算信息增益，我們就可以計算根據(jù)每個特征劃分數(shù)據(jù)集獲得的信息增益，選擇信息增益最高的特征就是最好的選擇。首先我們先來明確一下信息的定義：符號xi的信息定義為 l(xi)=-log2 p(xi)，p(xi)為選擇該類的概率。那么信息源的熵H=-∑p(xi)·log2 p(xi)。根據(jù)這個公式我們下面編寫代碼計算香農(nóng)熵

def calcShannonEnt(dataSet):
NumEntries = len(dataSet)
labelsCount = {}
for i in dataSet:
currentlabel = i[-1]
if currentlabel not in labelsCount.keys():
labelsCount[currentlabel]=0
labelsCount[currentlabel]+=1
ShannonEnt = 0.0
for key in labelsCount:
prob = labelsCount[key]/NumEntries
ShannonEnt -= prob*log(prob,2)
return ShannonEnt

上面的自定義函數(shù)我們需要在之前導入log方法，from math import log。我們可以先用一個簡單的例子來測試一下
def createdataSet():
#dataSet = [['1','1','yes'],['1','0','no'],['0','1','no'],['0','0','no']]
dataSet = [[1,1,'yes'],[1,0,'no'],[0,1,'no'],[0,0,'no']]
labels = ['no surfacing','flippers']
return dataSet,labels

這里的熵為0.811，當我們增加數(shù)據(jù)的類別時，熵會增加。這里更改后的數(shù)據(jù)集的類別有三種‘yes'、‘no'、‘maybe'，也就是說數(shù)據(jù)越混亂，熵就越大。

分類算法出了需要計算信息熵，還需要劃分數(shù)據(jù)集。決策樹算法中我們對根據(jù)每個特征劃分的數(shù)據(jù)集計算一次熵，然后判斷按照哪個特征劃分是最好的劃分方式。

			defsplitDataSet(dataSet,axis,value):
		
			 retDataSet=[]
		
			 forfeatVecindataSet:
		
			  iffeatVec[axis]==value:
		
			   reducedfeatVec=featVec[:axis]
		
			   reducedfeatVec.extend(featVec[axis+1:])
		
			   retDataSet.append(reducedfeatVec)
		
			 returnretDataSet

axis表示劃分數(shù)據(jù)集的特征，value表示特征的返回值。這里需要注意extend方法和append方法的區(qū)別。舉例來說明這個區(qū)別

下面我們測試一下劃分數(shù)據(jù)集函數(shù)的結(jié)果：

axis=0，value=1，按myDat數(shù)據(jù)集的第0個特征向量是否等于1進行劃分。

接下來我們將遍歷整個數(shù)據(jù)集，對每個劃分的數(shù)據(jù)集計算香農(nóng)熵，找到最好的特征劃分方式

			defchoosebestfeatureToSplit(dataSet):
		
			 Numfeatures=len(dataSet)-1
		
			 BaseShannonEnt=calcShannonEnt(dataSet)
		
			 bestInfoGain=0.0
		
			 bestfeature=-1
		
			 foriinrange(Numfeatures):
		
			  featlist=[example[i]forexampleindataSet]
		
			  featSet=set(featlist)
		
			  newEntropy=0.0
		
			  forvalueinfeatSet:
		
			   subDataSet=splitDataSet(dataSet,i,value)
		
			   prob=len(subDataSet)/len(dataSet)
		
			   newEntropy+=prob*calcShannonEnt(subDataSet)
		
			  infoGain=BaseShannonEnt-newEntropy
		
			  ifinfoGain>bestInfoGain:
		
			   bestInfoGain=infoGain
		
			   bestfeature=i
		
			 returnbestfeature

信息增益是熵的減少或數(shù)據(jù)無序度的減少。最后比較所有特征中的信息增益，返回最好特征劃分的索引。函數(shù)測試結(jié)果為

接下來開始遞歸構(gòu)建決策樹，我們需要在構(gòu)建前計算列的數(shù)目，查看算法是否使用了所有的屬性。這個函數(shù)跟跟第二章的calssify0采用同樣的方法

def majorityCnt(classlist):
ClassCount = {}
for vote in classlist:
if vote not in ClassCount.keys():
ClassCount[vote]=0
ClassCount[vote]+=1
sortedClassCount = sorted(ClassCount.items(),key = operator.itemgetter(1),reverse = True)
return sortedClassCount[0][0]

def createTrees(dataSet,labels):
classList = [example[-1] for example in dataSet]
if classList.count(classList[0]) == len(classList):
return classList[0]
if len(dataSet[0])==1:
return majorityCnt(classList)
bestfeature = choosebestfeatureToSplit(dataSet)
bestfeatureLabel = labels[bestfeature]
myTree = {bestfeatureLabel:{}}
del(labels[bestfeature])
featValue = [example[bestfeature] for example in dataSet]
uniqueValue = set(featValue)
for value in uniqueValue:
subLabels = labels[:]
myTree[bestfeatureLabel][value] = createTrees(splitDataSet(dataSet,bestfeature,value),subLabels)
return myTree

最終決策樹得到的結(jié)果如下：

有了如上的結(jié)果，我們看起來并不直觀，所以我們接下來用matplotlib注解繪制樹形圖。matplotlib提供了一個注解工具annotations，它可以在數(shù)據(jù)圖形上添加文本注釋。我們先來測試一下這個注解工具的使用。

import matplotlib.pyplot as plt
decisionNode = dict(boxstyle = 'sawtooth',fc = '0.8')
leafNode = dict(boxstyle = 'sawtooth',fc = '0.8')
arrow_args = dict(arrowstyle = '<-')

def plotNode(nodeTxt,centerPt,parentPt,nodeType):
createPlot.ax1.annotate(nodeTxt,xy = parentPt,xycoords = 'axes fraction',\
       xytext = centerPt,textcoords = 'axes fraction',\
       va = 'center',ha = 'center',bbox = nodeType,\
       arrowprops = arrow_args)

def createPlot():
fig = plt.figure(1,facecolor = 'white')
fig.clf()
createPlot.ax1 = plt.subplot(111,frameon = False)
plotNode('test1',(0.5,0.1),(0.1,0.5),decisionNode)
plotNode('test2',(0.8,0.1),(0.3,0.8),leafNode)
plt.show()

測試過這個小例子之后我們就要開始構(gòu)建注解樹了。雖然有xy坐標，但在如何放置樹節(jié)點的時候我們會遇到一些麻煩。所以我們需要知道有多少個葉節(jié)點，樹的深度有多少層。下面的兩個函數(shù)就是為了得到葉節(jié)點數(shù)目和樹的深度，兩個函數(shù)有相同的結(jié)構(gòu)，從第一個關(guān)鍵字開始遍歷所有的子節(jié)點，使用type()函數(shù)判斷子節(jié)點是否為字典類型，若為字典類型，則可以認為該子節(jié)點是一個判斷節(jié)點，然后遞歸調(diào)用函數(shù)getNumleafs()，使得函數(shù)遍歷整棵樹，并返回葉子節(jié)點數(shù)。第2個函數(shù)getTreeDepth()計算遍歷過程中遇到判斷節(jié)點的個數(shù)。該函數(shù)的終止條件是葉子節(jié)點，一旦到達葉子節(jié)點，則從遞歸調(diào)用中返回，并將計算樹深度的變量加一

def getNumleafs(myTree):
numLeafs=0
key_sorted= sorted(myTree.keys())
firstStr = key_sorted[0]
secondDict = myTree[firstStr]
for key in secondDict.keys():
if type(secondDict[key]).__name__=='dict':
   numLeafs+=getNumleafs(secondDict[key])
else:
   numLeafs+=1
return numLeafs

def getTreeDepth(myTree):
maxdepth=0
key_sorted= sorted(myTree.keys())
firstStr = key_sorted[0]
secondDict = myTree[firstStr]
for key in secondDict.keys():
if type(secondDict[key]).__name__ == 'dict':
   thedepth=1+getTreeDepth(secondDict[key])
else:
   thedepth=1
if thedepth>maxdepth:
   maxdepth=thedepth
return maxdepth

測試結(jié)果如下

我們先給出最終的決策樹圖來驗證上述結(jié)果的正確性

可以看出樹的深度確實是有兩層，葉節(jié)點的數(shù)目是3。接下來我們給出繪制決策樹圖的關(guān)鍵函數(shù)，結(jié)果就得到上圖中決策樹。

def plotMidText(cntrPt,parentPt,txtString):
xMid = (parentPt[0]-cntrPt[0])/2.0+cntrPt[0]
yMid = (parentPt[1]-cntrPt[1])/2.0+cntrPt[1]
createPlot.ax1.text(xMid,yMid,txtString)

def plotTree(myTree,parentPt,nodeTxt):
numLeafs = getNumleafs(myTree)
depth = getTreeDepth(myTree)
key_sorted= sorted(myTree.keys())
firstStr = key_sorted[0]
cntrPt = (plotTree.xOff+(1.0+float(numLeafs))/2.0/plotTree.totalW,plotTree.yOff)
plotMidText(cntrPt,parentPt,nodeTxt)
plotNode(firstStr,cntrPt,parentPt,decisionNode)
secondDict = myTree[firstStr]
plotTree.yOff -= 1.0/plotTree.totalD
for key in secondDict.keys():
if type(secondDict[key]).__name__ == 'dict':
   plotTree(secondDict[key],cntrPt,str(key))
else:
   plotTree.xOff+=1.0/plotTree.totalW
   plotNode(secondDict[key],(plotTree.xOff,plotTree.yOff),cntrPt,leafNode)
   plotMidText((plotTree.xOff,plotTree.yOff),cntrPt,str(key))
plotTree.yOff+=1.0/plotTree.totalD

def createPlot(inTree):
fig = plt.figure(1,facecolor = 'white')
fig.clf()
axprops = dict(xticks = [],yticks = [])
createPlot.ax1 = plt.subplot(111,frameon = False,**axprops)
plotTree.totalW = float(getNumleafs(inTree))
plotTree.totalD = float(getTreeDepth(inTree))
plotTree.xOff = -0.5/ plotTree.totalW; plotTree.yOff = 1.0
plotTree(inTree,(0.5,1.0),'')
plt.show()
以上就是本文的全部內(nèi)容，希望對大家的學習有所幫助

CDA數(shù)據(jù)分析師考試相關(guān)入口一覽（建議收藏）：

? 想報名CDA認證考試，點擊>>> “CDA報名” 了解CDA考試詳情；

? 想學習CDA考試教材，點擊>>> “CDA教材” 了解CDA考試詳情；

? 想加入CDA考試題庫，點擊>>> “CDA題庫” 了解CDA考試詳情；

? 想了解CDA考試含金量，點擊>>> “CDA含金量” 了解CDA考試詳情；

特征決策樹 matplotlib 特征向量 python 機器學習

數(shù)據(jù)分析咨詢請掃描二維碼

若不方便掃碼，搜微信號：CDAshujufenxi

上一篇決策樹的python實現(xiàn)方法

下一篇python決策樹之CART分類回歸樹詳解

CDA報考指南

報考流程
考試時間
報名費用
聯(lián)系我們

數(shù)據(jù)分析學習

數(shù)據(jù)分析師資訊

京公網(wǎng)安備 11010802034615號經(jīng)營許可證編號：京B2-20210330

聯(lián)系電話：13321103290 (微信同號)

CDA教材
CDA題庫
CDA大綱

客服在線

立即咨詢

客服在線

立即咨詢

免密碼登錄

提交首次登錄驗證后自動注冊

') } function initGt() { var handler = function (captchaObj) { captchaObj.appendTo('#captcha'); captchaObj.onReady(function () { $("#wait").hide(); }).onSuccess(function(){ $('.getcheckcode').removeClass('dis'); $('.getcheckcode').trigger('click'); }); window.captchaObj = captchaObj; }; $('#captcha').show(); $.ajax({ url: "/login/gtstart?t=" + (new Date()).getTime(), // 加隨機數(shù)防止緩存 type: "get", dataType: "json", success: function (data) { $('#text').hide(); $('#wait').show(); // 調(diào)用 initGeetest 進行初始化 // 參數(shù)1：配置參數(shù) // 參數(shù)2：回調(diào)，回調(diào)的第一個參數(shù)驗證碼對象，之后可以使用它調(diào)用相應(yīng)的接口 initGeetest({ // 以下 4 個配置參數(shù)為必須，不能缺少 gt: data.gt, challenge: data.challenge, offline: !data.success, // 表示用戶后臺檢測極驗服務(wù)器是否宕機 new_captcha: data.new_captcha, // 用于宕機時表示是新驗證碼的宕機 product: "float", // 產(chǎn)品形式，包括：float，popup width: "280px", https: true // 更多配置參數(shù)說明請參見：http://docs.geetest.com/install/client/web-front/ }, handler); } }); } function codeCutdown() { if(_wait == 0){ //倒計時完成 $(".getcheckcode").removeClass('dis').html("重新獲取"); }else{ $(".getcheckcode").addClass('dis').html("重新獲取("+_wait+"s)"); _wait--; setTimeout(function () { codeCutdown(); },1000); } } function inputValidate(ele,telInput) { var oInput = ele; var inputVal = oInput.val(); var oType = ele.attr('data-type'); var oEtag = $('#etag').val(); var oErr = oInput.closest('.form_box').next('.err_txt'); var empTxt = '請輸入'+oInput.attr('placeholder')+'！'; var errTxt = '請輸入正確的'+oInput.attr('placeholder')+'！'; var pattern; if(inputVal==""){ if(!telInput){ errFun(oErr,empTxt); } return false; }else { switch (oType){ case 'login_mobile': pattern = /^1[3456789]\d{9}$/; if(inputVal.length==11) { $.ajax({ url: '/login/checkmobile', type: "post", dataType: "json", data: { mobile: inputVal, etag: oEtag, page_ur: window.location.href, page_referer: document.referrer }, success: function (data) { } }); } break; case 'login_yzm': pattern = /^\d{6}$/; break; } if(oType=='login_mobile'){ } if(!!validateFun(pattern,inputVal)){ errFun(oErr,'') if(telInput){ $('.getcheckcode').removeClass('dis'); } }else { if(!telInput) { errFun(oErr, errTxt); }else { $('.getcheckcode').addClass('dis'); } return false; } } return true; } function errFun(obj,msg) { obj.html(msg); if(msg==''){ $('.login_submit').removeClass('dis'); }else { $('.login_submit').addClass('dis'); } } function validateFun(pat,val) { return pat.test(val); }

99999久久久久久亚洲,欧美人与禽猛交狂配,高清日韩av在线影院,一个人在线高清免费观看,啦啦啦在线视频免费观看www

機器學習python實戰(zhàn)之決策樹

數(shù)據(jù)分析師考試動態(tài)

CDA報考指南

數(shù)據(jù)分析學習

數(shù)據(jù)分析師資訊

【CDA干貨】SQL Server 中 CONVERT 函數(shù)的日期轉(zhuǎn)換 ...

【CDA干貨】MySQL 大表拆分與關(guān)聯(lián)查詢效率：打破 “ ...

CDA 數(shù)據(jù)分析師：表結(jié)構(gòu)數(shù)據(jù) “獲取 - 加工 - 使用 ...

【CDA干貨】DSGE 模型中的 Et：理性預(yù)期算子的內(nèi)涵 ...

【CDA干貨】Python 提取 TIF 中地名的完整指南 ...

CDA 數(shù)據(jù)分析師：解鎖表結(jié)構(gòu)數(shù)據(jù)特征價值的專業(yè)核心 ...

【CDA干貨】Excel 導入數(shù)據(jù)含缺失值？詳解 dropna ...

【CDA干貨】深入解析卡方檢驗與 t 檢驗：差異、適用 ...

CDA 數(shù)據(jù)分析師：掌控表格結(jié)構(gòu)數(shù)據(jù)全功能周期的專業(yè) ...

【CDA干貨】MySQL 執(zhí)行計劃中 rows 數(shù)量的準確性解 ...

【CDA干貨】解析 Python 中 Response 對象的 text ...

CDA 數(shù)據(jù)分析師：激活表格結(jié)構(gòu)數(shù)據(jù)價值的核心操盤手 ...

【CDA干貨】Python HTTP 請求工具對比：urllib.requ ...

【CDA干貨】解決 pd.read\_csv 讀取長浮點數(shù)據(jù)的科 ...

CDA 數(shù)據(jù)分析師：業(yè)務(wù)數(shù)據(jù)分析步驟的落地者與價值優(yōu) ...

【CDA干貨】用 SQL 驗證業(yè)務(wù)邏輯：從規(guī)則拆解到數(shù)據(jù) ...

【CDA干貨】塔吉特百貨孕婦營銷案例：數(shù)據(jù)驅(qū)動下的 ...

CDA 數(shù)據(jù)分析師與戰(zhàn)略 / 業(yè)務(wù)數(shù)據(jù)分析：概念辨析與 ...

【CDA干貨】Excel 數(shù)據(jù)聚類分析：從操作實踐到業(yè)務(wù) ...

【CDA干貨】統(tǒng)計模型的核心目的：從數(shù)據(jù)解讀到?jīng)Q策 ...

CDA教育閉環(huán)

常見問題

關(guān)于我們

CDA數(shù)據(jù)分析師公眾號

CDA考試中心小程序

CDA數(shù)據(jù)分析師App下載

99999久久久久久亚洲,欧美人与禽猛交狂配,高清日韩av在线影院,一个人在线高清免费观看,啦啦啦在线视频免费观看www

機器學習python實戰(zhàn)之決策樹

數(shù)據(jù)分析師考試動態(tài)

CDA報考指南

數(shù)據(jù)分析學習

數(shù)據(jù)分析師資訊

【CDA干貨】SQL Server 中 CONVERT 函數(shù)的日期轉(zhuǎn)換 ...

【CDA干貨】MySQL 大表拆分與關(guān)聯(lián)查詢效率：打破 “ ...

CDA 數(shù)據(jù)分析師：表結(jié)構(gòu)數(shù)據(jù) “獲取 - 加工 - 使用 ...

【CDA干貨】DSGE 模型中的 Et：理性預(yù)期算子的內(nèi)涵 ...

【CDA干貨】Python 提取 TIF 中地名的完整指南 ...

CDA 數(shù)據(jù)分析師：解鎖表結(jié)構(gòu)數(shù)據(jù)特征價值的專業(yè)核心 ...

【CDA干貨】Excel 導入數(shù)據(jù)含缺失值？詳解 dropna ...

【CDA干貨】深入解析卡方檢驗與 t 檢驗：差異、適用 ...

CDA 數(shù)據(jù)分析師：掌控表格結(jié)構(gòu)數(shù)據(jù)全功能周期的專業(yè) ...

【CDA干貨】MySQL 執(zhí)行計劃中 rows 數(shù)量的準確性解 ...

【CDA干貨】解析 Python 中 Response 對象的 text ...

CDA 數(shù)據(jù)分析師：激活表格結(jié)構(gòu)數(shù)據(jù)價值的核心操盤手 ...

【CDA干貨】Python HTTP 請求工具對比：urllib.requ ...

【CDA干貨】解決 pd.read\_csv 讀取長浮點數(shù)據(jù)的科 ...

CDA 數(shù)據(jù)分析師：業(yè)務(wù)數(shù)據(jù)分析步驟的落地者與價值優(yōu) ...

【CDA干貨】用 SQL 驗證業(yè)務(wù)邏輯：從規(guī)則拆解到數(shù)據(jù) ...

【CDA干貨】塔吉特百貨孕婦營銷案例：數(shù)據(jù)驅(qū)動下的 ...

CDA 數(shù)據(jù)分析師與戰(zhàn)略 / 業(yè)務(wù)數(shù)據(jù)分析：概念辨析與 ...

【CDA干貨】Excel 數(shù)據(jù)聚類分析：從操作實踐到業(yè)務(wù) ...

【CDA干貨】統(tǒng)計模型的核心目的：從數(shù)據(jù)解讀到?jīng)Q策 ...

CDA教育閉環(huán)

常見問題

關(guān)于我們

CDA數(shù)據(jù)分析師公眾號

CDA考試中心小程序

CDA數(shù)據(jù)分析師App下載

【CDA干貨】深入解析卡方檢驗與 t 檢驗：差異、適用 ...