99999久久久久久亚洲,欧美人与禽猛交狂配,高清日韩av在线影院,一个人在线高清免费观看,啦啦啦在线视频免费观看www

熱線電話：13121318867

登錄

Python使用三種方法實現(xiàn)PCA算法

2018-01-23

Python使用三種方法實現(xiàn)PCA算法

主成分分析，即Principal Component Analysis（PCA），是多元統(tǒng)計中的重要內(nèi)容，也廣泛應用于機器學習和其它領域。它的主要作用是對高維數(shù)據(jù)進行降維。PCA把原先的n個特征用數(shù)目更少的k個特征取代，新特征是舊特征的線性組合，這些線性組合最大化樣本方差，盡量使新的k個特征互不相關(guān)。

主成分分析（PCA） vs 多元判別式分析（MDA）

PCA和MDA都是線性變換的方法，二者關(guān)系密切。在PCA中，我們尋找數(shù)據(jù)集中最大化方差的成分，在MDA中，我們對類間最大散布的方向更感興趣。

一句話，通過PCA，我們將整個數(shù)據(jù)集（不帶類別標簽）映射到一個子空間中，在MDA中，我們致力于找到一個能夠最好區(qū)分各類的最佳子集。粗略來講，PCA是通過尋找方差最大的軸（在一類中，因為PCA把整個數(shù)據(jù)集當做一類），在MDA中，我們還需要最大化類間散布。

在通常的模式識別問題中，MDA往往在PCA后面。

PCA的主要算法如下：

組織數(shù)據(jù)形式，以便于模型使用；
計算樣本每個特征的平均值；
每個樣本數(shù)據(jù)減去該特征的平均值（歸一化處理）；
求協(xié)方差矩陣；
找到協(xié)方差矩陣的特征值和特征向量；
對特征值和特征向量重新排列（特征值從大到小排列）；
對特征值求取累計貢獻率；
對累計貢獻率按照某個特定比例，選取特征向量集的字跡合；
對原始數(shù)據(jù)（第三步后）。

其中協(xié)方差矩陣的分解可以通過按對稱矩陣的特征向量來，也可以通過分解矩陣的SVD來實現(xiàn)，而在Scikit-learn中，也是采用SVD來實現(xiàn)PCA算法的。

本文將用三種方法來實現(xiàn)PCA算法，一種是原始算法，即上面所描述的算法過程，具體的計算方法和過程，可以參考：A tutorial on Principal Components Analysis, Lindsay I Smith. 一種是帶SVD的原始算法，在Python的Numpy模塊中已經(jīng)實現(xiàn)了SVD算法，并且將特征值從大從小排列，省去了對特征值和特征向量重新排列這一步。最后一種方法是用Python的Scikit-learn模塊實現(xiàn)的PCA類直接進行計算，來驗證前面兩種方法的正確性。

用以上三種方法來實現(xiàn)PCA的完整的Python如下：

import numpy as np
from sklearn.decomposition import PCA
import sys
#returns choosing how many main factors
def index_lst(lst, component=0, rate=0):
#component: numbers of main factors
#rate: rate of sum(main factors)/sum(all factors)
#rate range suggest: (0.8,1)
#if you choose rate parameter, return index = 0 or less than len(lst)
if component and rate:
    print('Component and rate must choose only one!')
    sys.exit(0)
if not component and not rate:
    print('Invalid parameter for numbers of components!')
    sys.exit(0)
elif component:
    print('Choosing by component, components are %s......'%component)
    return component
else:
    print('Choosing by rate, rate is %s ......'%rate)
    for i in range(1, len(lst)):
      if sum(lst[:i])/sum(lst) >= rate:
        return i
    return 0

def main():
# test data
mat = [[-1,-1,0,2,1],[2,0,0,-1,-1],[2,0,1,1,0]]

# simple transform of test data
Mat = np.array(mat, dtype='float64')
print('Before PCA transforMation, data is:\n', Mat)
print('\nMethod 1: PCA by original algorithm:')
p,n = np.shape(Mat) # shape of Mat
t = np.mean(Mat, 0) # mean of each column

# substract the mean of each column
for i in range(p):
    for j in range(n):
      Mat[i,j] = float(Mat[i,j]-t[j])

# covariance Matrix
cov_Mat = np.dot(Mat.T, Mat)/(p-1)

# PCA by original algorithm
# eigvalues and eigenvectors of covariance Matrix with eigvalues descending
U,V = np.linalg.eigh(cov_Mat)
# Rearrange the eigenvectors and eigenvalues
U = U[::-1]
for i in range(n):
    V[i,:] = V[i,:][::-1]
# choose eigenvalue by component or rate, not both of them euqal to 0
Index = index_lst(U, component=2) # choose how many main factors
if Index:
    v = V[:,:Index] # subset of Unitary matrix
else: # improper rate choice may return Index=0
    print('Invalid rate choice.\nPlease adjust the rate.')
    print('Rate distribute follows:')
    print([sum(U[:i])/sum(U) for i in range(1, len(U)+1)])
    sys.exit(0)
# data transformation
T1 = np.dot(Mat, v)
# print the transformed data
print('We choose %d main factors.'%Index)
print('After PCA transformation, data becomes:\n',T1)

# PCA by original algorithm using SVD
print('\nMethod 2: PCA by original algorithm using SVD:')
# u: Unitary matrix, eigenvectors in columns
# d: list of the singular values, sorted in descending order
u,d,v = np.linalg.svd(cov_Mat)
Index = index_lst(d, rate=0.95) # choose how many main factors
T2 = np.dot(Mat, u[:,:Index]) # transformed data
print('We choose %d main factors.'%Index)
print('After PCA transformation, data becomes:\n',T2)

# PCA by Scikit-learn
pca = PCA(n_components=2) # n_components can be integer or float in (0,1)
pca.fit(mat) # fit the model
print('\nMethod 3: PCA by Scikit-learn:')
print('After PCA transformation, data becomes:')
print(pca.fit_transform(mat)) # transformed data
main()

運行以上代碼，輸出結(jié)果為：

這說明用以上三種方法來實現(xiàn)PCA都是可行的。這樣我們就能理解PCA的具體實現(xiàn)過程啦~~有興趣的讀者可以用其它語言實現(xiàn)一下哈

CDA數(shù)據(jù)分析師考試相關(guān)入口一覽（建議收藏）：

? 想報名CDA認證考試，點擊>>> “CDA報名” 了解CDA考試詳情；

? 想學習CDA考試教材，點擊>>> “CDA教材” 了解CDA考試詳情；

? 想加入CDA考試題庫，點擊>>> “CDA題庫” 了解CDA考試詳情；

? 想了解CDA考試含金量，點擊>>> “CDA含金量” 了解CDA考試詳情；

PCA 特征 SVD 方差特征向量協(xié)方差矩陣協(xié)方差降維

數(shù)據(jù)分析咨詢請掃描二維碼

若不方便掃碼，搜微信號：CDAshujufenxi

上一篇回歸系列（一）| 怎樣正確地理解線性回歸

下一篇2020美國總統(tǒng)競選大戲開鑼，川普當選的奇跡會再發(fā)生嗎？

CDA報考指南

報考流程
考試時間
報名費用
聯(lián)系我們

數(shù)據(jù)分析學習

數(shù)據(jù)分析師資訊

京公網(wǎng)安備 11010802034615號經(jīng)營許可證編號：京B2-20210330

聯(lián)系電話：13321103290 (微信同號)

CDA教材
CDA題庫
CDA大綱

客服在線

立即咨詢

客服在線

立即咨詢

^{<blockquote id="rnkus"><rt id="rnkus"></rt></blockquote>}

免密碼登錄

提交首次登錄驗證后自動注冊

') } function initGt() { var handler = function (captchaObj) { captchaObj.appendTo('#captcha'); captchaObj.onReady(function () { $("#wait").hide(); }).onSuccess(function(){ $('.getcheckcode').removeClass('dis'); $('.getcheckcode').trigger('click'); }); window.captchaObj = captchaObj; }; $('#captcha').show(); $.ajax({ url: "/login/gtstart?t=" + (new Date()).getTime(), // 加隨機數(shù)防止緩存 type: "get", dataType: "json", success: function (data) { $('#text').hide(); $('#wait').show(); // 調(diào)用 initGeetest 進行初始化 // 參數(shù)1：配置參數(shù) // 參數(shù)2：回調(diào)，回調(diào)的第一個參數(shù)驗證碼對象，之后可以使用它調(diào)用相應的接口 initGeetest({ // 以下 4 個配置參數(shù)為必須，不能缺少 gt: data.gt, challenge: data.challenge, offline: !data.success, // 表示用戶后臺檢測極驗服務器是否宕機 new_captcha: data.new_captcha, // 用于宕機時表示是新驗證碼的宕機 product: "float", // 產(chǎn)品形式，包括：float，popup width: "280px", https: true // 更多配置參數(shù)說明請參見：http://docs.geetest.com/install/client/web-front/ }, handler); } }); } function codeCutdown() { if(_wait == 0){ //倒計時完成 $(".getcheckcode").removeClass('dis').html("重新獲取"); }else{ $(".getcheckcode").addClass('dis').html("重新獲取("+_wait+"s)"); _wait--; setTimeout(function () { codeCutdown(); },1000); } } function inputValidate(ele,telInput) { var oInput = ele; var inputVal = oInput.val(); var oType = ele.attr('data-type'); var oEtag = $('#etag').val(); var oErr = oInput.closest('.form_box').next('.err_txt'); var empTxt = '請輸入'+oInput.attr('placeholder')+'！'; var errTxt = '請輸入正確的'+oInput.attr('placeholder')+'！'; var pattern; if(inputVal==""){ if(!telInput){ errFun(oErr,empTxt); } return false; }else { switch (oType){ case 'login_mobile': pattern = /^1[3456789]\d{9}$/; if(inputVal.length==11) { $.ajax({ url: '/login/checkmobile', type: "post", dataType: "json", data: { mobile: inputVal, etag: oEtag, page_ur: window.location.href, page_referer: document.referrer }, success: function (data) { } }); } break; case 'login_yzm': pattern = /^\d{6}$/; break; } if(oType=='login_mobile'){ } if(!!validateFun(pattern,inputVal)){ errFun(oErr,'') if(telInput){ $('.getcheckcode').removeClass('dis'); } }else { if(!telInput) { errFun(oErr, errTxt); }else { $('.getcheckcode').addClass('dis'); } return false; } } return true; } function errFun(obj,msg) { obj.html(msg); if(msg==''){ $('.login_submit').removeClass('dis'); }else { $('.login_submit').addClass('dis'); } } function validateFun(pat,val) { return pat.test(val); }

99999久久久久久亚洲,欧美人与禽猛交狂配,高清日韩av在线影院,一个人在线高清免费观看,啦啦啦在线视频免费观看www

Python使用三種方法實現(xiàn)PCA算法

數(shù)據(jù)分析師考試動態(tài)

CDA報考指南

數(shù)據(jù)分析學習

數(shù)據(jù)分析師資訊

【CDA干貨】LSTM 模型輸入長度選擇技巧：提升序列建 ...

CDA 數(shù)據(jù)分析師報考條件詳解與準備指南 ...

【CDA干貨】數(shù)據(jù)透視表中兩列相乘合計的實用指南 ...

CDA 認證考試大綱 2025 重磅更新：一二級考綱變化匯 ...

BI 大數(shù)據(jù)分析師：連接數(shù)據(jù)與業(yè)務的價值轉(zhuǎn)化者 ...

SQL 在預測分析中的應用：從數(shù)據(jù)查詢到趨勢預判 ...

數(shù)據(jù)查詢結(jié)束后：分析師的收尾工作與價值深化 ...

CDA 數(shù)據(jù)分析師考試：從報考到取證的全攻略 ...

【CDA干貨】單樣本趨勢性檢驗：捕捉數(shù)據(jù)背后的時間 ...

year_month數(shù)據(jù)類型：時間維度的精準切片 ...

CDA 備考干貨：Python 在數(shù)據(jù)分析中的核心應用與實 ...

【CDA干貨】SPSS 中的 Mann-Kendall 檢驗：數(shù)據(jù)趨勢 ...

備戰(zhàn) CDA 數(shù)據(jù)分析師考試：需要多久？如何規(guī)劃？ ...

【CDA干貨】LSTM 輸出不確定的成因、影響與應對策略 ...

統(tǒng)計學方法在市場調(diào)研數(shù)據(jù)中的深度應用 ...

CDA數(shù)據(jù)分析師證書考試全攻略

剖析 CDA 數(shù)據(jù)分析師考試題型：解鎖高效備考與答題 ...

【CDA干貨】SQL Server 字符串截取轉(zhuǎn)日期：解鎖數(shù)據(jù) ...

CDA 數(shù)據(jù)分析師視角：從數(shù)據(jù)迷霧中探尋商業(yè)真相 ...

CDA 數(shù)據(jù)分析師：開啟數(shù)據(jù)職業(yè)發(fā)展新征程 ...

CDA教育閉環(huán)

常見問題

關(guān)于我們

CDA數(shù)據(jù)分析師公眾號

CDA考試中心小程序

CDA數(shù)據(jù)分析師App下載

99999久久久久久亚洲,欧美人与禽猛交狂配,高清日韩av在线影院,一个人在线高清免费观看,啦啦啦在线视频免费观看www

Python使用三種方法實現(xiàn)PCA算法

數(shù)據(jù)分析師考試動態(tài)

CDA報考指南

數(shù)據(jù)分析學習

數(shù)據(jù)分析師資訊

【CDA干貨】LSTM 模型輸入長度選擇技巧：提升序列建 ...

CDA 數(shù)據(jù)分析師報考條件詳解與準備指南 ...

【CDA干貨】數(shù)據(jù)透視表中兩列相乘合計的實用指南 ...

CDA 認證考試大綱 2025 重磅更新：一二級考綱變化匯 ...

BI 大數(shù)據(jù)分析師：連接數(shù)據(jù)與業(yè)務的價值轉(zhuǎn)化者 ...

SQL 在預測分析中的應用：從數(shù)據(jù)查詢到趨勢預判 ...

數(shù)據(jù)查詢結(jié)束后：分析師的收尾工作與價值深化 ...

CDA 數(shù)據(jù)分析師考試：從報考到取證的全攻略 ...

【CDA干貨】單樣本趨勢性檢驗：捕捉數(shù)據(jù)背后的時間 ...

year_month數(shù)據(jù)類型：時間維度的精準切片 ...

CDA 備考干貨：Python 在數(shù)據(jù)分析中的核心應用與實 ...

【CDA干貨】SPSS 中的 Mann-Kendall 檢驗：數(shù)據(jù)趨勢 ...

備戰(zhàn) CDA 數(shù)據(jù)分析師考試：需要多久？如何規(guī)劃？ ...

【CDA干貨】LSTM 輸出不確定的成因、影響與應對策略 ...

統(tǒng)計學方法在市場調(diào)研數(shù)據(jù)中的深度應用 ...

CDA數(shù)據(jù)分析師證書考試全攻略

剖析 CDA 數(shù)據(jù)分析師考試題型：解鎖高效備考與答題 ...

【CDA干貨】SQL Server 字符串截取轉(zhuǎn)日期：解鎖數(shù)據(jù) ...

CDA 數(shù)據(jù)分析師視角：從數(shù)據(jù)迷霧中探尋商業(yè)真相 ...

CDA 數(shù)據(jù)分析師：開啟數(shù)據(jù)職業(yè)發(fā)展新征程 ...

CDA教育閉環(huán)

常見問題

關(guān)于我們

CDA數(shù)據(jù)分析師公眾號

CDA考試中心小程序

CDA數(shù)據(jù)分析師App下載

備戰(zhàn) CDA 數(shù)據(jù)分析師考試：需要多久？如何規(guī)劃？ ...

【CDA干貨】LSTM 輸出不確定的成因、影響與應對策略 ...