污18禁污色黄网站免费观看,两个人看的在线www片,樱花草视频在线观看www

99999久久久久久亚洲,欧美人与禽猛交狂配,高清日韩av在线影院,一个人在线高清免费观看,啦啦啦在线视频免费观看www

小姐姐帶你一起學(xué)：如何用Python實(shí)現(xiàn)7種機(jī)器學(xué)習(xí)算法（附代碼）

2018-04-04

小姐姐帶你一起學(xué)：如何用Python實(shí)現(xiàn)7種機(jī)器學(xué)習(xí)算法（附代碼）

Python 被稱為是最接近 AI 的語(yǔ)言。最近一位名叫Anna-Lena Popkes的小姐姐在GitHub上分享了自己如何使用Python（3.6及以上版本）實(shí)現(xiàn)7種機(jī)器學(xué)習(xí)算法的筆記，并附有完整代碼。所有這些算法的實(shí)現(xiàn)都沒(méi)有使用其他機(jī)器學(xué)習(xí)庫(kù)。這份筆記可以幫大家對(duì)算法以及其底層結(jié)構(gòu)有個(gè)基本的了解，但并不是提供最有效的實(shí)現(xiàn)。

小姐姐她是德國(guó)波恩大學(xué)計(jì)算機(jī)科學(xué)專業(yè)的研究生，主要關(guān)注機(jī)器學(xué)習(xí)和神經(jīng)網(wǎng)絡(luò)。

七種算法包括：

線性回歸算法
Logistic 回歸算法
感知器
K 最近鄰算法
K 均值聚類算法
含單隱層的神經(jīng)網(wǎng)絡(luò)
多項(xiàng)式的 Logistic 回歸算法

▌1. 線性回歸算法

在線性回歸中，我們想要建立一個(gè)模型，來(lái)擬合一個(gè)因變量 y 與一個(gè)或多個(gè)獨(dú)立自變量(預(yù)測(cè)變量) x 之間的關(guān)系。

給定：

數(shù)據(jù)集
是d-維向量

是一個(gè)目標(biāo)變量，它是一個(gè)標(biāo)量

線性回歸模型可以理解為一個(gè)非常簡(jiǎn)單的神經(jīng)網(wǎng)絡(luò)：

它有一個(gè)實(shí)值加權(quán)向量
它有一個(gè)實(shí)值偏置量 b
它使用恒等函數(shù)作為其激活函數(shù)

線性回歸模型可以使用以下方法進(jìn)行訓(xùn)練

a) 梯度下降法

b) 正態(tài)方程(封閉形式解)：

其中 X 是一個(gè)矩陣，其形式為，包含所有訓(xùn)練樣本的維度信息。

而正態(tài)方程需要計(jì)算的轉(zhuǎn)置。這個(gè)操作的計(jì)算復(fù)雜度介于）和之間，而這取決于所選擇的實(shí)現(xiàn)方法。因此，如果訓(xùn)練集中數(shù)據(jù)的特征數(shù)量很大，那么使用正態(tài)方程訓(xùn)練的過(guò)程將變得非常緩慢。

線性回歸模型的訓(xùn)練過(guò)程有不同的步驟。首先(在步驟 0 中)，模型的參數(shù)將被初始化。在達(dá)到指定訓(xùn)練次數(shù)或參數(shù)收斂前，重復(fù)以下其他步驟。

第 0 步：

用0 (或小的隨機(jī)值)來(lái)初始化權(quán)重向量和偏置量，或者直接使用正態(tài)方程計(jì)算模型參數(shù)

第 1 步(只有在使用梯度下降法訓(xùn)練時(shí)需要)：

計(jì)算輸入的特征與權(quán)重值的線性組合，這可以通過(guò)矢量化和矢量傳播來(lái)對(duì)所有訓(xùn)練樣本進(jìn)行處理：

其中 X 是所有訓(xùn)練樣本的維度矩陣，其形式為；· 表示點(diǎn)積。

第 2 步(只有在使用梯度下降法訓(xùn)練時(shí)需要)：

用均方誤差計(jì)算訓(xùn)練集上的損失：

第 3 步(只有在使用梯度下降法訓(xùn)練時(shí)需要):

對(duì)每個(gè)參數(shù)，計(jì)算其對(duì)損失函數(shù)的偏導(dǎo)數(shù)：

所有偏導(dǎo)數(shù)的梯度計(jì)算如下：

第 4 步(只有在使用梯度下降法訓(xùn)練時(shí)需要）:

更新權(quán)重向量和偏置量：

其中，表示學(xué)習(xí)率。

In [4]:

import numpy as npimport matplotlib.pyplot as pltfrom sklearn.model_selection import train_test_splitnp.random.seed(123)

數(shù)據(jù)集

In [5]:

# We will use a simple training setX = 2 * np.random.rand(500, 1)y = 5 + 3 * X + np.random.randn(500, 1)fig = plt.figure(figsize=(8,6))plt.scatter(X, y)plt.title("Dataset")plt.xlabel("First feature")plt.ylabel("Second feature")plt.show()

In [6]:

# Split the data into a training and test setX_train, X_test, y_train, y_test = train_test_split(X, y)print(f'Shape X_train: {X_train.shape}')print(f'Shape y_train: {y_train.shape}')print(f'Shape X_test: {X_test.shape}')print(f'Shape y_test: {y_test.shape}')

Shape X_train: (375, 1)Shape y_train: (375, 1)Shape X_test: (125, 1)Shape y_test: (125, 1)

線性回歸分類

In [23]:

class LinearRegression: def __init__(self): pass def train_gradient_descent(self, X, y, learning_rate=0.01, n_iters=100): """ Trains a linear regression model using gradient descent """ # Step 0: Initialize the parameters n_samples, n_features = X.shape self.weights = np.zeros(shape=(n_features,1)) self.bias = 0 costs = [] for i in range(n_iters): # Step 1: Compute a linear combination of the input features and weights y_predict = np.dot(X, self.weights) + self.bias # Step 2: Compute cost over training set cost = (1 / n_samples) * np.sum((y_predict - y)**2) costs.append(cost) if i % 100 == 0: print(f"Cost at iteration {i}: {cost}") # Step 3: Compute the gradients dJ_dw = (2 / n_samples) * np.dot(X.T, (y_predict - y)) dJ_db = (2 / n_samples) * np.sum((y_predict - y)) # Step 4: Update the parameters self.weights = self.weights - learning_rate * dJ_dw self.bias = self.bias - learning_rate * dJ_db return self.weights, self.bias, costs def train_normal_equation(self, X, y): """ Trains a linear regression model using the normal equation """ self.weights = np.dot(np.dot(np.linalg.inv(np.dot(X.T, X)), X.T), y) self.bias = 0 return self.weights, self.bias def predict(self, X): return np.dot(X, self.weights) + self.bias

使用梯度下降進(jìn)行訓(xùn)練

In [24]:

regressor = LinearRegression()w_trained, b_trained, costs = regressor.train_gradient_descent(X_train, y_train, learning_rate=0.005, n_iters=600)fig = plt.figure(figsize=(8,6))plt.plot(np.arange(n_iters), costs)plt.title("Development of cost during training")plt.xlabel("Number of iterations")plt.ylabel("Cost")plt.show()

Cost at iteration 0: 66.45256981003433Cost at iteration 100: 2.2084346146095934Cost at iteration 200: 1.2797812854182806Cost at iteration 300: 1.2042189195356685Cost at iteration 400: 1.1564867816573Cost at iteration 500: 1.121391041394467

測(cè)試（梯度下降模型）

In [28]:

n_samples, _ = X_train.shapen_samples_test, _ = X_test.shapey_p_train = regressor.predict(X_train)y_p_test = regressor.predict(X_test)error_train = (1 / n_samples) * np.sum((y_p_train - y_train) ** 2)error_test = (1 / n_samples_test) * np.sum((y_p_test - y_test) ** 2)print(f"Error on training set: {np.round(error_train, 4)}")print(f"Error on test set: {np.round(error_test)}")

Error on training set: 1.0955

Error on test set: 1.0

使用正規(guī)方程（normal equation）訓(xùn)練

# To compute the parameters using the normal equation, we add a bias value of 1 to each input exampleX_b_train = np.c_[np.ones((n_samples)), X_train]X_b_test = np.c_[np.ones((n_samples_test)), X_test]reg_normal = LinearRegression()w_trained = reg_normal.train_normal_equation(X_b_train, y_train)

測(cè)試（正規(guī)方程模型）

y_p_train = reg_normal.predict(X_b_train)y_p_test = reg_normal.predict(X_b_test)error_train = (1 / n_samples) * np.sum((y_p_train - y_train) ** 2)error_test = (1 / n_samples_test) * np.sum((y_p_test - y_test) ** 2)print(f"Error on training set: {np.round(error_train, 4)}")print(f"Error on test set: {np.round(error_test, 4)}")

Error on training set: 1.0228

Error on test set: 1.0432

可視化測(cè)試預(yù)測(cè)

# Plot the test predictionsfig = plt.figure(figsize=(8,6))plt.scatter(X_train, y_train)plt.scatter(X_test, y_p_test)plt.xlabel("First feature")plt.ylabel("Second feature")plt.show()

▌2. Logistic 回歸算法

在 Logistic 回歸中，我們?cè)噲D對(duì)給定輸入特征的線性組合進(jìn)行建模，來(lái)得到其二元變量的輸出結(jié)果。例如，我們可以嘗試使用競(jìng)選候選人花費(fèi)的金錢和時(shí)間信息來(lái)預(yù)測(cè)選舉的結(jié)果(勝或負(fù))。Logistic 回歸算法的工作原理如下。

給定：

數(shù)據(jù)集
是d-維向量
是一個(gè)二元的目標(biāo)變量

Logistic 回歸模型可以理解為一個(gè)非常簡(jiǎn)單的神經(jīng)網(wǎng)絡(luò)：

它有一個(gè)實(shí)值加權(quán)向量
它有一個(gè)實(shí)值偏置量 b
它使用 sigmoid 函數(shù)作為其激活函數(shù)

與線性回歸不同，Logistic 回歸沒(méi)有封閉解。但由于損失函數(shù)是凸函數(shù)，因此我們可以使用梯度下降法來(lái)訓(xùn)練模型。事實(shí)上，在保證學(xué)習(xí)速率足夠小且使用足夠的訓(xùn)練迭代步數(shù)的前提下，梯度下降法(或任何其他優(yōu)化算法)可以是能夠找到全局最小值。

訓(xùn)練 Logistic 回歸模型有不同的步驟。首先(在步驟 0 中)，模型的參數(shù)將被初始化。在達(dá)到指定訓(xùn)練次數(shù)或參數(shù)收斂前，重復(fù)以下其他步驟。

第 0 步：用 0 (或小的隨機(jī)值)來(lái)初始化權(quán)重向量和偏置值

第 1 步：計(jì)算輸入的特征與權(quán)重值的線性組合，這可以通過(guò)矢量化和矢量傳播來(lái)對(duì)所有訓(xùn)練樣本進(jìn)行處理：

其中 X 是所有訓(xùn)練樣本的維度矩陣，其形式為；·表示點(diǎn)積。

第 2 步：用 sigmoid 函數(shù)作為激活函數(shù)，其返回值介于0到1之間：

第 3 步：計(jì)算整個(gè)訓(xùn)練集的損失值。

我們希望模型得到的目標(biāo)值概率落在 0 到 1 之間。因此在訓(xùn)練期間，我們希望調(diào)整參數(shù)，使得模型較大的輸出值對(duì)應(yīng)正標(biāo)簽(真實(shí)標(biāo)簽為 1)，較小的輸出值對(duì)應(yīng)負(fù)標(biāo)簽(真實(shí)標(biāo)簽為 0 )。這在損失函數(shù)中表現(xiàn)為如下形式：

第 4 步：對(duì)權(quán)重向量和偏置量，計(jì)算其對(duì)損失函數(shù)的梯度。

關(guān)于這個(gè)導(dǎo)數(shù)實(shí)現(xiàn)的詳細(xì)解釋，可以參見(jiàn)這里（https://stats.stackexchange.com/questions/278771/how-is-the-cost-function-from-logistic-regression-derivated）。

一般形式如下：

對(duì)于偏置量的導(dǎo)數(shù)計(jì)算，此時(shí)為 1。

第 5 步：更新權(quán)重和偏置值。

其中，表示學(xué)習(xí)率。

In [24]:

import numpy as npfrom sklearn.model_selection import train_test_splitfrom sklearn.datasets import make_blobsimport matplotlib.pyplot as pltnp.random.seed(123)% matplotlib inline

數(shù)據(jù)集

In [25]:

# We will perform logistic regression using a simple toy dataset of two classesX, y_true = make_blobs(n_samples= 1000, centers=2)fig = plt.figure(figsize=(8,6))plt.scatter(X[:,0], X[:,1], c=y_true)plt.title("Dataset")plt.xlabel("First feature")plt.ylabel("Second feature")plt.show()

In [26]:

# Reshape targets to get column vector with shape (n_samples, 1)y_true = y_true[:, np.newaxis]# Split the data into a training and test setX_train, X_test, y_train, y_test = train_test_split(X, y_true)print(f'Shape X_train: {X_train.shape}')print(f'Shape y_train: {y_train.shape}')print(f'Shape X_test: {X_test.shape}')print(f'Shape y_test: {y_test.shape}')

Shape X_train: (750, 2)

Shape y_train: (750, 1)

Shape X_test: (250, 2)

Shape y_test: (250, 1)

Logistic回歸分類

In [27]:

class LogisticRegression: def __init__(self): pass def sigmoid(self, a): return 1 / (1 + np.exp(-a)) def train(self, X, y_true, n_iters, learning_rate): """ Trains the logistic regression model on given data X and targets y """ # Step 0: Initialize the parameters n_samples, n_features = X.shape self.weights = np.zeros((n_features, 1)) self.bias = 0 costs = [] for i in range(n_iters): # Step 1 and 2: Compute a linear combination of the input features and weights, # apply the sigmoid activation function y_predict = self.sigmoid(np.dot(X, self.weights) + self.bias) # Step 3: Compute the cost over the whole training set. cost = (- 1 / n_samples) * np.sum(y_true * np.log(y_predict) + (1 - y_true) * (np.log(1 - y_predict))) # Step 4: Compute the gradients dw = (1 / n_samples) * np.dot(X.T, (y_predict - y_true)) db = (1 / n_samples) * np.sum(y_predict - y_true) # Step 5: Update the parameters self.weights = self.weights - learning_rate * dw self.bias = self.bias - learning_rate * db costs.append(cost) if i % 100 == 0: print(f"Cost after iteration {i}: {cost}") return self.weights, self.bias, costs def predict(self, X): """ Predicts binary labels for a set of examples X. """ y_predict = self.sigmoid(np.dot(X, self.weights) + self.bias) y_predict_labels = [1 if elem > 0.5 else 0 for elem in y_predict] return np.array(y_predict_labels)[:, np.newaxis]

初始化并訓(xùn)練模型

In [29]:

regressor = LogisticRegression()w_trained, b_trained, costs = regressor.train(X_train, y_train, n_iters=600, learning_rate=0.009)fig = plt.figure(figsize=(8,6))plt.plot(np.arange(600), costs)plt.title("Development of cost over training")plt.xlabel("Number of iterations")plt.ylabel("Cost")plt.show()

Cost after iteration 0: 0.6931471805599453

Cost after iteration 100: 0.046514002935609956

Cost after iteration 200: 0.02405337743999163

Cost after iteration 300: 0.016354408151412207

Cost after iteration 400: 0.012445770521974634

Cost after iteration 500: 0.010073981792906512

測(cè)試模型

In [31]:

y_p_train = regressor.predict(X_train)y_p_test = regressor.predict(X_test)print(f"train accuracy: {100 - np.mean(np.abs(y_p_train - y_train)) * 100}%")print(f"test accuracy: {100 - np.mean(np.abs(y_p_test - y_test))}%")

train accuracy: 100.0%

test accuracy: 100.0%

▌3. 感知器算法

感知器是一種簡(jiǎn)單的監(jiān)督式的機(jī)器學(xué)習(xí)算法，也是最早的神經(jīng)網(wǎng)絡(luò)體系結(jié)構(gòu)之一。它由 Rosenblatt 在 20 世紀(jì) 50 年代末提出。感知器是一種二元的線性分類器，其使用 d- 維超平面來(lái)將一組訓(xùn)練樣本( d- 維輸入向量)映射成二進(jìn)制輸出值。它的原理如下：

給定：

數(shù)據(jù)集
是d-維向量
是一個(gè)目標(biāo)變量，它是一個(gè)標(biāo)量

感知器可以理解為一個(gè)非常簡(jiǎn)單的神經(jīng)網(wǎng)絡(luò)：

它有一個(gè)實(shí)值加權(quán)向量
它有一個(gè)實(shí)值偏置量 b
它使用 Heaviside step 函數(shù)作為其激活函數(shù)

感知器的訓(xùn)練可以使用梯度下降法，訓(xùn)練算法有不同的步驟。首先(在步驟0中)，模型的參數(shù)將被初始化。在達(dá)到指定訓(xùn)練次數(shù)或參數(shù)收斂前，重復(fù)以下其他步驟。

第 0 步：用 0 (或小的隨機(jī)值)來(lái)初始化權(quán)重向量和偏置值

第 1 步：計(jì)算輸入的特征與權(quán)重值的線性組合，這可以通過(guò)矢量化和矢量傳播法則來(lái)對(duì)所有訓(xùn)練樣本進(jìn)行處理：

其中 X 是所有訓(xùn)練示例的維度矩陣，其形式為；·表示點(diǎn)積。

第 2 步：用 Heaviside step 函數(shù)作為激活函數(shù)，其返回一個(gè)二進(jìn)制值：

第 3 步：使用感知器的學(xué)習(xí)規(guī)則來(lái)計(jì)算權(quán)重向量和偏置量的更新值。

其中，表示學(xué)習(xí)率。

第 4 步：更新權(quán)重向量和偏置量。

In [1]:

import numpy as npimport matplotlib.pyplot as pltfrom sklearn.datasets import make_blobsfrom sklearn.model_selection import train_test_splitnp.random.seed(123)% matplotlib inline

數(shù)據(jù)集

In [2]:

X, y = make_blobs(n_samples=1000, centers=2)fig = plt.figure(figsize=(8,6))plt.scatter(X[:,0], X[:,1], c=y)plt.title("Dataset")plt.xlabel("First feature")plt.ylabel("Second feature")plt.show()

In [3]:

y_true = y[:, np.newaxis]X_train, X_test, y_train, y_test = train_test_split(X, y_true)print(f'Shape X_train: {X_train.shape}')print(f'Shape y_train: {y_train.shape})')print(f'Shape X_test: {X_test.shape}')print(f'Shape y_test: {y_test.shape}')

Shape X_train: (750, 2)

Shape y_train: (750, 1))

Shape X_test: (250, 2)

Shape y_test: (250, 1)

感知器分類

In [6]:

class Perceptron(): def __init__(self): pass def train(self, X, y, learning_rate=0.05, n_iters=100): n_samples, n_features = X.shape # Step 0: Initialize the parameters self.weights = np.zeros((n_features,1)) self.bias = 0 for i in range(n_iters): # Step 1: Compute the activation a = np.dot(X, self.weights) + self.bias # Step 2: Compute the output y_predict = self.step_function(a) # Step 3: Compute weight updates delta_w = learning_rate * np.dot(X.T, (y - y_predict)) delta_b = learning_rate * np.sum(y - y_predict) # Step 4: Update the parameters self.weights += delta_w self.bias += delta_b return self.weights, self.bias def step_function(self, x): return np.array([1 if elem >= 0 else 0 for elem in x])[:, np.newaxis] def predict(self, X): a = np.dot(X, self.weights) + self.bias return self.step_function(a)

初始化并訓(xùn)練模型

In [7]:

p = Perceptron()w_trained, b_trained = p.train(X_train, y_train,learning_rate=0.05, n_iters=500)

測(cè)試

In [10]:

y_p_train = p.predict(X_train)y_p_test = p.predict(X_test)print(f"training accuracy: {100 - np.mean(np.abs(y_p_train - y_train)) * 100}%")print(f"test accuracy: {100 - np.mean(np.abs(y_p_test - y_test)) * 100}%")

training accuracy: 100.0%

test accuracy: 100.0%

可視化決策邊界

In [13]:

def plot_hyperplane(X, y, weights, bias): """ Plots the dataset and the estimated decision hyperplane """ slope = - weights[0]/weights[1] intercept = - bias/weights[1] x_hyperplane = np.linspace(-10,10,10) y_hyperplane = slope * x_hyperplane + intercept fig = plt.figure(figsize=(8,6)) plt.scatter(X[:,0], X[:,1], c=y) plt.plot(x_hyperplane, y_hyperplane, '-') plt.title("Dataset and fitted decision hyperplane") plt.xlabel("First feature") plt.ylabel("Second feature") plt.show()

In [14]:

plot_hyperplane(X, y, w_trained, b_trained)

▌4. K 最近鄰算法

k-nn 算法是一種簡(jiǎn)單的監(jiān)督式的機(jī)器學(xué)習(xí)算法，可以用于解決分類和回歸問(wèn)題。這是一個(gè)基于實(shí)例的算法，并不是估算模型，而是將所有訓(xùn)練樣本存儲(chǔ)在內(nèi)存中，并使用相似性度量進(jìn)行預(yù)測(cè)。

給定一個(gè)輸入示例，k-nn 算法將從內(nèi)存中檢索 k 個(gè)最相似的實(shí)例。相似性是根據(jù)距離來(lái)定義的，也就是說(shuō)，與輸入示例之間距離最小(歐幾里得距離)的訓(xùn)練樣本被認(rèn)為是最相似的樣本。

輸入示例的目標(biāo)值計(jì)算如下：

分類問(wèn)題：

a) 不加權(quán)：輸出 k 個(gè)最近鄰中最常見(jiàn)的分類

b) 加權(quán)：將每個(gè)分類值的k個(gè)最近鄰的權(quán)重相加，輸出權(quán)重最高的分類

回歸問(wèn)題：

a) 不加權(quán)：輸出k個(gè)最近鄰值的平均值

b) 加權(quán)：對(duì)于所有分類值，將分類值加權(quán)求和并將結(jié)果除以所有權(quán)重的總和

加權(quán)版本的 k-nn 算法是改進(jìn)版本，其中每個(gè)近鄰的貢獻(xiàn)值根據(jù)其與查詢點(diǎn)之間的距離進(jìn)行加權(quán)。下面，我們?cè)?sklearn 用 k-nn 算法的原始版本實(shí)現(xiàn)數(shù)字?jǐn)?shù)據(jù)集的分類。

In [1]:

import numpy as npimport matplotlib.pyplot as pltfrom sklearn.datasets import load_digitsfrom sklearn.model_selection import train_test_splitnp.random.seed(123)% matplotlib inline

數(shù)據(jù)集

In [2]:

# We will use the digits dataset as an example. It consists of the 1797 images of hand-written digits. Each digit is# represented by a 64-dimensional vector of pixel values.digits = load_digits()X, y = digits.data, digits.targetX_train, X_test, y_train, y_test = train_test_split(X, y)print(f'X_train shape: {X_train.shape}')print(f'y_train shape: {y_train.shape}')print(f'X_test shape: {X_test.shape}')print(f'y_test shape: {y_test.shape}')# Example digitsfig = plt.figure(figsize=(10,8))for i in range(10): ax = fig.add_subplot(2, 5, i+1) plt.imshow(X[i].reshape((8,8)), cmap='gray')

X_train shape: (1347, 64)

y_train shape: (1347,)

X_test shape: (450, 64)

y_test shape: (450,)

K 最鄰近類別

In [3]:

class kNN(): def __init__(self): pass def fit(self, X, y): self.data = X self.targets = y def euclidean_distance(self, X): """ Computes the euclidean distance between the training data and a new input example or matrix of input examples X """ # input: single data point if X.ndim == 1: l2 = np.sqrt(np.sum((self.data - X)**2, axis=1)) # input: matrix of data points if X.ndim == 2: n_samples, _ = X.shape l2 = [np.sqrt(np.sum((self.data - X[i])**2, axis=1)) for i in range(n_samples)] return np.array(l2) def predict(self, X, k=1): """ Predicts the classification for an input example or matrix of input examples X """ # step 1: compute distance between input and training data dists = self.euclidean_distance(X) # step 2: find the k nearest neighbors and their classifications if X.ndim == 1: if k == 1: nn = np.argmin(dists) return self.targets[nn] else: knn = np.argsort(dists)[:k] y_knn = self.targets[knn] max_vote = max(y_knn, key=list(y_knn).count) return max_vote if X.ndim == 2: knn = np.argsort(dists)[:, :k] y_knn = self.targets[knn] if k == 1: return y_knn.T else: n_samples, _ = X.shape max_votes = [max(y_knn[i], key=list(y_knn[i]).count) for i in range(n_samples)] return max_votes

初始化并訓(xùn)練模型

In [11]:

knn = kNN()knn.fit(X_train, y_train)print("Testing one datapoint, k=1")print(f"Predicted label: {knn.predict(X_test[0], k=1)}")print(f"True label: {y_test[0]}")print()print("Testing one datapoint, k=5")print(f"Predicted label: {knn.predict(X_test[20], k=5)}")print(f"True label: {y_test[20]}")print()print("Testing 10 datapoint, k=1")print(f"Predicted labels: {knn.predict(X_test[5:15], k=1)}")print(f"True labels: {y_test[5:15]}")print()print("Testing 10 datapoint, k=4")print(f"Predicted labels: {knn.predict(X_test[5:15], k=4)}")print(f"True labels: {y_test[5:15]}")print()

Testing one datapoint, k=1

Predicted label: 3

True label: 3

Testing one datapoint, k=5

Predicted label: 9

True label: 9

Testing 10 datapoint, k=1

Predicted labels: [[3 1 0 7 4 0 0 5 1 6]]

True labels: [3 1 0 7 4 0 0 5 1 6]

Testing 10 datapoint, k=4

Predicted labels: [3, 1, 0, 7, 4, 0, 0, 5, 1, 6]

True labels: [3 1 0 7 4 0 0 5 1 6]

測(cè)試集精度

In [12]:

# Compute accuracy on test sety_p_test1 = knn.predict(X_test, k=1)test_acc1= np.sum(y_p_test1[0] == y_test)/len(y_p_test1[0]) * 100print(f"Test accuracy with k = 1: {format(test_acc1)}")y_p_test8 = knn.predict(X_test, k=5)test_acc8= np.sum(y_p_test8 == y_test)/len(y_p_test8) * 100print(f"Test accuracy with k = 8: {format(test_acc8)}")

Test accuracy with k = 1: 97.77777777777777

Test accuracy with k = 8: 97.55555555555556

▌5. K均值聚類算法

K-Means 是一種非常簡(jiǎn)單的聚類算法(聚類算法都屬于無(wú)監(jiān)督學(xué)習(xí))。給定固定數(shù)量的聚類和輸入數(shù)據(jù)集，該算法試圖將數(shù)據(jù)劃分為聚類，使得聚類內(nèi)部具有較高的相似性，聚類與聚類之間具有較低的相似性。

算法原理

1. 初始化聚類中心，或者在輸入數(shù)據(jù)范圍內(nèi)隨機(jī)選擇，或者使用一些現(xiàn)有的訓(xùn)練樣本(推薦)

2. 直到收斂

將每個(gè)數(shù)據(jù)點(diǎn)分配到最近的聚類。點(diǎn)與聚類中心之間的距離是通過(guò)歐幾里德距離測(cè)量得到的。
通過(guò)將聚類中心的當(dāng)前估計(jì)值設(shè)置為屬于該聚類的所有實(shí)例的平均值，來(lái)更新它們的當(dāng)前估計(jì)值。

目標(biāo)函數(shù)

聚類算法的目標(biāo)函數(shù)試圖找到聚類中心，以便數(shù)據(jù)將劃分到相應(yīng)的聚類中，并使得數(shù)據(jù)與其最接近的聚類中心之間的距離盡可能小。

給定一組數(shù)據(jù)X1，...，Xn和一個(gè)正數(shù)k，找到k個(gè)聚類中心C1，...，Ck并最小化目標(biāo)函數(shù)：

這里：

決定了數(shù)據(jù)點(diǎn)是否屬于類
表示類的聚類中心
表示歐幾里得距離

K-Means 算法的缺點(diǎn)：

聚類的個(gè)數(shù)在開始就要設(shè)定
聚類的結(jié)果取決于初始設(shè)定的聚類中心
對(duì)異常值很敏感
不適合用于發(fā)現(xiàn)非凸聚類問(wèn)題
該算法不能保證能夠找到全局最優(yōu)解，因此它往往會(huì)陷入一個(gè)局部最優(yōu)解

In [21]:

import numpy as npimport matplotlib.pyplot as pltimport randomfrom sklearn.datasets import make_blobsnp.random.seed(123)% matplotlib inline

數(shù)據(jù)集

In [22]:

X, y = make_blobs(centers=4, n_samples=1000)print(f'Shape of dataset: {X.shape}')fig = plt.figure(figsize=(8,6))plt.scatter(X[:,0], X[:,1], c=y)plt.title("Dataset with 4 clusters")plt.xlabel("First feature")plt.ylabel("Second feature")plt.show()

Shape of dataset: (1000, 2)

K均值分類

In [23]:

class KMeans(): def __init__(self, n_clusters=4): self.k = n_clusters def fit(self, data): """ Fits the k-means model to the given dataset """ n_samples, _ = data.shape # initialize cluster centers self.centers = np.array(random.sample(list(data), self.k)) self.initial_centers = np.copy(self.centers) # We will keep track of whether the assignment of data points # to the clusters has changed. If it stops changing, we are # done fitting the model old_assigns = None n_iters = 0 while True: new_assigns = [self.classify(datapoint) for datapoint in data] if new_assigns == old_assigns: print(f"Training finished after {n_iters} iterations!") return old_assigns = new_assigns n_iters += 1 # recalculate centers for id_ in range(self.k): points_idx = np.where(np.array(new_assigns) == id_) datapoints = data[points_idx] self.centers[id_] = datapoints.mean(axis=0) def l2_distance(self, datapoint): dists = np.sqrt(np.sum((self.centers - datapoint)**2, axis=1)) return dists def classify(self, datapoint): """ Given a datapoint, compute the cluster closest to the datapoint. Return the cluster ID of that cluster. """ dists = self.l2_distance(datapoint) return np.argmin(dists) def plot_clusters(self, data): plt.figure(figsize=(12,10)) plt.title("Initial centers in black, final centers in red") plt.scatter(data[:, 0], data[:, 1], marker='.', c=y) plt.scatter(self.centers[:, 0], self.centers[:,1], c='r') plt.scatter(self.initial_centers[:, 0], self.initial_centers[:,1], c='k') plt.show()

初始化并調(diào)整模型

kmeans = KMeans(n_clusters=4)kmeans.fit(X)

Training finished after 4 iterations!

描繪初始和最終的聚類中心

kmeans.plot_clusters(X)

▌6. 簡(jiǎn)單的神經(jīng)網(wǎng)絡(luò)

在這一章節(jié)里，我們將實(shí)現(xiàn)一個(gè)簡(jiǎn)單的神經(jīng)網(wǎng)絡(luò)架構(gòu)，將 2 維的輸入向量映射成二進(jìn)制輸出值。我們的神經(jīng)網(wǎng)絡(luò)有 2 個(gè)輸入神經(jīng)元，含 6 個(gè)隱藏神經(jīng)元隱藏層及 1 個(gè)輸出神經(jīng)元。

我們將通過(guò)層之間的權(quán)重矩陣來(lái)表示神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)。在下面的例子中，輸入層和隱藏層之間的權(quán)重矩陣將被表示為，隱藏層和輸出層之間的權(quán)重矩陣為。除了連接神經(jīng)元的權(quán)重向量外，每個(gè)隱藏和輸出的神經(jīng)元都會(huì)有一個(gè)大小為 1 的偏置量。

我們的訓(xùn)練集由 m = 750 個(gè)樣本組成。因此，我們的矩陣維度如下：

訓(xùn)練集維度： X = (750，2)
目標(biāo)維度： Y = (750，1)
維度：(m，nhidden) = (2,6)
維度：(bias vector)：(1，nhidden) = (1,6)
維度： (nhidden，noutput)= (6,1)
維度：(bias vector)：(1，noutput) = (1,1)

損失函數(shù)

我們使用與 Logistic 回歸算法相同的損失函數(shù)：

對(duì)于多類別的分類任務(wù)，我們將使用這個(gè)函數(shù)的通用形式作為損失函數(shù)，稱之為分類交叉熵函數(shù)。

訓(xùn)練

我們將用梯度下降法來(lái)訓(xùn)練我們的神經(jīng)網(wǎng)絡(luò)，并通過(guò)反向傳播法來(lái)計(jì)算所需的偏導(dǎo)數(shù)。訓(xùn)練過(guò)程主要有以下幾個(gè)步驟：

1. 初始化參數(shù)(即權(quán)重量和偏差量)

2. 重復(fù)以下過(guò)程，直到收斂：

通過(guò)網(wǎng)絡(luò)傳播當(dāng)前輸入的批次大小，并計(jì)算所有隱藏和輸出單元的激活值和輸出值。
針對(duì)每個(gè)參數(shù)計(jì)算其對(duì)損失函數(shù)的偏導(dǎo)數(shù)
更新參數(shù)

前向傳播過(guò)程

首先，我們計(jì)算網(wǎng)絡(luò)中每個(gè)單元的激活值和輸出值。為了加速這個(gè)過(guò)程的實(shí)現(xiàn)，我們不會(huì)單獨(dú)為每個(gè)輸入樣本執(zhí)行此操作，而是通過(guò)矢量化對(duì)所有樣本一次性進(jìn)行處理。其中：

表示對(duì)所有訓(xùn)練樣本激活隱層單元的矩陣
表示對(duì)所有訓(xùn)練樣本輸出隱層單位的矩陣

隱層神經(jīng)元將使用 tanh 函數(shù)作為其激活函數(shù)：

輸出層神經(jīng)元將使用 sigmoid 函數(shù)作為激活函數(shù)：

激活值和輸出值計(jì)算如下(·表示點(diǎn)乘)：

反向傳播過(guò)程

為了計(jì)算權(quán)重向量的更新值，我們需要計(jì)算每個(gè)神經(jīng)元對(duì)損失函數(shù)的偏導(dǎo)數(shù)。這里不會(huì)給出這些公式的推導(dǎo)，你會(huì)在其他網(wǎng)站上找到很多更好的解釋(https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/)。

對(duì)于輸出神經(jīng)元，梯度計(jì)算如下(矩陣符號(hào))：

對(duì)于輸入和隱層的權(quán)重矩陣，梯度計(jì)算如下：

權(quán)重更新

In [3]:

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom sklearn.datasets import make_circlesfrom sklearn.model_selection import train_test_splitnp.random.seed(123)% matplotlib inline

數(shù)據(jù)集

In [4]:

X, y = make_circles(n_samples=1000, factor=0.5, noise=.1)fig = plt.figure(figsize=(8,6))plt.scatter(X[:,0], X[:,1], c=y)plt.xlim([-1.5, 1.5])plt.ylim([-1.5, 1.5])plt.title("Dataset")plt.xlabel("First feature")plt.ylabel("Second feature")plt.show()

In [5]:

# reshape targets to get column vector with shape (n_samples, 1)y_true = y[:, np.newaxis]# Split the data into a training and test setX_train, X_test, y_train, y_test = train_test_split(X, y_true)print(f'Shape X_train: {X_train.shape}')print(f'Shape y_train: {y_train.shape}')print(f'Shape X_test: {X_test.shape}')print(f'Shape y_test: {y_test.shape}')

Shape X_train: (750, 2)

Shape y_train: (750, 1)

Shape X_test: (250, 2)

Shape y_test: (250, 1)

Neural Network Class

以下部分實(shí)現(xiàn)受益于吳恩達(dá)的課程

https://www.coursera.org/learn/neural-networks-deep-learning

class NeuralNet(): def __init__(self, n_inputs, n_outputs, n_hidden): self.n_inputs = n_inputs self.n_outputs = n_outputs self.hidden = n_hidden # Initialize weight matrices and bias vectors self.W_h = np.random.randn(self.n_inputs, self.hidden) self.b_h = np.zeros((1, self.hidden)) self.W_o = np.random.randn(self.hidden, self.n_outputs) self.b_o = np.zeros((1, self.n_outputs)) def sigmoid(self, a): return 1 / (1 + np.exp(-a)) def forward_pass(self, X): """ Propagates the given input X forward through the net. Returns: A_h: matrix with activations of all hidden neurons for all input examples O_h: matrix with outputs of all hidden neurons for all input examples A_o: matrix with activations of all output neurons for all input examples O_o: matrix with outputs of all output neurons for all input examples """ # Compute activations and outputs of hidden units A_h = np.dot(X, self.W_h) + self.b_h O_h = np.tanh(A_h) # Compute activations and outputs of output units A_o = np.dot(O_h, self.W_o) + self.b_o O_o = self.sigmoid(A_o) outputs = { "A_h": A_h, "A_o": A_o, "O_h": O_h, "O_o": O_o, } return outputs def cost(self, y_true, y_predict, n_samples): """ Computes and returns the cost over all examples """ # same cost function as in logistic regression cost = (- 1 / n_samples) * np.sum(y_true * np.log(y_predict) + (1 - y_true) * (np.log(1 - y_predict))) cost = np.squeeze(cost) assert isinstance(cost, float) return cost def backward_pass(self, X, Y, n_samples, outputs): """ Propagates the errors backward through the net. Returns: dW_h: partial derivatives of loss function w.r.t hidden weights db_h: partial derivatives of loss function w.r.t hidden bias dW_o: partial derivatives of loss function w.r.t output weights db_o: partial derivatives of loss function w.r.t output bias """ dA_o = (outputs["O_o"] - Y) dW_o = (1 / n_samples) * np.dot(outputs["O_h"].T, dA_o) db_o = (1 / n_samples) * np.sum(dA_o) dA_h = (np.dot(dA_o, self.W_o.T)) * (1 - np.power(outputs["O_h"], 2)) dW_h = (1 / n_samples) * np.dot(X.T, dA_h) db_h = (1 / n_samples) * np.sum(dA_h) gradients = { "dW_o": dW_o, "db_o": db_o, "dW_h": dW_h, "db_h": db_h, } return gradients def update_weights(self, gradients, eta): """ Updates the model parameters using a fixed learning rate """ self.W_o = self.W_o - eta * gradients["dW_o"] self.W_h = self.W_h - eta * gradients["dW_h"] self.b_o = self.b_o - eta * gradients["db_o"] self.b_h = self.b_h - eta * gradients["db_h"] def train(self, X, y, n_iters=500, eta=0.3): """ Trains the neural net on the given input data """ n_samples, _ = X.shape for i in range(n_iters): outputs = self.forward_pass(X) cost = self.cost(y, outputs["O_o"], n_samples=n_samples) gradients = self.backward_pass(X, y, n_samples, outputs) if i % 100 == 0: print(f'Cost at iteration {i}: {np.round(cost, 4)}') self.update_weights(gradients, eta) def predict(self, X): """ Computes and returns network predictions for given dataset """ outputs = self.forward_pass(X) y_pred = [1 if elem >= 0.5 else 0 for elem in outputs["O_o"]] return np.array(y_pred)[:, np.newaxis]

初始化并訓(xùn)練神經(jīng)網(wǎng)絡(luò)

nn = NeuralNet(n_inputs=2, n_hidden=6, n_outputs=1)print("Shape of weight matrices and bias vectors:")print(f'W_h shape: {nn.W_h.shape}')print(f'b_h shape: {nn.b_h.shape}')print(f'W_o shape: {nn.W_o.shape}')print(f'b_o shape: {nn.b_o.shape}')print()print("Training:")nn.train(X_train, y_train, n_iters=2000, eta=0.7)

Shape of weight matrices and bias vectors:

W_h shape: (2, 6)

b_h shape: (1, 6)

W_o shape: (6, 1)

b_o shape: (1, 1)

Training:

Cost at iteration 0: 1.0872

Cost at iteration 100: 0.2723

Cost at iteration 200: 0.1712

Cost at iteration 300: 0.1386

Cost at iteration 400: 0.1208

Cost at iteration 500: 0.1084

Cost at iteration 600: 0.0986

Cost at iteration 700: 0.0907

Cost at iteration 800: 0.0841

Cost at iteration 900: 0.0785

Cost at iteration 1000: 0.0739

Cost at iteration 1100: 0.0699

Cost at iteration 1200: 0.0665

Cost at iteration 1300: 0.0635

Cost at iteration 1400: 0.061

Cost at iteration 1500: 0.0587

Cost at iteration 1600: 0.0566

Cost at iteration 1700: 0.0547

Cost at iteration 1800: 0.0531

Cost at iteration 1900: 0.0515

測(cè)試神經(jīng)網(wǎng)絡(luò)

n_test_samples, _ = X_test.shapey_predict = nn.predict(X_test)print(f"Classification accuracy on test set: {(np.sum(y_predict == y_test)/n_test_samples)*100} %")

Classification accuracy on test set: 98.4 %

可視化決策邊界

X_temp, y_temp = make_circles(n_samples=60000, noise=.5)y_predict_temp = nn.predict(X_temp)y_predict_temp = np.ravel(y_predict_temp)

fig = plt.figure(figsize=(8,12))ax = fig.add_subplot(2,1,1)plt.scatter(X[:,0], X[:,1], c=y)plt.xlim([-1.5, 1.5])plt.ylim([-1.5, 1.5])plt.xlabel("First feature")plt.ylabel("Second feature")plt.title("Training and test set")ax = fig.add_subplot(2,1,2)plt.scatter(X_temp[:,0], X_temp[:,1], c=y_predict_temp)plt.xlim([-1.5, 1.5])plt.ylim([-1.5, 1.5])plt.xlabel("First feature")plt.ylabel("Second feature")plt.title("Decision boundary")

Out[11]:Text(0.5,1,'Decision boundary')

▌7. Softmax 回歸算法

Softmax 回歸算法，又稱為多項(xiàng)式或多類別的 Logistic 回歸算法。

給定：

數(shù)據(jù)集

是d-維向量
是對(duì)應(yīng)于的目標(biāo)變量，例如對(duì)于K=3分類問(wèn)題，

Softmax 回歸模型有以下幾個(gè)特點(diǎn)：

對(duì)于每個(gè)類別，都存在一個(gè)獨(dú)立的、實(shí)值加權(quán)向量
這個(gè)權(quán)重向量通常作為權(quán)重矩陣中的行。
對(duì)于每個(gè)類別，都存在一個(gè)獨(dú)立的、實(shí)值偏置量b
它使用 softmax 函數(shù)作為其激活函數(shù)
它使用交叉熵( cross-entropy )作為損失函數(shù)

訓(xùn)練 Softmax 回歸模型有不同步驟。首先(在步驟0中)，模型的參數(shù)將被初始化。在達(dá)到指定訓(xùn)練次數(shù)或參數(shù)收斂前，重復(fù)以下其他步驟。

第 0 步：用 0 (或小的隨機(jī)值)來(lái)初始化權(quán)重向量和偏置值

第 1 步：對(duì)于每個(gè)類別k，計(jì)算其輸入的特征與權(quán)重值的線性組合，也就是說(shuō)為每個(gè)類別的訓(xùn)練樣本計(jì)算一個(gè)得分值。對(duì)于類別k，輸入向量為,則得分值的計(jì)算如下：

其中表示類別k的權(quán)重矩陣，·表示點(diǎn)積。

我們可以通過(guò)矢量化和矢量傳播法則計(jì)算所有類別及其訓(xùn)練樣本的得分值：

其中 X 是所有訓(xùn)練樣本的維度矩陣，W 表示每個(gè)類別的權(quán)重矩陣維度，其形式為；

第 2 步：用 softmax 函數(shù)作為激活函數(shù)，將得分值轉(zhuǎn)化為概率值形式。屬于類別 k 的輸入向量的概率值為：

同樣地，我們可以通過(guò)矢量化來(lái)對(duì)所有類別同時(shí)處理，得到其概率輸出。模型預(yù)測(cè)出的表示的是該類別的最高概率。

第 3 步：計(jì)算整個(gè)訓(xùn)練集的損失值。

我們希望模型預(yù)測(cè)出的高概率值是目標(biāo)類別，而低概率值表示其他類別。這可以通過(guò)以下的交叉熵損失函數(shù)來(lái)實(shí)現(xiàn)：

在上面公式中，目標(biāo)類別標(biāo)簽表示成獨(dú)熱編碼形式( one-hot )。因此為1時(shí)表示的目標(biāo)類別是 k，反之則為 0。

第 4 步：對(duì)權(quán)重向量和偏置量，計(jì)算其對(duì)損失函數(shù)的梯度。

關(guān)于這個(gè)導(dǎo)數(shù)實(shí)現(xiàn)的詳細(xì)解釋，可以參見(jiàn)這里（http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/）。

一般形式如下：

對(duì)于偏置量的導(dǎo)數(shù)計(jì)算，此時(shí)為1。

第 5 步：對(duì)每個(gè)類別k，更新其權(quán)重和偏置值。

其中，表示學(xué)習(xí)率。

In [1]:

from sklearn.datasets import load_irisimport numpy as npfrom sklearn.model_selection import train_test_splitfrom sklearn.datasets import make_blobsimport matplotlib.pyplot as pltnp.random.seed(13)

數(shù)據(jù)集

In [2]:

X, y_true = make_blobs(centers=4, n_samples = 5000)fig = plt.figure(figsize=(8,6))plt.scatter(X[:,0], X[:,1], c=y_true)plt.title("Dataset")plt.xlabel("First feature")plt.ylabel("Second feature")plt.show()

In [3]:

# reshape targets to get column vector with shape (n_samples, 1)y_true = y_true[:, np.newaxis]# Split the data into a training and test setX_train, X_test, y_train, y_test = train_test_split(X, y_true)print(f'Shape X_train: {X_train.shape}')print(f'Shape y_train: {y_train.shape}')print(f'Shape X_test: {X_test.shape}')print(f'Shape y_test: {y_test.shape}')

Shape X_train: (3750, 2)

Shape y_train: (3750, 1)

Shape X_test: (1250, 2)

Shape y_test: (1250, 1)

Softmax回歸分類

class SoftmaxRegressor: def __init__(self): pass def train(self, X, y_true, n_classes, n_iters=10, learning_rate=0.1): """ Trains a multinomial logistic regression model on given set of training data """ self.n_samples, n_features = X.shape self.n_classes = n_classes self.weights = np.random.rand(self.n_classes, n_features) self.bias = np.zeros((1, self.n_classes)) all_losses = [] for i in range(n_iters): scores = self.compute_scores(X) probs = self.softmax(scores) y_predict = np.argmax(probs, axis=1)[:, np.newaxis] y_one_hot = self.one_hot(y_true) loss = self.cross_entropy(y_one_hot, probs) all_losses.append(loss) dw = (1 / self.n_samples) * np.dot(X.T, (probs - y_one_hot)) db = (1 / self.n_samples) * np.sum(probs - y_one_hot, axis=0) self.weights = self.weights - learning_rate * dw.T self.bias = self.bias - learning_rate * db if i % 100 == 0: print(f'Iteration number: {i}, loss: {np.round(loss, 4)}') return self.weights, self.bias, all_losses def predict(self, X): """ Predict class labels for samples in X. Args: X: numpy array of shape (n_samples, n_features) Returns: numpy array of shape (n_samples, 1) with predicted classes """ scores = self.compute_scores(X) probs = self.softmax(scores) return np.argmax(probs, axis=1)[:, np.newaxis] def softmax(self, scores): """ Tranforms matrix of predicted scores to matrix of probabilities Args: scores: numpy array of shape (n_samples, n_classes) with unnormalized scores Returns: softmax: numpy array of shape (n_samples, n_classes) with probabilities """ exp = np.exp(scores) sum_exp = np.sum(np.exp(scores), axis=1, keepdims=True) softmax = exp / sum_exp return softmax def compute_scores(self, X): """ Computes class-scores for samples in X Args: X: numpy array of shape (n_samples, n_features) Returns: scores: numpy array of shape (n_samples, n_classes) """ return np.dot(X, self.weights.T) + self.bias def cross_entropy(self, y_true, scores): loss = - (1 / self.n_samples) * np.sum(y_true * np.log(scores)) return loss def one_hot(self, y): """ Tranforms vector y of labels to one-hot encoded matrix """ one_hot = np.zeros((self.n_samples, self.n_classes)) one_hot[np.arange(self.n_samples), y.T] = 1 return one_hot

初始化并訓(xùn)練模型

regressor = SoftmaxRegressor()w_trained, b_trained, loss = regressor.train(X_train, y_train, learning_rate=0.1, n_iters=800, n_classes=4)fig = plt.figure(figsize=(8,6))plt.plot(np.arange(800), loss)plt.title("Development of loss during training")plt.xlabel("Number of iterations")plt.ylabel("Loss")plt.show()Iteration number: 0, loss: 1.393

Iteration number: 100, loss: 0.2051

Iteration number: 200, loss: 0.1605

Iteration number: 300, loss: 0.1371

Iteration number: 400, loss: 0.121

CDA數(shù)據(jù)分析師考試相關(guān)入口一覽（建議收藏）：

? 想報(bào)名CDA認(rèn)證考試，點(diǎn)擊>>> “CDA報(bào)名” 了解CDA考試詳情；

? 想學(xué)習(xí)CDA考試教材，點(diǎn)擊>>> “CDA教材” 了解CDA考試詳情；

? 想加入CDA考試題庫(kù)，點(diǎn)擊>>> “CDA題庫(kù)” 了解CDA考試詳情；

? 想了解CDA考試含金量，點(diǎn)擊>>> “CDA含金量” 了解CDA考試詳情；

numpy 神經(jīng)網(wǎng)絡(luò) 損失函數(shù) matplotlib 線性回歸特征機(jī)器學(xué)習(xí) 無(wú)監(jiān)督

數(shù)據(jù)分析咨詢請(qǐng)掃描二維碼

若不方便掃碼，搜微信號(hào)：CDAshujufenxi

上一篇中國(guó)婚姻大數(shù)據(jù)告訴你，為什么近10年初婚的人越來(lái)越少

下一篇大數(shù)據(jù)殺熟：無(wú)關(guān)技術(shù)，關(guān)乎倫理

CDA報(bào)考指南

報(bào)考流程
考試時(shí)間
報(bào)名費(fèi)用
聯(lián)系我們

數(shù)據(jù)分析學(xué)習(xí)

數(shù)據(jù)分析師資訊

京公網(wǎng)安備 11010802034615號(hào) 經(jīng)營(yíng)許可證編號(hào)：京B2-20210330

聯(lián)系電話：13321103290 (微信同號(hào))

免密碼登錄

提交首次登錄驗(yàn)證后自動(dòng)注冊(cè)

99999久久久久久亚洲,欧美人与禽猛交狂配,高清日韩av在线影院,一个人在线高清免费观看,啦啦啦在线视频免费观看www

小姐姐帶你一起學(xué)：如何用Python實(shí)現(xiàn)7種機(jī)器學(xué)習(xí)算法（附代碼）

數(shù)據(jù)分析師考試動(dòng)態(tài)

CDA報(bào)考指南

數(shù)據(jù)分析學(xué)習(xí)

數(shù)據(jù)分析師資訊

【CDA干貨】LSTM 模型輸入長(zhǎng)度選擇技巧：提升序列建 ...

CDA 數(shù)據(jù)分析師報(bào)考條件詳解與準(zhǔn)備指南 ...

【CDA干貨】數(shù)據(jù)透視表中兩列相乘合計(jì)的實(shí)用指南 ...

CDA 認(rèn)證考試大綱 2025 重磅更新：一二級(jí)考綱變化匯 ...

BI 大數(shù)據(jù)分析師：連接數(shù)據(jù)與業(yè)務(wù)的價(jià)值轉(zhuǎn)化者 ...

SQL 在預(yù)測(cè)分析中的應(yīng)用：從數(shù)據(jù)查詢到趨勢(shì)預(yù)判 ...

數(shù)據(jù)查詢結(jié)束后：分析師的收尾工作與價(jià)值深化 ...

CDA 數(shù)據(jù)分析師考試：從報(bào)考到取證的全攻略 ...

【CDA干貨】單樣本趨勢(shì)性檢驗(yàn)：捕捉數(shù)據(jù)背后的時(shí)間 ...

year_month數(shù)據(jù)類型：時(shí)間維度的精準(zhǔn)切片 ...

CDA 備考干貨：Python 在數(shù)據(jù)分析中的核心應(yīng)用與實(shí) ...

【CDA干貨】SPSS 中的 Mann-Kendall 檢驗(yàn)：數(shù)據(jù)趨勢(shì) ...

備戰(zhàn) CDA 數(shù)據(jù)分析師考試：需要多久？如何規(guī)劃？ ...

【CDA干貨】LSTM 輸出不確定的成因、影響與應(yīng)對(duì)策略 ...

統(tǒng)計(jì)學(xué)方法在市場(chǎng)調(diào)研數(shù)據(jù)中的深度應(yīng)用 ...

CDA數(shù)據(jù)分析師證書考試全攻略

剖析 CDA 數(shù)據(jù)分析師考試題型：解鎖高效備考與答題 ...

【CDA干貨】SQL Server 字符串截取轉(zhuǎn)日期：解鎖數(shù)據(jù) ...

CDA 數(shù)據(jù)分析師視角：從數(shù)據(jù)迷霧中探尋商業(yè)真相 ...

CDA 數(shù)據(jù)分析師：開啟數(shù)據(jù)職業(yè)發(fā)展新征程 ...

CDA教育閉環(huán)

常見(jiàn)問(wèn)題

關(guān)于我們

CDA數(shù)據(jù)分析師公眾號(hào)

CDA考試中心小程序

CDA數(shù)據(jù)分析師App下載

99999久久久久久亚洲,欧美人与禽猛交狂配,高清日韩av在线影院,一个人在线高清免费观看,啦啦啦在线视频免费观看www

小姐姐帶你一起學(xué)：如何用Python實(shí)現(xiàn)7種機(jī)器學(xué)習(xí)算法（附代碼）

數(shù)據(jù)分析師考試動(dòng)態(tài)

CDA報(bào)考指南

數(shù)據(jù)分析學(xué)習(xí)

數(shù)據(jù)分析師資訊

【CDA干貨】LSTM 模型輸入長(zhǎng)度選擇技巧：提升序列建 ...

CDA 數(shù)據(jù)分析師報(bào)考條件詳解與準(zhǔn)備指南 ...

【CDA干貨】數(shù)據(jù)透視表中兩列相乘合計(jì)的實(shí)用指南 ...

CDA 認(rèn)證考試大綱 2025 重磅更新：一二級(jí)考綱變化匯 ...

BI 大數(shù)據(jù)分析師：連接數(shù)據(jù)與業(yè)務(wù)的價(jià)值轉(zhuǎn)化者 ...

SQL 在預(yù)測(cè)分析中的應(yīng)用：從數(shù)據(jù)查詢到趨勢(shì)預(yù)判 ...

數(shù)據(jù)查詢結(jié)束后：分析師的收尾工作與價(jià)值深化 ...

CDA 數(shù)據(jù)分析師考試：從報(bào)考到取證的全攻略 ...

【CDA干貨】單樣本趨勢(shì)性檢驗(yàn)：捕捉數(shù)據(jù)背后的時(shí)間 ...

year_month數(shù)據(jù)類型：時(shí)間維度的精準(zhǔn)切片 ...

CDA 備考干貨：Python 在數(shù)據(jù)分析中的核心應(yīng)用與實(shí) ...

【CDA干貨】SPSS 中的 Mann-Kendall 檢驗(yàn)：數(shù)據(jù)趨勢(shì) ...

備戰(zhàn) CDA 數(shù)據(jù)分析師考試：需要多久？如何規(guī)劃？ ...

【CDA干貨】LSTM 輸出不確定的成因、影響與應(yīng)對(duì)策略 ...

統(tǒng)計(jì)學(xué)方法在市場(chǎng)調(diào)研數(shù)據(jù)中的深度應(yīng)用 ...

CDA數(shù)據(jù)分析師證書考試全攻略

剖析 CDA 數(shù)據(jù)分析師考試題型：解鎖高效備考與答題 ...

【CDA干貨】SQL Server 字符串截取轉(zhuǎn)日期：解鎖數(shù)據(jù) ...

CDA 數(shù)據(jù)分析師視角：從數(shù)據(jù)迷霧中探尋商業(yè)真相 ...

CDA 數(shù)據(jù)分析師：開啟數(shù)據(jù)職業(yè)發(fā)展新征程 ...

CDA教育閉環(huán)

常見(jiàn)問(wèn)題

關(guān)于我們

CDA數(shù)據(jù)分析師公眾號(hào)

CDA考試中心小程序

CDA數(shù)據(jù)分析師App下載

備戰(zhàn) CDA 數(shù)據(jù)分析師考試：需要多久？如何規(guī)劃？ ...

【CDA干貨】LSTM 輸出不確定的成因、影響與應(yīng)對(duì)策略 ...