和KNN一样,使用SVM之前要做数据标准化处理,因为SVM算法涉及距离。
尺度不平衡的例子:

数据标准化之后:

准备数据

import numpy as np
import matplotlib.pyplot as plt

from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
y = iris.target

X = X[y<2,:2]
y = y[y<2]

plt.scatter(X[y==0,0],X[y==0,1], color='red')
plt.scatter(X[y==1,0],X[y==1,1], color='blue')
plt.show()

数据标准化

from sklearn.preprocessing import StandardScaler

standardScaler = StandardScaler()
standardScaler.fit(X)
X_standard = standardScaler.transform(X)

训练hard SVN模型

from sklearn.svm import LinearSVC

# Support Vector Classifier
svc = LinearSVC(C=1e9)  # C 越大,越hard
svc.fit(X_standard, y)

分类效果

def plot_decision_boundary(model, axis):
    x0, x1 = np.meshgrid(
        np.linspace(axis[0], axis[1], int((axis[1]-axis[0])*100)).reshape(-1,1),
        np.linspace(axis[2], axis[3], int((axis[3]-axis[2])*100)).reshape(-1,1)
    )
    X_new = np.c_[x0.ravel(), x1.ravel()]

    y_predict = model.predict(X_new)
    zz = y_predict.reshape(x0.shape)

    from matplotlib.colors import ListedColormap
    custom_cmap = ListedColormap(['#EF9A9A','#FFF59D','#90CAF9'])

    plt.contourf(x0, x1, zz, cmap=custom_cmap)

plot_decision_boundary(svc, axis=[-3,3,-3,3])
plt.scatter(X_standard[y==0,0],X_standard[y==0,1], color='red')
plt.scatter(X_standard[y==1,0],X_standard[y==1,1], color='blue')
plt.show()

训练soft SVN 模型

svc2 = LinearSVC(C=0.01)
svc2.fit(X_standard, y)

plot_decision_boundary(svc2, axis=[-3,3,-3,3])
plt.scatter(X_standard[y==0,0],X_standard[y==0,1], color='red')
plt.scatter(X_standard[y==1,0],X_standard[y==1,1], color='blue')
plt.show()

图中有一个点被错误地分类了,这是soft的效果

绘制margin

输入:svc.coef_
输出:array([[ 4.03240038, -2.50701084]])
样本中有两个特征,所以有2个系数,每个特征对应一个
输出是一个二维数组,因为sklearn提供的SVM算法可以处理多分类问题

输入:svc.intercept_
输出:array([0.92736326])

def plot_svc_decision_boundary(model, axis):
    x0, x1 = np.meshgrid(
        np.linspace(axis[0], axis[1], int((axis[1]-axis[0])*100)).reshape(-1,1),
        np.linspace(axis[2], axis[3], int((axis[3]-axis[2])*100)).reshape(-1,1)
    )
    X_new = np.c_[x0.ravel(), x1.ravel()]

    y_predict = model.predict(X_new)
    zz = y_predict.reshape(x0.shape)

    from matplotlib.colors import ListedColormap
    custom_cmap = ListedColormap(['#EF9A9A','#FFF59D','#90CAF9'])

    plt.contourf(x0, x1, zz, cmap=custom_cmap)

    # 取出model的系数,只取第0个决策边界
    w = model.coef_[0]
    b = model.intercept_[0]

    # 决策边界的直线方程:w0 * x0 + x1 * x1 + b = 0
    # 决策边界的斜率和截距 => x1 = -w0/w1 * x0 - b/w1
    plot_x = np.linspace(axis[0], axis[1], 200) # 绘制用的x
    up_y = -w[0]/w[1] * plot_x - b/w[1] + 1/w[1] # w0 * x0 + x1 * x1 + b = 1
    down_y = -w[0]/w[1] * plot_x - b/w[1] - 1/w[1] # w0 * x0 + x1 * x1 + b = -1
    # 过滤,防止y超出图像边界
    up_index = (up_y >= axis[2]) & (up_y <= axis[3])
    down_index = (down_y >= axis[2]) & (down_y <= axis[3])
    plt.plot(plot_x[up_index], up_y[up_index], color='black')
    plt.plot(plot_x[down_index], down_y[down_index], color='black')

svc的margin

plot_svc_decision_boundary(svc, axis=[-3,3,-3,3])
plt.scatter(X_standard[y==0,0],X_standard[y==0,1], color='red')
plt.scatter(X_standard[y==1,0],X_standard[y==1,1], color='blue')
plt.show()

svc2的margin

plot_svc_decision_boundary(svc2, axis=[-3,3,-3,3])
plt.scatter(X_standard[y==0,0],X_standard[y==0,1], color='red')
plt.scatter(X_standard[y==1,0],X_standard[y==1,1], color='blue')
plt.show()

Note 1:sklarn提供的SVM算法支持多分类,默认使用ovr算法 Note 2:sklarn提供的SVM算法支持正则化,默认使用L2范式