/**
* This class implements a simple perceptron (i.e., a single layer neural
* network). It converges if the supplied training dataset is linearly
* separable.
*
* @tparam LearnPolicy Options of SimpleWeightUpdate and GradientDescent.
* @tparam WeightInitializationPolicy Option of ZeroInitialization and
* RandomInitialization.
*/
template<typename LearnPolicy = SimpleWeightUpdate,
typename WeightInitializationPolicy = ZeroInitialization,
typename MatType = arma::mat>
class Perceptron
{
public:
/**
* Constructor: create the perceptron with the given number of classes and
* initialize the weight matrix, but do not perform any training. (Call the
* Train() function to perform training.)
*
* @param numClasses Number of classes in the dataset.
* @param dimensionality Dimensionality of the dataset.
* @param maxIterations Maximum number of iterations for the perceptron
* learning algorithm.
*/
Perceptron(const size_t numClasses = 0,
const size_t dimensionality = 0,
const size_t maxIterations = 1000);
首先是关于感知机整体的一些参数,例如学习策略是 SimpleWeightUpdate ,权重初始化的方式是 ZeroInitialization
然后是第一种初始化方法,提供三个数字,分别代表数据集的种类数,数据集的维度,以及最大迭代次数。
下面,我们来看一下它的具体实现:
/**
* Construct the perceptron with the given number of classes and maximum number
* of iterations.
*/
template<
typename LearnPolicy,
typename WeightInitializationPolicy,
typename MatType
>
Perceptron<LearnPolicy, WeightInitializationPolicy, MatType>::Perceptron(
const size_t numClasses,
const size_t dimensionality,
const size_t maxIterations) :
maxIterations(maxIterations)
{
WeightInitializationPolicy wip;
wip.Initialize(weights, biases, dimensionality, numClasses);
}
可以看到,它构造了一个类型是 WeightInitializationPolicy 的实例,接着调用该实例的 Initialize 方法,并将权重向量,偏置向量,数据维度,数据种类传递给它
按照头文件里的说明,这里的 WeightInitializationPolicy 默认是 ZeroInitialization ,因此我们不妨去看一下 ZeroInitialization 的实现:
/**
* This class is used to initialize the matrix weightVectors to zero.
*/
class ZeroInitialization
{
public:
ZeroInitialization() { }
inline static void Initialize(arma::mat& weights,
arma::vec& biases,
const size_t numFeatures,
const size_t numClasses)
{
weights.zeros(numFeatures, numClasses);
biases.zeros(numClasses);
}
}; // class ZeroInitialization
按照 Armadillo 的官方文档,分别将 weights 初始化为 (数据维度 * 数据种类) 的向量,biases 初始化为列向量,其元素个数等于数据集的种类数。它们的初始元素都为零,至于它们为什么是这个形状的向量,与该实现的 SimpleWeightUpdate 有关。
继续看完其余两个构造方法:
/**
* Constructor: constructs the perceptron by building the weights matrix,
* which is later used in classification. The number of classes should be
* specified separately, and the labels vector should contain values in the
* range [0, numClasses - 1]. The data::NormalizeLabels() function can be
* used if the labels vector does not contain values in the required range.
*
* @param data Input, training data.
* @param labels Labels of dataset.
* @param numClasses Number of classes in the dataset.
* @param maxIterations Maximum number of iterations for the perceptron
* learning algorithm.
*/
Perceptron(const MatType& data,
const arma::Row<size_t>& labels,
const size_t numClasses,
const size_t maxIterations = 1000);
实现:
/**
* Constructor - constructs the perceptron. Or rather, builds the weights
* matrix, which is later used in classification. It adds a bias input vector
* of 1 to the input data to take care of the bias weights.
*
* @param data Input, training data.
* @param labels Labels of dataset.
* @param maxIterations Maximum number of iterations for the perceptron learning
* algorithm.
*/
template<
typename LearnPolicy,
typename WeightInitializationPolicy,
typename MatType
>
Perceptron<LearnPolicy, WeightInitializationPolicy, MatType>::Perceptron(
const MatType& data,
const arma::Row<size_t>& labels,
const size_t numClasses,
const size_t maxIterations) :
maxIterations(maxIterations)
{
// Start training.
Train(data, labels, numClasses);
}
在该实现里,可以直接传入训练数据和标签,以及种类数,接着就用这些数据开始训练
还有一个复制构造函数:
/**
* Alternate constructor which copies parameters from an already initiated
* perceptron.
*
* @param other The other initiated Perceptron object from which we copy the
* values from.
* @param data The data on which to train this Perceptron object on.
* @param labels The labels of data.
* @param numClasses Number of classes in the data.
* @param instanceWeights Weight vector to use while training. For boosting
* purposes.
*/
Perceptron(const Perceptron& other,
const MatType& data,
const arma::Row<size_t>& labels,
const size_t numClasses,
const arma::rowvec& instanceWeights);
实现:
/**
* Alternate constructor which copies parameters from an already initiated
* perceptron.
*
* @param other The other initiated Perceptron object from which we copy the
* values from.
* @param data The data on which to train this Perceptron object on.
* @param instanceWeights Weight vector to use while training. For boosting
* purposes.
* @param labels The labels of data.
*/
template<
typename LearnPolicy,
typename WeightInitializationPolicy,
typename MatType
>
Perceptron<LearnPolicy, WeightInitializationPolicy, MatType>::Perceptron(
const Perceptron& other,
const MatType& data,
const arma::Row<size_t>& labels,
const size_t numClasses,
const arma::rowvec& instanceWeights) :
maxIterations(other.maxIterations)
{
Train(data, labels, numClasses, instanceWeights);
}
原理和上一个差不多,只不过在训练时多了一个 instanceWeights ,正如注释里提到,这个向量在训练时有用,下面我们去看一下训练的函数:
/**
* Training function. It trains on trainData using the cost matrix
* instanceWeights.
*
* @param data Data to train on.
* @param labels Labels of data.
* @param instanceWeights Cost matrix. Stores the cost of mispredicting
* instances. This is useful for boosting.
*/
template<
typename LearnPolicy,
typename WeightInitializationPolicy,
typename MatType
>
void Perceptron<LearnPolicy, WeightInitializationPolicy, MatType>::Train(
const MatType& data,
const arma::Row<size_t>& labels,
const size_t numClasses,
const arma::rowvec& instanceWeights)
{
// Do we need to resize the weights?
if (weights.n_elem != numClasses)
{
WeightInitializationPolicy wip;
wip.Initialize(weights, biases, data.n_rows, numClasses);
}
size_t j, i = 0;
bool converged = false;
size_t tempLabel;
arma::uword maxIndexRow = 0, maxIndexCol = 0;
arma::mat tempLabelMat;
LearnPolicy LP;
const bool hasWeights = (instanceWeights.n_elem > 0);
while ((i < maxIterations) && (!converged))
{
// This outer loop is for each iteration, and we use the 'converged'
// variable for noting whether or not convergence has been reached.
++i;
converged = true;
// Now this inner loop is for going through the dataset in each iteration.
for (j = 0; j < data.n_cols; ++j)
{
// Multiply for each variable and check whether the current weight vector
// correctly classifies this.
tempLabelMat = weights.t() * data.col(j) + biases;
tempLabelMat.max(maxIndexRow, maxIndexCol);
// Check whether prediction is correct.
if (maxIndexRow != labels(0, j))
{
// Due to incorrect prediction, convergence set to false.
converged = false;
tempLabel = labels(0, j);
// Send maxIndexRow for knowing which weight to update, send j to know
// the value of the vector to update it with. Send tempLabel to know
// the correct class.
if (hasWeights)
LP.UpdateWeights(data.col(j), weights, biases, maxIndexRow, tempLabel,
instanceWeights(j));
else
LP.UpdateWeights(data.col(j), weights, biases, maxIndexRow,
tempLabel);
}
}
}
}
开始是一些准备工作,然后对数据的每一列(注意mlpack的Load函数会自动转置读入的数据矩阵),用权重和偏置检查它的类别,如果误分类了,就将该数据,权重,偏置,错误的索引值,正确的索引值传递给更新权重的函数。
这里也就是 SimpleWeightUpdate :
/**
* This class is used to update the weightVectors matrix according to the simple
* update rule as discussed by Rosenblatt:
*
* if a vector x has been incorrectly classified by a weight w,
* then w = w - x
* and w'= w'+ x
*
* where w' is the weight vector which correctly classifies x.
*/
namespace mlpack {
namespace perceptron {
class SimpleWeightUpdate
{
public:
/**
* This function is called to update the weightVectors matrix. It decreases
* the weights of the incorrectly classified class while increasing the weight
* of the correct class it should have been classified to.
*
* @tparam Type of vector (should be an Armadillo vector like arma::vec or
* arma::sp_vec or something similar).
* @param trainingPoint Point that was misclassified.
* @param weights Matrix of weights.
* @param biases Vector of biases.
* @param incorrectClass Index of class that the point was incorrectly
* classified as.
* @param correctClass Index of the true class of the point.
* @param instanceWeight Weight to be given to this particular point during
* training (this is useful for boosting).
*/
template<typename VecType>
void UpdateWeights(const VecType& trainingPoint,
arma::mat& weights,
arma::vec& biases,
const size_t incorrectClass,
const size_t correctClass,
const double instanceWeight = 1.0)
{
weights.col(incorrectClass) -= instanceWeight * trainingPoint;
biases(incorrectClass) -= instanceWeight;
weights.col(correctClass) += instanceWeight * trainingPoint;
biases(correctClass) += instanceWeight;
}
};
} // namespace perceptron
} // namespace mlpack
更新的方法正如一开始所说,对于待分类的向量
x
x
x ,错误的权重向量
w
=
w
−
x
w = w - x
w=w−x ,正确的权重向量
w
′
=
w
′
+
x
w' = w' + x
w′=w′+x
而之前提到的 instanceWeights 在更新函数里被用作学习率的向量,默认的学习率为 1.0
权重向量则按照这个 instanceWeight 进行更新
最后还有分类函数:
/**
* Classification function. After training, use the weights matrix to classify
* test, and put the predicted classes in predictedLabels.
*
* @param test Testing data or data to classify.
* @param predictedLabels Vector to store the predicted classes after
* classifying test.
*/
template<
typename LearnPolicy,
typename WeightInitializationPolicy,
typename MatType
>
void Perceptron<LearnPolicy, WeightInitializationPolicy, MatType>::Classify(
const MatType& test,
arma::Row<size_t>& predictedLabels)
{
arma::vec tempLabelMat;
arma::uword maxIndex = 0;
// Could probably be faster if done in batch.
for (size_t i = 0; i < test.n_cols; ++i)
{
tempLabelMat = weights.t() * test.col(i) + biases;
tempLabelMat.max(maxIndex);
predictedLabels(0, i) = maxIndex;
}
}
和训练时差不多,用权重的转置乘以测试集的每一列(每一个数据点),再加上偏置,最后取最大元素的索引值作为分类的结果
用《统计学习方法》里,例2.1示范:正实例点是 x 1 = ( 3 , 3 ) T x_1=(3,3)^{\mathsf{T}} x1=(3,3)T, x 2 = ( 4 , 3 ) T x_2=(4,3)^{\mathsf{T}} x2=(4,3)T, 负实例点是 x 3 = ( 1 , 1 ) T x_3=(1,1)^{\mathsf{T}} x3=(1,1)T
数据比较少,迭代个10次就差不多了:
#include <iostream>
#include <mlpack/core.hpp>
#include <mlpack/methods/perceptron/perceptron.hpp>
using namespace mlpack;
using namespace mlpack::perceptron;
using namespace arma;
using namespace std;
int main()
{
mat dataset;
mlpack::data::Load("../ml_test/data/my_data.csv", dataset);
Row<size_t> labels;
labels = conv_to<decltype (labels)>::from(dataset.row(dataset.n_rows-1));
dataset.shed_row(dataset.n_rows-1);
cout << "dataset:\n" << dataset << "labels:\n" << labels << endl;
Perceptron p(2, 2, 10);
p.Train(dataset, labels, 2);
cout << "weights:\n" << p.Weights() << endl;
cout << "bias:\n" << p.Biases() << endl;
}
输出:
dataset:
3.0000 4.0000 1.0000
3.0000 3.0000 1.0000
labels:
1 1 0
weights:
-1.0000 1.0000
-1.0000 1.0000
bias:
3.0000
-3.0000
按照先前的解释,对于待分类点
(
x
1
,
x
2
)
(x_1, \ x_2)
(x1, x2),其预测结果为:
{
0
,
x
1
+
x
2
−
3
⩽
0
1
,
o
t
h
e
r
w
i
s
e
\begin{cases} 0 \ , \ x_1 + x_2 - 3 \leqslant 0 \\ 1 \ , \ otherwise \end{cases}
{0 , x1+x2−3⩽01 , otherwise
例如接着上面的程序:
mat test;
test << 5 << endr << 2 << endr;
Row<size_t> pred_labels(1);
p.Classify(test, pred_labels);
cout << "predicted labels: " << pred_labels[0] << endl;
输出:
predicted labels: 1
perceptron
《统计学习方法》