 * This class implements a simple perceptron (i.e., a single layer neural
 * network).  It converges if the supplied training dataset is linearly
 * separable.
 * @tparam LearnPolicy Options of SimpleWeightUpdate and GradientDescent.
 * @tparam WeightInitializationPolicy Option of ZeroInitialization and
 *      RandomInitialization.
template<typename LearnPolicy = SimpleWeightUpdate,
         typename WeightInitializationPolicy = ZeroInitialization,
         typename MatType = arma::mat>
class Perceptron
   * Constructor: create the perceptron with the given number of classes and
   * initialize the weight matrix, but do not perform any training.  (Call the
   * Train() function to perform training.)
   * @param numClasses Number of classes in the dataset.
   * @param dimensionality Dimensionality of the dataset.
   * @param maxIterations Maximum number of iterations for the perceptron
   *      learning algorithm.
  Perceptron(const size_t numClasses = 0,
             const size_t dimensionality = 0,
             const size_t maxIterations = 1000);

首先是关于感知机整体的一些参数,例如学习策略是 SimpleWeightUpdate ,权重初始化的方式是 ZeroInitialization



 * Construct the perceptron with the given number of classes and maximum number
 * of iterations.
    typename LearnPolicy,
    typename WeightInitializationPolicy,
    typename MatType
Perceptron<LearnPolicy, WeightInitializationPolicy, MatType>::Perceptron(
    const size_t numClasses,
    const size_t dimensionality,
    const size_t maxIterations) :
  WeightInitializationPolicy wip;
  wip.Initialize(weights, biases, dimensionality, numClasses);

可以看到,它构造了一个类型是 WeightInitializationPolicy 的实例,接着调用该实例的 Initialize 方法,并将权重向量,偏置向量,数据维度,数据种类传递给它

按照头文件里的说明,这里的 WeightInitializationPolicy 默认是 ZeroInitialization ,因此我们不妨去看一下 ZeroInitialization 的实现:

 * This class is used to initialize the matrix weightVectors to zero.
class ZeroInitialization
  ZeroInitialization() { }

  inline static void Initialize(arma::mat& weights,
                                arma::vec& biases,
                                const size_t numFeatures,
                                const size_t numClasses)
    weights.zeros(numFeatures, numClasses);
}; // class ZeroInitialization

按照 Armadillo 的官方文档,分别将 weights 初始化为 (数据维度 * 数据种类) 的向量,biases 初始化为列向量,其元素个数等于数据集的种类数。它们的初始元素都为零,至于它们为什么是这个形状的向量,与该实现的 SimpleWeightUpdate 有关。


   * Constructor: constructs the perceptron by building the weights matrix,
   * which is later used in classification.  The number of classes should be
   * specified separately, and the labels vector should contain values in the
   * range [0, numClasses - 1].  The data::NormalizeLabels() function can be
   * used if the labels vector does not contain values in the required range.
   * @param data Input, training data.
   * @param labels Labels of dataset.
   * @param numClasses Number of classes in the dataset.
   * @param maxIterations Maximum number of iterations for the perceptron
   *      learning algorithm.
  Perceptron(const MatType& data,
             const arma::Row<size_t>& labels,
             const size_t numClasses,
             const size_t maxIterations = 1000);


 * Constructor - constructs the perceptron. Or rather, builds the weights
 * matrix, which is later used in classification.  It adds a bias input vector
 * of 1 to the input data to take care of the bias weights.
 * @param data Input, training data.
 * @param labels Labels of dataset.
 * @param maxIterations Maximum number of iterations for the perceptron learning
 *      algorithm.
    typename LearnPolicy,
    typename WeightInitializationPolicy,
    typename MatType
Perceptron<LearnPolicy, WeightInitializationPolicy, MatType>::Perceptron(
    const MatType& data,
    const arma::Row<size_t>& labels,
    const size_t numClasses,
    const size_t maxIterations) :
  // Start training.
  Train(data, labels, numClasses);



   * Alternate constructor which copies parameters from an already initiated
   * perceptron.
   * @param other The other initiated Perceptron object from which we copy the
   *       values from.
   * @param data The data on which to train this Perceptron object on.
   * @param labels The labels of data.
   * @param numClasses Number of classes in the data.
   * @param instanceWeights Weight vector to use while training. For boosting
   *      purposes.
  Perceptron(const Perceptron& other,
             const MatType& data,
             const arma::Row<size_t>& labels,
             const size_t numClasses,
             const arma::rowvec& instanceWeights);


 * Alternate constructor which copies parameters from an already initiated
 * perceptron.
 * @param other The other initiated Perceptron object from which we copy the
 *      values from.
 * @param data The data on which to train this Perceptron object on.
 * @param instanceWeights Weight vector to use while training. For boosting
 *      purposes.
 * @param labels The labels of data.
    typename LearnPolicy,
    typename WeightInitializationPolicy,
    typename MatType
Perceptron<LearnPolicy, WeightInitializationPolicy, MatType>::Perceptron(
    const Perceptron& other,
    const MatType& data,
    const arma::Row<size_t>& labels,
    const size_t numClasses,
    const arma::rowvec& instanceWeights) :
  Train(data, labels, numClasses, instanceWeights);

原理和上一个差不多,只不过在训练时多了一个 instanceWeights ,正如注释里提到,这个向量在训练时有用,下面我们去看一下训练的函数:

 * Training function.  It trains on trainData using the cost matrix
 * instanceWeights.
 * @param data Data to train on.
 * @param labels Labels of data.
 * @param instanceWeights Cost matrix. Stores the cost of mispredicting
 *      instances.  This is useful for boosting.
    typename LearnPolicy,
    typename WeightInitializationPolicy,
    typename MatType
void Perceptron<LearnPolicy, WeightInitializationPolicy, MatType>::Train(
    const MatType& data,
    const arma::Row<size_t>& labels,
    const size_t numClasses,
    const arma::rowvec& instanceWeights)
  // Do we need to resize the weights?
  if (weights.n_elem != numClasses)
    WeightInitializationPolicy wip;
    wip.Initialize(weights, biases, data.n_rows, numClasses);

  size_t j, i = 0;
  bool converged = false;
  size_t tempLabel;
  arma::uword maxIndexRow = 0, maxIndexCol = 0;
  arma::mat tempLabelMat;

  LearnPolicy LP;

  const bool hasWeights = (instanceWeights.n_elem > 0);

  while ((i < maxIterations) && (!converged))
    // This outer loop is for each iteration, and we use the 'converged'
    // variable for noting whether or not convergence has been reached.
    converged = true;

    // Now this inner loop is for going through the dataset in each iteration.
    for (j = 0; j < data.n_cols; ++j)
      // Multiply for each variable and check whether the current weight vector
      // correctly classifies this.
      tempLabelMat = weights.t() * data.col(j) + biases;

      tempLabelMat.max(maxIndexRow, maxIndexCol);

      // Check whether prediction is correct.
      if (maxIndexRow != labels(0, j))
        // Due to incorrect prediction, convergence set to false.
        converged = false;
        tempLabel = labels(0, j);

        // Send maxIndexRow for knowing which weight to update, send j to know
        // the value of the vector to update it with.  Send tempLabel to know
        // the correct class.
        if (hasWeights)
          LP.UpdateWeights(data.col(j), weights, biases, maxIndexRow, tempLabel,
          LP.UpdateWeights(data.col(j), weights, biases, maxIndexRow,


这里也就是 SimpleWeightUpdate :

 * This class is used to update the weightVectors matrix according to the simple
 * update rule as discussed by Rosenblatt:
 *  if a vector x has been incorrectly classified by a weight w,
 *  then w = w - x
 *  and  w'= w'+ x
 *  where w' is the weight vector which correctly classifies x.
namespace mlpack {
namespace perceptron {

class SimpleWeightUpdate
   * This function is called to update the weightVectors matrix.  It decreases
   * the weights of the incorrectly classified class while increasing the weight
   * of the correct class it should have been classified to.
   * @tparam Type of vector (should be an Armadillo vector like arma::vec or
   *      arma::sp_vec or something similar).
   * @param trainingPoint Point that was misclassified.
   * @param weights Matrix of weights.
   * @param biases Vector of biases.
   * @param incorrectClass Index of class that the point was incorrectly
   *      classified as.
   * @param correctClass Index of the true class of the point.
   * @param instanceWeight Weight to be given to this particular point during
   *      training (this is useful for boosting).
  template<typename VecType>
  void UpdateWeights(const VecType& trainingPoint,
                     arma::mat& weights,
                     arma::vec& biases,
                     const size_t incorrectClass,
                     const size_t correctClass,
                     const double instanceWeight = 1.0)
    weights.col(incorrectClass) -= instanceWeight * trainingPoint;
    biases(incorrectClass) -= instanceWeight;

    weights.col(correctClass) += instanceWeight * trainingPoint;
    biases(correctClass) += instanceWeight;

} // namespace perceptron
} // namespace mlpack

更新的方法正如一开始所说,对于待分类的向量 x x x ,错误的权重向量 w = w − x w = w - x w=wx ,正确的权重向量 w ′ = w ′ + x w' = w' + x w=w+x
而之前提到的 instanceWeights 在更新函数里被用作学习率的向量,默认的学习率为 1.0
权重向量则按照这个 instanceWeight 进行更新


 * Classification function. After training, use the weights matrix to classify
 * test, and put the predicted classes in predictedLabels.
 * @param test Testing data or data to classify.
 * @param predictedLabels Vector to store the predicted classes after
 *      classifying test.
    typename LearnPolicy,
    typename WeightInitializationPolicy,
    typename MatType
void Perceptron<LearnPolicy, WeightInitializationPolicy, MatType>::Classify(
    const MatType& test,
    arma::Row<size_t>& predictedLabels)
  arma::vec tempLabelMat;
  arma::uword maxIndex = 0;

  // Could probably be faster if done in batch.
  for (size_t i = 0; i < test.n_cols; ++i)
    tempLabelMat = weights.t() * test.col(i) + biases;
    predictedLabels(0, i) = maxIndex;



用《统计学习方法》里,例2.1示范:正实例点是 x 1 = ( 3 , 3 ) T x_1=(3,3)^{\mathsf{T}} x1=(3,3)T, x 2 = ( 4 , 3 ) T x_2=(4,3)^{\mathsf{T}} x2=(4,3)T, 负实例点是 x 3 = ( 1 , 1 ) T x_3=(1,1)^{\mathsf{T}} x3=(1,1)T


#include <iostream>
#include <mlpack/core.hpp>
#include <mlpack/methods/perceptron/perceptron.hpp>

using namespace mlpack;
using namespace mlpack::perceptron;
using namespace arma;
using namespace std;

int main()
	mat dataset;
    mlpack::data::Load("../ml_test/data/my_data.csv", dataset);
    Row<size_t> labels;
    labels = conv_to<decltype (labels)>::from(dataset.row(dataset.n_rows-1));

    cout << "dataset:\n" << dataset << "labels:\n" << labels << endl;
    Perceptron p(2, 2, 10);
    p.Train(dataset, labels, 2);
    cout << "weights:\n" << p.Weights() << endl;
    cout << "bias:\n" <<  p.Biases() << endl;

3.0000 4.0000 1.0000
3.0000 3.0000 1.0000
1 1 0

-1.0000 1.0000
-1.0000 1.0000


按照先前的解释,对于待分类点 ( x 1 ,   x 2 ) (x_1, \ x_2) (x1, x2),其预测结果为:
{ 0   ,   x 1 + x 2 − 3 ⩽ 0 1   ,   o t h e r w i s e \begin{cases} 0 \ , \ x_1 + x_2 - 3 \leqslant 0 \\ 1 \ , \ otherwise \end{cases} {0 , x1+x2301 , otherwise


mat test;
test << 5 << endr << 2 << endr;
Row<size_t> pred_labels(1);
p.Classify(test, pred_labels);
cout << "predicted labels: " << pred_labels[0] << endl;

predicted labels: 1





