pylearn2说明:
pylearn2包含了模型、学习算法和数据集三部分
Model:用来存储参数的,实现了很多成熟的模型,比如RBM,CNN,AUTOENCODER等,尤其是LISP实验室的paper中的模型,它全都实现了的。
学习算法:调整Model中的参数的,并且还有别的功能,比如建立Monitor,来检测学习过程中的一些变化,比如精度曲线,里面有几个实现好的类,比如SGD、BGD。
数据集:就是我们训练算法用的数据,只是对原始数据和模型之间做了个接口,让模型对数据透明,因为不同的实现,毕竟数据类型很多,理论上支持任何类型的数据输入格式。如果数据时矩阵直接用DenseDesignmatrix类,如果数据在Numpy或者pickle format的直接用就是了,更大的数据也支持HDF5格式,同时这个模块还可以对数据进行ZCA,PCA的预处理。
具体使用是用配置文件来实现的,配置的主要模块也就是上面3个地方,每个地方会涉及到一些参数,具体的根据实际应用设定
import theano.tensor as T #一个算法库, tensor是数据方法的函数,很多库以它为数据处理基础
import theano
######Pylearn2 ,机器学习的库,目前已经停止开发。
import pylearn2.train # 训练的累
import pylearn2.models.mlp as p2_md_mlp # 多层感知机
import pylearn2.datasets.dense_design_matrix
""" The DenseDesignMatrix class and related code. Functionality for representing data that can be described as a dense matrix (rather than a sparse matrix) with each row containing an example and each column corresponding to a different feature. DenseDesignMatrix also supports other "views" of the data, for example a dataset of images can be viewed either as a matrix of flattened images or as a stack of 2D multi-channel images. However, the images must all be the same size, so that each image may be mapped to a matrix row by the same transformation. """
import pylearn2.training_algorithms.sgd as p2_alg_sgd #training_algorithms:训练的算法
""" SGD = (Minibatch) Stochastic Gradient Descent. A TrainingAlgorithm that does stochastic gradient descent on minibatches of training examples.
"""
import pylearn2.training_algorithms.learning_rule
""" A module containing different learning rules for use with the SGD training algorithm.
A pylearn2 learning rule is an object which computes new parameter values given (1) a learning rate (2) current parameter values and (3) the current estimated gradient."""
import pylearn2.costs.mlp.dropout as p2_ct_mlp_dropout
""" Functionality for training with dropout.
Implements the dropout training technique described in "Improving neural networks by preventing co-adaptation of feature detectors" Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov arXiv 2012 This paper suggests including each unit with probability p during training, then multiplying the outgoing weights by p at the end of training. We instead include each unit with probability p and divide its state by p during training. Note that this means the initial weights should be multiplied by p relative to Hinton's. The SGD learning rate on the weights should also be scaled by p^2 (use W_lr_scale rather than adjusting the global learning rate, because the learning rate on the biases should not be adjusted). During training, each input to each layer is randomly included or excluded for each example. The probability of inclusion is independent for each input and each example. Each layer uses "default_input_include_prob" unless that layer's name appears as a key in input_include_probs, in which case the input inclusion probability is given by the corresponding value. Each feature is also multiplied by a scale factor. The scale factor for each layer's input scale is determined by the same scheme as the input probabilities.
"""
import pylearn2.termination_criteria as p2_termcri
""" Termination criteria used to determine when to stop running a training algorithm. """