机器学习的一个核心目标是对输入数据进行分类。例如一个训练好的分类器,输入一张图片便可预测这张图中是狗还猫。
用来分类的方法有很多,支持向量机、逻辑回归、深度学习等
假设我们有一个1024行的SFrame数据集, 我们要随机把它分割成90%/10%.
>>> sf = graphlab.SFrame({'id': range(1024)})
>>> sf_train, sf_test = sf.random_split(.9, seed=5)
>>> print len(sf_train), len(sf_test)
What is a seed?
The seed is a number that controls whether the Random Number Generator produces a new set of random numbers or repeats a particular sequence of random numbers. If the text box labeled “Seed” is blank, the Random Number Generator will produce a different set of random numbers each time a random number table is created. On the other hand, if a number is entered in the “Seed” text box, the Random Number Generator will produce a set of random numbers based on the value of the Seed. Each time a random number table is created, the Random Number Generator will produce the same set of random numbers, until the Seed value is changed.
Note: The ability of the seed to repeat a random sequence of numbers assumes that other User specifications (i.e., quantity of random numbers, minimum value, maximum value, whether duplicate values are permitted) are constant across replications. The use of a seed is illustrated in Sample Problem 1.
922 102
训练集
测试集
http://suanfazu.com/t/graphlab-create-geng-jian-dan-geng-qiang-da-de-shen-du-xue-xi/275
http://bugra.github.io/work/notes/2014-04-06/graphs-databases-and-graphlab/
http://ju.outofmemory.cn/entry/85316