machine-learning-ex1-answer

岑明辉
2023-12-01


最近开始在coursera上学习吴恩达先生的机器学习课程,现在进行到了第二周的课程,也完成了第一次编程作业,在此写下自己对于作业的理解以巩固所学,之后也会继续写,算是一种督促吧。

Warm up exercise

Description

function A = warmUpExercise()
%WARMUPEXERCISE Example function in octave
%   A = WARMUPEXERCISE() is an example function that returns the 5x5 identity matrix
A = [];
% ============= YOUR CODE HERE ==============

Solution

返回一 5x5 单位矩阵即可

Code

A = eye(5);

Computing Cost (for One Variable)

Description

function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
%   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
% ====================== YOUR CODE HERE ======================

Solution

由代价函数公式:
J ( θ 0 , θ 1 ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2 J(θ0,θ1)=2m1i=1m(hθ(x(i))y(i))2
翻译为对应代码即可

Code

J = sum((X * theta - y) .^ 2)/(2 * m);

Gradient Descent (for One Variable)

Description

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
%   taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================

Solution

单变量线性回归模型为:
h θ ( x ) = θ 0 + θ 1 x h_\theta(x)=\theta_0+\theta_1x hθ(x)=θ0+θ1x
对应梯度下降算法为:

repeat until convergence{
θ j : = θ j − α ∂ ∂ θ j J ( θ 0 , θ 1 ) \theta_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1) θj:=θjαθjJ(θ0,θ1)
}

翻译为对应代码即可

Code

H = X * theta - y;
theta(1) = theta(1) - alpha * (1/m) * sum(H .* X(:,1));
theta(2) = theta(2) - alpha * (1/m) * sum(H .* X(:,2));

Feature Normalization

Description

function [X_norm, mu, sigma] = featureNormalize(X)
%FEATURENORMALIZE Normalizes the features in X
%   FEATURENORMALIZE(X) returns a normalized version of X where
%   the mean value of each feature is 0 and the standard deviation
%   is 1. This is often a good preprocessing step to do when
%   working with learning algorithms.
% You need to set these values correctly
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));
% ====================== YOUR CODE HERE ======================

Solution

函数要求实现特征缩放(均值为 0,平均方差为 1),

为此可以先计算出原矩阵的均值与平均差1

计算均值可调用 mean() 函数:

mean (X) = SUM_i X(i) / N

计算标准差可调用 std() 函数:

std (X) = sqrt ( 1/(N-1) SUM_i (X(i) - mean(X))^2 )

已知均值与标准差便能得到特征缩放结果

Code

mu = mean(X);
sigma = std(X);
X_norm = (X - mu) ./ sigma;

Computing Cost (for Multiple Variables)

Description

function J = computeCostMulti(X, y, theta)
%COMPUTECOSTMULTI Compute cost for linear regression with multiple variables
%   J = COMPUTECOSTMULTI(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
% ====================== YOUR CODE HERE ======================

Solution

同Computing Cost (for One Variable)
由多变量代价函数公式
J ( θ 0 , θ 1 , … … , θ n ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 J(\theta_0,\theta_1,……,\theta_n)=\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2 J(θ0,θ1,,θn)=2m1i=1m(hθ(x(i))y(i))2
即得

Code

J = sum((X * theta - y) .^ 2)/(2 * m);

Gradient Descent (for Multiple Variables)

Description

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
%GRADIENTDESCENTMULTI Performs gradient descent to learn theta
%   theta = GRADIENTDESCENTMULTI(x, y, theta, alpha, num_iters) updates theta by
%   taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================

Solution

多变量时,假设函数为
h θ ( x ) = θ 0 x 0 + θ 1 x 1 + … … + θ n x n ( x 0 = 1 ) h_\theta(x)=\theta_0x_0+\theta_1x_1+……+\theta_nx_n (x_0=1) hθ(x)=θ0x0+θ1x1++θnxn(x0=1)

h θ ( x ) = θ T X h_\theta(x)=\theta^TX hθ(x)=θTX
代价函数为
J ( θ 0 , θ 1 , … … , θ n ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 J(\theta_0,\theta_1,……,\theta_n)=\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2 J(θ0,θ1,,θn)=2m1i=1m(hθ(x(i))y(i))2
在多元变量的梯度下降中,我们将对每个θ都求偏导。其形式如下:
Repeat until convergence:{
θ j : = θ j − α ∂ ∂ θ j J ( θ ) \theta_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta) θj:=θjαθjJ(θ)
}2

Code

H = X*theta;
J = H - y;
j = J' *X;
theta = theta - alpha * (1/m) * j';

Normal Equations

Description

function [theta] = normalEqn(X, y)
%NORMALEQN Computes the closed-form solution to linear regression
%   NORMALEQN(X,y) computes the closed-form solution to linear
%   regression using the normal equations.
theta = zeros(size(X, 2), 1);
% ====================== YOUR CODE HERE ======================

Solution

由正规方程:
θ = ( X T X ) − 1 X T y \theta=(X^TX)^{-1}X^Ty θ=(XTX)1XTy
即可得答案

Code

theta = pinv(X' * X) * X' * y;

  1. 当计算数据为对象总体的数据时,标准差为 ∑ i = 1 N ( x i − x ‾ ) 2 N \sqrt{\frac{\sum_{i=1}^N(x_i-\overline{x})^2}{N}} Ni=1N(xix)2 ,当计算数据为样本数据时,标准差为 ∑ i = 1 N ( x i − x ‾ ) 2 N − 1 \sqrt{\frac{\sum_{i=1}^N(x_i-\overline{x})^2}{N-1}} N1i=1N(xix)2 ↩︎

  2. 在一次迭代过程中,必须同时更新每个θ。例如不能在更新了θ1之后,就把新的θ1用于更新后面的θ2,而应该使用上一次迭代产生的θ1来更新这一次迭代中的θ2。 ↩︎

 类似资料: