machine-learning-ex1-answer

岑明辉

2023-12-01

最近开始在coursera上学习吴恩达先生的机器学习课程，现在进行到了第二周的课程，也完成了第一次编程作业，在此写下自己对于作业的理解以巩固所学,之后也会继续写，算是一种督促吧。

Warm up exercise

Description

function A = warmUpExercise()
%WARMUPEXERCISE Example function in octave
%   A = WARMUPEXERCISE() is an example function that returns the 5x5 identity matrix
A = [];
% ============= YOUR CODE HERE ==============

Solution

返回一 5x5 单位矩阵即可

Code

A = eye(5);

Computing Cost (for One Variable)

Description

function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
%   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
% ====================== YOUR CODE HERE ======================

Solution

由代价函数公式：
$J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2$
翻译为对应代码即可

Code

J = sum((X * theta - y) .^ 2)/(2 * m);

Gradient Descent (for One Variable)

Description

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
%   taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================

Solution

单变量线性回归模型为：
$h_\theta(x)=\theta_0+\theta_1x$
对应梯度下降算法为：

repeat until convergence{
$\theta_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1)$
}

翻译为对应代码即可

Code

H = X * theta - y;
theta(1) = theta(1) - alpha * (1/m) * sum(H .* X(:,1));
theta(2) = theta(2) - alpha * (1/m) * sum(H .* X(:,2));

Feature Normalization

Description

function [X_norm, mu, sigma] = featureNormalize(X)
%FEATURENORMALIZE Normalizes the features in X
%   FEATURENORMALIZE(X) returns a normalized version of X where
%   the mean value of each feature is 0 and the standard deviation
%   is 1. This is often a good preprocessing step to do when
%   working with learning algorithms.
% You need to set these values correctly
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));
% ====================== YOUR CODE HERE ======================

Solution

函数要求实现特征缩放（均值为 0，平均方差为 1），

为此可以先计算出原矩阵的均值与平均差¹

计算均值可调用 mean() 函数：

mean (X) = SUM_i X(i) / N

计算标准差可调用 std() 函数：

std (X) = sqrt ( 1/(N-1) SUM_i (X(i) - mean(X))^2 )

已知均值与标准差便能得到特征缩放结果

Code

mu = mean(X);
sigma = std(X);
X_norm = (X - mu) ./ sigma;

Computing Cost (for Multiple Variables)

Description

function J = computeCostMulti(X, y, theta)
%COMPUTECOSTMULTI Compute cost for linear regression with multiple variables
%   J = COMPUTECOSTMULTI(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
% ====================== YOUR CODE HERE ======================

Solution

同Computing Cost (for One Variable)
由多变量代价函数公式
$J(\theta_0,\theta_1,……,\theta_n)=\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2$
即得

Code

J = sum((X * theta - y) .^ 2)/(2 * m);

Gradient Descent (for Multiple Variables)

Description

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
%GRADIENTDESCENTMULTI Performs gradient descent to learn theta
%   theta = GRADIENTDESCENTMULTI(x, y, theta, alpha, num_iters) updates theta by
%   taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================

Solution

多变量时，假设函数为
$h_\theta(x)=\theta_0x_0+\theta_1x_1+……+\theta_nx_n (x_0=1)$
即
$h_\theta(x)=\theta^TX$
代价函数为
$J(\theta_0,\theta_1,……,\theta_n)=\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2$
在多元变量的梯度下降中，我们将对每个θ都求偏导。其形式如下：
Repeat until convergence:{
$\theta_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta)$
}²

Code

H = X*theta;
J = H - y;
j = J' *X;
theta = theta - alpha * (1/m) * j';

Normal Equations

Description

function [theta] = normalEqn(X, y)
%NORMALEQN Computes the closed-form solution to linear regression
%   NORMALEQN(X,y) computes the closed-form solution to linear
%   regression using the normal equations.
theta = zeros(size(X, 2), 1);
% ====================== YOUR CODE HERE ======================

Solution

由正规方程：
$\theta=(X^TX)^{-1}X^Ty$
即可得答案

Code

theta = pinv(X' * X) * X' * y;

当计算数据为对象总体的数据时，标准差为 $\sqrt{\frac{\sum_{i=1}^N(x_i-\overline{x})^2}{N}}$ ，当计算数据为样本数据时，标准差为 $\sqrt{\frac{\sum_{i=1}^N(x_i-\overline{x})^2}{N-1}}$ ↩︎
在一次迭代过程中，必须同时更新每个θ。例如不能在更新了θ1之后，就把新的θ1用于更新后面的θ2，而应该使用上一次迭代产生的θ1来更新这一次迭代中的θ2。 ↩︎

machine-learning-ex1-answer

Warm up exercise

Description

Solution

Code

Computing Cost (for One Variable)

Description

Solution

Code

Gradient Descent (for One Variable)

Description

Solution

Code

Feature Normalization

Description

Solution

Code

Computing Cost (for Multiple Variables)

Description

Solution

Code

Gradient Descent (for Multiple Variables)

Description

Solution

Code

Normal Equations

Description

Solution

Code

相关阅读

相关文章

相关问答

相关文档