function A = warmUpExercise()
%WARMUPEXERCISE Example function in octave
% A = WARMUPEXERCISE() is an example function that returns the 5x5 identity matrix
A = [];
% ============= YOUR CODE HERE ==============
返回一 5x5 单位矩阵即可
A = eye(5);
function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
% J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
% parameter for linear regression to fit the data points in X and y
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
% ====================== YOUR CODE HERE ======================
由代价函数公式:
J
(
θ
0
,
θ
1
)
=
1
2
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
2
J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2
J(θ0,θ1)=2m1i=1∑m(hθ(x(i))−y(i))2
翻译为对应代码即可
J = sum((X * theta - y) .^ 2)/(2 * m);
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================
单变量线性回归模型为:
h
θ
(
x
)
=
θ
0
+
θ
1
x
h_\theta(x)=\theta_0+\theta_1x
hθ(x)=θ0+θ1x
对应梯度下降算法为:
repeat until convergence{
θ
j
:
=
θ
j
−
α
∂
∂
θ
j
J
(
θ
0
,
θ
1
)
\theta_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1)
θj:=θj−α∂θj∂J(θ0,θ1)
}
翻译为对应代码即可
H = X * theta - y;
theta(1) = theta(1) - alpha * (1/m) * sum(H .* X(:,1));
theta(2) = theta(2) - alpha * (1/m) * sum(H .* X(:,2));
function [X_norm, mu, sigma] = featureNormalize(X)
%FEATURENORMALIZE Normalizes the features in X
% FEATURENORMALIZE(X) returns a normalized version of X where
% the mean value of each feature is 0 and the standard deviation
% is 1. This is often a good preprocessing step to do when
% working with learning algorithms.
% You need to set these values correctly
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));
% ====================== YOUR CODE HERE ======================
函数要求实现特征缩放(均值为 0,平均方差为 1),
为此可以先计算出原矩阵的均值与平均差1
计算均值可调用 mean() 函数:
mean (X) = SUM_i X(i) / N
计算标准差可调用 std() 函数:
std (X) = sqrt ( 1/(N-1) SUM_i (X(i) - mean(X))^2 )
已知均值与标准差便能得到特征缩放结果
mu = mean(X);
sigma = std(X);
X_norm = (X - mu) ./ sigma;
function J = computeCostMulti(X, y, theta)
%COMPUTECOSTMULTI Compute cost for linear regression with multiple variables
% J = COMPUTECOSTMULTI(X, y, theta) computes the cost of using theta as the
% parameter for linear regression to fit the data points in X and y
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
% ====================== YOUR CODE HERE ======================
同Computing Cost (for One Variable)
由多变量代价函数公式
J
(
θ
0
,
θ
1
,
…
…
,
θ
n
)
=
1
2
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
2
J(\theta_0,\theta_1,……,\theta_n)=\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2
J(θ0,θ1,……,θn)=2m1i=1∑m(hθ(x(i))−y(i))2
即得
J = sum((X * theta - y) .^ 2)/(2 * m);
function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
%GRADIENTDESCENTMULTI Performs gradient descent to learn theta
% theta = GRADIENTDESCENTMULTI(x, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================
多变量时,假设函数为
h
θ
(
x
)
=
θ
0
x
0
+
θ
1
x
1
+
…
…
+
θ
n
x
n
(
x
0
=
1
)
h_\theta(x)=\theta_0x_0+\theta_1x_1+……+\theta_nx_n (x_0=1)
hθ(x)=θ0x0+θ1x1+……+θnxn(x0=1)
即
h
θ
(
x
)
=
θ
T
X
h_\theta(x)=\theta^TX
hθ(x)=θTX
代价函数为
J
(
θ
0
,
θ
1
,
…
…
,
θ
n
)
=
1
2
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
2
J(\theta_0,\theta_1,……,\theta_n)=\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2
J(θ0,θ1,……,θn)=2m1i=1∑m(hθ(x(i))−y(i))2
在多元变量的梯度下降中,我们将对每个θ都求偏导。其形式如下:
Repeat until convergence:{
θ
j
:
=
θ
j
−
α
∂
∂
θ
j
J
(
θ
)
\theta_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta)
θj:=θj−α∂θj∂J(θ)
}2
H = X*theta;
J = H - y;
j = J' *X;
theta = theta - alpha * (1/m) * j';
function [theta] = normalEqn(X, y)
%NORMALEQN Computes the closed-form solution to linear regression
% NORMALEQN(X,y) computes the closed-form solution to linear
% regression using the normal equations.
theta = zeros(size(X, 2), 1);
% ====================== YOUR CODE HERE ======================
由正规方程:
θ
=
(
X
T
X
)
−
1
X
T
y
\theta=(X^TX)^{-1}X^Ty
θ=(XTX)−1XTy
即可得答案
theta = pinv(X' * X) * X' * y;
当计算数据为对象总体的数据时,标准差为 ∑ i = 1 N ( x i − x ‾ ) 2 N \sqrt{\frac{\sum_{i=1}^N(x_i-\overline{x})^2}{N}} N∑i=1N(xi−x)2,当计算数据为样本数据时,标准差为 ∑ i = 1 N ( x i − x ‾ ) 2 N − 1 \sqrt{\frac{\sum_{i=1}^N(x_i-\overline{x})^2}{N-1}} N−1∑i=1N(xi−x)2 ↩︎
在一次迭代过程中,必须同时更新每个θ。例如不能在更新了θ1之后,就把新的θ1用于更新后面的θ2,而应该使用上一次迭代产生的θ1来更新这一次迭代中的θ2。 ↩︎