We seek to build a simple linear regression model. We will have some training data, and we want to build a model for this data.
Definition
Where:
- is the predicted values
- is the hypothesis function
- are the model parameters (weights). is the model Parameter (including the bias term and the feature weights ).
- is the feature value (High temp in this case).
We also have to choose a Loss Function, in this case we usually use something called Mean Squared Error. Our objective when fitting a model is to minimize the cost function.
We then seek to make a prediction using a learned model:
And then test error using squared loss. Why squared loss? it is often not explained well, but we want to heavily penalize large errors, and squaring the error achieves this. There is a good video by Artem Kirsanov called What Textbooks Don’t Tell You About Curve Fitting
In terms of our linear function equation, we are seeking to find and by minimizing MSE.
Linear Regression with Multiple Variables
We can extend this to generalize for any number of features:
Where is the feature. Our objective is to minimize the cost function, which is defined as:
Simplifying our definition:
We usually use Gradient Descent for higher order learning, but there is a closed form math solution for two variables:
Write s, s and s in vector form.
Then becomes:
And we achieve :
And our loss can finally be defined as :
And our closed form solution for :
This is great, but there are a few things that make this a costly operation at large . For one, taking the inverse of a matrix is a costly operation, and at very high points this closed form solution is somewhere between and (although, to be honest, it will probably still instantly, this is a two parameter operation). Additionally, if is not invertible (i.e. singular or non-square), then this method fails. In these cases, we usually use Gradient Descent, which we will discuss next class.