Model and Cost Function
Model Representation
In supervised learning, we have a dataset - called training dataset.
We feed our training set to a learning algorithm which will then output a function - called h
, our hypothesis function. h
then can take in x
and output y
.
Some Notation
For supervised learning, our goal is to learn a function \(h:X \rightarrow Y\) so that \(h(x)\) is a good predictor for the corresponding value of \(y\). h
is the hypothesis.
How do we represent h
?
The above is a linear regression with one variable or univariate linear regression.
Cost Function
Let's say we have a training set and a hypothesis of \(h_\theta(x) = \theta_0+\theta_1x\). The \(\theta_i\)'s are parameters. With various values of \(\theta_i\)'s, we will get various hypothesis.
Reminder - this is a regression problem. We cannot do this classification problems.
We want to come up with the values for these parameters so that \(h_\theta(x)\) is as close to \(y\) for our training examples \((x,y)\). This is a minimization problem. We want to find \(\theta_0\) and \theta_1$ to minimize:
By convention, we redefine our problem to minimize a cost function:
To break this down:
This cost function is called the squared error function or mean squared error. The \(\frac{1}{2}\) is there to simplify the calculus. The squared and one half cancels out. It probably the most commonly used cost function. It should be a pretty reasonable thing to try for linear regression.
To summarize:
Let's take a closer look at the cost function \(J(\theta_0,\theta_1)\) and the hypothesis function \(h_\theta(x)\).
- \(h_\theta(x)\) - For fixed \(\theta_0,\theta_1\), this is a function of \(x\).
- \(J(\theta_0,\theta_1)\) - function of the parameters \(\theta_0,\theta_1\).
We want to minimize the cost function. It is important to note that this is a linear regression so \(h_\theta(x)\) and \(J(\theta_0,\theta_1)\) are continuous functions.