Skip to content

Model and Cost Function

Model Representation

In supervised learning, we have a dataset - called training dataset.

We feed our training set to a learning algorithm which will then output a function - called h , our hypothesis function. h then can take in x and output y.

Some Notation

\[ \begin{align*} x^{(i)} &&& \text{Input variable/feature where }i\text{ is the index} \\ y^{(i)} &&& \text{Ouput/Target variable/feature }i\text{ is the index} \\ (x^{(i)}, y^{(i)}) &&& \text{Training example}\\ (x^{(i)}, y^{(i)}); i = 1, \ldots,m &&& \text{Training set}\\ X &&& \text{Space of input values}\\ Y &&& \text{Space of output values}\\ \end{align*} \]

For supervised learning, our goal is to learn a function \(h:X \rightarrow Y\) so that \(h(x)\) is a good predictor for the corresponding value of \(y\). h is the hypothesis.

How do we represent h?

\[ \begin{align*} h_\theta(x) &= \theta_0 + \theta_1x\\ h(x) && \text{Shorthand}\\ \end{align*} \]

The above is a linear regression with one variable or univariate linear regression.

Cost Function

Let's say we have a training set and a hypothesis of \(h_\theta(x) = \theta_0+\theta_1x\). The \(\theta_i\)'s are parameters. With various values of \(\theta_i\)'s, we will get various hypothesis.

Reminder - this is a regression problem. We cannot do this classification problems.

\[ \begin{align*} \theta_0&=1.5 & \theta_1&=0\\ \theta_0&=0 & \theta_1&=0.5\\ \theta_0&=1 & \theta_1&=0.5\\ \end{align*} \]

We want to come up with the values for these parameters so that \(h_\theta(x)\) is as close to \(y\) for our training examples \((x,y)\). This is a minimization problem. We want to find \(\theta_0\) and \theta_1$ to minimize:

\[ \frac{1}{2m}\sum^m_{i=1}(h_\theta(x^{(i)}) - y^{(i)})^2\\ \]

By convention, we redefine our problem to minimize a cost function:

\[ J(\theta_0,\theta_1) = \frac{1}{2m}\sum^m_{i=1}(h_\theta(x^{(i)}) - y^{(i)})^2\\ \min_{\theta_0\theta_1} J(\theta_0,\theta_1)\\ \]

To break this down:

\[ \begin{align*} J(\theta_0,\theta_1) &= \frac{1}{2m}\sum^m_{i=1}(h_\theta(x^{(i)}) - y^{(i)})^2\\ &= \frac{1}{2}\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)}) - y^{(i)})^2\\ &= \frac{1}{2}\frac{1}{m}\sum^m_{i=1}(\text{difference between predicted and actual})^2\\ &= \frac{1}{2}\frac{1}{m}\sum^m_{i=1}(\text{error})^2\\ &= \frac{1}{2}\frac{1}{m}(\text{sum of error squared})\\ &= \frac{1}{2}(\text{mean of error squared})\\ &= \frac{1}{2}\bar{x} & \bar{x}\text{ is the mean of the squares of errors}\\ \end{align*} \]

This cost function is called the squared error function or mean squared error. The \(\frac{1}{2}\) is there to simplify the calculus. The squared and one half cancels out. It probably the most commonly used cost function. It should be a pretty reasonable thing to try for linear regression.

To summarize:

\[ \begin{align*} \text{Hypothesis} &&& h_\theta(x)=\theta_0+\theta_1x\\ \text{Parameteres} &&& \theta_0,\theta_1\\ \text{Cost Function} &&& J(\theta_0,\theta_1) = \frac{1}{2m}\sum^m_{i=1}(h_\theta(x^{(i)}) - y^{(i)})^2\\ \text{Goal} &&& \min_{\theta_0,\theta_1} J(\theta_0,\theta_1) \end{align*} \]

Let's take a closer look at the cost function \(J(\theta_0,\theta_1)\) and the hypothesis function \(h_\theta(x)\).

  • \(h_\theta(x)\) - For fixed \(\theta_0,\theta_1\), this is a function of \(x\).
  • \(J(\theta_0,\theta_1)\) - function of the parameters \(\theta_0,\theta_1\).

We want to minimize the cost function. It is important to note that this is a linear regression so \(h_\theta(x)\) and \(J(\theta_0,\theta_1)\) are continuous functions.