Source: View original notebook on GitHub
Category: Machine Learning / Learn ML

Example which results in underfitting

import numpy as np
import matplotlib.pyplot as plt

X = np.loadtxt('Datasets/weightedX.txt')
Y = np.loadtxt('Datasets/weightedY.txt')

plt.scatter(X,Y)

Output

Output:

&lt;matplotlib.collections.PathCollection at 0x10561830&gt;

from sklearn.linear_model import LinearRegression

lr = LinearRegression(normalize=True)

lr.fit(X.reshape(-1,1),Y.reshape(-1,1)) # model works on atleast 2D data

Output:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=True)

slope = lr.coef_
intercept = lr.intercept_

plt.scatter(X,Y)
Y_pred = (slope*X+intercept).flatten()
plt.plot(X,Y_pred,'k')

# see model is said to underfitting the data

Output

Output:

[&lt;matplotlib.lines.Line2D at 0x119d8170&gt;]

According to the data if the model would result in non-linear-boundary, that would be more correct fit.

Overcoming Underfittng by `increasing the complexity by adding more features`

- we will add extra feature which would be x2 which will be the sqaure of already available featrue in X (lets say x1).
- our hypothsis will become h(x) = theta0 + theta[1] * x1 + theta[2]* x2 
- which would effectively be h(x) = theta0 + theta[1] * x1 + theta[2]* x1**2
- so that we can get more complex boundary using Linear Regression model only.

X.shape

Output:

(100,)

X1 = X**2
X = np.column_stack((X,X1))

X.shape

Output:

(100, 2)

lr2 = LinearRegression(normalize=True)

lr2.fit(X,Y)

Output:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=True)

slope = lr2.coef_
intercept = lr2.intercept_

plt.scatter(X[:,0],Y)
Y_pred = slope[0]*X[:,0] + slope[1]*X[:,1] + intercept
plt.scatter(X[:,0] , Y_pred, c='r',label = 'predicted Boundary')
plt.legend()

Output

Output:

&lt;matplotlib.legend.Legend at 0x132b1f70&gt;

# we can more fit the model further by adding more features maybe by adding cubic feature and so on....

Example which results in underfitting

According to the data if the model would result in non-linear-boundary, that would be more correct fit.​

Overcoming Underfittng by increasing the complexity by adding more features

According to the data if the model would result in non-linear-boundary, that would be more correct fit.

Overcoming Underfittng by `increasing the complexity by adding more features`