Source: View original notebook on GitHub
Category: Machine Learning / Learn ML
Example which results in underfitting
import numpy as np
import matplotlib.pyplot as plt
X = np.loadtxt('Datasets/weightedX.txt')
Y = np.loadtxt('Datasets/weightedY.txt')
plt.scatter(X,Y)
Output:
<matplotlib.collections.PathCollection at 0x10561830>
from sklearn.linear_model import LinearRegression
lr = LinearRegression(normalize=True)
lr.fit(X.reshape(-1,1),Y.reshape(-1,1)) # model works on atleast 2D data
Output:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=True)
slope = lr.coef_
intercept = lr.intercept_
plt.scatter(X,Y)
Y_pred = (slope*X+intercept).flatten()
plt.plot(X,Y_pred,'k')
# see model is said to underfitting the data
Output:
[<matplotlib.lines.Line2D at 0x119d8170>]
According to the data if the model would result in non-linear-boundary, that would be more correct fit.
Overcoming Underfittng by increasing the complexity by adding more features
- we will add extra feature which would be x2 which will be the sqaure of already available featrue in X (lets say x1).
- our hypothsis will become h(x) = theta0 + theta[1] * x1 + theta[2]* x2
- which would effectively be h(x) = theta0 + theta[1] * x1 + theta[2]* x1**2
- so that we can get more complex boundary using Linear Regression model only.
X.shape
Output:
(100,)
X1 = X**2
X = np.column_stack((X,X1))
X.shape
Output:
(100, 2)
lr2 = LinearRegression(normalize=True)
lr2.fit(X,Y)
Output:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=True)
slope = lr2.coef_
intercept = lr2.intercept_
plt.scatter(X[:,0],Y)
Y_pred = slope[0]*X[:,0] + slope[1]*X[:,1] + intercept
plt.scatter(X[:,0] , Y_pred, c='r',label = 'predicted Boundary')
plt.legend()
Output:
<matplotlib.legend.Legend at 0x132b1f70>
# we can more fit the model further by adding more features maybe by adding cubic feature and so on....
