ML Series: Simple Linear Regression in Python

Machine Learning 1.01
In this series we are going to look at the basics of machine learning concepts from stats to running a few models and applying on actual data.

Lets first start learning simple linear regression in Python:

Case: We will create a regression which will predict the GPA based on SAT scores obtained by the students.

Sample Data (csv):

SAT	GPA
1714	2.4
1664	2.52
1760	2.54
1685	2.74
1693	2.83
1670	2.91
1764	3
1764	3
1792	3.01
1850	3.01
1735	3.02
1775	3.07

1. Importing the libraries (These are the most common and important libraries which you need to import at most of the times - pandas, numpy, matplotlib, seaborn and linear regression from sklearn package)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

From sklearn.linear_model import LinearRegression
import seaborn as sns
seaborn.set()

2. Loading the data

data = pd.read_csv('C:/Users/abc/Downloads/1.01. Simple linear regression.csv')

data.head()

The data has 2 variable GPA & SAT. The notion is SAT can predict the GPA of a student. We can test our hypothesis and even predict the GPA if in case SAT comes out to be a good predictor of GPA.

3. Setting up the model by defining the dependent variable GPA as y and independent variable as y. Thus we can define the linear regression line as y = b0 + b1*x where b0 & b1 are the constants

x = data['SAT']
y = data['GPA']

x.shape

y.shape

4. Now the Regression model in Sklearn takes only array as inputs. Thus we will convert the x to 'x_matrix' and see the shape

x_matrix = x.values.reshape(-1,1)
x_matrix.shape

5. The Regression - We need to set a variable to LinearRegression() function and fit the model for our x_matrix & y variables

reg = LinearRegression()

reg.fit(x_matrix,y)

Out: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

6. We can check the results to see if our model is a good fit or not (R-Squared)

reg.score(x_matrix,y)

which gives us a score of 40% which is not so bad!
It means 40% of the data variability is explained by the model.

7. All Done. Now we can simply predict the GPA for any SAT

reg.predict([[1740]])

Out: array([3.15593751])

We will discuss about multiple linear regression in the next post.
Thanks

Massive Inputs | eLearn

Tuesday, July 14, 2020

ML Series: Simple Linear Regression in Python

No comments:

Post a Comment