Regression in Python (Still under construction)
Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. This tutorial will guide you through performing some basic linear regression using the open source Python package Pingouin 🐧.
In this tutorial, we will cover:
- The type of data needed for regression
- Performing simple linear regression
- How to interpret the results
Download the data & Jupyter notebook for this tutorial
Type of Data Needed for Regression
For regression analysis, we need at least two variables:
- Independent Variable(s) (X): The predictor variable(s) or the variable(s) used to predict the dependent variable.
- Dependent Variable (Y): The outcome or the variable we are trying to predict.
Importing Our Data & Packages
import pandas as pd
import pingouin as pg
data = pd.read_csv("regression.csv")
The data that we will be using is … Let’s take a look at it.
data.head()
Simple linear regression involves one independent variable and one dependent variable. We aim to find the best-fitting line through the data points.
Performing Simple Linear Regression
Using the pingouin package, we can perform a simple linear regression as follows:
Defining our independent and dependent variables
x = data["Experience"] # Dependent variable
y = data["Time"] # Independent Variable
Regression
reg = pg.linear_regression(x,y)
reg.round(2) # Rounding to the tenth to make the results more readable