Regression in Python (Still under construction)

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. This tutorial will guide you through performing some basic linear regression using the open source Python package Pingouin 🐧.

In this tutorial, we will cover:

  1. The type of data needed for regression
  2. Performing simple linear regression
  3. How to interpret the results

Download the data & Jupyter notebook for this tutorial

Type of Data Needed for Regression

For regression analysis, we need at least two variables:

  1. Independent Variable(s) (X): The predictor variable(s) or the variable(s) used to predict the dependent variable.
  2. Dependent Variable (Y): The outcome or the variable we are trying to predict.

Importing Our Data & Packages

import pandas as pd
import pingouin as pg

data = pd.read_csv("regression.csv")

The data that we will be using is … Let’s take a look at it.

data.head()

Simple linear regression involves one independent variable and one dependent variable. We aim to find the best-fitting line through the data points.

Performing Simple Linear Regression

Using the pingouin package, we can perform a simple linear regression as follows:

Defining our independent and dependent variables

x = data["Experience"] # Dependent variable
y = data["Time"] # Independent Variable

Regression

reg = pg.linear_regression(x,y)
reg.round(2) # Rounding to the tenth to make the results more readable