Single Linear Regression – Part 1 – Python ML – OOP Basics
This article is originally published at https://www.stoltzmaniac.com
Data scientists who come to the career without a software background (myself included) tend to use a procedural style of programming rather than taking an object oriented approach. Changing styles is a paradigm shift and really takes some time to wrap your mind around. Many of us who have been doing this for years still have trouble envisioning how objects can improve things. There are a lot of resources out there to help you understand this subject in more detail but I am going to take a “learn by doing” approach. The code used for this can be found on my GitHub.
Goal of this post:
- Build a very basic object to house our linear regression model
- Create a command line interface (CLI) to pass in different datasets
- Print the object to the screen in a user-friendly format
What we are leaving for the next post:
- Fitting a model
- Predicting values from the model
- Printing and saving our model results
Here we go!
Definition: OOP = Object Oriented Programming
Our object above describes what we need to house in our object.
- Data – Obviously… **note** in this case, the data needs to be in a specific format
- Fit – Utilize the
y = mx + b
format that we all grew up with. We’ll write the code for this in the next post - Fit Results – After fitting the model, we typically need to be able to see the fit rather than just predicting results
- Predictions – Make the model useful by being able to predict values provided by the user
What do we need as inputs?
- Independent variable values (typically ‘x’)
- Dependent variable values (typically ‘y’)
- Numeric value at which we want a prediction (similar to ‘x’)
We start by making a class
and then we define what it takes as input within the __init__
method. In our case, we are asking for a list of independent_var
and dependent_var
with a single numeric value as predict
. .
class SingleLinearRegression: def __init__(self, independent_var: list, dependent_var: list, predict: float): """ Completes either a single or multiple linear regression. We will pass a single value to predict. :param independent_var: list :param dependent_var: list :param predict: float """ self.independent_var = independent_var self.dependent_var = dependent_var self.predict = predict
Next, we know that we will be fitting a model and predicting results. This will utilize fit
and predictions
methods. We will hold off on adding the math until next post. Finally, we add the __str__
method which is called when you print(your_object)
in order to make the output legible. You will find that there is another method called __repr__
available, but it is typically utilized for a different purpose. We will save this class by itself in a file called linear_regression.py
.
class SingleLinearRegression: def __init__(self, independent_var: list, dependent_var: list, predict: float): """ Completes either a single or multiple linear regression. We will pass a single value to predict. :param independent_var: list :param dependent_var: list :param predict: float """ self.independent_var = independent_var self.dependent_var = dependent_var self.predict = predict def fit(self) -> dict: pass def predictions(self) -> dict: pass def __str__(self): return f""" This class returns a dictionary of results from your on your linear regression: {{ 'independent_var': {self.independent_var}, 'dependent_var': {self.dependent_var}, 'fit': {{ 'coefficient': coefficient, 'constant': constant, 'r_squared': r_squared, 'p_values': 'p_values' }}, 'predictions': {{ 'predict': {self.predict}, 'result': result_of_predictions. }} }} :return: dict """
There we have it, our first class. By itself, this doesn’t do a whole lot for us. We have to convert our class
into an instance
with all of our inputs. Before we go too far, let’s take a look at our folder structure.
We have a data
directory with 2 csv
files to use as “data”. We also have a linear_regression.py
file which holds our SingleLinearRegression
class that we just created. We also have a run_me.py
file which will be used to run everything. You will also notice the requirements.txt
file, this houses all of the required packages.
What should our run_me.py
contain? It needs to import our SimpleLinearRegression
class, take data and print out results. Looking at my_function()
below shows us that we will need to provide a dataset
(filename and location) and the predict
value. Note that reading in the csv
data is quite long, we will trim this down in the next post. We instantiate our object with our data utilizing the dependent_data
and independent_data
that was read from the dataset
.
import csv import click from linear_regression import SingleLinearRegression def my_function(dataset: str, predict: int): print('Starting run_me.py') # Read in csv data independent_data = [] dependent_data = [] with open(dataset, 'r') as csvfile: reader = csv.reader(csvfile) next(reader, None) # Removes header row for row in reader: independent_data.append(row[0]) dependent_data.append(row[1]) # Create instance of SingleLinearRegression model single_linear_regression = SingleLinearRegression( independent_var=independent_data, dependent_var=dependent_data, predict=predict ) print(single_linear_regression)
We aren’t quite done, this will not do anything if we run the run_me.py
file. We need to set this up to take an arbitrary dataset in and run. This is where the click
library comes in handy. There are a lot of different ways to pass arguments in from the CLI, but I prefer click
for its simplicity.
Each @click.option
should be self explanatory. You provide the dataset location and the predicted value. The rest is handled in the program. We have also set default values for each. Utilizing the __name__
and main()
is pretty typical in Python and you will see it all over the place, it’s a good way to setup your projects.
import csv import click from linear_regression import SingleLinearRegression @click.command() @click.option('-d', '--dataset', default='./data/fake_data.csv', help='Dataset with independent variable in first column and dependent variable in second. \ Dataset has a header row.') @click.option('-p', '--predict', default=2.5, help='Dependent variable value you would like to use the fit to predict.') def main(dataset: str, predict: int): print('Starting run_me.py') # Read in csv data independent_data = [] dependent_data = [] with open(dataset, 'r') as csvfile: reader = csv.reader(csvfile) next(reader, None) # Removes header row for row in reader: independent_data.append(row[0]) dependent_data.append(row[1]) # Create instance of SingleLinearRegression model single_linear_regression = SingleLinearRegression( independent_var=independent_data, dependent_var=dependent_data, predict=predict ) print(single_linear_regression) if __name__ == '__main__': main()
Finally, we can run this! Since we have default values (utilizing the dataset fake_data.csv
), we can simply run:
> python run_me.py
Terminal Output:
Starting run_me.py This class returns a dictionary of results from your on your linear regression: { 'independent_var': ['1', '2', '3'], 'dependent_var': ['5', '6', '8'], 'fit': { 'coefficient': coefficient, 'constant': constant, 'r_squared': r_squared, 'p_values': 'p_values' }, 'predictions': { 'predict': 2.5, 'result': result_of_predictions. } } :return: dict
We can see that we have a nice description of our output, including dynamically populated values for independent_var
, dependent_var
, and predict
. If we want to pass a different dataset
or predict
value in it is simple…
> python run_me.py -d data/fake_data2.csv -p 312
Terminal Output:
Starting run_me.py This class returns a dictionary of results from your on your linear regression: { 'independent_var': ['100', '200', '300'], 'dependent_var': ['500', '600', '800'], 'fit': { 'coefficient': coefficient, 'constant': constant, 'r_squared': r_squared, 'p_values': 'p_values' }, 'predictions': { 'predict': 312.0, 'result': result_of_predictions. } } :return: dict
You’ll notice that the variables have changed in the output! In the next post we will dive into making something a bit more useful.
I need to state this explicitly, I am not an expert in object oriented design. These types of patterns are very specific and experts in the field have been doing this for many years with a lot of mentoring. If you are taking anything into a production environment that people depend on, please take the time to have someone with lots of experience take a look at your code to help you gain confidence and grow your skills.
Thanks for visiting r-craft.org
This article is originally published at https://www.stoltzmaniac.com
Please visit source website for post related comments.