Prediction using Simple liner regression in R – part 2

Prologue: I have been working and practicing various skills and algorithms as a progress to show on my road-map to become as a matured data scientist. As a part of this expedition I have decided to document all those stuffs I am going through. So whatever you read under this column will be either a summary of my understanding or a post explaining the details of my experiment. 

Source code and data-set used in this post can be found @ my github repository

In part-1 of Prediction using Simple liner regression we have covered the uni-variant linear analysis. Now Lets try the regression analysis for little more complicated problems.

Objective of this writing is to show how to perform liner linear regression analysis with multiple variables in R.

linear multiple regression

Lets assume we have given a problem and a sample data-set by which we have to come up with a learning algorithm and regression model to predict the unknown variable Y using the known variables Xi.

Given data-set containing 44 rows and 4 columns namely index, age of fish, temperature of water in degrees Celsius and length of the fish.


Age of fish


Length of fish

















Using the above sample data set we have to predict the unknown variable ,length of fish whose age and temperature is is known(eg. Age = 32 , temp = 34).


Xi is an independent variable.

Y is a dependent variable.

Here variable are

X1 –> Age

X2 –> Temperature

Y –> Length of fish

Therefore the cost function, the model representation of regression analysis will be

h(x) = ∅0 + ∅1x1 + ∅2x2

This type of problem is called multi-variant regression or multiple linear regression  analysis (because Y is determined by multiple variable Xi).

Similar to Part-1 of linear regression analysis, the implementation stage remains the for multi-variant except the model representation step.

In model representation lm function need to tweaked for multiple variables analysis.

Model <- lm(Y ~., dataset)

and then after obtaining the model substitute the values of theta and X.

Model object has vector variable called coefficient. The values in this vector variable corresponds to the values of theta-one, theta-two and so on. Therefore substituting the values in cost function will determine the unknown variable Y.

Source code and data-set used in this post can be found @ my github repository

Post a Comment

Your email is never published nor shared. Required fields are marked *