Prologue: I have been working and practicing various skills and algorithms as a progress to show on my road-map to become as a matured data scientist. As a part of this expedition I have decided to document all those stuffs I am going through. So whatever you read under this column will be either a summary of my understanding or a post explaining the details of my experiment.

Source code and data-set used in this post can be found @ my github repository https://github.com/shakthydoss/simple-liner-regression-in-r

In part-1 of Prediction using Simple liner regression we have covered the uni-variant linear analysis. Now Lets try the regression analysis for little more complicated problems.

Objective of this writing is to show how to perform liner linear regression analysis with multiple variables in R.

Lets assume we have given a problem and a sample data-set by which we have to come up with a learning algorithm and regression model to predict the unknown variable Y using the known variables X_{i}_{.}

Given data-set containing 44 rows and 4 columns namely index, age of fish, temperature of water in degrees Celsius and length of the fish.

Index |
Age of fish |
Temperature |
Length of fish |

1 |
14 |
25 |
620 |

2 |
28 |
25 |
1315 |

3 |
55 |
25 |
2600 |

.. |
.. |
.. |
.. |

Using the above sample data set we have to predict the unknown variable ,length of fish whose age and temperature is is known(eg. Age = 32 , temp = 34).

Here

X_{i} is an independent variable.

Y is a dependent variable.

Here variable are

X_{1} –> Age

X_{2 }–> Temperature

Y –> Length of fish

Therefore the cost function, the model representation of regression analysis will be

h(x) = ∅_{0} + ∅_{1}x_{1} + ∅_{2}x_{2}

This type of problem is called multi-variant regression or multiple linear regression analysis (because Y is determined by multiple variable X_{i}).

Similar to Part-1 of linear regression analysis, the implementation stage remains the for multi-variant except the model representation step.

In model representation lm function need to tweaked for multiple variables analysis.

Model <- lm(Y ~., dataset)

and then after obtaining the model substitute the values of theta and X.

Model object has vector variable called coefficient. The values in this vector variable corresponds to the values of theta-one, theta-two and so on. Therefore substituting the values in cost function will determine the unknown variable Y.

Source code and data-set used in this post can be found @ my github repository https://github.com/shakthydoss/simple-liner-regression-in-r

## Post a Comment