Getting Started with your first Machine Learning Algorithm - Linear Regression || First step towards ML

Linear Regression is a Supervised Learning Algorithm. This is probably the first algorithm taught when you start learning ML. Now as a beginner you must be wondering "What is a Supervised Learning??". So machine learning can be divided mainly into 3 types:
  1. Supervised Learning
  2. Unsupervised Learning
  3. Reinforcement Learning
Supervised Learning is a technique in which we use a labelled dataset to train a model. In short, you can say there is a supervisor who will tell whether the predictions model is making is correct or not. This supervisor decides about based on the labels provided in the datasets. Labels are the correct values for a particular training example. We will deal later with the other two techniques of ML.

In supervised learning, we can classify problems into 2 different categories:
  1. Regression - problems in which we need to predict continuous value as an output.
  2. Classification - where we need to classify the training examples in a particular category.
We plot all the training examples in the datasets and try to find a line that best fits our datasets (as shown in fig). Based on that line we make predictions on unseen examples. So basically linear regression is all about finding the best fit line.


Let's work on some math of Linear Regression Algorithm. First of all, we will understand a simple linear regression in which we will have only 2 values in a dataset one for X and another for Y

Math Behind Linear Regression

Before diving in we will check out some of the notations 

We need to find an equation of a Straight Line that best fits out dataset. Let us assume the equation of a straight line to be - 

y = Wx +b    --- (1)

based on this equation we will construct a hypothesis - 

h(x) = Wx + b    ---(2)

Here, W and b are 2 parameters of a straight line. Now our objective is to find such values of W and b that our hypothesis can give us the most accurate prediction so that we have a minimum error.

Cost Function

To check how good our hypothesis is we will use a Cost Function. Cost Function will help us decide what value our parameters are better for making predictions. The cost function is denoted by J. 

J(w,b)=2m1i=1m(h(xi
)
yi)

Here,
  •  Xi represents x value for any ith example
  • similarly, the y value is represented by Yi.
  • h(xi) is the estimated value by a hypothesis.
  • m is the number of training example in our dataset.
We are trying to figure out that the value which we estimated with the hypothesis is close to the actual value or not. The difference gives us error for a particular example. Then we will take an average of it.

So, this was to find out whether our parameters have optimal values or not. Now, we will try to improve our parameters values.

Gradient Descent

Gradient Descent is an algorithm we use to get better values of our parameters. The process of building a successful model and giving predictions is an iterative task. We need to do it repeatedly to find which values of our parameter can give better results.
It is slightly difficult to discuss the whole process behind Gradient descent. I will give a better intuition behind it in another blog. So, here we will just look at the formulas.

W=WαWJ(W,b)

b=bαbJ(W,b)


The α is called a learning rate. We will discuss more about it later.
We need to repeatedly follow these two equations to find a better and better value. The cost function can help us decide which is the best value of our parameter. These two formulas are used to update the values of our parameters. The generalised form of this equation is as follow - 

θj=θjαθjJ

Currently, you must be wondering how this formula came? and why it will give us better values of parameters? and many more. But, once you will read my blog on Gradient Descent you will get a clear picture of this. I will share the link here as soon as I publish that blog. Till then just go with the flow.

So, now we have our parameters which can give best-fit line to our dataset. Now, our model can give the most accurate. To measure the model's accuracy we can use Mean Squared Error (MSE).
Now, look at the GIF above. It might give you a good understanding of Linear Regression.
In the next blog, I will try to cover Gradient Descent's better picture and Implementation of Linear Regression in Python. Stay tuned for it. If you have any doubts you can leave comments here and your feedbacks are always appreciated. Thanks for reading it.

Comments

Popular posts from this blog

Hierarchical Clustering - An Unsupervised Learning Algorithm

Implementing Hierarchical Clustering - In Python Programming language

Pose Estimation || Application of Computer Vision and Deep Learning