Essentials of Machine Learning Algorithms
In this article, I will explain the type of machine learning algorithms and when you should use each of them. I particularly think that getting to know the types of Machine learning algorithms is like getting to see the Big Picture of AI and what is the goal of all the things that are being done in the field and put you in a better position to break down a real problem and design a machine learning system.
Let’s understand what Machine Learning is. Machine Learning is the study of training machines with historical data to build predictive models for the unknown datasets. Most companies these days are accepting ML in their architecture to speed up their workflow and automate tasks which needed repetitive human intervention. There are several pre-programmed algorithms which are used to build such predictive models and solve either classification, regression, or clustering problems.
The state-of-the-art Machine Learning algorithms could be classified into –
- Supervised Machine Learning – The dataset used in Supervised Learning is labelled which means for each row there is a target variable given. The model is trained with the supervised training set and then tested on the unknown data.
List of Common Supervised Machine Learning Algorithms
- Decision Trees
- Linear Regression
- Logistic Regression
- Support Vector Machines (SVM)
- Neural Networks
- Un-Supervised Learning – Unlike supervised learning, Un-Supervised Machine Learning algorithm, the dataset is unlabelled and needs to be grouped together based on the similarity among the data points. Clustering algorithms used for clustering the data points into different groups.
- Reinforcement Learning – A special type of Machine Learning where the model learns from past actions and it is rewarded for every correct move and penalized for any wrong move taken. Google’s AlphaGo is an example of a Reinforcement Learning application.
In a supervised learning problem, the target variable could be numeric or discrete in nature. Linear regression is one of the first algorithms one should master which takes into account the linear relationship between the independent variables and the continuous dependent variable.
Linear regression predictions are continuous values (i.e., rainfall in cm), logistic regression predictions are discrete values (i.e., whether a student passed/failed) after applying a transformation function.
Logistic regression is best suited for binary classification: data sets where y = 0 or 1, where 1 denotes the default class. For example, in predicting whether an event will occur or not, there are only two possibilities: that it occurs (which we denote as 1) or that it does not (0). So if we were predicting whether a patient was sick, we would label sick patients using the value of 1 in our data set.
Logistic regression is named after the transformation function it uses, which is called the logistic function h(x)= 1/ (1 + ex). This forms an S-shaped curve.
In logistic regression, the output takes the form of probabilities of the default class (unlike linear regression, where the output is directly produced). As it is a probability, the output lies in the range of 0-1.
So, for example, if we’re trying to predict whether patients are sick, we already know that sick patients are denoted as 1, so if our algorithm assigns the score of 0.98 to a patient, it thinks that patient is quite likely to be sick.
This output (y-value) is generated by log transforming the x-value, using the logistic function h(x)= 1/ (1 + e^ -x) . A threshold is then applied to force this probability into a binary classification.
One of the simplest CART algorithms, Decision Tree is interpretable and is not affected by the presence of outliers, or missing values in the data. The root node is chosen based on the feature which carries the maximum information and this iterative process continues in the child nodes as well.
The splitting is stopped when the tree has reached its maximum depth or all instances have been classified. Decision Tree machine learning is prone to overfitting and hence it’s required to set constraints at each step or prune the tree.
Random Forest is a bagging model that reduces the variance in a model. In Random Forest, the data is sampled into many small datasets which could be defined as a parameter. Then on each sampled data, the Decision Tree algorithm is applied and the final output is either the mean of all the outputs or the mode of a class.
Random Forest reduces overfitting and could be used as a dimensionality reduction technique as well. However, it is not interpretable.
An unsupervised learning algorithm where the data needs to be clustered into k groups in such a way that within a cluster, the distance is minimized and is maximum between the two clusters.
The elbow method is used to choose the number of clusters maintaining the maximum variance in the data. Once k is defined, the centroids are initialized and adjusted repeatedly until all the points in a cluster are closest to the centroid.
The advancement in the field of python Machine Learning is endless and several new techniques and algorithms are coming out every now and then to simplify the predictive modelling tasks. This article consisted of the intuition behind some of the basic ML algorithms.
Linear regression and linear classifier. Despite an apparent simplicity, they are very useful on a huge amount of features where better algorithms suffer from overfitting.
Logistic regression is the simplest non-linear classifier with a linear combination of parameters and nonlinear function (sigmoid) for binary classification.
Decision trees are often similar to people’s decision processes and are easy to interpret. But they are most often used in compositions such as Random forest or gradient boosting.
K-means is more primal, but a very easy to understand algorithm, that can be perfect as a baseline in a variety of problems.