Most Popular Regression Algorithms in Machine Learning
Machine Learning has wide business applications across various domains. Most popularly Machine Leaning is used in recommendation engines, fraud detection, even supply chain, inventory planning, image recognition, Amazon’s Alexa and much more. While the algorithms are stemmed from traditional data analytics, it’s the approach that makes machine learning palatable in the data age. Machine Learning focuses on prediction and can make data analysis efficient by looking at humungous amounts of data simultaneously. It monitors accuracy over statistical significance.
Let’s look at the overview of various Machine Learning Algorithms. These are broadly categorized into the following –
- Supervised Learning – In traditional analytics, what we call as a target variable is referred to as label in machine learning. In Supervised Learning inductive inference is used to infer a predictive relationship between data points and labels. Examples – Linear Regression, Logistic Regression, Decision Trees, Naïve Bayes Classification.
- Unsupervised Learning – This is typically used a data mining technique to discern various patterns or structure in the data. It is not directed by any label. Example: Clustering, neural nets.
- Reinforcement Learning – This is a branch of Artificial Intelligence allowing systems to automatically determine (using trial and error) the ideal behavior within a specific context, to maximize its performance and make as accurate decisions as possible. Simple reward feedback also known as the reinforcement signal helps in the learning behavior and hence controlling a system. Example: Markov Decision Process
There are many algorithms used in Machine Learning but here we will look at only some of the most popular ones.
- Linear Regression: Linear Regression is used in problems where the label is of continuous nature e.g. Sales of a retail chain. It consists of ordinary least squares method fitting the best line that minimizes the sum of squared errors between the predicted and actual data points. Such algorithms are mostly used to for decision-making process to solve problems like – What should be my best marketing mix to optimize sales given the investment in various marketing channels or maximizing sales based on store layouts or pricing and promotional offers.
- Logistic Regression: When the label is that of categorical or discrete nature, we use log odds ratio to optimize business problems such as – scoring customers and then predict those that are most likely to default on the loan payment, or predict higher response rates to a certain marketing strategy.
- Clustering/ K-Means: This is a undirected or unsupervised data mining activity typically seen to be used in problems that involve market segmentation, fraud detection, recommendation engines, clustering web pages by similarity.
- Support Vector Machines (SVM) : SVMs are most popularly used in ML to deal with problems related to image segmentation, the stock market, text categorization and biological sciences.
- Decision Trees: Decision tree methods construct a tree of predictive decisions made based on actual values of attributes in the data. Decision trees are used for classification and regression problems.
- Naïve Bayes: This is mostly used in text mining, sentiment analysis, document categorization, spam filtering, disease prediction. Naïve Bayes Classifier is based on the Bayes Theorem of Probability and assumes independence of attributes also known as conditional independence assumption.
There are many others such as Artificial Neural Networks, PCA, Gradient Boost, Apriori, Random Forest. Most of Machine Learning algorithms are a black box as opposed to traditional analytics.