• Today is: Saturday, May 28, 2016

Cluster analysis for business

Gaurav Vohra
January07/ 2011

Clustering is the process of grouping observations of similar kinds into smaller groups within the larger population. It has widespread application in business analytics. One of the questions facing businesses is how to organize the huge amounts of available data into meaningful structures.Or break a large heterogeneous population into smaller homogeneous groups. Cluster analysis is an exploratory data analysis tool which aims at sorting different objects into groups in a way that the degree of association between two objects is maximal if they belong to the same group and minimal otherwise.


Business application of clustering

A grocer retailer used clustering to segment its 1.3MM loyalty card customers into 5 different groups based on their buying behavior. It then adopted customized marketing strategies for each of these segments in order to target them more effectively.

Fresh Food LoversOne of the groups was called ‘Fresh food lovers’. This comprised of customers who purchase a high proportion of organic food, fresh vegetables, salads etc. A marketing campaign that emphasized the freshness of the fruits and vegetables and year-round availability of organic produce in the stores appealed to this customer group.

Convenience JunkiesAnother cluster was called ‘Convenience junkies’. This comprised of people who shopped for cooked/semi-cooked, easy-to prepare meals. A marketing campaign focusing on the retailer’s in-house line of frozen meals as well as the speed of the check-out counters at the store worked well with this audience.

In this way the retailer was able to deliver the right message to the right customer and maximize the effectiveness of its marketing.

Features of clustering

Clustering is an undirected data mining technique. This means it can be used to identify hidden patterns and structures in the data without formulating a specific hypothesis. There is no target variable in clustering. In the above case, the grocery retailer was not actively trying to identify fresh food lovers at the start of the analysis. It was just attempting to understand the different buying behaviors of its customer base.

Clustering is performed to identify similarities with respect to specific behaviors or dimensions. In our example, the objective was to identify customer segments with similar buying behavior. Hence, clustering was performed using variables that represent the customer buying patterns.

Cluster analysis can be used to discover structures in data without providing an explanation or interpretation. In other words, cluster analysis simply discovers patterns in data without explaining why they exist. The resulting clusters are meaningless by themselves. They need to be profiled extensively to build their identity i.e. to understand what they represent and how they are different from the parent population.

In the retailer’s case, each cluster was profiled on its buying behavior. Customers in cluster 1 spent a quarter of their total spend on fresh, organic produce. This was significantly higher than other customers who spent less than 5% on this category. This segment of customers was called ‘Fresh food lovers’ as this is what distinguished them from the rest of the customers.

Types of clustering

There are different algorithms available for clustering, and each of them may give a different set of clusters. The choice of a particular method will depend on the objective of clustering, the type of output desired, the hardware and software facilities available and the size of the dataset. In general, clustering techniques may be divided into two categories based on the cluster structure which they produce.

Types of Clustering

The non-hierarchical methods divide a dataset of N objects into M clusters. K-means, a non-hierarchical technique, is the most commonly used one in business analytics.

The hierarchical methods produce a set of nested clusters in which each pair of objects or clusters is progressively nested in a larger cluster until only one cluster remains.

When to use clustering?

Clustering is primarily used to perform segmentation, be it customer, product or store. We have already talked about customer segmentation using cluster analysis in the example above. Similarly products can be clustered together into hierarchical groups based on their attributes like use, size, brand, flavor etc; stores with similar characteristics – similar sales, size, customer base etc, can be clustered together.

Clustering can also be used for anomaly detection, for example, identifying fraud transactions. Cluster detection methods can be used on a sample containing only good transactions to determine the shape and size of the “normal” cluster. When a transaction comes along that falls outside the cluster for any reason, it is suspect. This approach has been used in medicine to detect the presence of abnormal cells in tissue samples and in telecommunications to detect calling patterns indicative of fraud.

Clustering is often used to break large set of data into smaller groups that are more amenable to other techniques. For example, logistic regression results can be improved by performing it separately on smaller clusters that behave differently and may follow slightly different distributions.

In summary, clustering is a powerful technique to explore patterns structures within data and has wide applications is business analytics. There are various methods for clustering. An analyst should be familiar with multiple clustering algorithms and should be able to apply the most relevant technique as per the business needs.

Interested in learning about other Analytics and Big Data tools and techniques? Click on our course links and explore more.
Jigsaw’s Data Science with SAS Course – click here.
Jigsaw’s Data Science with R Course – click here.
Jigsaw’s Big Data Course – click here.

You may also like

5 Super Tips to Improve Your Linear Regression Mod... Fun Fact- Do you know that the first published picture of a regression line illustrating this effect, was from a lecture presented by Sir Francis G...
Want to Know How to Enter a Matrix in R but Have O... If yes, here is a quick tutorial… Let’s begin by understanding a Matrix. Well, a Matrix is an array of numbers, in a combination of rows and co...
ANOVA & Chi Square Using the Language of SAS-... By: Kafeel Basha- Jigsaw Academy Faculty ANOVA SAS® (Statistical Analysis Software) is an analytic software suite which has the largest market sha...
Hypothesis testing for difference between means: a... Often times, in businesses and outside, people need to evaluate whether the parameters of two populations are alike or different. A pharmaceutical com...

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>