Visualizing Data Using Chernoff faces


It makes sense to start New Year on a lighter note. In this post, I will talk about a fun way to visualize multivariate data using a statistical device called Chernoff’s faces. The goal of Chernoff’s faces is to show a bunch of variables at once via facial features like lips, eyes, and nose size. Most of the time there are better solutions, but the faces can be interesting to work with.

In this post, we will analyse cricket data and we will try to compare 5 Indian batsmen: Sachin, Gambhir, Sehwag, Dhoni and Yuvraj Singh. A natural question that one might ask is how do we compare these 5 players? One can look at several metrics like strike rate, batting average etc and form an opinion. But looking at these metrics in a table might not be very soothing to eyes! One can create Chernoff faces and compare a lot of metrics across these players.

In this particular post, these players were compared on the following metrics:

1. Batting average

2. Strike rate

3. Number of fours per match

4. Number of sixers per match

5. Ratio of Innings to total matches played

To create Chernoff faces these metrics were mapped to certain facial features as given in the table below:

MetricsFacial Features
Batting averageHeight of face
Strike rateCurve of smile
Number of fours per matchWidth of eyes
Number of sixers per matchHeight of eyes
Ratio of Innings to total matches playedWidth of face

The data was collected from  Following R code was used to create the graphic:

setwd(“F:\\Work\\Jigsaw Academy\\Blogs\\January 16”)








Here is the result:

Screen Shot 2016-01-18 at 11.56.45 AM

As can be seen, the happiest face seems to be of Sehwag, no surprise there since he has the highest strike rate, a variable mapped to curve of the smile.  Also, notice Dhoni has a very long face, this is again due to the fact because the batting average is mapped to the height of face and Dhoni has a very good batting average.

Another thing to notice is the width of eyes for both Dhoni and Yuvraj, its very small, testament to the fact that both these players, although very good stroke makers, made a lot of runs by running between the wickets.

Hope you enjoyed this post. Indeed analytics can be fun!

Suggested Read:

Geospatial Visualizations- Extracting Data with lat-long Information From Shape Files

If You’re a Data Analyst you Should Read this Review of Hadley’s readr 0.1.0 Right Now