What are the Various Measures Of Central Tendency Using R?

0

Measures of central tendency is one of the most popular techniques used for data summarization of a series. When you have a large amount of data, then in order to manage them, we use the method called averages. The purpose of its computation is to identify the most representative value among the data items. Therefore, we would deal only with single representation of data rather than having a very large series of observations. This is helpful for comparison purpose and also to understand the characteristics of the series. Averages are also considered under data exploration stages before building statistical models for deriving solutions to your problem.

As you know, the application of measures of central tendency is meaningful if it is only applied to a specific form of data. Initially we need to study the data limitation before choosing an appropriate measures of central tendency.  The following are the most popular measures of central tendency.

  • Arithmetic Mean
  • Median
  • Mode
  • Geometric Mean
  • Harmonic Mean

The computation of arithmetic mean depends on each observation of the series. Hence it is influenced by extreme observation if the series has an outlier. Since arithmetic mean acts as a representation of series of data so it should be only considered for homogeneous observation.  For any larger variation in the series, arithmetic mean may not be a good measure of central tendency. Using R software one could easily obtain the value of the mean using summary function.

Median is referred to as positional average. It is a value that divides the distribution of data into two equal halves. Therefore, one would find a 50% of data above the median value and another 50% of data below the median value.  As being a positional average, the computation of Median does not depend on extreme observations. Hence median is not influenced by outliers. We could find median value using summary function in R. The randomForest library can be used to impute the missing values using Median for numeric variables.

Mode is the value which occurs frequently in a data series. It is easy to compute and is not influenced by extreme observation. The main advantage of Mode is when the variable are categorical, where model value could be determined unlike arithmetic mean and median value. Mode is used for missing value imputation for categorical variables using randomForest library in R. Model can be easily located graphically. You shouldn’t be surprised that the R’s mode function (mode ()) does not provide a model value. It shows the datatype of the particular variable which does not comply with our standard expectation. So how one would find mode using R software? We need to use table function for finding mode. As you know the table function in R provides frequency distribution of the variable. Thus the value with highest frequency is a modal value.

Geometric mean is the only average that is recommended for finding average growth (decline) rates. It is defined as the nth root of the product of n terms. Since it is defined in product terms so the observation shouldn’t be having zero or negative values. Geometric mean is not easy to understand. The presence of few extreme values has no considerable effect on geometric mean. It is also popularly used in banking and insurance sector for finding rates of interest and rates of depreciation, etc. We don’t have a built-in function in R for its computation but one could find it by using its formula directly in R platform.

Harmonic Mean is based on mathematic computation like Arithmetic Mean and Geometric Mean. It is used only with quantitative data. It is defined as the number of observations over the sum of reciprocals of given values. It is however complex to understand and is not a popular measure of central tendency. It is capable of further mathematical treatment as it depends upon every observation in the series. Harmonic Mean is popularly used for finding average distances of a moving body that changes its position from place to place. We don’t have R function for finding harmonic mean, therefore we need to use its formula directly to find its mean.

Liked this? Interested in learning more about R and it’s role in the world of data? Take a look at the R courses Jigsaw Academy offers.

Also take a look at these articles:

My top 5 R packages

Want to use R, but are stuck because your Data Set is too large? We have a solution