• SoniaSamipillai

Descriptive Statistics - Types of Measures

Updated: Jan 25, 2021

"Data really powers everything that we do.” – Jeff Weiner

Before we jump into talking about the Descriptive statistics and the various measures, I would very much like to set the stage for you. You are a business consultant and your manager assigns three projects to you.

Project 1:

Perform Market overview for a leading beverage client. You are given SKU level sales data.

Project 2:

Measure the performance of the email campaign that was sent to a targeted group of customers. Dataset is at customer transaction level.

Project 3:

The KPI from last week's report shows that the sales have declined. What caused the decline? Analyze the variables and report the results to the management. You will have to access multiple datasets to address the business problem.

Let the problem be simple or complicated. Let the dataset be big or small. Let the client be high profile client or a beginner. If you have to provide solution, you should know your problem better. If you want to know your problem better, you have to know your data. Your analysis starting point is understanding your data. Always lay the groundwork with an exploratory data analysis(EDA) on the dataset. Know every nook and corner of your dataset. How do you do that? The stage is all set and now you know the importance of what you are going to read; The foundation of all Giants, here comes Descriptive Statistics.

Given a dataset , Statistics helps us establish two main goals ; one is to describe the data and two, draw conclusions from the data. Based on these two goals, statistics can be branched as Descriptive Statistics and Inferential Statistics. Descriptive Statistics helps answer the questions on the general characteristics of the given data and allows you to characterize your data based on its properties. Inferential Statistics takes you to the next level where you mine recommendations and insights from the data.

Know your Jargons! If the measures are computed for data from a population, they are called population parameters and if the measures are computed from a sample, they are called sample statistics. In statistical inference, a sample statistic is referred to as the point estimator of the corresponding population parameter.

There are five major types of descriptive statistics. They are as follows:

  1. Measures of Location

  2. Measures of Variability

  3. Measures of Distribution Shape

  4. Measures of Relative Locations

  5. Measures of Association

In addition to the above five, we have 'five number summary' and 'seven number summary' used widely.

Measures of Location or Central Tendency

Measures of Location can help determine "where is the center of the data?". A central tendency (or measure of central tendency) is a central or typical value for a probability distribution. It may also be called a center or location of the distribution.

The most common measures of location are:

  1. Mean

  2. Median

  3. Mode

  4. Geometric Mean

  5. Minimum

  6. Maximum

Measures of Variability (or Dispersion or Scatter or Spread)

Measures of Variability helps determine the extent to which a distribution is stretched or squeezed.

The most common measures of variability are:

  1. Range

  2. Interquartile Range

  3. Standard Deviation

  4. Variance

  5. Mean absolute difference

  6. Co-efficient of Variation

Measures of Distribution Shape

The histogram can give a general idea of the shape, but two numerical measures of shape give a more precise evaluation: They are as follows:

1. Skewness tells you the amount and direction of skew (departure from horizontal symmetry), and

2. kurtosis tells you how tall and sharp the central peak is, relative to a standard bell curve.

Measures of Relative Locations

Measures of relative standing, or relative locations are measures that can be used to compare values from different data sets, or to compare values within the same data set. The most common ones are :

  1. Quartile

  2. Percentile

  3. Z-Score

  4. Minimum

  5. Maximum

Measures of Association

Measures of Association quantify a relationship between variables. Association exists if the distribution of one variable is related to the distribution of another variable. Measures of Association will help determine the extent to which a change in the value of one variable is related to a change in the value of another. The most common ones are

  1. Correlation coefficient

  2. Covariance

Five Number Summary

The five-number summary is a set of descriptive statistics that provides information about a dataset. It consists of the five most important sample percentiles:

  1. the sample minimum (smallest observation)

  2. the lower quartile or first quartile

  3. the median (the middle value)

  4. the upper quartile or third quartile

  5. the sample maximum (largest observation)

Seven Number Summary

The seven number summary contains the following percentiles which are (approximately) evenly spaced under a normally distributed variable:

  1. the 2nd percentile

  2. the 9th percentile

  3. the 25th percentile or lower quartile or first quartile

  4. the 50th percentile or median (middle value, or second quartile)

  5. the 75th percentile or upper quartile or third quartile

  6. the 91st percentile

  7. the 98th percentile

Everyone who associates with data, should be well versed with the above metrics and their applications. These measures are important and crucial in understanding the dataset you handle. Knowledge of these numbers will not only be a feather in your hat but also lays the foundation for comprehending complex models. Glad that you made it to the end of this post. Well done! To know more about the above, please read the individual blog posts for every individual measure.

Hope this post was helpful!!. If you’re interested to read more, you can subscribe and be notified when the next article is published.

23 views0 comments

Recent Posts

See All