Descriptive Statistics - Measures of Relative Location - Part2
Updated: Jan 26, 2021
A person who is gifted sees the essential point and leaves the rest as surplus.
In addition to the Measures of Relative Location that were discussed in Part -1 , there are two important theorems that speak about the relative location of values within a dataset. They are as follows:
Chebyshev's theorem and Empirical rule talks about the proportion of data values that must be within a specified number of standard deviations of the mean.
Chebyshev's Theorem states that,
At least (1-1/z**2) of the data values must be within z standard deviations of the mean, where z is any value greater than 1. Chebyshev's theorem requires that z >1 but z need not be an integer.(for eg., It can also be a decimal value).
When z = 2,3 and 4 standard deviations,
Atleast 75% or 0.75 values must be within z=2 standard deviations of the mean.
Atleast 89% or 0.89 values must be within z=3 standard deviations of the mean.
Atleast 94% or 0.94 values must be within z=4 standard deviations of the mean.
Chebyshev's theorem can be applied to any dataset regardless of the shape of its distribution.
The Empirical rule is based on the Normal distribution. It applies only when the data follows a normal distribution and exhibits a symmetrical bell-shaped curve.
for a data having Normal distribution,
Approximately 68% of the data values will be within one standard deviation from the mean.
Approximately 95% of the data values will be within two standard deviation from the mean.
Approximately 99% of the data values will be within three standard deviation from the mean.
Outliers are values that are extremely small or extremely large when compared to the other values in the distribution and tends to skew the data. Analysts always keep an eye out for the outliers and treating outliers is in itself an expansive topic. Two of the most common measures used for identifying outliers are:
Lower and upper limits calculation using Interquartile range
In a given dataset, values that lie beyond 3 standard deviations from the mean are considered to be outliers.
Using IQR the lower and upper boundaries are calculated as follows:
Lower_limit = Q1 - 1.5(IQR)
Upper_limit = Q3 + 1.5(IQR)
In a given dataset, values that fall outside of these lower and upper limits are considered outliers. The Measures Z-score, Q1, Q3 and IQR are discussed in detail here.
Hope this post was helpful!!. If you’re interested to read more, you can subscribe and be notified when the next article is published.