Descriptive Statistics - Measures of Location
Updated: Jan 26
"I think that a particle must have a separate reality independent of the measurements. That is an electron has spin, location and so forth even when it is not being measured. I like to think that the moon is there even if I am not looking at it."
As we discussed the importance of descriptive statistics in the previous blog, let us start our EDA with the Measures of location aka Measure of Central tendency. Measures of Location can help determine "where is the center of the data?".
A central tendency (or measure of central tendency) is a central or typical value for a probability distribution. It may also be called a center or location of the distribution. The most common measures of location are:
For all one knows, the most important measure of location is the mean or average value for a variable. The mean is calculated by summing all the entries in a list or array and dividing it by the number of elements in the array. When you calculate mean for a sample, the mean is called sample mean and is denoted by x-bar. When the mean is calculated for the population it is called population mean and is denoted by µ .
The formula for sample mean is given by,
The formula for population mean is given by,
Median is the measure of location most often reported when the data has outlier that inflate the mean. When the data is skewed with outliers , Median is the preferred measure of location. It is not skewed so much by a small proportion of extremely large or small values, and so it may give a better idea of a "typical" value.
In statistics and probability theory, a median is a value separating the higher half from the lower half of a data sample, a population or a probability distribution. For a data set, it may be thought of as "the middle" value.
The Median is the value in the middle when the data are arranged in ascending order from smallest to largest value.
With an odd number of observations, the median is the middle value.
For an even number of observations, the median is the average of two middle values.
The mode is the value that appears most often in a set of data values. If X is a discrete random variable, the mode is the value x (i.e, X = x) at which the probability mass function takes its maximum value. In other words, it is the value that is most likely to be sampled.
Mode is the value that occurs with greatest frequency.
If the data contains exactly two modes, we say that the data are bimodal.
If the data contain more than two modes, we say that the data are multimodal. In Multimodal cases, the mode is almost never reported because listing three or more modes is not particularly helpful in describing central location of the data(Mulitple Local maxima).
The mean, median and mode of the above distribution are 20.0, 20.0, 16.9
The codes used for the above plot is as below.
When we calculated the sample mean and the population mean, we gave equal importance to all the observations. In weighted Mean, The mean is obtained by assigning each observation a weight that reflects its importance.
The formula for Weighted Mean is given by,
The geometric mean is a measure of location that is calculated by finding the nth root of the product of n values. The geometric mean is often used in analyzing the growth rates in financial data.
The general formula for the geometric mean is as follows:
Hope this post was helpful!!. If you’re interested to read more, you can subscribe and be notified when the next article is published.