Stats (1 Viewer)

NexusRich

Member
Joined
Jul 6, 2019
Messages
87
Gender
Male
HSC
2021
In stats, why is the median sometimes a better representation of the middle value compared to the mean ? Also, when comparing two sets of data, how does the median suggest anything about the skewness of the data set ?
 

username_2

Active Member
Joined
Aug 1, 2020
Messages
116
Gender
Male
HSC
2020
In stats, why is the median sometimes a better representation of the middle value compared to the mean ? Also, when comparing two sets of data, how does the median suggest anything about the skewness of the data set ?
A good question. When we are analysing a large data set with a symmetric distribution (i.e. normally called a Gaussian curve or a normal distribution), it is generally the case that the mean is ~ equal to the median. But when I have a skewed distribution, the mean is also skewed towards the majority of the points - especially because of the fundamental nature of the standard mean formula which is (all x sum)/(number of x). Where if there are a lot of values near one and some values near 4, the mean will be near 1 - but that is not necessarily true for the mean. This is because median is semi-independent of the concentration of points as we are deducting the points from the edges towards the center, hence giving a more "central" value.

When we a comparing two data sets with a different skew but similar x-values, I would think that (if the values of x were within a certain equal range) the direction the median is from the middle of the range represents the skew of the distribution. Suppose we have two data sets confined within the values x=1 and x=5. One is skewed towards 1 (who median will be closer to 3 than 1) and the other skewed towards 4.5 (whose median is closer to 3 than 4.5). Hence, from the two medians, we can tell that since the first distribution is a bit towards 1 and not exactly at 3, it is skewed towards it whereas it is the opposite for the second distribution.

I might not have been very clear but I am trying my best. Hope it helps (i have forgotten which is positively skewed and which is negatively skewed hence the little confusion in the discussion. Sry)
 

Trebla

Administrator
Administrator
Joined
Feb 16, 2005
Messages
8,403
Gender
Male
HSC
2006
The mean is based on the values of the data points. The median is based on the rank of the data points, rather than their value. The former is therefore sensitive to big outliers or extreme values which can distort the measure of the “centre”. The median does not care how big/small the extreme values are (it treats them the same way) so it could be a better measure.

The median may tell you how skewed the data is (assuming the dataset is large enough and “well behaved”) if you compare it to points such as the 1st and 3rd quartiles or the max/min (a boxplot is a nice way to visualise this). If the median value is very close to say the maximum value but much further away from the minimum value then this suggests skewness.
 

Users Who Are Viewing This Thread (Users: 0, Guests: 1)

Top