Skewness Understaning by CHIRAG
What is skewness?
So far, we’ve understood the skewness of normal distribution using a probability or frequency distribution.
Now, let’s understand it in terms of a boxplot because that’s the most common way of looking at a distribution in the data science space.
The above image is a boxplot of symmetric distribution. You’ll notice here that the distance between Q1 and Q2 and Q2 and Q3 is equal,(Q3-Q2=Q2-Q1)let’s jump to the formula for skewness now:
1) SKEWNESS= MEAN-MODE/STANDARD DEVIATION
2)MODE= 3MEAN-2MEDIAN
3)SKEWNESS= 3(MEAN-MEDIAN)/STANDARD DEVIATION
Common characteristics of all the Normal distributions
- They all are Symmetric
- Mean=Median=Mode
Empirical Rule The empirical rule, also sometimes called the three-sigma or 68-95-99.7 rule, is a statistical rule which states that for normally distributed data, almost all observed data will fall within three standard deviations
- the standard normal distribution is a special case of the normal distribution where the mean = 0 and the SD = 1.
- This distribution is also known as the Z-distribution.
Positive Skewed or Right-Skewed (Positive Skewness)
- Right skewed distributions occur when the long tail is on the right side of the distribution.
- This condition occurs because probabilities taper off more slowly for higher values.
- In positively skewed, the mean of the data is greater than the median .
- MEAN > MEDIAN > MODE
Right Skewed Box Plot
If a box plot is skewed to the right, the box shifts to the left and the right whisker gets longer. As a result, the mean is greater than the median
In the below boxplot, you can see that Q2 is present nearer to Q1. This represents a positively skewed distribution.
Right Skewed Histogram
A histogram is right skewed if the peak of the histogram veers to the left. Therefore, the histogram’s tail has a positive skew to the right.
Negative Skewed or Left-Skewed (Negative Skewness)
- Left skewed distributions occur when the long tail is on the left side of the distribution.
- This condition occurs because probabilities taper off more slowly for lesser values.
- In negatively skewed, the mean of the data is less than the median .
- MODE > MEDIAN > MEAN
Left Skewed Boxplot
If the bulk of observations are on the high end of the scale, a boxplot is left skewed. Consequently, the left whisker is longer than the right whisker.
Left Skewed Histogram
Left skewed histograms are Histograms with long tails on the left.
Rule of thumb :- If the skewness is between -0.5 & 0.5, the data are nearly symmetrical.
- If the skewness is between -1 & -0.5 (negative skewed) or between 0.5 & 1(positive skewed), the data are slightly skewed.
- If the skewness is lower than -1 (negative skewed) or greater than 1 (positive skewed), the data are extremely skewed.
How Do We Transform Skewed Data?
Since you know how much the skewed data can affect our machine learning model’s predicting capabilities, it is better to transform the skewed data into normally distributed data. Here are some of the ways you can transform your skewed data:
- Power Transformation
- Log Transformation
- Exponential Transformation
Note: The selection of transformation depends on the statistical characteristics of the data.
How to calculate Skewness in Python?
And we should get: 0.647511295006068
To calculate the adjusted skewness in Python, pass bias=False as an argument to the skew() function:print(skew(x, bias=False))And we should get:0.7678539385891452-------------------------------------------------------------------Now lets go through some important questions & answers1) If a positively skewed distribution has a median of 50, which of the following statement is true?
A) Mean is greater than 50
B) Mean is less than 50
C) Mode is less than 50
D) Mode is greater than 50
E) Both A and C
F) Both B and D2) Which of the following measures of central tendency will always change if a single value in the data changes?
A) Mean
B) Median
C) Mode
D) All of these3) Which is the best measure of central tendency – Mean, Median, Mode?4)What is the empirical rule?5)What are the different measures of Skewness?=============================================Thank youStay connected for more articles on DATA SCIENCE....













Comments
Post a Comment