BANL 6100: Business Analytics

Describing Data – II

Mehmet Balcilar
mbalcilar@newhaven.edu

Univeristy of New Haven

2023-09-28 (updated: 2023-10-25)

1 / 17

Skewness & Kurtosis2 / 17

Introductiona normal distribution to be symmetrical in shape
real-world data are often have asymmetric distributions 
asymmetry in a distribution is measure by skewness
kurtosis (peakedness) defines whether a distribution is truly normalor whether it may have so-called fatter or thinner tails

3 / 17

Symmetric Distributions – I

A distribution is symmetric if the relative frequency or probability of certain values are equal at equal distances from the point of symmetry.
The point of symmetry for normal distributions is the mean (and at the same time median and mode!)
The most common symmetric distribution is the normal distribution.
However, there are a number of other distributions that are symmetric.

A symmetric distribution has the following property:

$Q_{3} - Q_{2} = Q_{2} - Q_{1}$ where $Q_{1}$ , $Q_{2}$ , and $Q_{3}$ are 1st, 2nd, and 3rd quartiles. Thus the ratio $(Q_{3} - Q_{2}) / (Q_{2} - Q_{1})$ can be used as a measure of asymmetry.

4 / 17

Symmetric Distributions – II

Have a look at following histogram:

This distribution meets all of the conditions of being symmetrical.

5 / 17

Skewness

Skewness is the degree of distortion or deviation from the symmetrical normal distribution.
Skewness can be seen as a measure to calculate the lack of symmetry in the data distribution.
Skewness helps you identify extreme values in one of the tails. Symmetrical distributions have a skewness of 0.

Positive Skewness

A distribution is positively (right) skewed when the tail on the right side of the distribution is longer (also often called "fatter").
When there is positive skewness, the mean and median are bigger than the mode.

Negative Skewness

Distributions are negatively (left) skewed when the tail on the left side of the distribution is longer or fatter than the tail on the right side.
When there is negative skewness, the mean and median are smaller than the mode.

6 / 17

Typs of Skewness

7 / 17

Fisher-Pearson coefficient of skewness

For univariate data $x_{1}, x_{2}, . . ., x_{n}$ the formula for skewness is:

$g_{1} = \frac{\frac{1}{n} \sum_{i = 1}^{n} (x_{i} - \bar{x})^{3}}{s^{3}}$

where $\bar{x}$ is the mean, $s$ is the standard deviation, and $n$ is the number of data points.

The Fisher-Pearson coefficient of skewness is the most commonly used measure of skewness.

8 / 17

Interpreting Fisher-Pearson coefficient of skewness

The rule of thumb:

A skewness between -0.5 and 0.5 means that the data are pretty symmetrical
A skewness between -1 and -0.5 (negatively skewed) or between 0.5 and 1 (positively skewed) means that the data are moderately skewed.
A skewness smaller than -1 (negatively skewed) or bigger than 1 (positively skewed) means that the data are highly skewed.

9 / 17

Pearson Mode Skewness

The Pearson mode skewness is used when a strong mode is exhibited by the sample data.

For univariate data $x_{1}, x_{2}, . . ., x_{n}$ the formula for Pearson mode Skewness is:

${S k}_{1} = \frac{\bar{x} - m_{o}}{s}$

where $\bar{x}$ is the mean, $s$ is the standard deviation, and $m_{o}$ is the mode of data points.

Interpretation:

The direction of skewness is given by the sign.
The coefficient compares the sample distribution with a normal distribution. The larger the value, the larger the distribution differs from a normal distribution.
A value of zero means no skewness at all.
A large negative value means the distribution is negatively skewed.
A large positive value means the distribution is positively skewed.

10 / 17

Pearson's Second Coefficient (Pearson Median Skewness)

Pearson's second coefficient is used when the data includes multiple modes or a weak mode.

For univariate data $x_{1}, x_{2}, . . ., x_{n}$ the formula for Pearson mode Skewness is:

${S k}_{2} = \frac{3 (\bar{x} - m_{d})}{s}$

where $\bar{x}$ is the mean, $s$ is the standard deviation, and $m_{d}$ is the median of data points.

It has the sam interpretation as the Pearson mode skewness.

11 / 17

Remedies for Skewewness

You generally have three choices if your statistical procedure requires a normal distribution and your data is skewed:

Do nothing. Many statistical tests, including t tests, ANOVAs, and linear regressions, aren’t very sensitive to skewed data. Especially if the skew is mild or moderate, it may be best to ignore it.

Use a different model. You may want to choose a model that doesn’t assume a normal distribution. Non-parametric tests or generalized linear models could be more appropriate for your data.

Transform the variable. Another option is to transform a skewed variable so that it’s less skewed. “Transform” means to apply the same function to all the observations of a variable.

12 / 17

Transformations Based on the Type of Skewness

Type of skew	Intensity of skew	Transformation
Right	Mild	Do not transform
	Moderate	Square root
	Strong	Natural log
	Very strong	Log base 10
Left	Mild	Do not transform
	Moderate	Reflect* then square root
	Strong	Reflect* then natural log
	Very strong	Reflect* then log base 10

*In this context, “reflect” means to take the largest observation, $x_{l}$ , then subtract each observation from $x_{l} + 1$ . Keep in mind that the reflection reverses the direction of the variable and its relationships with other variables (i.e., positive relationships become negative).

13 / 17

Kurtosis

Kurtosis deals with the lengths of tails in the distribution.
It is a measure of peakedness (or tailedness) of the distribution relative to a normal distribution

Where skewness talks about extreme values in one tail versus the other, kurtosis aims at identifying extreme values in both tails at the same time!

You can think of Kurtosis as a measure of outliers present in the distribution.

The distribution denoted in the image above has relatively more observations around the mean, then a steep decline and longer tails compared to the normal distribution.

14 / 17

Measuring Kurtosis

For univariate data $x_{1}, x_{2}, \dots, x_{n}$ the formula for kurtosis is:

$k = \frac{\frac{1}{n} \sum_{i = 1}^{n} (x_{i} - \bar{x})^{4}}{s^{4}}$

If there is a high kurtosis, then you may want to investigate why there are so many outliers.

Low kurtosis in a data set is an indication that data has light tails or lacks outliers. If we get low kurtosis, then also we need to investigate and trim the dataset of unwanted results.

Excess kurtosis

In practic, excess kurtosis, which is defined as Pearson's kurtosis minus 3, to provide a simple comparison to the normal distribution.

$k_{e} = k - 3 = \frac{\frac{1}{n} \sum_{i = 1}^{n} (x_{i} - \bar{x})^{3}}{s^{3}} - 3$

15 / 17

Types of Kurtosis

Mesokurtic $(k \approx 3)$

A mesokurtic distribution has kurtosis statistics that lie close to the ones of a normal distribution. Mesokurtic distributions have a kurtosis of around 3. According to this definition, the standard normal distribution has a kurtosis of 3.

Platykurtic $(k < 3)$

When a distribution is platykurtic, the distribution is shorter and tails are thinner than the normal distribution. The peak is lower and broader than Mesokurtic, which means that the tails are light and that there are fewer outliers than in a normal distribution.

Leptokurtic $(k > 3)$

When you have a leptokurtic distribution, you have a distribution with longer and fatter tails. The peak is higher and sharper than the peak of a normal distribution, which means that data have heavy tails and that there are more outliers.

16 / 17

Types of Kurtosis


Category

Mesokurtic


Platykurtic

Leptokurtic



Tailedness
Medium-tailed
Thin-tailed
Fat-tailed

Outlier frequency
Medium
Low
High

Kurtosis
Moderate (3)
Low (< 3)
High (> 3)

Excess kurtosis
0
Negative
Positive

Example distribution
Normal
Uniform
Laplace

17 / 17

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

	Category
	Mesokurtic	Platykurtic	Leptokurtic
Tailedness	Medium-tailed	Thin-tailed	Fat-tailed
Outlier frequency	Medium	Low	High
Kurtosis	Moderate (3)	Low (< 3)	High (> 3)
Excess kurtosis	0	Negative	Positive
Example distribution	Normal	Uniform	Laplace

BANL 6100: Business Analytics

Describing Data – II

Mehmet Balcilar mbalcilar@newhaven.edu

Univeristy of New Haven

2023-09-28 (updated: 2023-10-25)

Skewness & Kurtosis

Introduction

Symmetric Distributions – I

Symmetric Distributions – II

Skewness

Positive Skewness

Negative Skewness

Typs of Skewness

Fisher-Pearson coefficient of skewness

Interpreting Fisher-Pearson coefficient of skewness

Pearson Mode Skewness

Interpretation:

Pearson's Second Coefficient (Pearson Median Skewness)

Remedies for Skewewness

Transformations Based on the Type of Skewness

Kurtosis

Measuring Kurtosis

Excess kurtosis

Types of Kurtosis

Types of Kurtosis

Skewness & Kurtosis

Help

Mehmet Balcilar
mbalcilar@newhaven.edu