Search for notes by fellow students, in your own course and all over the country.
Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.
Title: MEASURES OF CENTRAL TENDENCY AND DISPERSION
Description: THESE ARE NOTES FOR INTRODUCTION TO STATISTICS AND THEY ARE VERY SIMPLIFIED
Description: THESE ARE NOTES FOR INTRODUCTION TO STATISTICS AND THEY ARE VERY SIMPLIFIED
Document Preview
Extracts from the notes are below, to see the PDF you'll receive please use the links above
CHINHOYI UNIVERSITY OF TECHNOLOGY
SCHOOL OF NATURAL SCIENCES AND MATHEMATICS
DEPARTMENT OF MATHEMATICS
MEASURES OF CENTRAL TENDENCY AND DISPERSION
INTRODUCTION
From the previous unit, the Graphical displays of statistical data are useful as a means of
communicating broad overviews of the behaviour of a random variable
...
The behaviour pattern of any random
variable can be described by a measure of Central tendency and Spread (dispersion) of
observations about this central value
...
A central tendency statistic represents a typical value or middle data point of a set of
observations and are useful for comparing data sets
...
Each measure will be computed for Ungrouped data (raw data) and Grouped data (data
summarised into a frequency distribution)
...
s the shorthand notation for the sum of n individual observation i
...
+……+
Page 1 of 16
=
+
Grouped Data
Grouped data is represented by a frequency distribution
...
Thus the sum of all the observation
cannot be determined exactly
...
The computed mean is an approximation of the actual arithmetic mean
...
For your practice attempt examples in the Tutorial Work Sheet
...
The sum of the deviation of each observation from the mean value is equal zero
...
e
...
This makes the mean an unbiased statistical measure of central location
Drawbacks of the mean
It is affected or distorted by extreme values (Outliers) in the data
It is not valid to compute the mean for nominal- or ordinal-scaled data
...
In other words it
is the most frequently occurring value in a data set
...
However, if the number of observations is too large, the mode can be found by arranging
the data in ascending order and by inspection identify that value that occurs frequently
...
What is the modal colour?
Solution: The modal colour is Blue, because it appears most, with a frequency of 4
...
e
...
The mode lies in this class and then calculate the mode using the formula
Mode
c ( f1 f 0 )
2 f1 f 0 f 2
Where is the lower limit of modal class, f 1 is the frequency of the modal class, f 0 is the
frequency of the class preceding the modal class, f 2 is the frequency of interval succeeding the
modal class and c is the width of the modal class
...
Test Mark, x
Frequency
5-10
3
10-15 15-20
5
7
20-25
2
Solution:
We seek to invoke the formula
c( f 1 f 0 )
Mode
2f f f
0
2
1
Where 15-20 is the modal class, = 15, f 1 = 7, f 0 = 5, f 2 = 2, and c = 5
Substituting yields
5(7 5)
Mode 15
27 5 2
Page 3 of 16
16
...
Half of the observation will fall below this
median value and the other half above it
...
If the number of observations is even, then median is the
2
th
th
th
n
n
mean of and 1 observation
...
Median for ungrouped data
Example 3
Given the following data in a frequency table, find the median
...
e
...
To
2
2
find these observations we first find the cumulative frequencies in bold
...
Thus
Median
4400 4900
4650
2
Interpretation:
This means 50% of the workers get incomes that are less than $4650 and another 50% get an
income that is more than $4650
...
To use this formula, we calculate the cumulative frequencies and then identify the median class
n 1
which is the class containing the
observation
...
Marks, x
Frequency
0-10
2
10-20 20-30
12
22
30-40
8
40-50
6
Solution
First and for most don’t forget to order the data set (in this case data is already ordered)
...
Marks, x
0-10
Frequency
2
Cumulative Frequency
2
10-20 20-30
12
22
14
36
We use the formula
Page 5 of 16
30-40
8
44
40-50
6
50
=
+
Where 20 -30 is the median class, c =10, n =50, F (<) =14, f = 22,
Substituting these values we have
Median = 20 +
and
= 20
...
The advantage of the median is that it is unaffected by outliers and is a useful measure of
central tendency when the distribution of a random variable is severely skewed
...
It is best
suited as a central location measure for interval-scaled data such as rating scales
...
a
...
It is that observation which
separates the lower 25 percent of the observations from the top 75 percent of ordered
observations
...
Middle Quartile, Q2 is the second quartile (50th percentile) is the median
...
c
...
It is that observation that which
observations
...
The only difference lies in
(i) the identification of the quartile position, and (ii) the choice of the appropriate quartile
interval
...
The appropriate quartile interval is that interval
into which the quartile position falls
...
Page 6 of 16
Quartiles for ungrouped data
Consider Data from Example 3:
Exercise 1: Find Q1, Q2 and Q3
...
5th position
...
5th
=
observation falls within this class interval
...
Thus:
=
+
= 10 +
= 18
...
75marks
The Second Quartile, Q2 (Median)
Q2, use position
=
= 25th position
...
The formula for Q2 is
Page 7 of 16
=
+
= 25marks
The Upper Quartile, Q3:
Q3 position
=
= 37
...
Q3 interval = [30 - 40] because the 37
...
The formula for Q3 is
=
Where
is the Upper Quartile,
+
is the Lower limit of Q3 Interval (class), n is the sample
size (total number of observations), F(<) is the cumulative frequency of the interval before the
Q3 interval,
is the frequency of the Q3 interval and c is the width of the Q3 interval
...
875
Interpretation:
75 % of the students got below 31
...
Alternatively, 25% of the students got above
31
...
Percentiles
In general, any percentile value can be found by adjusting the median formula to:
Find the required percentile’s position and from this,
Establish the percentile interval
...
9 × n
35th percentile position = 0
...
25 × n
Uses of percentiles: to identify various non-central values
...
Page 8 of 16
SKEWNESS
Skewness is departure from symmetry
...
a
...
If mean < median < mode the frequency distribution is Negatively skewed (Skewed to the
left)
c
...
If a distribution is distorted by extreme values (i
...
skewed) then the median or the mode is
more representative than the mean
...
If the frequency distribution is skewed, the median may be the best measure of central
location as it is not pulled by extreme values, nor is it as highly influenced by the frequency
of occurrence
...
Frequency distributions
can be described as: leptokurtic, mesokurtic and platykurtic
...
e
...
Mesokurtic – moderately peaked distribution
Platykurtic – flat distribution (i
...
the observations are widely spread about the central
location)
...
EXERCISES
Exercise 2: The number of days in a year that employees in a certain company were away
from work due to illness is given in the following table:
Page 9 of 16
Sick days
5-6
7-8
9-10
11-12
Number of employees
67
91
67
5
Find the modal class and the modal days sick and interpret
...
Their seniority (in years
of service) and sex are listed below:
Sex
Seniority
(years)
F
8
M
15
F
6
M
2
F
9
M
21
M
9
F
3
F
4
F
7
F
2
M
10
a) Find the seniority mean, the seniority median and the seniority mode for the above data
...
c) Find the mode for the sex data
...
Measures of dispersion provide useful information with
which the reliability of the central value may be judged
...
Conversely, a high
concentration of observation about the central value increases confidence in the reliability and
representativeness of the central value
...
It
is calculated as:
Range = Xmax - Xmin for ungrouped data or
= Upper Limit (highest Class) – Lower Limit (lowest class)
The range is a crude estimate of spread
...
An outlier would be xmax or xmin
...
It also provides no information on the clustering of observations within the dataset
about a central value as it uses only two observations in its computation
...
Income/$
Number of Workers
3800 4100 4400 4900 5200 5500 6000
12
13
25
17
15
12
6
Solution: Range = Xmax – Xmin = 6000 – 3800 = 2200
...
This modified range is the deference between the upper and lower
quartiles
...
This measure of
dispersion, like the range, also provides no information on the clustering of observations within
the dataset as it uses only two observations
...
D
...
It is found by dividing the interquartile range in half
...
D
...
The quartile deviation is an appropriate measure of spread for the median
...
It
is a useful measure of spread if the sample of observations contains excessive outliers as it
ignores the top 25 percent and bottom 25 percent of the ranked observations
...
The variance is such a measure of dispersion
...
Step 1: Find the sample mean,
=
= 12 years
...
Car Ages
xi
13
7
10
15
12
18
9
12
12
12
12
12
12
12
Deviation
(xi - )
+1
-5
-2
+3
0
+6
-3
∑(xi - ) = 0
Squared Deviations
(xi - )2
1
25
4
9
0
36
9
(xi - )2 = 84
Step 3: Find the average squared deviation that is the Variance
S2 =
= 14 years2
Page 12 of 16
Note: Divison by n would appear logical, but the variance statistic would then be a biased
measure of dispersion
...
For large
samples (i
...
n > 30) however this distinction becomes less important
...
8 and ∑fx2 = 38450
=
=
Page 13 of 16
= 105
...
It is
expressed in squared units
...
To
provide meaning, the measure should be expressed in the original units of the random variable
...
The Standard deviation is the square root of the
variance that is s or sx
...
27
The standard deviation is a relatively stable measure of dispersion across different samples of
the same random variable
...
It describes how the
observations are spread about the mean
...
A coefficient of variation value close to zero indicates low variability and a tight
clustering of observations about the mean
...
From our example above, CV =
× 100% =
× 100% = 39
...
Exercise 4: Find the mean and the standard deviation for the following data which records the
duration of 20 telephone hotline calls on the 0772 line for advice on ‘car repairs’
...
60 per minute, what was the average cost of a call, and what was the total cost
paid by the 20 telephone callers
...
Exercise 5: Employee bonuses earned by workers at a furniture factory in a recent month
(US$) were:
47
25
51
29
31
28
30
39
42
62
43
53
33
29
72
61
58
65
73
52
51
46
37
35
Find the:
a) Mean and standard deviation of bonuses
...
c) Coefficient of variation and comment
...
Exercise 7:
Discuss briefly:
a) Which measure of dispersion would you use if the mean is used as the measure of central
location? Why?
b) Which measure of dispersion would you use if the median is used as a measure of central
location? Why?
c) The limitation of the range as a measure of dispersion
...
a) Outliers
b) Skewness
c) Kurtosis
Page 16 of 16
Title: MEASURES OF CENTRAL TENDENCY AND DISPERSION
Description: THESE ARE NOTES FOR INTRODUCTION TO STATISTICS AND THEY ARE VERY SIMPLIFIED
Description: THESE ARE NOTES FOR INTRODUCTION TO STATISTICS AND THEY ARE VERY SIMPLIFIED