Search for notes by fellow students, in your own course and all over the country.
Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.
Title: Descriptive statistics explained_Problems and Solutions
Description: A summary of descriptive statistics, covering: numerical or quantitative data, categorical or qualitative data, the graphical presentation of qualitative data, skewness, frequency polygon, geometric mean, arithmetic mean, mode, median, quantiles and standard deviation.
Description: A summary of descriptive statistics, covering: numerical or quantitative data, categorical or qualitative data, the graphical presentation of qualitative data, skewness, frequency polygon, geometric mean, arithmetic mean, mode, median, quantiles and standard deviation.
Document Preview
Extracts from the notes are below, to see the PDF you'll receive please use the links above
WKS114
DESCRIPTIVE STATISTICS
1
1
...
Data sets are often so large that it is
impossible to make sense from it, and therefore it must be summarized
...
1
...
The following table represents the lives of 20 similar car batteries recorded to the nearest tenth of a
year
...
2
4
...
5
4
...
2
3
...
6
3
...
3
3
...
5
4
...
4
3
...
9
3
...
1
3
...
4
3
...
Suppose 20 students were asked to evaluate their lecturer
...
The data in the previous two tables are also examples of ungrouped data
...
3 QUALITATIVE DATA
1
...
1 Organizing qualitative data:
Guidelines for constructing a frequency distribution table:
! The categories or classes of the variable are given in the first column
...
! The score of each class is given in the second column
...
These frequencies
represent the frequency distribution of the variable
...
f
! The relative frequencies i are given in the fourth column
...
i
f
! The percentage i x100 for each class is given in the fifth column
...
Category
Score
fi
Excellent (E)
Above average (AA)
Average (A)
Below average (BA)
Poor (P)
|||| ||
||||
||||
|||
|
7
5
4
3
1
n = 20
fi
n
0
...
25
0
...
15
0
...
1
...
2 Graphical presentation of qualitative data:
Bar chart:
A widely used form of graphic presentation of data is the bar chart
...
!
!
!
!
How to draw a bar chart:
Mark the categories on the x-axis;
Mark the frequencies on the y-axis;
Draw rectangles with heights equal to f;
Leave spaces between rectangles
...
The results are
given in the frequency distribution below
...
Number of times per week
Frequency
0
8
1
16
2
40
3
60
4
56
5
44
6
36
7
140
Bar Chart of Vegetable consumption
140
120
100
80
Frequency
60
40
20
0
0
1
3
4
5
2
Number of times per week
6
7
▄
Pie chart:
A pie chart is a circle divided into portions according to the relative frequency or percentage
of each category
...
Give a visual representation of the
data by using a Pie chart
...
BA
15%
P
5%
E
35%
A
20%
AA
25%
▄
1
...
4
...
! Let n denote the total number of observations
...
3 (log10 n)
The answer must be rounded to the next integer
...
! Determine the intervals
...
Take the lower limit of the first interval as the first integer ≤ x min
...
! Find the frequency ( f i ) for each interval
...
! Find the cumulative frequency (Fi ) for each interval
...
n
Example:
Consider the following marks of students:
68
42
62
52
55
54
47
60
64
57
83
61
65
90
74
77
58
50
40
69
64
59
53
76
79
45
63
80
61
97
! r = 97 − 40 = 57
! k = 1 + 3
...
87
k is taken as 6
57
= 9
...
1333
0
...
3333
0
...
0667
0
...
4
...
•
Plot the intervals as specified by the lower and upper limits on the x-axis
...
•
Use the intervals (on the x-axis) as basis and draw rectangles with heights equal to
corresponding frequencies of the intervals
...
Example:
Consider the frequency table of the marks of the students and draw a histogram
...
7
Example:
11
10
9
8
7
6
5
4
3
2
1
0
20
40
60
80
▄
Positively skewed:
A histogram or distribution is said to be skewed to the right, or positively skewed, if it has a
long right tail compared to a much shorter left tail
...
Example:
11
10
9
8
7
6
5
4
3
2
1
0
10
20
30
40
50
60
70
▄
! Frequency polygon:
•
Plot the class midpoint on the x-axis and the frequencies on the y-axis
...
Do the same on the right
...
•
Join points by straight lines
...
9
Example:
Frequency polygon of the marks of 30 students
12
10
8
Frequency
6
4
2
0
30
40
50
60
70
Marks
80
90
100
110
80
90
100
110
OR
12
10
8
Frequency
6
4
2
0
30
40
50
60
70
Marks
▄
10
! Cumulative frequency polygon:
•
•
•
•
•
Mark the intervals as specified by the lower and upper limits on the x-axis
...
Mark the points (upper limit, Fi ) for each interval and join points with straight lines
...
Full caption and indicate the x-axis and y-axis
...
▄
! Relative frequency polygon:
•
Mark the class midpoint on the x-axis and the relative frequencies ( f i / n ) on the y-axis
...
•
Plot the corresponding relative frequency ( f i / n ) above each midpoint
...
•
Full caption and indicate what the x- and y-axes represent
...
35
0
...
25
0
...
15
0
...
05
0
30
40
50
60
70
80
90
100
110
Marks
▄
Relative frequency distributions can be used to compare data sets for which the number of
observations are different
...
Example:
The following frequency distributions give the monthly income of a sample of fulltime
employees by gender
...
Income
[0 ; 500)
[500 ; 1000)
[1000 ; 1500)
[1500 ; 2000)
[2000 ; 2500)
[2500 ; 3000)
xi
250
750
1250
1750
2250
2750
fF
125
326
185
81
27
13
n F = 757
12
fM
332
476
354
143
72
51
n M = 1428
f F / nF
0
...
4306
0
...
1070
0
...
0172
1
f M / nM
0
...
3333
0
...
1002
0
...
0357
1
Relative frequency polygon of monthly income of employees by gender
...
45
Female
Male
0
...
35
0
...
25
Relative
frequency
0
...
15
0
...
05
0
-500
0
500
1000
1500
2000
2500
3000
3500
Income
▄
1
...
However, it gives only an approximate indication of specific properties such as the midpoint
and spread of the data
...
Each of these properties of shape can be described by a single numerical value known as
descriptive measures
...
5
...
5
...
1 Measures of location for ungrouped data:
! Arithmetic mean: x =
1
n
n
∑x
i
i =1
Example:
The wing lengths (in cm) of five deadly bees are as follows:
8
...
51
8
...
56
8
...
1
(42
...
524
5
x=
▄
Example:
The six figures below reflect the maximum monthly temperatures for October to March in
Upington:
39
...
9
42
...
7
41
...
7
Calculate the mean maximum temperature for these months
...
2) = 40
...
The mean will then be equal to the mean of the new
data set plus the constant a
...
The mean of data set i is denoted by xi and the size of data set i by
ni , i = 1, K , k
...
These students wrote a test and the averages (means) of the respective groups
were 66%, 52% and 70%
...
xω =
=
n1 x1 + n2 x 2 + n3 x3
n1 + n2 + n3
n3
n1
n2
x1 +
x2 +
x3
n1 + n2 + n3
n1 + n2 + n3
n1 + n2 + n3
= 0,25 x1 + 0,4 x 2 + 0,35 x3
= (0,25)(66) + (0,4)(52) + (0,35)(70)
= 61,8
▄
Let Y1 , Y2 , K , Yn denote n numbers with the relative importance of w1 , w2 , K , wn
...
At the end of 1992 the net full supply capacity
(FSC) in millions of cubic metres and the percentage content of these dams were as follows:
DAM
Vaal Dam
Bloemhof Dam
Sterkfontein Dam
Net FSC
2529
1269
2617
15
%-CONTENT
20
20
99
What is the average percentage content of these dams?
yω
=
(2529)(20) + (1269)(20) + (2617)(99)
2529 + 1269 + 2617
=
2529
1269
2617
(20) +
(20) +
(99)
6415
6415
6415
= 0,3942(20) + 0,1978(20) + 0,408(99)
= 52,232
▄
n
! Geometric mean: Geo
...
09%
8
...
54%
8
...
62%
Calculate the geometric average annual interest rate
...
mean = 5 0
...
0831 * 0
...
0886 * 0
...
0884 = 8
...
Equivalent Fixed Rate = 5 1
...
0831 * 1
...
0886 * 1
...
0888 = 8
...
•
Arrange the observations from smallest to largest and determine the mode by inspection
...
Then
10 10 11 12 15 15 15
Thus the mode is 15
▄
16
20
19
19
20
Example:
Suppose that in a random sample of fifty female clients from a particular shoe shop, the
following distribution of shoe sizes is found
...
Shoe size
Number of clients
5
2
6
8
7
10
8
16
9
8
10
6
The modal shoe size is 8
▄
! The median (me )
The median is the middle value of a number of observations arranged from smallest to
n +1
largest
...
•
•
•
arrange the data in ascending order
n +1
determine the position of the median:
2
n +1
median = value of the
th observation
...
7 77
...
2 68
...
6 111
...
5
Determine the median of the data set
...
6 68
...
7 77
...
5 99
...
0
n +1
Position of me :
=4
2
me = 77
...
•
400
425
445
450
475
498
500
17
540
550
650
725
825
•
Position of me :
•
me =
n + 1 13
=
= 6
...
To answer the following question we have to look at another measure:
Is a particular person in the top 25% of the economical active population?
QUANTILES
quartiles
deciles
percentiles
4 equal parts
(q1 , q 2 , q 3 )
10 equal parts
( d1 , d 2 , K , d 9 )
100 equal parts
( p1 , p 2 , K p 99 )
Relation between quantiles
me
q1
q2
q3
d1
d2
d3
d4
d5
d6
d7
d8
d9
p10
p20
p30
p40
p50
p60
p70
p80
p90
p25
p75
The calculation of quantiles:
•
•
•
Arrange the data in ascending order
...
i (n + 1)
i (n + 1)
If 1 ≤
≤ n, the i − th percentile can be determined as the
-th value in
100
100
the data array
...
• 3; 4; 4; 5; 5; 7; 9; 11; 13; 15; 16; 17
i)
• q1 = p 25
•
i (n + 1) 25(13)
=
= 3
...
25(obs (4) − obs (3))
= 4 + 0
...
25
= 4
...
4
100
100
p80 = obs (10) + 0
...
4(16 − 15)
= 15 + 0
...
4
▄
19
1
...
1
...
Once the data
has been grouped it is usually assumed that the original set of ungrouped data (raw data) is
no longer available and that all values within an interval (class) can be represented by the
class midpoint
...
Calculation of x :
i)
ii)
Calculate the class midpoints;
Calculate the product f i xi ;
iii)
iv)
Calculate the sum of the f i xi ' s ;
Divide by n
...
∑fx
i
x=
=
i
i
n
1930
30
= 64
...
b) What percentage of the families spend less than R270 on insurance?
c) Calculate the mean amount per month spend on insurance
...
This interval is called the modal interval
...
The modal interval is [60 ; 70)
Thus
(k − k l )( f m − f m−1 )
mo = k l + u
(2 f m − f m −1 − f m +1 )
= 60 +
(70 − 60)(10 − 8)
(2(10) − 8 − 4)
= 60 +
(10)(2)
8
= 60 + 2
...
5
The mode can also be determined graphically by using a histogram
...
Costs
frequency ( f i )
5
23
14
12
3
7
2
n = 66
[160 ; 215)
[215 ; 270)
[270 ; 325)
[325 ; 380)
[380 ; 435)
[435 ; 490)
[490 ; 545)
Calculate the modal cost (mode) per month
...
6667
27
= 251
...
A vertical line through the quantiles divide the area of the
histogram of a grouped data set into 2, 4, 10 or 100 equal parts
...
100
23
Then
in
(k u − k l )
− Fkl
100
pi = k l +
f pi
where
k l = Lower limit of interval containing p i
k u = Upper limit of interval containing pi
f pi = Frequency of interval containing p i
Fkl = Cumulative frequency of interval before interval containing p i
...
i) p50
in
(50)(30)
=
= 15
100
100
iii) Interval in which p50 falls: [60 ; 70)
ii) Position of p50 :
in
(k u − k l )
− Fkl
100
iv) pi = k l +
f pi
p50 = 60 +
= 60 +
(70 − 60)(15 − 12)
10
(10)(3)
10
= 63
▄
24
Example:
Consider the frequency distribution given below, showing the monthly expenditure
(R/month) on insurance of 66 families
...
2
100
100
iii) Interval in which p 20 falls: [215 ; 270)
in
(k u − k l )
− Fkl
100
iv) pi = k l +
f pi
p 20 = 215 +
= 215 +
(270 − 215)(13
...
2)
23
= 215 + 19
...
6087
The median can also be determined graphically by using a cumulative frequency polygon
...
The value on the horizontal
2
axis corresponding to this point is the median
...
30
25
20
Cumulative
frequency
15
10
5
0
40
50
60
median
70
80
90
100
Marks
▄
1
...
2 Measures of spread:
If only the mean of a data set is known, we do not know how the observations are spread
about the mean
...
Consider rainfall figures for Pretoria and Durban:
Month Nov
Pretoria 119
...
9
Dec
119
...
6
Jan
134
...
1
Feb
113
...
7
March
94
...
7
According to the means there is apparently a strong similarity:
Pretoria: x P = 116
...
6
The next graph, however, illustrates the difference between the rainfall figures by taking the
spread into account
...
5
...
1 Measures of spread for ungrouped data:
! The range (r )
•
•
It is the simplest measure of spread
It is based on only two observations
r = x max − x min
Example:
On 2 June the highest temperature in SA was 23°C (in Durban) and the lowest 2°C (in
Bloemfontein)
...
10
7
...
65
8
...
20
6
...
75
Calculate the range
...
6 − 6
...
4
27
! The standard deviation (s):
The digits 10, 20, 30, 40 and 50 have a mean of 30
...
In each case the difference is their deviation, in other words we have
10 – 30 = -20
20 – 30 = -10
30 – 30 = 0
40 – 30 = 10
50 – 30 = 20
→
→
→
→
→
the first value is 20 below x
the second value is 10 below x
the third value is x
the fourth value is 10 above x
the fifth value is 20 above x
If the deviations are combined by means of a special mean, we obtain a traditional measure
of spread referred to as the standard deviation
...
The standard deviation can also be written as
n
∑ xi
n
i =1
2
xi −
∑
n
i =1
n −1
s=
2
Proof:
n
∑ (x
− x)2
i
i =1
s=
n −1
n
s2 =
∑ (x
i =1
n −1
n
∑ (x
=
2
i
− 2 x xi + x 2 )
i =1
n −1
n
∑x
=
− x)2
i
i =1
2
i
n
− 2 x ∑ x i + nx 2
i =1
n −1
28
2
2 n
n n
x
−
x
+
∑ xi
∑
∑ i
n i =1 n 2 i =1
i =1
=
n −1
n
2
2 n
1 n
x
−
x
+
xi
∑
∑ i n ∑
n i =1
i =1
i =1
=
n −1
n
2
2
i
n
∑ xi
n
2
xi − i =1
∑
n
= i =1
n −1
2
n
∑ xi
n
2
i =1
xi −
∑
n
i =1
n −1
s=
2
2
i
2
Properties of the standard deviation:
•
The standard deviation is based on all values in the data set and is the most important
measure of spread
...
•
The larger the value of s , the further the observations are from x
...
Example:
Consider the scores obtained by seven gymnasts:
8
...
10
6
...
60
6
...
55
7
...
2
n
∑ xi
n
i =1
2
xi −
∑
n
s 2 = i =1
n −1
(50
...
6075 −
7
=
6
= 0
...
8911
29
1
...
2
...
range:
r = 100 − 40 = 60
standard deviation:
k
s=
∑
i =1
k
∑ f i xi
i =1
f i xi2 −
n
n −1
2
(1930) 2
30
29
129550 −
=
= 13
Title: Descriptive statistics explained_Problems and Solutions
Description: A summary of descriptive statistics, covering: numerical or quantitative data, categorical or qualitative data, the graphical presentation of qualitative data, skewness, frequency polygon, geometric mean, arithmetic mean, mode, median, quantiles and standard deviation.
Description: A summary of descriptive statistics, covering: numerical or quantitative data, categorical or qualitative data, the graphical presentation of qualitative data, skewness, frequency polygon, geometric mean, arithmetic mean, mode, median, quantiles and standard deviation.