Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

You have nothing in your shopping cart yet.

Title: Descriptive statistics explained_Problems and Solutions
Description: A summary of descriptive statistics, covering: numerical or quantitative data, categorical or qualitative data, the graphical presentation of qualitative data, skewness, frequency polygon, geometric mean, arithmetic mean, mode, median, quantiles and standard deviation.

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above


WKS114
DESCRIPTIVE STATISTICS

1

1
...
Data sets are often so large that it is
impossible to make sense from it, and therefore it must be summarized
...

1
...

The following table represents the lives of 20 similar car batteries recorded to the nearest tenth of a
year
...
2

4
...
5

4
...
2

3
...
6

3
...
3

3
...
5

4
...
4

3
...
9

3
...
1

3
...
4

3
...

Suppose 20 students were asked to evaluate their lecturer
...

The data in the previous two tables are also examples of ungrouped data
...
3 QUALITATIVE DATA
1
...
1 Organizing qualitative data:
Guidelines for constructing a frequency distribution table:
! The categories or classes of the variable are given in the first column
...

! The score of each class is given in the second column
...
These frequencies
represent the frequency distribution of the variable
...

f 
! The relative frequencies  i  are given in the fourth column
...


i

f

! The percentage  i x100  for each class is given in the fifth column
...


Category

Score

fi

Excellent (E)
Above average (AA)
Average (A)
Below average (BA)
Poor (P)

|||| ||
||||
||||
|||
|

7
5
4
3
1
n = 20

fi
n
0
...
25
0
...
15
0
...


1
...
2 Graphical presentation of qualitative data:
Bar chart:
A widely used form of graphic presentation of data is the bar chart
...


!
!
!
!

How to draw a bar chart:
Mark the categories on the x-axis;
Mark the frequencies on the y-axis;
Draw rectangles with heights equal to f;
Leave spaces between rectangles
...
The results are
given in the frequency distribution below
...

Number of times per week
Frequency

0
8

1
16

2
40

3
60

4
56

5
44

6
36

7
140

Bar Chart of Vegetable consumption
140

120

100

80
Frequency
60

40

20

0

0

1

3
4
5
2
Number of times per week

6

7


Pie chart:
A pie chart is a circle divided into portions according to the relative frequency or percentage
of each category
...
Give a visual representation of the
data by using a Pie chart
...


BA
15%

P
5%
E
35%

A
20%
AA
25%


1
...
4
...

! Let n denote the total number of observations
...
3 (log10 n)
The answer must be rounded to the next integer
...

! Determine the intervals
...


Take the lower limit of the first interval as the first integer ≤ x min
...

! Find the frequency ( f i ) for each interval
...

! Find the cumulative frequency (Fi ) for each interval
...

n
Example:
Consider the following marks of students:
68
42
62

52
55
54

47
60
64

57
83
61

65
90
74

77
58
50

40
69
64

59
53
76

79
45
63

80
61
97

! r = 97 − 40 = 57
! k = 1 + 3
...
87
k is taken as 6

57
= 9
...
1333
0
...
3333
0
...
0667
0
...
4
...



Plot the intervals as specified by the lower and upper limits on the x-axis
...




Use the intervals (on the x-axis) as basis and draw rectangles with heights equal to
corresponding frequencies of the intervals
...


Example:
Consider the frequency table of the marks of the students and draw a histogram
...


7

Example:

11
10
9
8
7
6

5
4
3
2
1

0

20

40

60

80


Positively skewed:
A histogram or distribution is said to be skewed to the right, or positively skewed, if it has a
long right tail compared to a much shorter left tail
...

Example:

11
10
9
8
7
6
5
4
3
2
1
0
10

20

30

40

50

60

70


! Frequency polygon:


Plot the class midpoint on the x-axis and the frequencies on the y-axis
...

Do the same on the right
...




Join points by straight lines
...


9

Example:
Frequency polygon of the marks of 30 students
12

10

8
Frequency
6

4

2

0
30

40

50

60

70
Marks

80

90

100

110

80

90

100

110

OR
12

10

8
Frequency
6

4

2

0
30

40

50

60

70
Marks



10

! Cumulative frequency polygon:






Mark the intervals as specified by the lower and upper limits on the x-axis
...

Mark the points (upper limit, Fi ) for each interval and join points with straight lines
...

Full caption and indicate the x-axis and y-axis
...


! Relative frequency polygon:


Mark the class midpoint on the x-axis and the relative frequencies ( f i / n ) on the y-axis
...




Plot the corresponding relative frequency ( f i / n ) above each midpoint
...




Full caption and indicate what the x- and y-axes represent
...
35

0
...
25

0
...
15

0
...
05

0
30

40

50

60

70

80

90

100

110

Marks



Relative frequency distributions can be used to compare data sets for which the number of
observations are different
...

Example:
The following frequency distributions give the monthly income of a sample of fulltime
employees by gender
...

Income
[0 ; 500)
[500 ; 1000)
[1000 ; 1500)
[1500 ; 2000)
[2000 ; 2500)
[2500 ; 3000)

xi
250
750
1250
1750
2250
2750

fF
125
326
185
81
27
13
n F = 757

12

fM
332
476
354
143
72
51
n M = 1428

f F / nF
0
...
4306
0
...
1070
0
...
0172
1

f M / nM
0
...
3333
0
...
1002
0
...
0357
1

Relative frequency polygon of monthly income of employees by gender
...
45
Female
Male

0
...
35
0
...
25
Relative
frequency
0
...
15
0
...
05
0
-500

0

500

1000

1500

2000

2500

3000

3500

Income


1
...
However, it gives only an approximate indication of specific properties such as the midpoint
and spread of the data
...

Each of these properties of shape can be described by a single numerical value known as
descriptive measures
...
5
...
5
...
1 Measures of location for ungrouped data:
! Arithmetic mean: x =

1
n

n

∑x

i

i =1

Example:
The wing lengths (in cm) of five deadly bees are as follows:
8
...
51

8
...
56

8
...

1
(42
...
524
5

x=


Example:
The six figures below reflect the maximum monthly temperatures for October to March in
Upington:
39
...
9

42
...
7

41
...
7

Calculate the mean maximum temperature for these months
...
2) = 40
...
The mean will then be equal to the mean of the new
data set plus the constant a
...
The mean of data set i is denoted by xi and the size of data set i by
ni , i = 1, K , k
...
These students wrote a test and the averages (means) of the respective groups
were 66%, 52% and 70%
...


xω =

=

n1 x1 + n2 x 2 + n3 x3
n1 + n2 + n3
n3
n1
n2
x1 +
x2 +
x3
n1 + n2 + n3
n1 + n2 + n3
n1 + n2 + n3

= 0,25 x1 + 0,4 x 2 + 0,35 x3
= (0,25)(66) + (0,4)(52) + (0,35)(70)
= 61,8


Let Y1 , Y2 , K , Yn denote n numbers with the relative importance of w1 , w2 , K , wn
...
At the end of 1992 the net full supply capacity
(FSC) in millions of cubic metres and the percentage content of these dams were as follows:

DAM
Vaal Dam
Bloemhof Dam
Sterkfontein Dam

Net FSC
2529
1269
2617
15

%-CONTENT
20
20
99

What is the average percentage content of these dams?


=

(2529)(20) + (1269)(20) + (2617)(99)
2529 + 1269 + 2617

=

2529
1269
2617
(20) +
(20) +
(99)
6415
6415
6415

= 0,3942(20) + 0,1978(20) + 0,408(99)
= 52,232


n

! Geometric mean: Geo
...
09%

8
...
54%

8
...
62%

Calculate the geometric average annual interest rate
...
mean = 5 0
...
0831 * 0
...
0886 * 0
...
0884 = 8
...

Equivalent Fixed Rate = 5 1
...
0831 * 1
...
0886 * 1
...
0888 = 8
...




Arrange the observations from smallest to largest and determine the mode by inspection
...
Then
10 10 11 12 15 15 15
Thus the mode is 15


16

20

19

19

20

Example:
Suppose that in a random sample of fifty female clients from a particular shoe shop, the
following distribution of shoe sizes is found
...

Shoe size
Number of clients

5
2

6
8

7
10

8
16

9
8

10
6

The modal shoe size is 8

! The median (me )
The median is the middle value of a number of observations arranged from smallest to
 n +1
largest
...





arrange the data in ascending order
 n +1
determine the position of the median: 

 2 
 n +1
median = value of the 
 th observation
...
7 77
...
2 68
...
6 111
...
5
Determine the median of the data set
...
6 68
...
7 77
...
5 99
...
0
n +1
Position of me :
=4
2
me = 77
...



400

425

445

450

475

498

500

17

540

550

650

725

825



Position of me :



me =

n + 1 13
=
= 6
...
To answer the following question we have to look at another measure:
Is a particular person in the top 25% of the economical active population?

QUANTILES
quartiles

deciles

percentiles

4 equal parts
(q1 , q 2 , q 3 )

10 equal parts
( d1 , d 2 , K , d 9 )

100 equal parts
( p1 , p 2 , K p 99 )

Relation between quantiles
me
q1

q2

q3

d1

d2

d3

d4

d5

d6

d7

d8

d9

p10

p20

p30

p40

p50

p60

p70

p80

p90

p25

p75

The calculation of quantiles:




Arrange the data in ascending order
...

i (n + 1)
i (n + 1)
If 1 ≤
≤ n, the i − th percentile can be determined as the
-th value in
100
100
the data array
...


• 3; 4; 4; 5; 5; 7; 9; 11; 13; 15; 16; 17
i)
• q1 = p 25


i (n + 1) 25(13)
=
= 3
...
25(obs (4) − obs (3))
= 4 + 0
...
25
= 4
...
4
100
100

p80 = obs (10) + 0
...
4(16 − 15)
= 15 + 0
...
4


19

1
...
1
...
Once the data
has been grouped it is usually assumed that the original set of ungrouped data (raw data) is
no longer available and that all values within an interval (class) can be represented by the
class midpoint
...

Calculation of x :
i)
ii)

Calculate the class midpoints;
Calculate the product f i xi ;

iii)
iv)

Calculate the sum of the f i xi ' s ;
Divide by n
...


∑fx
i

x=

=

i

i

n
1930
30

= 64
...

b) What percentage of the families spend less than R270 on insurance?
c) Calculate the mean amount per month spend on insurance
...
This interval is called the modal interval
...

The modal interval is [60 ; 70)
Thus
(k − k l )( f m − f m−1 )
mo = k l + u
(2 f m − f m −1 − f m +1 )
= 60 +

(70 − 60)(10 − 8)
(2(10) − 8 − 4)

= 60 +

(10)(2)
8

= 60 + 2
...
5

The mode can also be determined graphically by using a histogram
...

Costs

frequency ( f i )
5
23
14
12
3
7
2
n = 66

[160 ; 215)
[215 ; 270)
[270 ; 325)
[325 ; 380)
[380 ; 435)
[435 ; 490)
[490 ; 545)

Calculate the modal cost (mode) per month
...
6667
27
= 251
...
A vertical line through the quantiles divide the area of the
histogram of a grouped data set into 2, 4, 10 or 100 equal parts
...

100

23

Then
 in

(k u − k l )
− Fkl 
 100

pi = k l +
f pi
where
k l = Lower limit of interval containing p i
k u = Upper limit of interval containing pi
f pi = Frequency of interval containing p i
Fkl = Cumulative frequency of interval before interval containing p i
...

i) p50

in
(50)(30)
=
= 15
100
100
iii) Interval in which p50 falls: [60 ; 70)

ii) Position of p50 :

 in

(k u − k l )
− Fkl 
 100

iv) pi = k l +
f pi
p50 = 60 +
= 60 +

(70 − 60)(15 − 12)
10
(10)(3)
10

= 63


24

Example:
Consider the frequency distribution given below, showing the monthly expenditure
(R/month) on insurance of 66 families
...
2
100
100

iii) Interval in which p 20 falls: [215 ; 270)

 in

(k u − k l )
− Fkl 
 100

iv) pi = k l +
f pi
p 20 = 215 +
= 215 +

(270 − 215)(13
...
2)
23

= 215 + 19
...
6087
The median can also be determined graphically by using a cumulative frequency polygon
...
The value on the horizontal
2
axis corresponding to this point is the median
...

30

25

20
Cumulative
frequency
15

10

5

0
40

50

60
median

70

80

90

100

Marks


1
...
2 Measures of spread:
If only the mean of a data set is known, we do not know how the observations are spread
about the mean
...


Consider rainfall figures for Pretoria and Durban:
Month Nov
Pretoria 119
...
9

Dec
119
...
6

Jan
134
...
1

Feb
113
...
7

March
94
...
7

According to the means there is apparently a strong similarity:
Pretoria: x P = 116
...
6

The next graph, however, illustrates the difference between the rainfall figures by taking the
spread into account
...
5
...
1 Measures of spread for ungrouped data:
! The range (r )



It is the simplest measure of spread
It is based on only two observations

r = x max − x min
Example:
On 2 June the highest temperature in SA was 23°C (in Durban) and the lowest 2°C (in
Bloemfontein)
...
10

7
...
65

8
...
20

6
...
75

Calculate the range
...
6 − 6
...
4

27

! The standard deviation (s):
The digits 10, 20, 30, 40 and 50 have a mean of 30
...
In each case the difference is their deviation, in other words we have
10 – 30 = -20
20 – 30 = -10
30 – 30 = 0
40 – 30 = 10
50 – 30 = 20







the first value is 20 below x
the second value is 10 below x
the third value is x
the fourth value is 10 above x
the fifth value is 20 above x

If the deviations are combined by means of a special mean, we obtain a traditional measure
of spread referred to as the standard deviation
...

The standard deviation can also be written as

 n 
 ∑ xi 
n
i =1

2
xi − 

n
i =1
n −1

s=

2

Proof:
n

∑ (x

− x)2

i

i =1

s=

n −1
n

s2 =

∑ (x
i =1

n −1
n

∑ (x
=

2
i

− 2 x xi + x 2 )

i =1

n −1
n

∑x
=

− x)2

i

i =1

2
i

n

− 2 x ∑ x i + nx 2
i =1

n −1

28

2

2 n 
n  n 
x

x
+


 ∑ xi 

∑ i
n  i =1  n 2  i =1 
i =1
=
n −1
n

2

2 n 
1 n 
x

x
+
xi 



∑ i n  ∑
n  i =1 
i =1
i =1

=
n −1
n

2

2
i

 n 
 ∑ xi 
n
2
xi −  i =1 

n
= i =1
n −1

2

 n 
 ∑ xi 
n
2
i =1

xi − 

n
i =1
n −1

s=

2

2
i

2

Properties of the standard deviation:


The standard deviation is based on all values in the data set and is the most important
measure of spread
...




The larger the value of s , the further the observations are from x
...


Example:
Consider the scores obtained by seven gymnasts:
8
...
10

6
...
60

6
...
55

7
...

2

 n 
 ∑ xi 
n
i =1

2
xi − 

n
s 2 = i =1
n −1
(50
...
6075 −
7
=
6
= 0
...
8911

29

1
...
2
...

range:
r = 100 − 40 = 60
standard deviation:
k

s=


i =1

 k

 ∑ f i xi 
i =1

f i xi2 − 
n
n −1

2

(1930) 2
30
29

129550 −

=
= 13
Title: Descriptive statistics explained_Problems and Solutions
Description: A summary of descriptive statistics, covering: numerical or quantitative data, categorical or qualitative data, the graphical presentation of qualitative data, skewness, frequency polygon, geometric mean, arithmetic mean, mode, median, quantiles and standard deviation.