Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

You have nothing in your shopping cart yet.

Title: Descriptive Statistical tools
Description: Statistical tools

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above


Descriptive Statistical Tools

Outline of Discussion


Lesson Proper
◦ Descriptive Statistics
 Data Organization
 Data Analysis
 Statistical Measures



Case Study

Outline of Discussion


Lesson Proper
◦ Descriptive Statistics
 Data Organization
 Data Analysis
 Statistical Measures



Case Study

Descriptive Statistics


What is Descriptive Statistics?
◦ Also known as deductive statistics
◦ Deals with gathering, classification, and
presentation of data
◦ Summarizes values to describe group
characteristics of data

Descriptive Statistics


Data Presentation
◦ Presenting data in tabular or graphical form is
not enough to get all the relevant information
◦ Data must be organized and analysis must be
readily made
 Data organization tools
 Frequency distribution table, histogram, ogive

 Data analysis tools
 Stem-and-leaf diagram, boxplot, time-series plot, probability
plot, scatter plot

Descriptive Statistics


Data Presentation
◦ The common statistics included may also not
be adequate to describe data
◦ There are many more measures used
 Measures of central tendency
 Mean, trimmed mean, median, mode

 Measures of dispersion
 Standard deviation, variance, range, interquartile range, mean
absolute deviation, coefficient of dispersion

 Measures on individual data points
 Standard deviation unit, standard score

Descriptive Statistics


Data Presentation
◦ The common statistics included may also not
be adequate to describe data
◦ There are many more measures used
 Measures of location
 Percentile

 Measures of skewness and kurtosis
 Coefficient of skewness, coefficient of peakedness

 Measure of linear relationship
 Correlation coefficient, Pearson’s r coefficient

Outline of Discussion


Lesson Proper
◦ Descriptive Statistics
 Data Organization
 Data Analysis
 Statistical Measures



Case Study

Descriptive Statistics


Data Organization
◦ Frequency Distribution Table
 How to group data
 Original data presention
 Number of cells
 Cell width

 Cell boundaries

Long Exam 1 Scores
71
48
...
25
65
...
75
53
...
75
51
...
25
65
...
5
63
...
75
66
...
25
66
...
75
67
...
75
54
...
25
42
...
5
67
...
25
61
...
75
66
...
75
49
...
75
60
...
5
54
...
25
58
...
75
56
...
25
63
...
25
55
...
5
49
...
5
59
...
5
69
...
5
79
...
5
49
...
5
59
...
5
69
...
5
79
...
5
49
...
5
59
...
5
69
...
5
79
...
5
49
...
5
59
...
5
69
...
5
79
...
5
49
...
5
59
...
5
69
...
5
79
...
5
49
...
5
59
...
5
69
...
5
79
...
5
49
...
5
59
...
5
69
...
5
79
...
0943
0
...
1509
0
...
1887
0
...
0755
0
...
0943
0
...
3585
0
...
7547
0
...
9623
1

Descriptive Statistics


Data Organization
◦ Cumulative Frequency Distribution Table
 Can answer the ff
...
5
49
...
5
59
...
5
69
...
5
79
...
5
49
...
5
59
...
5
69
...
5
79
...
0943
0
...
1509
0
...
1887
0
...
0755
0
...
0943
0
...
3585
0
...
7547
0
...
9623
1

Descriptive Statistics


Data Organization
◦ Ogive
 Line graph of CFD





Cell Boundaries
[42 - 47)
[47 - 52)
[52 - 57)
[57 - 62)
[62 - 67)
[67 - 72)
[72 - 77)
[77 - 82]

For ≤ CFD, x-axis is upper cell boundaries
For ≥ CFD, x-axis is lower cell boundaries
Y-axis is cumulative frequency
Y-axis can also be relative frequency

Cumulative Frequency Distribution Table
fi
xi
rel fi
≤ CFD
≥ CFD
5
6
8
11
10
7
4
2

44
...
5
54
...
5
64
...
5
74
...
5

0
...
1132
0
...
2075
0
...
1321
0
...
0377

5
11
19
30
40
47
51
53

53
48
42
34
23
13
6
2

rel CFD
0
...
2075
0
...
566
0
...
8868
0
...
5
49
...
5
59
...
5
69
...
5
79
...
0943
0
...
1509
0
...
1887
0
...
0755
0
...
0943
0
...
3585
0
...
7547
0
...
9623
1

Ogive for relative CFD
1
...
8
0
...
4
0
...
5
49
...
5
59
...
5
69
...
5
79
...
0943
0
...
1509
0
...
1887
0
...
0755
0
...
0943
0
...
3585
0
...
7547
0
...
9623
1

Ogive for ≥ CFD
60
50
40
30
20
10
0
42

47

52

57

62

67

72

77

Descriptive Statistics


Data Organization
◦ Ogive
 Line graph of CFD





Cell Boundaries
[42 - 47)
[47 - 52)
[52 - 57)
[57 - 62)
[62 - 67)
[67 - 72)
[72 - 77)
[77 - 82]

For ≤ CFD, x-axis is upper cell boundaries
For ≥ CFD, x-axis is lower cell boundaries
Y-axis is cumulative frequency
Y-axis can also be relative frequency

Cumulative Frequency Distribution Table
fi
xi
rel fi
≤ CFD
≥ CFD
5
6
8
11
10
7
4
2

44
...
5
54
...
5
64
...
5
74
...
5

0
...
1132
0
...
2075
0
...
1321
0
...
0377

5
11
19
30
40
47
51
53

53
48
42
34
23
13
6
2

rel CFD
0
...
2075
0
...
566
0
...
8868
0
...
g
...
g
...
5

 Get quartile 3 (75th percentile)
 V Quartile 1 = 68, W Quartile 1 = 68
...
5
V Quartile 3 = 68, W Quartile 3 = 68
...
g
...
)
 Y-axis: value of variable

Descriptive Statistics
Data Analysis
◦ Time-series Plot
 How to interpret a time-series plot
 Example: Average weight of manufactured 100g potato chip
packs per day
 Mean is around 100g
 No pattern meaning random
 Considering acceptance limits, a few less than 97g or more than 103g
are rejected
Time-series Plot
106

Average Weight



104
102
100
98
96
94
92
1

3

5

7

9

11 13 15 17 19 21 23 25 27 29
Day

Descriptive Statistics
Data Analysis
◦ Time-series Plot
 How to interpret a time-series plot
 Example: Average weight of manufactured 100g potato chip
packs per day of another company
 Mean is around 100g (at first)
 Downward trend (something might be wrong with manufacturing)
 Considering acceptance limits, many are being rejected

Time-series Plot
Average Weight



104
102
100
98
96
94
92
90
88
1

3

5

7

9

11 13 15 17 19 21 23 25 27 29
Day

Descriptive Statistics
Data Analysis
◦ Time-series Plot
 How to interpret a time-series plot
 Example: Monthly sales of jackets of an apparel store
 Mean is around 80 units
 Cyclic pattern (there might be a reason to this cycle)
 Monthly sales of jackets increase when nearing 12th and 24th month
(because people are buying more jackets during the Ber months)

Time-series Plot
120
100

Sales



80
60
40
20
0
1

3

5

7

9 11 13 15 17 19 21 23 25 27 29
Month

Descriptive Statistics


Data Analysis
◦ Time-series Plot
 How to interpret a time-series plot
 Example: Price of ABS-CBN shares over 20 years

Descriptive Statistics


Data Analysis
◦ Time-series Plot
 How to interpret a time-series plot
 Example: Price of TEL (PLDT) shares over 5 years

Descriptive Statistics


Data Analysis
◦ Probability Plot
 How to construct a probability plot





Normal probability plot is most commonly used
Long Exam 1 Scores (rounded off)
Original data in tabular form
71
49
70
Data is sorted in increasing order
46
65
61
70
61
53
Each data is plotted against:
49
59
51
46
63
63
46
60
55
43
68
57
50

76
65
64
63
68
76
52
76
67

58
55
67
66
52
57
73
62
59

72
60
43
55
60
58
71
56
79
53
63
78
56

Descriptive Statistics


Data Analysis
◦ Probability Plot
 How to interpret a probability plot
 If the points lie in a straight line, distribution is correct
 Since this is a normal probability plot, distribution is normal

Descriptive Statistics


Data Analysis
◦ Probability Plot
 How to interpret a probability plot
 If the points are curved downwards, distribution is positively
skewed because small and large points are larger than
expected

Descriptive Statistics


Data Analysis
◦ Scatter Plot
 How to construct a scatter plot
 The values of two variables are plotted against each other

Descriptive Statistics


Data Analysis
◦ Scatter Plot
 How to interpret a scatter plot
 If the points form a diagonal line, variables are correlated
 Perfect diagonal line means a correlation of one

Descriptive Statistics


Data Analysis
◦ Scatter Plot
 How to interpret a scatter plot
 If the points form a diagonal line, variables are correlated
 Perfect horizontal line means correlation is zero

Descriptive Statistics


Data Analysis
◦ Scatter Plot
 How to interpret a scatter plot
 Later, correlation values such as Pearson’s R coefficient will
be discussed

Outline of Discussion


Lesson Proper
◦ Descriptive Statistics
 Data Organization
 Data Analysis
 Statistical Measures



Case Study

Descriptive Statistics


Statistical Measures
◦ Measures of Central Tendency
 Describes the tendency of sample data to cluster
around a particular value
 Mean
 Median
 Mode

Descriptive Statistics


Statistical Measures
◦ Measures of Central Tendency
 Mean
 First moment about the origin
 Average value of data

 k% trimmed mean
 Mean after eliminating the (k/2)% highest and (k/2)% lowest
data points
 Less affected by extreme values

Descriptive Statistics


Statistical Measures
◦ Measures of Central Tendency
 Median
 Divides the data set into two equal halves
 Less affected by extreme values (does not concern with
“weight” of values)
 50th percentile (Quartile 2)

Descriptive Statistics


Statistical Measures
◦ Measures of Central Tendency
 Mode
 Most frequently occurring data point

 Unimodal distribution: one mode/peak
 Bimodal distribution: two modes/peaks

Descriptive Statistics


Statistical Measures
◦ Measures of Variability/Dispersion
 Describes the variability or scattering of data
 Used to gauge the reliability or accuracy of averages
(e
...
lower variability, closer to average)







Range
Interquartile range
Standard deviation
Variance
Mean absolute deviation
Coefficient of dispersion

Descriptive Statistics


Statistical Measures
◦ Measures of Variability/Dispersion
 Range
 Difference between smallest and largest value

 Interquartile Range
 Difference between Quartile 3 (75th percentile) and
Quartile 1 (25th percentile)

Descriptive Statistics


Statistical Measures
◦ Measures of Variability/Dispersion
 Example:
 Determine the 64th percentile, range, and interquartile range
of the following data set
 Quiz Question
Number of hours of sleep per day
9

7

8

6

11

6

5

6

9

4

 Hint: Arrange first in increasing order: 4, 5, 6, 6, 6, 7, 8, 9, 9, 11
 Hint: Xth percentile = X*(n+1), range = max – min, interquartile
range = 75th percentile – 25th percentile

Descriptive Statistics


Statistical Measures
◦ Measures of Variability/Dispersion
 Variance
 Second moment about the origin
 Squared deviation from the mean

 Always positive
 Sum all squared difference of a data point from the mean, then divide
over total data points minus one
 Unit is unit2 of the variable

Descriptive Statistics


Statistical Measures
◦ Measures of Variability/Dispersion
 Standard deviation
 Most commonly used measure of variability/dispersion
 Deviation of data from the mean

 Always positive
 Square root of variance
 Unit is same as that of the variable

Descriptive Statistics


Statistical Measures
◦ Relative measure of Variability/Dispersion
 Coefficient of dispersion
 Used to compare different populations
 The lesser the value, the more consistent the data

 s is the sample standard deviation
 x is the sample mean

Descriptive Statistics


Statistical Measures
◦ Relative measure of Variability/Dispersion
 Example:
 Determine who among two friends, A and B, have more
consistent sleeping hours
 Quiz Question
Sleeping hours

Friend A

Friend B

xbar

9

6
...
5

 Hint: Solve for each friend’s coefficient of variability

Descriptive Statistics


Statistical Measures
◦ Measures on Individual Data Points
 Standard deviation unit
 Distance of a point from the mean
 The lesser the value, the closer the point is to the mean

 s is the sample standard deviation
 xi is the certain point in subject
 xbar is the sample mean (or any point of reference)

Descriptive Statistics


Statistical Measures
◦ Measure of Symmetry
 Coefficient of skewness
 Determines the symmetry of a distribution
 Third moment about the mean

 xbar is the sample mean
 n is the total number of data points
 s is the sample standard deviation
 a3 = 0; symmetric data set
 a3 < 0; skewed to the left data set
 a3 > 0; skewed to the right data set

Descriptive Statistics


Statistical Measures
◦ Measure of Symmetry
 Coefficient of skewness

Descriptive Statistics


Statistical Measures
◦ Measure of Kurtosis
 Coefficient of peakedness
 Determines the height of a unimodal distribution

 xbar is the sample mean
 n is the total number of data points
 s is the sample standard deviation
 a4 = 3; data is mesokurtic (normal)
 a4 > 3; data is leptokurtic (high peakedness)
 a4 < 3; data is platykurtic (low peakedness)

Descriptive Statistics


Statistical Measures
◦ Measure of Kurtosis
 Coefficient of peakedness

Descriptive Statistics


Statistical Measures
◦ Measure of Linear Relationship
 Correlation coefficient
 Determines the linearity between variables of a population

 Pearson’s r coefficient
 Determines the linearity between variables of a sample

 Nonzero correlation coefficient means there is a linear relationship
 Zero correlation coefficient means either they are independent of each
other or their relationship is nonlinear
 Excel scatterplot uses r2 which is more accurate and reliable

Outline of Discussion


Lesson Proper
◦ Descriptive Statistics
 Data Organization
 Data Analysis
 Statistical Measures



Case Study

Summary


Data Organization






Frequency Distribution Table
Histogram
Cumulative Frequency Distribution Table
Ogive
Pareto Chart

Summary


Data Analysis






Stem and Leaf Diagram
Boxplot
Time-series Plot
Probability Plot
Scatter Plot

Outline of Discussion


Lesson Proper
◦ Descriptive Statistics
 Data Organization
 Data Analysis
 Statistical Measures



Case Study

CASE STUDY


For Each Data Set, Get the following:









Mean
Median
Mode
25th & 75th Percentile
Skewness
Kurtosis
Histogram
Ogives

Which Data sets are skewed?
Which Data sets are leptokurtic?
For each Data Set which measure of
Central Tendency is more appropriate to
be used when drawing conclusions?
 Suppose that Data Set 2 is data used for
arm’s reach for Filipinos
...
95% of the Filipinos must be able
to reach this emergency button, how far
should the emergency button be?




A hazardous chemical shelf is to be
installed
...
You wish that only 5% of
the Filipino will be able to reach the
hazardous chemical easily, what is the best
height to be used?
 Data Set 4 are length of steel rods
...
24 and anything
greater than 122
Title: Descriptive Statistical tools
Description: Statistical tools