Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

You have nothing in your shopping cart yet.

Title: Graphical And Numerical Summaries
Description: Introductory level statistics describing several forms of graphical and numerical summaries of data, including histograms and boxplots.

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above


STAT 146

Introduction to Statistics II

Day One Notes—Graphical and Numerical Summaries

Text: 3
...
2, 3
...
5, 7
...

Mean > Median Skewed Positively

1

If the mean value is 50, then we
would describe the shape of the
distribution as Symmetric since the
mean is approximately equal to the
median
...
= Median Symmetric

If the mean value is 12, then
we would describe the
distribution as Skewed
Negatively (Left) since the
mean is less than the median
...


List the following values, in this order:
Min, Q1, Median, Q3, Max

The middle 50% of the values is Q3 –
Q1 and called the Interquartile Range
(IQR)
...


75th percentile (Q3): The data point such
that 75% of data points are smaller that
it and 25% are larger
...

25th percentile (Q1): The data point such
that 25% of data points are smaller than
it and 75% are larger
...
5(IQR)
Low Value (Min): The smallest #
...
5(IQR)

2

Describe data using Shape, Center, and Spread
If asked to describe SHAPE, determine if the data set is uniform, skewed left/right or
symmetric by studying a graphical display (dotplot, histogram, or boxplot), or study the
descriptive statistics (to compare the mean and median)
...
If the mean is greater than
the median, then we say “skewed positively”
...
NOTE: If there are outliers present, mention it and only report statistics
resistant to outliers when comparing center and spread
...
If outliers are present, analyze the median (since it is resistant to outliers)
...
I am not looking for you to state the numerical value; I
would like you to compare the center values across the categories that you are studying and
report which category has the highest average value, or the lowest average value
...

Make a conclusion:
“According to the boxplot (median)/descriptive statistics (mean), on average, ________ has
the higher/same as__ /lower ____
...
If
outliers are present, analyze the interquartile range (IQR: Q3 – Q1; the middle 50%)
...
I am not looking for you to state a value
that represents spread; I would like you to compare the spread across the categories that you
are studying and report which category has data with the least spread; in other words, report
which category has the least variability
...


3

Example 1: Given the boxplots, compare the distributions by describing shape, center
(location) and spread
...


Shape: (Use comparison of mean and median to determine skew
...

Flow 160 appears somewhat symmetric since the mean and median values appear the same
...

Center: (Compare means or median values
...
Flow 125 has the lowest uniformity value, on average
...
Which category has the smallest IQR (smallest
rectangle?)
According to the IQR from the boxplots, the flow 160 data has the least variability (spread)
...

NOTE: If there are outliers, mention it and only report statistics resistant to outliers
...
)
According to the boxplots, 1978 distribution appears to be skewed negatively since the
mean is less than the median
...

There are 2 outliers present
...
Which category has the highest “center” value?)
According to the median values on the boxplots, on average, the higher interruption times
occurred in 2003
...
Which category has the smallest IQR (smallest
rectangle?)
According to the IQR on the boxplots, the 2003 data have the least variability (spread)
...
If you need
more than 2 decimal places in your results (because your MML homework asks for
more), find the descriptive stats another way (see next page)
...
This means that all of the sodium levels are in one column (C9) and the names of
the restaurants are in another column (C1-T)
...
But, if you want the mean of sodium levels for each restaurant, then you need to use the ‘By
variables’
...


7

Build a boxplot if data is STACKED
GraphBoxplotWith Groups
Always select ‘Data View’ and check the ‘Mean symbol’

Build a boxplot if data is UNSTACKED
GraphBoxplotSimple
Always select ‘Data View’ and check the ‘Mean symbol’

8

Build a histogram if data is STACKED
GraphHistogramWith Outline and Groups

Build a histogram if data is UNSTACKED
GraphHistogramSimple

9

Converting STACKED data to UNSTACKED data
DataUnstack columns…
You can store the unstacked data in a new worksheet or in the last column used in your current worksheet
...


Converting UNSTACKED data to STACKED data
DataStack Columns…
You can give headings to your two new columns, if you choose
...
Open the data in Statcrunch

2
...
Open Minitab, highlight
C1 entirely
...
Paste (CTRL V) the Statcrunch
data into Minitab
...
It is supposed that a new
machine will pack faster on the average than the machine currently used
...
Open data set: Day 1
Machine
...
Describe the data set (shape, center, and spread)
...
It is supposed
that a new machine will pack faster on the average than the machine currently used
...
Open data set: Day 1
Machine
...
Describe the data set (shape, center, and spread)
...

(NOTE to student: Since the mean values are slightly less than the respective median values,
you could say ‘slightly skewed negatively/left’—I just felt the values were so close
...
You can choose to use numerical summaries,
boxplots and/or histograms
...
Use whichever one(s) you
need and always tell me which you used
...

Spread: According to the IQR from the boxplots, the data for the ‘new machine’ have the
least variability
...
Open data set: Day 1 Singers
...
Describe the
data set (shape, center, and spread)
...
Open data set: Day 1 Singers
...

Describe the data set (shape, center, and spread)
...

(NOTE to student: Of course, you could use the numerical summaries or boxplots as well and
had different conclusions: Alto, bass and soprano height data are slightly skewed left since the
mean values are slightly smaller than the respective median values
...
)
Center: According to the median values on the boxplots, on average, bass singers are taller
(have the greatest height)
...

Spread: According to the IQR from the boxplots, soprano data have the least variability
...


15


Title: Graphical And Numerical Summaries
Description: Introductory level statistics describing several forms of graphical and numerical summaries of data, including histograms and boxplots.