Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

You have nothing in your shopping cart yet.

Title: Methods for Describing Sets of Data
Description: These notes differentiate different types of data sets and how to describe them in a business analysis. These are college level notes but they can also be used for high school classes and AP courses.

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above


Methods for Describing Sets of Data
Thursday, February 11, 2016

11:58 AM

Describing Qualitative Data

• Qualitative data are nonnumerical
○ The value of a qualitative variable can be classified only into categories
called classes
○ Class frequency is the number of observations in the data set that fall into
each class
○ Class relative frequency is the proportion of the total number of
observations falling into each class
§ Class relative frequency = (class frequency)/n
○ Class percentage is the class relative frequency multiplied by 100
§ Class percentage = (class relative frequency)(100)
• Bar Graph: the categories (classes) of the qualitative variable are represented by
bars, where the height of each bar is either the class frequency, class relative
frequency, or class percentage
• Pie Chart: the categories (classes) of the qualitative variable are represented by
the slices of a pie (circle)
○ The size of each slice is proportional to the class relative frequency
• Pareto Diagram: a bar graph with the categories (classes) of the qualitative
variables (i
...
the bars) arranged by height in descending order from left to right

Graphical Methods for Describing Quantitative Data

• Quantitative data sets consists of data that are recorded on a meaningful
numerical scale
○ The main graphical methods used are dot plots, stem-and-leaf plots, and
histograms
• Dot Plot: the numerical value of each quantitative measurement in the data set
is represented by a dot on a horizontal scale
○ When data values repeat, the dots are placed above one another
vertically
• Stem-and-Leaf Display: the numerical value of the quantitative variable is
partitioned into a "stem" and a "leaf"
○ The possible stems are listed in order in a column
○ The lead for each quantitative measurement in the data set is placed in
the corresponding stem row
○ Leaves for observations with the same stem value are listed in increasing

partitioned into a "stem" and a "leaf"
○ The possible stems are listed in order in a column
○ The lead for each quantitative measurement in the data set is placed in
the corresponding stem row
○ Leaves for observations with the same stem value are listed in increasing
order horizontally
• Histogram: the possible numerical values of the quantitative variable are
partitioned into class intervals, where each interval has the same width
○ These intervals form the scale of the horizontal axis
○ The frequency or relative frequency of observations in each class interval
is determined
○ A horizontal bar is placed over each class interval, with height equal to
either the class frequency or class relative frequency
○ Provide good visual descriptions of data sets - particularly very large ones

Numerical Measures of Central Tendency

• If statistical inference is our goal, we'll wish ultimately to use sample numerical
descriptive measures to make inferences about the corresponding measures for
the population
○ The central tendency of the set of measurements is the tendency of the
data to cluster, or center, about certain numerical values
○ The variability of the set of measurements is the spread of the data
• The mean of a set of quantitative data is the sum of the measurements divided
by the number of measurements contained in the data set
○ Mean of a Sample:
§
§ Equal to (x1 + x2 + … xn)
• We use the Greek letter µ for the population mean
• We need to know something about the reliability of our inference -- we need to
know how accurately we might expect "x bar" to estimate µ
○ Depends on the size of the sample: the larger the sample, the more
accurate the estimate will tend to be
○ Depends on the variability, or spread, of the data: all other factors
remaining constant, the more variable the data, the less accurate the
estimate
• The median of a quantitative data set is the middle number when the
measurements are arranged in ascending (or descending) order
○ Of most value in describing large data sets
○ We denote the median of a sample by m
○ We denote the median of a population by N
○ Arrange the n measurements from smallest to largest
§ If n is odd, m is the middle number
§ If n is even, m is the mean of the middle two numbers

Of most value in describing large data sets
We denote the median of a sample by m
We denote the median of a population by N
Arrange the n measurements from smallest to largest
§ If n is odd, m is the middle number
§ If n is even, m is the mean of the middle two numbers
• A data set is said to be skewed if one tail of the distribution has more extreme
observations than the other tail
○ With rightward skew, the right tail of the distribution has more extreme
observations
§ Median is less than the mean
○ With leftward skew, the left tail of the distribution has more extreme
observations
§ Mean is less than the median
○ If data is symmetric, the mean equals the median
• The mode is the measurement that occurs most frequently in the data set
○ The class interval containing the largest relative frequency is called the
modal class





Numerical Measures of Variability

• The range of a quantitative data set is equal to the largest measure minus the
smallest measurement
• The measure of variability is the spread of data
• If deviations, or distance/direction between each measurement and the mean,
are large, the data is spread out/highly variable
• The sample variance for a sample of n measurements is equal to the sum of the
squared deviations from the mean divided by (n-1)
○ The symbol s^2 is used to represent the sample variance

• The sample standard deviation, s, is defined as the positive square root of the
sample variance, s^2
○ s = sqrt(s^2)
○ The population variance is denoted by sigma squared, and is the average
of the squared distances of the measurements of all units in the
population from the mean µ, and sigma is the square root of this quantity
○ Standard deviation is expressed in the original units of measurement

Using the Mean and Standard Deviation to Describe Data

• Cherbyshev's Rule applies to any data set, regardless of the shape of the
frequency distribution of the data
○ No useful information is provided on the fraction of measurements that
fall within 1 standard deviation of the mean
○ At least 3/4 will fall within 2 standard deviations of the mean
○ At least 8/9 will fall within 3 standard deviations of the mean
○ Generally, for any number k greater than 1, at least (1-1/k^2) of the

frequency distribution of the data
○ No useful information is provided on the fraction of measurements that
fall within 1 standard deviation of the mean
○ At least 3/4 will fall within 2 standard deviations of the mean
○ At least 8/9 will fall within 3 standard deviations of the mean
○ Generally, for any number k greater than 1, at least (1-1/k^2) of the
measurements will fall within k standard deviations of the mean
• The Empirical Rule is a rule that applies to data sets with frequency distributions
that are mound shaped and symmetrical
○ Approximately 68% of the measurements will fall within 1 standard
deviation of the mean
○ Approximately 95% of the measurements will fall within 2 standard
deviations of the mean
○ Approximately 99
...
7% of the measurements will have a z-score between -3
and 3


Title: Methods for Describing Sets of Data
Description: These notes differentiate different types of data sets and how to describe them in a business analysis. These are college level notes but they can also be used for high school classes and AP courses.