Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

You have nothing in your shopping cart yet.

Title: S1 notes for Maths
Description: Notes for Maths, Statistics

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above


Statistics 1
Mathematical Models in Probability and Statistics





A mathematical model is a simplification of a real world situation
Some advantages of mathematical models are:
o They are quick and easy to produce
o They can simplify a more complex situation
o They can help us improve our understanding of the real world as certain variables
can readily be changed
o They can enable predictions to be made about the future
o The can help provide control – as in air craft scheduling
Some disadvantages are:
o They only give a partial description of the real situation
o They only work for a restricted range of values

Representation and Summary of Data – Location




A variable that can take any value in a given range is a continuous variable
A variable that can take only specific values in a given range is a discrete value
A grouped frequency distribution consists of classes and their related class frequencies
o Classes – 30-31 32-33 34-35
 For the class 32-33
 Lower class boundary is 31
...
5
 Class width is 33
...
5 = 2





31
...
5
2

= 32
...
When 2 a whole number, find the corresponding

o

term, if it’s not a whole number round the number up and find the corresponding
term
...






Class mid-point is

Median 𝑄2 = 𝑏 +






The mean is given by

o

𝑥̅ =

o 𝑥̅ =

𝑛
−𝑓
2

𝑓𝑐

× 𝑐 – not in formula book

𝑏 is the lower class boundary of the median class
𝑛 is the number of pieces of data
𝑓 is the sum of the frequencies below 𝑏
𝑓𝑐 is the frequency of the median class
𝑐 is the width of the median

∑𝑥
𝑛
∑𝑓𝑥



∑𝑓

– for grouped data

With ∑𝑥 being the sum of all the 𝑥 values

𝑛 being the number of pieces of date
∑𝑓𝑥 being the sum of the frequency of the class times 𝑥 ( 𝑥 being the mid-






point of the class)
 ∑𝑓 the sum of all the frequencies
The mean of a combined set of data is

o 𝑥̅ =

𝑛1 𝑥̅1 + 𝑛2 𝑥̅ 2
𝑛1 + 𝑛2



𝑛1 being the number of pieces of data for the first set and 𝑛2 being the
number for the second set
𝑥̅1 being the mean for the first set of data and 𝑥̅2 being the mean for the
second set




The coded mean of 𝑥 is 𝑧, using the code 𝑦 =

𝑥− 𝑎

, the original mean is 𝑥̅ = (𝑧 × 𝑐) + 𝑏

𝑏

Representation and Summary of Data – Measures of Dispersion


Range = highest value – lowest value



Lower quartile (𝑄1 ) is 4 (for discrete data) or 𝑄1 = 𝑏 +

𝑛

𝑛
−𝑓
4

𝑓𝑐

× 𝑐 (for continuous data) and

find the corresponding value – Lowest value to Lower quartile is 25%
3𝑛
4

3𝑛
−𝑓
4



Upper quartile (𝑄3 ) is




corresponding value – Upper quartile to Highest value is 25%
Interquartile range (IQR) is 𝑄3 − 𝑄1
𝑥𝑛
To calculate the 𝑥𝑡ℎ percentile you find the 100 𝑡ℎ term



or 𝑄3 = 𝑏 +

𝑓𝑐

× 𝑐 (for continuous data) and find the

The 𝑛% to the 𝑚% interpercentile range = 𝑃 𝑚 − 𝑃 𝑛
∑𝑥 2
𝑛

∑𝑥

∑𝑓𝑥 2
∑𝑓

∑𝑓𝑥



Variance (𝜎 2 ) =



Standard deviation (𝜎) = √ 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒



The coded variance of 𝑥 is 𝑧, using the code 𝑦 =



The coded standard deviation of 𝑥 is 𝑧, using the code 𝑦 =

− ( 𝑛 )2 or

− ( ∑𝑓 )2
𝑥− 𝑎
𝑏

, the original variance is 𝜎 2 = 𝑧 ÷ 𝑏2
𝑥− 𝑎
𝑏

, the original standard

deviation is 𝜎 = 𝑧 ÷ 𝑏


𝑦=

𝑥− 𝑎
𝑏

o ̅=
𝑦
o

𝑥̅ − 𝑎
𝑏

𝜎(𝑦) =

𝜎(𝑥)
𝑏

Representation of Data





Stem and leaf diagram:
o 1 ǀ2 4 7 9 - Key: 1ǀ4 = 14
An outlier is an extreme value
o Outliers are values the are less than 𝑄1 − (1
...
5 × 𝐼𝑄𝑅)






A histogram is similar to a bar chart but there are 2 major differences
o There are no gaps between the bars
o The area of the bar is proportional to the frequency
 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 × 𝑐𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡ℎ
Skewness
o If 𝑄2 − 𝑄1 = 𝑄3 − 𝑄2 then the distribution is symmetrical
o If 𝑄2 − 𝑄1 < 𝑄3 − 𝑄2 then the distribution is positively skewed
o If 𝑄2 − 𝑄1 > 𝑄3 − 𝑄2 then the distribution is negatively skewed
o mode = mean = median describes a distribution which is symmetrical
o mode < mean < median describes a distribution which is positive skew
o mode > mean > median describes a distribution which is negative skew

o

3(𝑚𝑒𝑎𝑛−𝑚𝑒𝑑𝑖𝑎𝑛)
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛





>1→ positive skewness, the larger the number the greater the skewness
0 → symmetrical
<1 → negative skewness, the larger the number the greater the skewness

Probability








𝑃(𝐴) probability of 𝐴 happening
𝑃(𝐴 ∪ 𝐵) probability of 𝐴 or 𝐵 or Both happening
𝑃(𝐴 ∩ 𝐵) probability of 𝐴 and 𝐵 happening
𝑃(𝐴′) probability of 𝐴 not happening
𝑃(𝐴 ǀ 𝐵) probability of 𝐴 happening given 𝐵 has already happened
Complementary probability → 𝑃(𝐴′ ) = 1 − 𝑃(𝐴)
Addition rule → 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)



Conditional probability → 𝑃(𝐴 ǀ 𝐵) =




Multiplication rule → 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴 ǀ 𝐵) × 𝑃(𝐵) OR 𝑃(𝐵 ǀ 𝐴) × 𝑃(𝐴)
𝐴 and 𝐵 are Independent if 𝑃(𝐴 ǀ 𝐵) = 𝑃(𝐴) OR 𝑃(𝐵 ǀ 𝐴) × 𝑃(𝐵) OR 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) ×
𝑃(𝐵)
𝐴 and 𝐵 are Mutually Exclusive if 𝑃(𝐴 ∩ 𝐵) = 0
Probability grid





𝑃(𝐵)
𝑃(𝐵′)

𝑃(𝐴)
0
...
4
0
...
1
0
...
5

0
...
8
1

𝑃(𝐴∩𝐵)
𝑃(𝐵)



o 𝑃(𝐴 ∪ 𝐵) = 1 − 𝑃(𝐴′ ∩ 𝐵′ )
Venn diagram

o



When drawing a Venn diagram:
 Always draw a box around the Venn diagram
 Start the in the middle i
...
where they all overlap
 If 𝑃(𝐴 ∩ 𝐵) is 0
...
2, on the Venn
diagram you will write 0
...
7 - 0
...
5
Probability tree

Correlation




For a positive correlation the points on the on the scatter diagrams increase as you go from
left to right
...
Most points lie in the second and fourth quadrants
For no correlation the points lie in all four quadrants

∑𝑥



𝑆 𝑥𝑥 = ∑𝑥 2 − ( 𝑛 )2



𝑆 𝑦𝑦 = ∑𝑦 2 − ( 𝑛 )2



𝑆 𝑥𝑦 = ∑𝑥𝑦 −

∑𝑦

𝑆 𝑥𝑦

∑𝑥∑𝑦
𝑛



𝑟=



𝑟 (PMCC – Product Moment Correlation Coefficient) is a measure of linear relationship –
between -1 and 1
o 𝑟 = 1→ perfect positive relationship
o 𝑟 = -1→ perfect negative relationship
o 𝑟 = 0 → no relationship
Coding has no effect on 𝑟



√ 𝑆 𝑥𝑥 𝑆 𝑦𝑦

Regression




An independent (or explanatory) variable is one that is set independently of the other
variable
A dependent (or response) variable is one whose values are decided by the values of the
independent variable
...
Values
estimated by extrapolation can be unreliable

Discrete Random Variables






A variable is represented by a symbol (usually a capital letter)
The output must be numerical (a lower case letter)
Probability distribution function (PDF)
Cumulative distribution function (CDF)
The sum of the probabilities must add up to 1
o PDF = 𝑃(𝑋 = 𝑥)
o CDF = 𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥)
o ∑𝑃(𝑋 = 𝑥) = ∑𝑝(𝑥) = 1
o 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2
 𝐸(𝑎𝑋 + 𝑏) = 𝑎𝐸(𝑋) + 𝑏
 𝑉𝑎𝑟(𝑎𝑋 + 𝑏) = 𝑎2 𝑉𝑎𝑟(𝑋)
EXAMPLE: The discrete random variable X has probability function
𝑘(6 − 𝑥), 𝑥 = 0 𝑡𝑜 6
𝑃(𝑋 = 𝑥) =
𝑘(𝑥 − 6), 𝑥 = 7
Where 𝑘 is a positive constant?
a) Show that 𝑘 = 0
...
045
22

b)
𝑥

0

𝑃(𝑋 = 𝑥)

6(22)

5(22)

4(22)

3(22)

𝑃(𝑋 = 𝑥)

6
22

5
22

4
22

3
22

𝐸(𝑋 2 )

𝐸(𝑋 2 ) =

1

1

2
1

3
1

4

5

6

2(22)

1
( )
22

0(22)

1
( )
22

2
22

1
22

0
22

1
22

1



6
4
3
2
1
0
1
5


16 ×
25 ×
36 ×
49 ×

22
22
22
22
22
22
22
22



0
5
8
9
8
5
0
7
42
+
+
+
+
+
+
+
=
= 1
...
92 = 3
...
39

𝑉𝑎𝑟 (2𝑥 − 3) = 22 𝑉𝑎𝑟(𝑋) = 13
Title: S1 notes for Maths
Description: Notes for Maths, Statistics