Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

You have nothing in your shopping cart yet.

Title: Correlation, hypthesis testing statistics
Description: Lecture notes and detailed solved examples on correlation in statistics. For students who are studying statistics, mathematics and physics. (lecture notes on statistics 1)

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above


Notes on Statistics
Lecture notes and solved examples 1
LINEAR CORRELATION
§ 1
...
For example, Ohm's law expresses the linear dependence of the current in a
section of a circuit on the voltage drop in this section
...
These dependences are generalizations of experimental data or obtained by methods of theoretical physics
...

However, often at the initial stage of solving a physical problem, the type of
functional relationship between certain physical quantities is unknown
...
Some physical quantities may be independent
...
Based on the results
obtained, it is concluded that the physical quantities X and Y are related or independent
...


1

When studying real processes, it turns out that there are physical quantities
that, on the one hand, are not connected by an explicit one-to-one functional dependence, but, on the other hand, they are not absolutely independent
...
At the same time, the average values 𝑜𝑓¯ 𝑌 regularly increase or decrease
as X increases
...

Stochastic relationships are associated with random variables that are found
in almost all areas of physics: thermodynamics, quantum physics, many-body mechanics, and the theory of nonlinear oscillations
...

Examples of different types of relationships between variables X and Y are
shown as graphs in Figure 1
...
Graphical representation of measurement results for different typical
cases of the relationship between X and Y values:
a - functionally related quantities X and Y, b - stochastically related quantities X
and Y,
c- independent values of X and Y
...
This is why we need a mathematical method
for analyzing the relationship between variables X and Y, which is described in
this chapter
...
The spread of values of one variable while the value of the other is constant
can be described using probability distributions
...
Linear correlation coefficient
For dependent random variables, quantitative characteristics of the degree of
relationship are determined, the value of which is determined using the probability

3

distributions of these random variables
...

D ( X ) D (Y )

(1)

The symbols M and D represent the expected value and variance of the random variables shown in parentheses
...

1
...
e
...

2
...

3
...
In other
words, if XY = 1, then
Y = K
...

4
...
If ρXY < 0, then X величина Y decreases as X increases, or
Y increases as X decreases величина Y
...
The closer the value ρXY is to zero, the greater the
irregular spread of the values of Y for any fixed value of X
...

In addition, the sign of the correlation coefficient uniquely determines the
nature of the Y(X) dependence – whether it is increasing or decreasing
...
The absolute value XY will decrease when the dependence of Y(X) deviates from the linear one, even if the variables are not stochastically related, but
strictly functionally
...

The expediency of using the correlation coefficient is also due to the fact
that, to a first approximation, many complex dependences are assumed to be linear
...
Calculation of the correlation coefficient
The probability-theoretic definition of the correlation coefficient is not suitable for practical calculations
...
Therefore, instead of the exact value of the correlation coefficient 𝑝XY , it is necessary to calculate its approximate value (estimate) using the
results of measurements of X and Y
...


(3)
5

The best approximate value of the correlation coefficient that can be calculated using the measurement results is the value R, expressed by the following
formula:
N

R

 ( x

i

i 1

N

 (x
i 1

i

 x )( yi  y )

 x )2

N

(y
i 1

i

,

(4)

 y)2

where 𝑥¯ and 𝑦¯ are the average values of the measurement results xi and yi, respectively:

x

1 N
 xi
N i 1

y

1 N
 yi
...
To demonstrate how to estimate the correlation coefficient (4),
let's first consider a simple example
...
The theory gives the following form of
this dependence:

𝑇 = 2𝜋


...
Let us verify this statement using the results of the experiment
by calculating the value of R according to formula (4)
...
The results are summarized in Table 1
...

Table 1
Experiment
number

1

2

3

4

5

10

17

24

31

38

Thread
length
L (cm)
Period
square

0
...
6954 0
...
2618 1
...
First, we calculate the average values of the length and square
of the period:
L = 24 (см), T 2 = 0,9793 (с2)
...
3):

 ( L  L )T
5

i 1

5

 (L  L )
i 1

i

2

i

2

i



 T 2 = 19,824 (смс2)
...


Now, substituting the calculated sums in (6
...

The estimate of the correlation coefficient, which is so close to unity, is a convincing proof of the existence of a linear relationship between the values of L and
T2 and an experimental confirmation of the theoretical formula (6
...
A slight difference between the obtained value R and unity is due to errors in measuring the
length of the thread and the oscillation period
...


Example 2
...

The experiment examines the collision of two steel balls suspended on
threads of the same length
...

Let's try to answer the question: does the collision time depend on the initial
deflection angle? Collision time refers to the time interval during which the balls
are elastically deformed and remain in contact
...




Fig
...
Elastic ball collision scheme
Let the collision time T be measured in the experiment T for the interval of
initial angles φ from 5 to 15 degrees
...
1)
...

9

Table 2
Initial deflection angle φ (deg)

Collision time T
(microseconds)

15

133,2

14

134

13

143,7

12

135,8

11

134,2

10

135,4

9

139,3

8

125

7

110,7

6

130,7

5

100

At first glance, as the angle decreases φ, the impact time changes quite randomly
...

First, according to the data in Table
...
3):
11

 
i 1

11

i

   Ti  T   294,7 (град
...


Substituting the calculated amounts in formula (4), we obtain an approximate
correlation coefficient:

R = 0,687
...
Consequently, we conclude that the relationship of the studied quantities exists
...


§ 3
...
The value of R determined by
the formula (6
...
If
we repeat the series of measurements, we will generally get different experimental
results (6
...
4)
...

To obtain reliable information about the relationship between the studied
values X and Y, it is necessary to construct a confidence interval for an unknown
value of the correlation coefficient 𝑝XY
...
Mathematical statistics show that
this distribution generally depends on the true value of the correlation coefficient
ρXY, which is unknown to the experimenter
...

An algorithm that provides reliable results even for a small number of measured pairs of values of the X and Y values under study is described below Y
...

11

The first step is to determine the number ε, which is the root of the following
nonlinear equation
 = 2() ,

(7)

where Φ (x) is the Laplace function
(x) =

 t2 
exp
   2  dt
...
According to the given value of the Laplace function α/2, the corresponding
value of the argument ε is extracted from the table
...
)
The confidence interval for the correlation coefficient 𝑝XY is represented as:



th U  

N  3    XY



 th U  

N  3 ,

(9)

where U is a value that is related to the approximate value of the correlation coefficient R as follows:
1 1 R 
U  ln 

2 1 R 

(10)

The function th(z) is a hyperbolic tangent that can be expressed in terms of
the exponential function of the doubled argument:

th ( x ) 

exp(2 x )  1

...

Modern computer equipment and software make it easy to perform calculations necessary for constructing the boundaries of the confidence interval (9)
...

The proposed method for constructing a confidence interval for the correlation coefficient gives reliable results for any value ρXY, if the number of pairs N of
measured values xii, yi, (i = 1,
...

If the number of measurements is large (several hundred) or it is known that
the spread of values of both the studied quantities X and Y obeys the normal law
of probability distribution, then the boundaries of the confidence interval for ρXY
are expressed by a simpler formula:
R  1  R 

N   XY  R  1  R 

N

(12)

where the parameter ε is still determined from equation (7)
...


13

4
...

It makes sense to consider the following special case, which is important for
practice
...
This means that the correlation coefficient ρXY must be zero
...
This results in an approximate value of the correlation coefficient, which can significantly differ from zero
due to random errors in the measurement results xii yi, (i = 1,
...

It is proved that, under the condition ρXY = 0, the random variable

TR  R

N 2
1 R2

(13)

has a Student distribution with the number of degrees of freedom equal to N -2
...

The null statistical hypothesis H0 is formulated : the physical quantities X
and Y under study are independent
...
The critical region is twoconnected and consists of sufficiently large absolute values
...

Having set the confidence probability α, based on the known number of
measurement pairs N, we find from the tables the value of the Student's coefficient
tα, N-2, corresponding to the probability α and the number of degrees of freedom
(N - 2)
...
Now we should compare the two obtained numbers:
t α, N-2 and TR
...

Let TR > t α, N-2
...
The difference between the approximate value of the
correlation coefficient R and zero is explained by the presence of random errors
...
We assume that the investigated values X and Y
are independent
...
In the case of TR > t α, N–2, the nonzero value of R can
no longer be explained only by the presence of random errors in the experimental
method
...
It is unlikely that such a significant difference in the estimate of R from zero can be attributed to large random errors in the measured values (3)
...
The null hypothesis is rejected
...
Note that this statement is not absolutely accurate, but is made with
probability α
...

To illustrate the above, we will perform practical calculations for the example of a ball collision from the previous section
...
687 is very far from zero
...
To do this, we calculate from the data in
Table
...

15

Let's set the confidence probability α = 0
...
Tab
...
Therefore, the number of degrees of freedom is 9
...
262
...

We construct a confidence interval for the correlation coefficient for the selected confidence probability α = 0
...

First, we calculate the value U by formula (10), using the previously calculated value R = 0
...


Using the table of values of the Laplace function, we find the solution of
equation (7):

ε = 1,96
...
8):

0,148 ≤ 𝜌XY ≤ 0,911
...
According to (7), the interval width increases sharply with decreasing number of measurement pairs
...
In

addition, the relationship between the data of the studied quantities is probably
nonlinear
...
In such cases, the analysis of the nonlinear relationship should be carried out using so-called correlation relations
...
More commonly used is converting dependencies to a linear form
Title: Correlation, hypthesis testing statistics
Description: Lecture notes and detailed solved examples on correlation in statistics. For students who are studying statistics, mathematics and physics. (lecture notes on statistics 1)