Therefore, the purpose of our project is to investigate statistically whether there is a significant change in temperature over time in Vietnam, especially in April.. From those raw data
Trang 1VIETNAM NATIONAL UNIVERSITY OF HO CHI MINH CITY
INTERNATIONAL UNIVERSITY
PROJECT REPORT
Course: Statistics
Lecturer: Dr Nguyen Minh Quan
Semester: 2020 - 2021
Group:
Members:
Lê Hu nh Tu n Ki t_MAMAIU18066
Nguyn Trn Duy Tân_MAMAIU18031
Trang 2Table of Contents
Trang 3I Overall information of topic:
Our team choose the project to analyze the global warming in Vietnam Therefore, the purpose of our project is to investigate statistically whether there
is a significant change in temperature over time in Vietnam, especially in April From sources, we have collected the two distinct data set which are the temperature over years from 1931 to 1960 and from 1991 to 2016 respectively
Data set 1: From 1931 to 1960
Data set 2: From 1991 to 2016
Trang 4From those raw data sets, we are on the purpose to analyse only the temperature over the years in April Therefore, we decided to get two samples from there Sample 1: Temperature in April from 1931 to 1960
Sample 2: Temperature in April from 1991 to 2016
Trang 5II Analysis of the temperature:
1) Historical analysis over the two samples:
By using ToolPak of Excel and the knowledge of Statistics course, we have calculated some numbers which are represented for the distribution of sample 1 and sample 2
Trang 6
Overally, from these numbers, the average temperature of two periods changed slightly The median of the second period’s temperature is a slightly more than the first period’s temperature
23
23.5
24
24.5
25
25.5
26
26.5
27
27.5
1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1
1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 First Sample
22.5
23
23.5
24
24.5
25
25.5
26
26.5
27
27.5
28
1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2
Second Sample
Trang 7From these above graphs, it indicates that the temperature in April over the year fluctuated significantly in the interval of 24 degree and 27 degree
Visually, the maximum and minimum temperature of the first sample is greater than the maximum temperature of the second sample Furthermore, from these charts, we can predict that the variances of two samples are approximately the same with each other
Trang 8Over these charts, the sample 1 do not distribute normally and neither does es sample 2 Moreover, data of sample 1 is right skewed while the data of – sample 2 is on the opposite side
2) Hypothesis Test:
a) Hypothesis test of equality of variances:
From the previous part, we have discovered that those two sample are not assumed to be distributed normally Therefore, until the times, we
Trang 9did not know whether the variances of two populations are equal Hence, it is compulsory to estimate it through Levene’s test
The Levene’s test theory:
Definition: Levene's test is used to test if samples have equal k
variances Equal variances across samples is called homogeneity of variance Some statistical tests, for example the analysis of variance, assume that variances are equal across groups or samples The Levene test can be used to verify that assumption
H0: σ2=σ2=…=σ2
i≠σ2
j for at least one pair (i ,j).
The Test
Statistics:
Given a variable with sample of size divided Y N
into subgroups, where is the sample size of k N i
the th subgroup, the Levene s test statistic is i ’ defined as:
W= (𝑁−𝑘) ∑𝑘 𝑁𝑖(𝑍𝑖.−𝑍 )2
𝑖=1 (𝑘−1) ∑𝑘𝑖=1∑𝑁𝑖𝑗=1𝑁𝑖(𝑍𝑖𝑗−𝑍𝑖.)2
Where:
- k is the number of different groups to which the sampled cases belong
- Ni is the number of cases in the th group i
- N is the total number of cases in all groups
- Yij is the value of the measured variable for the th case from the th group j i
- Zij = |Yij - y i.|
- yi. is a mean of th group i
- Zi. = 𝑁1𝑖∑𝑁𝑖𝑍𝑖𝑗
𝑗=1 is the mean of the Zij for group ith
Trang 10- Z = 𝑁1∑ ∑𝑁𝑖 𝑍𝑖𝑗
𝑗=1
𝑘 𝑖=1 is the mean of all
Zij Critical region The Levene s test rejects the null hypothesis that ’
the variances are equal if W > Fα, k-1, N-k
Applying to the problem using 0.05 level of significance:
1 ≠ σ2
We can easily calculate the component of W:
It is clear from the statistical evidence that the p-value (P(F0.05,55,1>W)) is greater than α = 0.05 Therefore, we accept the assumption that the variances
of two population are equal
b) Hypothesis test of the change of the average temperature
between two population:
Trang 11Using t test: Assuming Equal Variances –
H0: μ2 ≥μ1 Ha: μ2 < μ1
With the level of significant is 0.02:
It is clear that the TS = 2.104 < |t-stat|; therefore, it
is reasonable to reject H0 That is, for the level of significance of 0.02, the average temperature of April did not increase over the years
With the level of significant is 0.05:
It is clear that the TS = 1.6736 < |t-stat|; therefore,
it is reasonable to reject H0 That is, for the level of significance of 0.05, the average temperature of April still did not increase over the years
III Regression:
1) Linear regression:
- From the summary of sample 2:
Trang 12The R Square number, which represented for the correlation between – is factors, is too small In the other words, by linear regression, we are not able
to get the data with high accuracy
2) Quadratic regression:
a) The idea:
- A quadratic regression is the process of finding the equation of the parabola that
best fits a set of data As a result, we get an equation of the form:
f(x) = ax2 + bx + c where a≠0
- The best way to find this equation manually is by using the least squares method The purpose is to minimize the sum of the squares of the residuals between the measured y and the y calculated with the quadratic model
b) Introduction:
- I will present a technique for determining the values of a,b and c that minimize the sum of the squares of the residuals
- Given the data set (xi,yi) with i=1,2,…,n and the function f(xi) = axi2 + bxi + c
- Let g(a,b,c) = ∑(yi - axi2 - bx - i c)2with i=1,2,…,n and a,b,c € R
Note:
a) f(xi): the quadratic model
b) g(a,b,c): the function of the sum of the squares of the residuals
- Take the derivative of the function g(a,b,c) with respect to each coefficient a,b,c:
𝒅𝒈
𝒅𝒂 = -2∑(yi - axi2 - bx - c).x = 0 i i2 ⬄ ∑yixi2 - ∑axi4 - ∑bxi3 - ∑cxi2 = 0
𝒅𝒈
𝒅𝒃 = -2∑(yi - axi2 - bxi - c).xi = 0 ⬄ ∑y x - ∑axi i i3 - ∑bxi2 - ∑cxi= 0
Trang 13𝒅𝒄 = -2∑(yi - axi
2 - bx - c) = 0 i ⬄ ∑yi - ∑axi2 - ∑bxi - ∑c = 0
- Equivalently, we have the set of equations which are called normal equations:
∑yi.xi2 = a∑xi4 + b∑xi3 + c∑xi2
∑yi.xi = a∑xi3 + b∑xi2 + c∑xi
∑yi = a∑xi2 + b∑xi + nc
- Solving this set of equations, we will obtain the value of a, b and c
c) Applications:
Now, we applied this method to estimate the future value of the sample 2:
Year(x) Temperature
(y)
1991 26.1502
1992 26.6464
1993 25.4547
1994 26.7231
1996 24.1817
1998 26.7605
1999 25.6341
2000 25.3206
2001 26.7044
2003 26.8283
2004 26.0383
2005 26.0422
2006 25.8245
2007 24.7316
2009 25.2222
2010 25.9229
2011 24.6103
2012 26.2427
2013 25.8306
2016 27.3298
Let xi be the year (i=1,2,…,26) Let y be the temperature i
(i=1,2,…,26)
- Then, computing necessary values:
∑xi3 200909162375
∑xi4 402436699193645
∑yixi2 2591684871
Trang 14- The normal equations:
402436699193645a + 200909162375b + 100301525c = 2591684871
200909162375a+ 100301525b + 52091c = 1293904
100301525a+ 52091b + 26c = 673.322
- Hence, the value of (a, c) = (-3.3645*10 , 0.01357, 0.01724) b, -7
- And the predicting temperature function is:
f(x) = (-3.3645*10-7)x2 + 0.01357x + 0.01724
Hence, the temperature will decrease
Trang 15IV Refferences:
1/ Numerical Methods For Engineers, Seventh Edition
2/ The Impact of Levene's Test of Equality of Variances on Statistical Theory and Practice
-The End-