Try not to Question 2: Use analysis of variance to test for any significant differences due to province.. Explaim...-- cece eee 5 Question 3: Use analysis of variance to test for any sig
Trang 1Nguyễn Thị Thu Trang 2004050055
Tiphaine Verbist 22FBA0001
Hanoi, November 4" , 2022
Trang 2
Table of content
LG na ẻ.ẻ.Ề 1
Question 1: Produce descriptive statistics to summarize the data You are expected to generate as many relevant descriptive statistics as possible using ALL the relevant tools introduced in the labs of this course Remember to provide appropriate mterpretations for the descriptive statistics Try not to
Question 2: Use analysis of variance to test for any significant differences due to province Use a 05 level of significance, and for now, ignore the effect of types of ownership Check all the assumptions of the imference technique you use Are the assumpfions satisfiied? Explaim cece eee 5 Question 3: Use analysis of variance to test for any significant differences due to types of ownership Use a 05 level of significance, and for now, ignore the effect of province Check all the assumptions of the inference technique you use Are the assumptions satisfied? Explain 0.00000 cece eee eee 6 Question 4; At the 05 level of significance test for any significant differences due to province, types of ownership, and interaction Check all the assumptions of the inference technique you use Are the
Question 5: Draw an interaction plot and interpret the plot Is the plot consistent with the conclusions Question 6: Discuss the credibility of the mterpretations and conclusions of these tests Is there
Trang 3A Scenerio
The database of the annual Vietnamese Enterprise Surveys (VESs) is an important source of data
for any scholars doing research on Vietnam economy and its micro dynamics In 2004, the survey
was carried out with a sample size of more than 2 million businesses in all provinces across the country The household questionnaire contained many sections, each of which covered a separate aspect of business activities, and profitability was one important indicator In the survey, businesses were asked to specify their site of operation(province), types of ownership(own) and profitability (roa) The objective of our study is to test for any significant interaction between provinces and types of ownership and to test for any significant differences in the profitability of businesses due to these two variables A portion of the VES data is to be given to each group by your tutor In the given dataset, 1 represents firms from Hanoi, 2 represents firms from Danang and 3 represents firms from Ho Chi Minh City
B Answering the questions
Question 1: Produce descriptive statistics to summarize the data You are expected to generate as many relevant descriptive statistics as possible using ALL the relevant tools introduced in the labs of this course Remember to provide appropriate interpretations for the descriptive statistics Try not to include unnecessary or irrelevant descriptive statistics
In order to produce descriptive statistics, R studio was applied in this report In the first place, it is necessary to set the working directory and import our data file “Datasetl.csv” Therefore, the following code was used:
setwd("~/Desktop/Case study BES")
data1 <- read.table("Dataseti.csv", header = TRUE, sep = ";", quote="\"", stringsAsFactors = FALSE)
Then, we change the variable “own” and “province” into factors:
datal$province <- factor(datal$province, levels =c("1", "2", "3") ,labels = cC"Ha Noi","Da Nang","Ho Chi Minh
City"))
The following table shows the first 6 subjects of the data set 1:
Trang 4We use “SO” for “State-owned” and “PO” for “Private-owned”
We can access the internal structure of the data frame with the following code:
> str(datal)
$ province: Factor w/ 3 levels "Ha Noi","Da Nang", : 1111111111
We have 180 observations and 3 variables: roa, own and province We converted own and
province into factors because they are characters
In order to form a tabular summary of the data for 2 variables, we form a cross tabulation table
As shown by this table, we can analyse the relationships between the two variables: the site of operation “province” and the type of ownership “own” With the following code we obtained this table:
> table(datal$province, datal$own) SO PO Ha Noi 30 30 Da Nang 30 30 Ho Chi Minh City 30 30
Furthermore, we computed the means, standard deviations and a summary of information of our data set as numerical values with the code by() We can observe that the mean or average of Ha Noi State-owned is the largest one and the mean or average of Ho Chi Minh City state-owned is
the smallest one In terms of standard deviation, we can observe that the value of the standard
deviation of Ho Chi Minh City state-owned is the largest one and the value of the standard deviation of Da Nang state-owned is the smallest one In the third by() code that we ran we can
see the minimum and maximum values, the quartiles and the means of our data.
Trang 5: Ha Noi : SO
[1] -0.5864526
: Ha Noi : PO
[1] -@.01194629
: Da Nang : PO [1] -@.01175367
[1] 9.1722996
: Ho Chỉ Minh City : PO [11 9.5308589
> by(datal$roa, 1ist(datal$province, datal$own), summany) : Ha Noi
: S0
-0.098277 0.004444 0.019747 0.032610 0.041715 0.242908 : Da Nang
-@.302521 -@.005953 0.002195 -0.011946 0.007814 0.030714
: Da Nang : PO
Trang 6For the purpose of providing a graphical description of our data, we drew a boxplot with the following code:
> boxplot(roa ~ province + own, data = datal, xLab = "Type of Provinces and Ownership", ylab = "roa", ylim=c(- 0.1,0.2), col = c("red", "blue", "yellow" ,"pink","green","orange"), main="Box Plots")
This boxplot shows us the distribution of the profitability for the six given groups (state-owned
from Ha Noi, state-owned from Da Nang, state-owned from Ho Chi Minh City, private-owned
from Ha Noi, private-owned from Da Nang and private-owned from Ho Chi Minh City) We can observe the minimum and maximum profitability and the lower, median/middle and upper quartile of the profitability for each group
While analysing the plots, we notice that they are almost all symmetric, with some outliers Besides, we can see that Ho Chi Minh City private-owned hit the maximum value and Ha Noi state-owned hit the minimum value
Since the plot of Ha Noi private-owned is the shortest, this means that there is the least variation
in this group The group with the most variation is Ho Chi Minh City private-owned
° ° 2
3 ° °
Type of Provinces and Ownership
Trang 7Question 2: Use analysis of variance to test for any significant differences due to province Use a 05 level of significance, and for now, ignore the effect of types of ownership Check all the assumptions of the inference technique you use Are the assumptions satisfied? Explain
Ho: All population means of profitability followed by provinces are the same Ha: At least 2 population means of profitability followed by provinces are different
¢ Checking assumptions:
- Samples are independent, simple random samples
Private-owned and state-owned are the two categories of ownership, as it has been mentioned There are 90 samples total for each sort of ownership, making the sample sizes all equal We may therefore say that the samples are independent
- All populations in question are normally distributed
The fact that a straight line links practically all of the places is obvious Since no glaring outliers are significant, we may draw the conclusion that all populations exhibit are
- All population standard deviations are equal
We divide the largest standard deviation by the smallest standard deviation (= 36.1188 1)
This value is bigger than 2, so we conduct a Levene test.The p-value of the Levene test is equal to 0.2128 This value is larger than alpha (0.05) so all standard deviations are equivalent.
Trang 8datal$province: Ha Noi
[1] @.06776013
datal$province: Da Nang [1] @.1240284
datal$province: Ho Chi Minh City [1] 2.447415
> 2.447415/0.06776013 [1] 36.11881
> LeveneTest(datal$roa, data1$province, center = median) Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F) group 2 1.561 0.2128
177
We use the one-way ANOVA-test:
> datal<-aovCroa ~ province, data = datal1) > summaryCdata1)
Df Sum Sq Mean Sq F value Pr(>F}) province 2 5.2 2.586 1.291 0.278 Residuals 177 354.6 2.003
¢ Significant level alpha = 0.05
Reject Ho if p-value < alpha
P-value = 0.278>0.05 — Do not reject Ho ¢ Conclusion:
There is enough evidence to conclude that there are no appreciable differences related to different provinces
Question 3: Use analysis of variance to test for any significant differences due to types of ownership Use a 05 level of significance, and for now, ignore the effect of province Check all the assumptions of the inference technique you use Are the assumptions satisfied? Explain
We utilize one-way ANOVA to check for any differences that are significantly different based on
ownership in this question We need to check assumptions: - Samples are independent, simple random samples - Populations are normally distributed
- All population standard deviations are equal
Assumption 1: Samples are independent, simple random samples.
Trang 9Private-owned and state-owned are the two categories of ownership, as it has been mentioned There are 90 samples total for each sort of ownership, making the sample sizes all equal We may therefore say that the samples are independent
Assumption 2: Populations are normally distributed
To determine if populations are regularly distributed or not, we use the R function qq plot
> Business]<-read.table ("Business.csv",header=TRUE,sep=",",quote="\""
stringsAsFactors = FALSE) install packages("car" library(car)
qqPlot(Im(roa~own, data=business1), simulate=T, main="Q-Q Plot", labels=F)
Assumption 3: All population standard deviations are equal
We use the function “by” to check if this assumption is true or not.
Trang 10business$own: State
[1] 0 3241811 business$own: private
178
>
We have p-value = 0.448, alpha = 0.05, then p-value is less than alpha, so all population standard
deviations are equivalent, as may be inferred Test procedure:
Ho: All population means of profitability followed by ownership are the same Ha: At least 2 population means of profitability followed by ownership are different ° Check assumption:
- Samples are independent, simple random samples - Populations are normally distributed
- All population standard deviations are equal
We use one-way ANOVA test for this:
> ovown <- aov(roa ~ own, data=business) > summary Covown)
DF Sum Sq Mean Sq F value Pr(>F) 1 0.7633 0.378 0.539 own 8
0O 2.0168 Residuals 178 35
>
9 9
° Significant level: alpha = 0.05
Reject Ho if p-value < alpha, we have P-value = 0.539>0.05 — Do not reject Ho
Trang 11There is enough evidence to conclude that there are no appreciable differences related to different types of ownership
Question 4: At the 05 level of significance test for any significant differences due to province, types of ownership, and interaction Check all the assumptions of the inference technique you use Are the assumptions satisfied? Explain
There are three sets of hypotheses for two-way ANOVA in this case study:
1 Hol: The means of “Province” are equal
Hal: The mean of at least one factor of “Province” is different 2 Ho2: The means of “Ownership” are equal
Ha2: The means of at least one factor of “Ownership” is different 3 Ho3: There is no interaction between “Province” and “Ownership”
Ha3: There is interaction between “Province” and “Ownership”
a, Check assumptions:
- Samples are independent, simple random samples from each of 6 populations (“Ownership has 2 levels; “Province” has 3 levels)
- All populations are normally distributed (Q-Q plot - figure 2)
- All populations have the same standard deviation (because
largest standard deviation _ 3.43 _ 108.46>2, we use other tests - figure 3) smallest standard deviation 0.03
b, Test statistic:
The test was conducted using Rstudio:
> datal.result<-aovCroa ~ province*own, data = datal) > summary Cdatal.result)
Df Sum Sq Mean Sq F value PrC>F) province 2 5.2 2.5857 1.281 0.280
province:own 2 2.6 1.2955 0.642 0.528 Residuals 174 351.2 2.0185
Figure 1: Two-way ANOVA output 9
Trang 12- Test statistic and p-values
= 0.642 p-value (province:own) = 0.528
F (province) = 1.281 p-value (province) = 0.280
F (own) = 0.378 p-value (own) = 0.539
F (province:own) = 0.642 p-value (province:own) = 0.528
® Decision rule: We reject Ho if p-value < a = 0.05
- p-value (province) > o (0.280 > 0.05) then we do not reject Hol - p-value (own) > o (0.539 > 0.05) then we do not reject Ho2
Trang 13Figure 2: Q-O plot
> byCdatal$roa, listCdatal$province,datal$own) ,sd) >: Ha Noi
: private-owned [11 O.O5845379
Da Nang : private-owned [1] 0.1722996
Ho Chi Minh City : private-owned
[1] 0.5308589
Ha Noi 2 state-owned [1] 0.06998216
Figure 3: Standard deviation output
Question 5: Draw an interaction plot and interpret the plot Is the plot consistent with the
conclusions made in Question 4?
In order to visualize the interaction between province location and types of ownership by the
outcome variable of roa, and recheck the conclusion in question 4, the following code is conducted:
interaction plot(datal$province, datal$own, datal$roa, type="b", col=c("red", "blue"), pch=c(16,
18), main="Interaction between Province and Ownerships") Following that, the image of interaction is explored