The objective of our study is to test for any significant interaction between provinces and types of ownership and to test for any significant differences in the profitability of busines
Trang 1Hanoi, 19th October, 2021
Trang 2TABLE OF CONTENTS
Ay SCOMANION eee .e 3
.c 3
9i 1 aes 3 9i 021 - 9
Trang 3A Scenario
The database of the annual Vietnamese Enterprise Surveys (VESs) is an important source of data for any scholars doing research on the Vietnam economy and its micro dynamics In 2004, the survey was carried out with a sample size of more than 2 million businesses in all provinces across the country The household questionnaire contained many sections, each of which covered a separate aspect of business activities, and profitability was one important indicator In the survey, businesses were asked to specify their site of operation (province), types of ownership (own) and profitability (roa) The objective of our study is to test for any significant interaction between provinces and types of ownership and to test for any significant differences in the profitability of businesses due to these two variables In the given dataset, 1 represents firms from Hanoi, 2 represents firms from Danang and 3 represents firms from Ho Chi Minh City
B Questions
Question 1: Produce descriptive statistics for the dataset You are expected to generate as many relevant descriptive statistics as possible using ALL the relevant tools introduced in the labs of this course Remember to provide appropriate interpretations for the descriptive statistics Try not to include unnecessary or irrelevant descriptive statistics
We use RStudio to describe statistics First, we import the Excel file “Datasets.csv” into R for further calculation:
> VESdata <- read.table ("Datasets.csv", header=TRUE, sep = ",", stringsAsFactors = FALSE) There are 180 observations in this case study; therefore, so we use head (data) to see some first observations to have better knowledge related to this data:
> head(VESdata)
Trang 4roa own province
.017205203 state-owned HaNoi 037425261 state-owned HaNo1 010682668 state-owned HaNo1 045622803 state-owned HaNo1 005230857 state-owned HaNoi 011417229 state-owned HaNo1
Figure 2: Structure of data set
A frequency table can be created to see the sample size of each treatment group with the following R code:
> table(VESdataSprovince, VESdataSown)
private-owned state-owned HaNo1 30 30 DaNang 30 30 HoCh1M1nhC1ty 30 30
Figure 3: Frequency table oƒ sample size
All 6 treatment groups have the same sample size of 20, therefore two-way ANOVA test is the best choice to use
Next, we use by () function to show several descriptive statistics such as mean, median, standard deviation, summary, for each group, listed by the factors and their output, respectively:
Trang 5> by(VESdata$roa,list(VESdata$ province, VESdata$own ), mean) : HaNoi
: private-owned
[1] 0.01803454
: DaNang : private-owned [1] 0.00112933 : HochiMinhcity : private-owned [1] -0.01423397 : HaNoi : state-owned [1] 0.01764922 : DaNang : state-owned
[1] 0.006058169
: HochiMinhcity : state-owned
[1] 0.03873235
Figure 4: Mean of dataset
> by(VESdataSroa,list(VESdataSprovince, VESdataSown ), median)
: HaNoi : private-owned
[1] 0.0009765625
: DaNang : private-owned
[1] 0.003712735
: HoChiMinhCity : private-owned [1] 0.004456994 : HaNoÏ : state-owned [1] 0.01058501 : DaNang : State-owned [1] 0.004234574 : Hochiminhcity
: state-owned
[1] 0.02688294
Trang 6Figure 5: Median of dataset
> by(VESdataSroa,list(VESdataSprovince, VESdataSown ), sd)
: HaNoi
: private-owned [1] 0.06516742 : DaNang : private-owned
[1] 0.04189785
: HoChiMinhCỉty : private-owned [1] 0 09088937
: HaNoi
: state-owned [1] 0.02668994 : DaNang : state-owned
[1] 0.0492133
: HoChiMinhCỉity : state-owned [1] 0.06603698
Figure 6: Standard deviation
> by(VESdataSroa,list(VESdataSprovince, VESdataSown ), summary)
: HaNoi : private-owned
-0.0755968 -0.0077573 0.0009766 0.0180345 0.0109979 0.2681564 : DaNang
: private-owned
-0.100803 -0.002577 0.003713 0.001129 0.010688 0.142857
: HochiMinhcity : private-owned
-0 246154 -0.015633 0.004457 -0.014234 0.023085 0.108000
: HaNoi : state-owned
Min 1st Qu Median Mean 3rd Qu Max
-0.040651 0.003409 0.010585 0.017649 0.027338 0.116372
Trang 7-0.13838 0.01177 0.02688 0.03873 0.06195 0.20805
Figure 7: Summary
Each code gives the specific descriptive statistics of the outcome variable (GPA) for each treatment group with the listed Province and types of ownership (own) The final code Summary helps to find 5 basic statistics related to the roa: Minimum value, the first quantile, mean, median, the third quartile and maximum value
Lastly, we conduct the boxplot and the mean plot to get further information:
> boxplot(roa ~ province + own, data = VESdata, xlab = "Province along with types of ownership ", ylab = "Profitability", main="Box Plot", col = c("orange", "red",
"purple","yellow","black","green"))
HaNoi private owned DaNang private owned HoChiMintiCity.privale owned HaNoi siate owned DaNang siate owned = HoChiMintyCity state owned
Province along with types of ownership Figure 8: boxplot
Trang 8The box plot up top displays data from the dataset's medians, quartiles, maximum, minimum, and outliers
Firstly, the graph shows that the distribution is squeezed, indicating that The dataset contains only a few minor variables DaNang private own has stability and Profitability values since its variance within the group is smallest, due to smallest interquartile range and marginal value range between maximum and minimum value HoChiMinhCity state owned has the highest value of all criteria, except lower quartile, which is only lower than that of HaNoi private owned Most privately owned and state owned businesses have relatively low values, however, HaNoi private owned has an outlier that has a profit greater than 0.2 which is the highest value in the dataset
The box plot also shows the deviation of the data set While the profitability (roa) of state-owned and private enterprises in Da Nang has a symmetrical distribution, those in Ho Chi Minh City are positively skewed (skewed to the right)
Next, we use Mean Plot to identify mean value of each group and compare means between groups: > install.packages("gplots")
> library(gplots)
> plotmeans(roa ~ interaction(province,own), data = VESdata, xlab = "Province and types of ownership", ylab= "Profitability", main="Mean Plot with 95% CI")
Trang 9
Province and types of ownership Figure 9: Mean Plot
There are 6 groups presented in the mean plot with a 95% confidence interval The result in this graph is the same as the above outcome generated through By () functions for means Looking at the graph, there is a distinct pattern between private and state-owned enterprises in two provinces in Ho Chi Minh City and Hanoi While HaNoi privately owned companies achieved profitability exceeding 0.04, HaNoi state-owned only reached nearly 0.03 Similarly, HoChiMinh state owned achieved the highest profitability in the dataset exceeding 0.06 while Ho Chi Minh City private-owned achieved only 0.02 As a result, there is a rather large difference between these two groups Moreover, the means of six groups are different, therefore, they are satisfied with the assumption of one-way ANOVA
Question 2: Use analysis of variance to test for any significant differences due to province Use a 05 level of significance, and for now, ignore the effect of types of ownership Check all the assumptions of the inference technique you use Are the assumptions satisfied?
Step 1: Hypotheses
Trang 10® Ha: At least the two populations are different Step 2: Checking assumptions
Assumption 1: Independent, simple random sample
There are two categories of province: stated-owned and private owned In total, there are 180 samples, and each type of sample has 60 data It also can be seen that there is no relationship between these two factors because they are not affected by the other As a result, we can say that the samples are independent and all the observations are randomly selected
Assumption 2: All populations have the same standard deviation
Firstly, we used “by function” to see the output of the standard deviation of roa for Hanoi, Danang and Ho Chi Minh city The following code and outcome table are shown below:
> by(VESdataSroa,VESdataSprovince,sd)
VESdata$province: HaNoi [1] 0.04937189
VESdata$province: DaNang [1] 0.04538132
VESdata$province: HochiMinhcity
[1] 0.08316948
From this standard deviation result, we can see that the ratio respectively is 0.04937189 (Hanoi), 0.04538132 (Danang) and 0.08316948 (HCM City) After calculating the ratio of the largest standard deviation over the smallest one (0.08316948 / 0.04538132) equals 1.83268093568 which is smaller than 2 As a result, we have enough evidence to conclude that our second assumption is satisfied Besides, because the ratio is smaller than 2, we cannot use the Levene test because this test is only used when the largest standard over smallest standard ratio is 2-3
Assumption 3: All populations are normally distributed
10
Trang 11In order to check all populations are normally distributed or not, we can use Q-Q plot approach with R code on the below:
Step 3: The ANOVA test
We use R studio to calculate statistics and p-value Here is R output for one-way ANOVA test: > VESdata.result <- aov(roa ~ province, data = VESdata)
> summary(VESdata.result)
DF Sum Sq Mean Sq F value Pr(>F) province 2 0.0062 0.003092 0.813 0.445 Residuals 177 0.6734 0.003805
Step4: F statistic
11
Trang 12F = 0.813
Step 5: Level of significance The level of significance is = 0.05 Step 6: Decision rule and conclusion
We decide to use the p-value approach to make decisions Hence, we reject Ho if the p-value =< a Based on the R output, we find out p-value = 0.445 In conclusion, at p-value = 0.445 > a = 0.05 , so we do not reject Ho
Step 7: Conclusion
We have not enough evidence to conclude that there is a significant difference due to province Question 3: Use analysis of variance to test for any significant differences due to types of ownership Use a 05 level of significance, and for now, ignore the effect of province Check all the assumptions of the inference technique you use Are the assumptions satisfied? Step 1: Hypotheses
® Ho: All population means are equal (H1 = 1 = = pk) ® H,: At least two population means are different Step 2: Checking assumptions
Based on the insights and knowledge during the BES course, the One-Way Analysis of Variance is the most logical approach to solving this question in our case study Before conducting a One-Way analysis of variance, we must first check whether three assumptions met
Assumption 1: Independent, simple random samples
To check assumption 1, we chose the individuals from the sample "Types of ownership" without paying attention to the sample "Province" It leads to all the provinces "Hanoi", "Danang" and "Ho Chi Minh City" having the same chance of being selected Thus, all the samples are independent and the samples are randomly selected
12
Trang 13Assumption 2: All populations have the same standard deviation
Firstly, we can calculate the standard deviation of roa for each type of ownership > by(VESdataSroa,VESdataSown,sd)
Assumption 3: All populations are normally distributed
We use Q-Q plots as an approach Firstly, we import dataset6.cvs data frame into R Studio and accredit it to VESdata
> VESdata <- read.table ("Datasets.csv", header=TRUE, sep = ",", stringsAsFactors = FALSE) > Str(VESdata)
The normality of the data can be confirmed graphically using a Q-Q plot A separate Q-Q plot can be created for each sample, allowing us to assess whether they are all normally distributed Alternatively, we can present the following plot to analyze the normality of the residuals To access the Q-Q plot function, install the{car}package
Trang 14In a Q-Q plot, if the data points line up on a straight diagonal line, the data set has a normal distribution It can be seen that the points lie mostly along a straight diagonal line, with some small deviations at each tail We can safely assume that this data set is normally distributed along the graph Step 3: The ANOVA test
After all three assumptions are satisfied, we now run the ANOVA test The ANOVA command is as follows:
> VESdata.result<-aov(roa ~ province*own, data = VESdata)
Step 5: Level of significance
Sum Sq Mean Sq
0.0062 0.003092 0.0165 0.016537 0.0259 0.012956 0.6310 0.003626
F value Pr(>F) 0.853 0.4281 4.560 0.0341 * 3.573 0.0302 *
14
Trang 15The level of significance is = 0.05 Step 6: Decision rule and conclusion
The decision rule for one-way ANOVA is that if p-value is smaller than alpha (a), the null hypothesis will be rejected We have F(4.560,1,174), so the p-value is equal to 0.0341 Because p-value = 0.0341 is smaller than a = 0.05, we decide to reject the null hypothesis
Step 7: Conclusion
To compare the effect of types of ownership on the business profitability, a one-way analysis of variance was performed With the result in step 4, we have enough evidence to conclude that there are significant differences in the profitability of businesses due to types of ownership
Question 4: At the 05 level of significance test for any significant differences due to province, types of ownership, and interaction Check all the assumptions of the inference technique you use Are the assumptions satisfied? Explain
Step 1: Hypotheses
There are 3 sets of hypotheses for two-way ANOVA in this case study:
Step 2: Checking assumptions
According to what we can get from the case study, we deduced that the two-way factor analysis of variance is the best appropriate technique to solve this problem Three assumptions are needed for two-way ANOVA:
15