business economics statistic case study report

The objective of our study is to test for any significant interaction between provinces and types of ownership and to test for any significant differences in the profitability of busines

Trang 1

Hanoi, 19th October, 2021

Trang 2

TABLE OF CONTENTS

Ay SCOMANION eee .e 3

.c 3

9i 1 aes 3 9i 021 - 9

Trang 3

A Scenario

The database of the annual Vietnamese Enterprise Surveys (VESs) is an important source of data for any scholars doing research on the Vietnam economy and its micro dynamics In 2004, the survey was carried out with a sample size of more than 2 million businesses in all provinces across the country The household questionnaire contained many sections, each of which covered a separate aspect of business activities, and profitability was one important indicator In the survey, businesses were asked to specify their site of operation (province), types of ownership (own) and profitability (roa) The objective of our study is to test for any significant interaction between provinces and types of ownership and to test for any significant differences in the profitability of businesses due to these two variables In the given dataset, 1 represents firms from Hanoi, 2 represents firms from Danang and 3 represents firms from Ho Chi Minh City

B Questions

Question 1: Produce descriptive statistics for the dataset You are expected to generate as many relevant descriptive statistics as possible using ALL the relevant tools introduced in the labs of this course Remember to provide appropriate interpretations for the descriptive statistics Try not to include unnecessary or irrelevant descriptive statistics

We use RStudio to describe statistics First, we import the Excel file “Datasets.csv” into R for further calculation:

> VESdata <- read.table ("Datasets.csv", header=TRUE, sep = ",", stringsAsFactors = FALSE) There are 180 observations in this case study; therefore, so we use head (data) to see some first observations to have better knowledge related to this data:

> head(VESdata)

Trang 4

roa own province

.017205203 state-owned HaNoi 037425261 state-owned HaNo1 010682668 state-owned HaNo1 045622803 state-owned HaNo1 005230857 state-owned HaNoi 011417229 state-owned HaNo1

Figure 2: Structure of data set

A frequency table can be created to see the sample size of each treatment group with the following R code:

> table(VESdataSprovince, VESdataSown)

private-owned state-owned HaNo1 30 30 DaNang 30 30 HoCh1M1nhC1ty 30 30

Figure 3: Frequency table oƒ sample size

All 6 treatment groups have the same sample size of 20, therefore two-way ANOVA test is the best choice to use

Next, we use by () function to show several descriptive statistics such as mean, median, standard deviation, summary, for each group, listed by the factors and their output, respectively:

Trang 5

> by(VESdata$roa,list(VESdata$ province, VESdata$own ), mean) : HaNoi

: private-owned

[1] 0.01803454

: DaNang : private-owned [1] 0.00112933 : HochiMinhcity : private-owned [1] -0.01423397 : HaNoi : state-owned [1] 0.01764922 : DaNang : state-owned

[1] 0.006058169

: HochiMinhcity : state-owned

[1] 0.03873235

Figure 4: Mean of dataset

> by(VESdataSroa,list(VESdataSprovince, VESdataSown ), median)

: HaNoi : private-owned

[1] 0.0009765625

: DaNang : private-owned

[1] 0.003712735

: HoChiMinhCity : private-owned [1] 0.004456994 : HaNoÏ : state-owned [1] 0.01058501 : DaNang : State-owned [1] 0.004234574 : Hochiminhcity

: state-owned

[1] 0.02688294

Trang 6

Figure 5: Median of dataset

> by(VESdataSroa,list(VESdataSprovince, VESdataSown ), sd)

: HaNoi

: private-owned [1] 0.06516742 : DaNang : private-owned

[1] 0.04189785

: HoChiMinhCỉty : private-owned [1] 0 09088937

: HaNoi

: state-owned [1] 0.02668994 : DaNang : state-owned

[1] 0.0492133

: HoChiMinhCỉity : state-owned [1] 0.06603698

Figure 6: Standard deviation

> by(VESdataSroa,list(VESdataSprovince, VESdataSown ), summary)

: HaNoi : private-owned

-0.0755968 -0.0077573 0.0009766 0.0180345 0.0109979 0.2681564 : DaNang

: private-owned

-0.100803 -0.002577 0.003713 0.001129 0.010688 0.142857

: HochiMinhcity : private-owned

-0 246154 -0.015633 0.004457 -0.014234 0.023085 0.108000

: HaNoi : state-owned

Min 1st Qu Median Mean 3rd Qu Max

-0.040651 0.003409 0.010585 0.017649 0.027338 0.116372

Trang 7

-0.13838 0.01177 0.02688 0.03873 0.06195 0.20805

Figure 7: Summary

Each code gives the specific descriptive statistics of the outcome variable (GPA) for each treatment group with the listed Province and types of ownership (own) The final code Summary helps to find 5 basic statistics related to the roa: Minimum value, the first quantile, mean, median, the third quartile and maximum value

Lastly, we conduct the boxplot and the mean plot to get further information:

> boxplot(roa ~ province + own, data = VESdata, xlab = "Province along with types of ownership ", ylab = "Profitability", main="Box Plot", col = c("orange", "red",

"purple","yellow","black","green"))

HaNoi private owned DaNang private owned HoChiMintiCity.privale owned HaNoi siate owned DaNang siate owned = HoChiMintyCity state owned

Province along with types of ownership Figure 8: boxplot

Trang 8

The box plot up top displays data from the dataset's medians, quartiles, maximum, minimum, and outliers

Firstly, the graph shows that the distribution is squeezed, indicating that The dataset contains only a few minor variables DaNang private own has stability and Profitability values since its variance within the group is smallest, due to smallest interquartile range and marginal value range between maximum and minimum value HoChiMinhCity state owned has the highest value of all criteria, except lower quartile, which is only lower than that of HaNoi private owned Most privately owned and state owned businesses have relatively low values, however, HaNoi private owned has an outlier that has a profit greater than 0.2 which is the highest value in the dataset

The box plot also shows the deviation of the data set While the profitability (roa) of state-owned and private enterprises in Da Nang has a symmetrical distribution, those in Ho Chi Minh City are positively skewed (skewed to the right)

Next, we use Mean Plot to identify mean value of each group and compare means between groups: > install.packages("gplots")

> library(gplots)

> plotmeans(roa ~ interaction(province,own), data = VESdata, xlab = "Province and types of ownership", ylab= "Profitability", main="Mean Plot with 95% CI")

Trang 9

Province and types of ownership Figure 9: Mean Plot

There are 6 groups presented in the mean plot with a 95% confidence interval The result in this graph is the same as the above outcome generated through By () functions for means Looking at the graph, there is a distinct pattern between private and state-owned enterprises in two provinces in Ho Chi Minh City and Hanoi While HaNoi privately owned companies achieved profitability exceeding 0.04, HaNoi state-owned only reached nearly 0.03 Similarly, HoChiMinh state owned achieved the highest profitability in the dataset exceeding 0.06 while Ho Chi Minh City private-owned achieved only 0.02 As a result, there is a rather large difference between these two groups Moreover, the means of six groups are different, therefore, they are satisfied with the assumption of one-way ANOVA

Question 2: Use analysis of variance to test for any significant differences due to province Use a 05 level of significance, and for now, ignore the effect of types of ownership Check all the assumptions of the inference technique you use Are the assumptions satisfied?

Step 1: Hypotheses

Trang 10

® Ha: At least the two populations are different Step 2: Checking assumptions

Assumption 1: Independent, simple random sample

There are two categories of province: stated-owned and private owned In total, there are 180 samples, and each type of sample has 60 data It also can be seen that there is no relationship between these two factors because they are not affected by the other As a result, we can say that the samples are independent and all the observations are randomly selected

Assumption 2: All populations have the same standard deviation

Firstly, we used “by function” to see the output of the standard deviation of roa for Hanoi, Danang and Ho Chi Minh city The following code and outcome table are shown below:

> by(VESdataSroa,VESdataSprovince,sd)

VESdata$province: HaNoi [1] 0.04937189

VESdata$province: DaNang [1] 0.04538132

VESdata$province: HochiMinhcity

[1] 0.08316948

From this standard deviation result, we can see that the ratio respectively is 0.04937189 (Hanoi), 0.04538132 (Danang) and 0.08316948 (HCM City) After calculating the ratio of the largest standard deviation over the smallest one (0.08316948 / 0.04538132) equals 1.83268093568 which is smaller than 2 As a result, we have enough evidence to conclude that our second assumption is satisfied Besides, because the ratio is smaller than 2, we cannot use the Levene test because this test is only used when the largest standard over smallest standard ratio is 2-3

Assumption 3: All populations are normally distributed

10

Trang 11

In order to check all populations are normally distributed or not, we can use Q-Q plot approach with R code on the below:

Step 3: The ANOVA test

We use R studio to calculate statistics and p-value Here is R output for one-way ANOVA test: > VESdata.result <- aov(roa ~ province, data = VESdata)

> summary(VESdata.result)

DF Sum Sq Mean Sq F value Pr(>F) province 2 0.0062 0.003092 0.813 0.445 Residuals 177 0.6734 0.003805

Step4: F statistic

11

Trang 12

F = 0.813

Step 5: Level of significance The level of significance is = 0.05 Step 6: Decision rule and conclusion

We decide to use the p-value approach to make decisions Hence, we reject Ho if the p-value =< a Based on the R output, we find out p-value = 0.445 In conclusion, at p-value = 0.445 > a = 0.05 , so we do not reject Ho

Step 7: Conclusion

We have not enough evidence to conclude that there is a significant difference due to province Question 3: Use analysis of variance to test for any significant differences due to types of ownership Use a 05 level of significance, and for now, ignore the effect of province Check all the assumptions of the inference technique you use Are the assumptions satisfied? Step 1: Hypotheses

® Ho: All population means are equal (H1 = 1 = = pk) ® H,: At least two population means are different Step 2: Checking assumptions

Based on the insights and knowledge during the BES course, the One-Way Analysis of Variance is the most logical approach to solving this question in our case study Before conducting a One-Way analysis of variance, we must first check whether three assumptions met

Assumption 1: Independent, simple random samples

To check assumption 1, we chose the individuals from the sample "Types of ownership" without paying attention to the sample "Province" It leads to all the provinces "Hanoi", "Danang" and "Ho Chi Minh City" having the same chance of being selected Thus, all the samples are independent and the samples are randomly selected

12

Trang 13

Assumption 2: All populations have the same standard deviation

Firstly, we can calculate the standard deviation of roa for each type of ownership > by(VESdataSroa,VESdataSown,sd)

Assumption 3: All populations are normally distributed

We use Q-Q plots as an approach Firstly, we import dataset6.cvs data frame into R Studio and accredit it to VESdata

> VESdata <- read.table ("Datasets.csv", header=TRUE, sep = ",", stringsAsFactors = FALSE) > Str(VESdata)

The normality of the data can be confirmed graphically using a Q-Q plot A separate Q-Q plot can be created for each sample, allowing us to assess whether they are all normally distributed Alternatively, we can present the following plot to analyze the normality of the residuals To access the Q-Q plot function, install the{car}package

Trang 14

In a Q-Q plot, if the data points line up on a straight diagonal line, the data set has a normal distribution It can be seen that the points lie mostly along a straight diagonal line, with some small deviations at each tail We can safely assume that this data set is normally distributed along the graph Step 3: The ANOVA test

After all three assumptions are satisfied, we now run the ANOVA test The ANOVA command is as follows:

> VESdata.result<-aov(roa ~ province*own, data = VESdata)

Step 5: Level of significance

Sum Sq Mean Sq

0.0062 0.003092 0.0165 0.016537 0.0259 0.012956 0.6310 0.003626

F value Pr(>F) 0.853 0.4281 4.560 0.0341 * 3.573 0.0302 *

14

Trang 15

The level of significance is = 0.05 Step 6: Decision rule and conclusion

The decision rule for one-way ANOVA is that if p-value is smaller than alpha (a), the null hypothesis will be rejected We have F(4.560,1,174), so the p-value is equal to 0.0341 Because p-value = 0.0341 is smaller than a = 0.05, we decide to reject the null hypothesis

Step 7: Conclusion

To compare the effect of types of ownership on the business profitability, a one-way analysis of variance was performed With the result in step 4, we have enough evidence to conclude that there are significant differences in the profitability of businesses due to types of ownership

Question 4: At the 05 level of significance test for any significant differences due to province, types of ownership, and interaction Check all the assumptions of the inference technique you use Are the assumptions satisfied? Explain

Step 1: Hypotheses

There are 3 sets of hypotheses for two-way ANOVA in this case study:

Step 2: Checking assumptions

According to what we can get from the case study, we deduced that the two-way factor analysis of variance is the best appropriate technique to solve this problem Three assumptions are needed for two-way ANOVA:

15

Tiêu đề	Business Economics Statistic Case Study Report
Tác giả	Nguyen Thuy Duong, Ngo Dieu Hien, Tran Ngoc Lan, Be Thi Nguyet Le, Tran Cong Minh, Nguyen Truong Thuong, Yang Haryeong
Người hướng dẫn	Lai Hoai Phuong
Trường học	Hanoi University
Chuyên ngành	Business Economics
Thể loại	Case Study Report
Năm xuất bản	2021
Thành phố	Hanoi

Định dạng
Số trang	23
Dung lượng	2,5 MB