business and economics statistics case study report

LIST OF FIGURES Figure 1: Mean plot of total assets with 95% CL Figure 2: Box plot of total assets Figure 3: Histograms of firms’ total assets based on province Figure 4: Box plot of fir

Trang 1

CASE STUDY REPORT

Tutor: Mrs Tran Thi Thu Hién Tutorial 4 - Group 5

Nguyễn Thị Hải Anh 2004040009 Nguyễn Minh Hằng 1904040038 Nguyễn Nghiêm Hưng 1904000122

Nguyễn Thị Bích Phượng 2004040090 Ngô Hồng Giang 2004050015

Hanoi, November 7th, 2023

Trang 2

TABLE OF CONTENTS A Scenario

A Answering the questions

Question 1 Question 2 Question 3 Question 4 Question 5

REFERENCE

Trang 3

LIST OF FIGURES

Figure 1: Mean plot of total assets with 95% CL Figure 2: Box plot of total assets

Figure 3: Histograms of firms’ total assets based on province

Figure 4: Box plot of firms’ total assets based on province and type of ownership Figure 5: Interaction plot between the effects of province and ownership on total assets Figure 6: Added-Variable Plots of quantity sold, quantity product vs total assets

11 14

Trang 4

A Scenario

The database of The Viet Nam Small and Medium Enterprises (SME) is an important source of data for any scholars doing research on the Vietnam economy and its micro dynamics In 2015, the survey was carried out with a sample size of over 2500 enterprises from nine provinces across the country (Viet Nam SME database, 2015)

The survey instrument consists of three modules: (1) a main enterprise questionnaire for owners or managers; (11) an employee questionnaire administered to a random subset of employees in a quarter of randomly selected enterprises; and (111) an economic accounts

module

In the survey, business were asked to

e Specify address of firm: Hanoi, Haiphong, TP HCM (province) @ Ownership status: One owner, Multiple owners (own)

® Quantity produced for the most important product (n revenue terms) (quantity product)

® Quantity sold base one quantity produced for the most important product (quantitysold)

e Total assets in 2014 (end-year) (million VND) (in market value) (totalass)

A — Answering the questions

Question I: Produce descriptive statistics to summarize the data You are expected to generate as many relevant descriptive statistics as possible using ALL the relevant tools introduced in the labs of this course Remember to provide appropriate interpretations for the descriptive statistics Try not to include unnecessary or irrelevant descriptive statistics

We describe statistics using RStudio First, we import the Excel file "Dataset5.csv" into R for further analysis:

> DatasetS <-read.table("Dataset5.csv", header= TRUE, sep =",", quote ="/",

stringsAsFactors = TRUE)

Trang 5

There are 300 observations in this case study, therefore, we should see some first

observations to have better knowledge related to this data using Head () function in R :

From the previous output, we can conclude that there are 300 observations with 5 variables:

province, own , quantity product, quantity sold, totalass After that, we use the s#r() function again to obtain the structure of the data

> str(DatasetsS)

Nextly, we use summary () function to give our knowledge about length, class, mode, max,

min, median, mean of Datasets

Trang 6

province own quantityproduct quantitysold

Following that, we use code to show Mean Plot to determine the mean value as well as the

mean comparison between treatment groups:

> install.packages("gplots") > library(gplots)

> plotmeans(totalass~ interaction(Dataset5 $own,Dataset5$province), data = Dataset5, xlab = "Ownership and Province",

ylab = "Totalass", main="Mean Plot + with 95% CI")

Mean Plot + with 95% Cl

Ownership and Province

Figure 1:Mean plot of total assets with 95% CT

The mean plot is used to identify the mean value of each group and compare means between groups In the mean plot with a 95 percent confidence interval, there are six groups with six different means.As you can see in the chart, in both categories one-owner and multi-owner, TP HCM consistently has the highest mean total asset values compared to Haiphong and

Hanoi

Trang 7

Nextly, we use code to do the boxplot to examine the findings more closely:

> boxplot(Dataset5$totalass~interaction(Dataset5$own,Dataset5$province), Dataset5= data,xlab="Province and Type of ownership",ylab="Totalass",col = c("red",

Province and Type of ownership

Figure 2: Box plot of total assets

The figure displays the dataset's minimum and maximum values, medians, quartiles, and outliers of total assets categorized by provinces and ownership The Multi-owner companies of Ho Chi Minh City are among those that have the largest total assets, whereas the sole- owner firms of Hai Phong obtain the least The box plot also indicates that total asset observations for all groups are right-skewed Outliers are also discussed in the graph, the One-owner Hai Phong and the one-owner Ha Noi groups have the most outliers compared to the other groups

Question 2: Use analysis of variance to test for any significant differences due to province Use a 05 level of significance, and for now, ignore the effect of types of ownership, quantity produced and quantity sold Check all the assumptions of the inference technique you use

In this question, we conduct an analysis of variance for one variable-province Therefore, we use a one-way ANOVA test and it is necessary to test all the assumptions of this interference system before showing our test with the aim of ensuring that our results are valid We have three assumptions for the one-way ANOVA test

e The samples are independent and selected by selecting simple random sampling (1)

Trang 8

e The population is normally distributed (2) e All population standard deviations are equal (3)

Firstly, we check assumption | whether the samples are independent and simple random or not The data was collected from observations of random enterprises from three provinces as Hanoi, Haiphong, and TP HCM and have no relations to one another Therefore, the samples are independent

Secondly, we check assumption 2 whether they are normally distributed or not We use the histograms:

Trang 9

We can see that the distributions of populations are not symmetrical, but rather right-skewed Therefore, we infer that the assumption of normal population distribution is not satisfied

Finally, we check assumption 3 by calculating the ratio of the highest standard deviation to the lowest standard deviation and comparing it to 2

> by (dataset5 $totalass, dataset5$.province, sd)

OUTPUT

> by Cdataset5$totalass, dataset5$x province, sd)

dataset5$X.province: Haiphong [1] 20036.26

dataset5$xX.province: Hanoi

[1] 7819.229

dataset5$X.province: TP HCM

[1] 56633.3 >

largest standard deviation/smallest standard deviation = 56633.3/7819.229 = 7.242824 > 56633.3/7819 229

[1] 7.242824

The ratio is approximately equal to 7.24, which is significantly larger than 2 The populations

have different variances or different standard deviations

As 2 out of 3 ANOVA assumptions are not met, we will run the Kruskal-Wallis test instead

of the ANOVA test We will perform the Kruskal-Wallis test as a way of testing for difference in abnormally distributed populations

1 Hypotheses: Ho: All population distributions are identical

Ha: Some populations are significantly different than others 2, Assumptions:

e The objective is to compare 3 populations based on provinces e The samples are independent, simple random samples e The data are quantitative but not normally distributed

The assumptions are proven above so we carry on the Kruskal-Wallis test

> kruskal.test(totalass~province,data=dataset5 )

Trang 10

> kruskal test(totalass~province, data=datasets) Kruskal-wallis rank sum test data: totalass by province

Kruskal-wallis chi-squared = 15.983, df = 2, p-value = 0.0003384

3 Test statistic: H = 15.983 (round 3dp) 4 Level of significance: a = 0.05 5 Decision rule: Reject Ho if p < 0.05

As seen in the Kruskal-Wallis test R-output , p-value = 0.0003 < 0.05 so we reject Ho, the null hypothesis

6 Conclusion:

There is enough evidence to conclude that at 95% confidence level to support Ha, the values are systematically higher in some populations than in others This means there is a difference due to the province

Question 3 At the 05 level of significance test for any significant differences due to province, types of ownership, and interaction (ignore the effect of quantity produced and quantity sold) Check all the assumptions of the inference technique you use Are the assumptions satisfied? Explain Draw an interaction plot and interpret the plot Is the plot

consistent with the conclusions?

In order to assess the effect of independent variables (factors) on one dependent variable and the interaction between them, the Two-way Factorial Analysis of Variance is the most suitable technique to solve this problem

Trang 11

Ha: There is interaction between province and types of ownership 2 Check Assumptions

e Samples are independent, simple random samples of size ny from each of k (=a*b) populations

e All populations are normally distributed e All populations have the same standard deviation

First, we check independent samples Term and notation for two-way ANOVA using output of cross tabulation table between province and own variables would give you the sample size for each stratum:

Next, we are going to check the assumption of equal standard deviations by looking at the standard deviation output of the “by” function in R for all the six groups

> by(dataset5 $totalass, list(dataset5$province,dataset5 $own),sd)

: Haiphong

: Multi-owner [1] 26O67.86

: Hanoi ¡ MuTti-owner

[11 6936.263

: TP HCM =: Multi-owner [11 78797.02 >: Haiphong

Trang 12

it shows that all populations do not have the same standard deviation Therefore, the second assumption 1s not satisfied

Finally, we use the boxplot to check the normality of populations:

> boxplot(totalass~interaction(own,province),data=dataset5,xlab="Province and Type of

tft ot ft, it ft

ownership",ylab="Totalassets",col=c("red","purple","orange","yellow","beige","maroon"),yli m=c(0,500000))

Province and Type of ownership

Figure 4: Box plot of firms’ total assets based on province and type of ownership The figures suggest that the distributions are not normally distributed The median lines indicate there is no symmetry Most of the samples are right-skewed with the exception of the one-owner firms based in TP HCM Consequently, the populations’ distributions are not normal Therefore, this assumption of normality is not met

Because of the failed assumptions, we opt for a different method to conduct analyses of variance on factorial models when the assumptions of traditional parametric ANOVA, such as normality and homoscedasticity, are not met The method is Aligned Rank Transform for Nonparametric Factorial ANOVAs It transforms our data and runs a series of ANOVAs with

those transformed data to fit with the traditional ANOVA model (Wobbrock, J.O et al

2011) The ART technique will provide accurate nonparametric treatment for both main and

interaction effects

Firstly, we install the package (ARTool): > install._packages("ARTool") > library(ARTool)

Trang 13

Secondly, we transform the data using the aligned rank transform (ART) > totalass_art <- art(totalass~own*province,data=dataset5 )

And we verify that the ART procedure was correctly applied and is appropriate for this dataset as followed

ANOVA Two-way test

Signif codes: 0 ‘***' 0.001 ‘**’ 0.01 “*' 0.05 “.' OL‘ 7 1

Test statistic

F ownership = 34.6064 F province = 9.8526

F own:province = 6.6387

3 Decision Rule and Conclusion

We have: Reject Ho if p-value < 0 (0 =0.05) Interaction effect of types of ownership - province

Trang 14

p-value = 0.0015 < 0.05 Therefore, we reject Ho There is enough evidence to conclude that there is an interaction effect between provinces and types of ownership on total asset We represent this relationship in the graph below

> interaction plot(dataset5$province,dataset5 $own,dataset5 $totalass, type="b",

col=c("blue","red"), pch=c(16, 18),main="Interaction between Province and Ownership") Interaction between Province and Ownership

a The credibility of the interpretations and conclusions

For question 2, ANOVA tests have 2 assumption violations, so we replaced the ANOVA test with Kruskal-Wallis test which is suitable for this case study The conclusion of rejecting Ho is reliable since the nonparametric test adapts to the assumptions we have Regarding question 3, we conducted the Aligned Rank Transformation on our data so that they can be applied to the two-way ANOVA model According to researchers from the University of

Trang 15

Washington, The Aligned Rank Transform (ART) procedure was devised to analyze multi- factor nonparametric designs The conclusions align with our visual depiction of the dataset and corroborate the relationship between factors

b Limitations of the case

The case has several drawbacks Firstly, the population distributions of numerical variables (quantityproduct, quantitysold, totalass) are non-normal They are heavily right-skewed, violating the ANOVA test normality assumption Secondly, the population variances are not equal Consequently, we cannot perform ANOVA tests directly and we pivot to alternative methods such as Kruskal-Wallis test and Aligned Rank Transformation which have drawbacks of their own Kruskal-Wallis test has lower statistical power than other parametric

tests because it overlooks the distribution assumption so the result is, while valid, may not

provide interpretations as convincing Our dataset exhibits extreme skewness and the ART method reduces that skew which may be undesirable if the distributions are meaningful to the

case study (Wobbrock, J.O et al 2011) All in all, we cannot say with 100% confidence

whether the samples effectively represent the populations of Vietnam Small and Medium Enterprises at the time

Question 5: Based on your dataset, make your own problem using simple/multiple linear regression Interpret the output

Multiple linear regression is useful for modeling the relationship between numeric outcome or dependent variables (Y) multiplier explanatory or independent variables (X) In a balance sheet, total assets are calculated as the sum of all short-term, long-term, and other assets These include cash, inventory Therefore, we surmise that the number of goods produced and sold may have some influence on the total assets In this case, quantity sold and quantity product are independent whereas total asset is the dependent variable

#fit model using quantitysold and quantityproduct as X-variables

> multiple.regression <- lm(totalass ~ quantitysold + quantityproduct, data=dataset5) > summary(multiple.regression)

Tiêu đề	Business and Economics Statistics Case Study Report
Tác giả	Nguyễn Thị Hải Anh, Nguyễn Minh Hằng, Nguyễn Nghiêm Hưng, Trần Mỹ Hà, Nguyễn Thị Bích Phượng, Ngô Hồng Giang
Người hướng dẫn	Mrs. Tran Thi Thu Hién
Trường học	Hanoi University
Chuyên ngành	Business and Economics Statistics
Thể loại	Case Study Report
Năm xuất bản	2023
Thành phố	Hanoi

Định dạng
Số trang	18
Dung lượng	1,29 MB