1. Trang chủ
  2. » Luận Văn - Báo Cáo

group assignment mas202 lo4 utilize common statistical packages to conduct statistical analysis

19 37 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Utilize Common Statistical Packages to Conduct Statistical Analysis
Tác giả Nguyễn Thị Quỳnh Anh, Hồ Thị Phương Thảo, Nguyễn Minh Anh, Đinh Quốc Bình, Lưu Vũ Trường Giang
Người hướng dẫn Pham Thanh Hieu
Trường học FPT University Hanoi
Chuyên ngành Statistical Analysis
Thể loại Group Assignment
Thành phố Hanoi
Định dạng
Số trang 19
Dung lượng 1,78 MB

Nội dung

1 to 3 sentencesThe main issues intended to be addressed are:+ Statistics and calculation of the average online shopping amount of students, toconclude.+ Finally, use a linear regression

Trang 1

Group Assignment - MAS202 LO4: Utilize common statistical packages to conduct statistical analysis

Group 2:

Nguyễn Thị Quỳnh Anh - HS171205

Hồ Thị Phương Thảo - HS171216

Nguyễn Minh Anh - HS170963

Đinh Quốc Bình - HS173058

Lưu Vũ Trường Giang - HS170866

Lecturers: Pham Thanh Hieu

Trang 2

TABLES OF CONTENT

PART A :

I Part I : Introduction & Methodology

a The topic

b The main issue Question for project

c Experts think about your research issues

d The two continuous variables Explain

e The population

f The samples and the sampling method

g The data

h The survey errors

II Part II: Descriptive Statistics Results

a The demographics information

b The choice of graphs

c The maximum number of graphs

PART B: Inferential Statistics.

I Part 1 :

a Research topic

b Research purpose

c Summary of your descriptive statistics

II Part 2 :

a Confidence interval

b Hypothesis testing

c Linear regression

PART C: Reference

Trang 3

Part A:

Part I: Introduction & Methodology

a What is the topic of your project? (1 short sentence)

- Statistics on average spending of current students

b What are the main issues you plan to address? What questions do you have about your project? (1 to 3 sentences)

The main issues intended to be addressed are:

+ Statistics and calculation of the average online shopping amount of students, to conclude

+ Finally, use a linear regression problem to show how the relationship between the amount of money received from parents monthly can affect students' online shopping decisions

c What do experts think about your research issues? Provide background information on the research topic with in-text reliable references (Word limit: 250 words) State your full references in the reference list at the end of the report (this is not part of the word limit)

Reference :(1),(2),(3)

Experts often have different views on the issue of spending per month for students:

- Some experts recommend that students create a monthly budget and stick to it Budgeting helps students track and control their spending, increasing their financial management ability

Trang 4

living expenses, tuition, rent, and food The impact of these expenditures needs to be carefully considered

- At the same time, experts also often recommend that students have a frugal and strict attitude towards unnecessary money, such as entertainment money, unnecessary shopping, or eating out at restaurants too often

- If possible, students should look for ways to save money such as finding shared accommodation options, using old textbooks or buying discounted books, and taking advantage of student incentives that the school or agency may offer provide

⇒ However, each student has unique circumstances and financial goals, so spending decisions should be individualized based on personal financial situation and priorities

d Identify the two continuous variables (independent and dependent) between which you would like to find the relationship Explain why you are choosing these two variables At this point, you must get approval from your lecturer on your chosen variables before proceeding further If not, you might need to redo the whole assignment

- The two continuous variables are:

+ Amount of online shopping

+ Amount received from parents

⇒ We chose these two variables because we want to calculate and conclude the relationship between time spent using social networks and the spending patterns of current students See if spending so much time on social networks will have too much of an impact on the monthly amount received from parents

e Identify the population in your research about which you'll be making inferences

- Population: All students on the campus of FPT University Hanoi

Trang 5

- Samples in the article include:

+ Amount of online shopping

+ Amount received from parents

- Sampling method: send random surveys to 52 different people

- Summary of results: Through survey and calculation, we have the following results

+ Representative of the Sample Data Description section (Amount of online shopping): We see that the lowest and highest amounts are 0 VND and 5 million respectively, the standard deviation is 1,330,284.49 VND, and the average amount is 1,364,808 VND

+ Use histograms: representation of finding the confidence interval (Amount of online shopping): We see that with 98% confidence, the population average of

"Amount of online shopping" will range from 1,735,162.49 to 0.994 454.49

g Submit the designed questionnaire that you'll be using to collect the data if you use a survey in your data-collecting step

https://docs.google.com/forms/d/1VLZL0ovB4QK1ovI6snpEEzlhqi7IQz170SvedK63WY8/e dit?fbclid=IwAR3gf95HZKcqeIFwdQu-_TCkI51UL_jsu6nEkTCJutoTeY0bm_T4bE3RZlA

h Identify the survey errors that might have occurred in your research while collecting data

- Sampling Error: This occurs when the survey sample is not representative of the target population, leading to biased or ungeneralizable results

Trang 6

or fail to respond to the survey, which may introduce bias if non-respondents differ from respondents

- Measurement Error: This error is related to problems with the survey questions or response options that lead to inaccurate or inconsistent responses It can result from ambiguous wording, biased phrasing, or lack of clarity

- Question Order Effect: The order in which questions are presented can influence respondents' answers Questions asked earlier may prime or influence responses to subsequent questions, introducing bias

- Response Bias: Respondents may actively or unconsciously provide responses that do not reflect their true opinions, due to factors such as social pressure, perceived expectations, or lack of motivation

Part II: Descriptive Statistics Results

a,b,c Present the demographic information You are free in the choice of graphs The maximum number of graphs should be three

● Amount of online shopping

Trang 7

- According to the table we have Min = 0, Max = 5,000,000, Mean = 1,364,808, Q1

=500,000, Q2 = 1,000,000, Q3 = 2,000,000, s = 1,330,284.49

Mean =𝑖=1 = 1.364.808

𝑛

∑ 𝑥𝑖

𝑛

Median (Q2)= 𝑛+12 ranked value= 52+12 ranked value = 1.000.000

Trang 8

Standard deviation = 𝑖=1 = 1.330.284,49

∑ (𝑥𝑖− 𝑥) 𝑛−1

= ranked value= ranked value = 13th =500.000

value

= ranked value= ranked value = 40thvalue =2.000.000

⇒ Amount of online shopping: Min = 0, Max = 5,000,000, Mean = 1,364,808, Q1 =500,000, Q2 = 1,000,000, Q3 = 2,000,000, s = 1,330,284.49

● Amount received from parents

Trang 9

- According to the table we have Min = 0, Max = 10,000,000, Mean =2,773,076.92, Q1 =1,500,000, Q2 =3,125,000, Q3 =3,000,000, s = 1,976,044.31

- Mean =𝑖=1 = 2.773.076,92

𝑛

∑ 𝑥𝑖

𝑛

Median (Q2)= 𝑛+12 ranked value= 52+12 ranked value = 3.000.000

Standard deviation = 𝑖=1 = 1.976.044,31

𝑛

∑ (𝑥𝑖− 𝑥)2 𝑛−1

= ranked value= ranked value = 13thvalue =1.500.000

𝑄

1

𝑛+1

4

52+1 4

= ranked value= ranked value = 40thvalue =3.000.000

⇒ Amount received from parents: Min = 0, Max = 10,000,000, Mean=2,773,076.92, Q1

=1,500,000, Q2 =3,125,000, Q3 =3,000,000, s=1,976,044, thirty first

Trang 10

PART B: Inferential Statistics (5%)

Part I :

a Research topic :

- Statistics on average spending of current students

b Research purpose :

- The project aims to show the relationship between daily expenses and finances along with the amount of money received from parents From there, each person will adjust their purchases to suit their current finances

- Furthermore, the purpose of research is also to grasp the knowledge learned and apply

it in practice, making seemingly dry knowledge more effective and beneficial

c Summary of your descriptive statistics :

● Description of sample data

- Online shopping amount: Min = 0, Max = 5,000,000, Mean = 1,364,808, Q1 = 500,000, Q2 = 1,000,000, Q3 = 2,000,000, s = 1,330,284.49

- Amount received from parents: Min = 0, Max = 10,000,000, Mean=2,773,076.92, Q1

=1,500,000, Q2 =3,125,000, Q3 =3,000,000, s=1,976,044, thirty first

● Find the confidence interval

- With 98% confidence, the population mean of "chi tiêu vào TMĐT" will be between 1807870.2845 to 921745.7155

Trang 11

between 3431214.8694 to 2114938.9706

● Hypothesis testing

- Question: Is there any evidence that the population average of "online shopping amount" is greater than 1,500,000?

- ANSWER : There is no evidence that the population average of "online shopping amounts" is greater than 1,500,000

Linear regression: There is a linear relationship

d The two variables:

- The two continuous variables are:

+ Amount of online shopping

+ Amount received from parents

⇒ We choose these two variables because we want to calculate and conclude the relationship between time spent using social networks and spending patterns of current students See if spending so much time on social networks will have too much of an impact on the monthly amount received from parents

Part II:

1 Confidence interval:

a Amount of online shopping

Spending on e-commerce

Trang 12

Standard Error

Mode

Sample Variance

Kurtosis

Skewness

Sum

Confidence

Level(98,0%) (E)

- Have: Mean = 1364808, sample standard deviation = 1330284.49, n=52, problem format for calculating CI for the mean with unknownσ

=> Apply formula

= 1364808 ± E

With E = t_(0.01;51) *1330284.49/ 52= 443062.2845

- Conclusion: 98% CI for the population mean of "spending on e-commerce"

Trang 13

1807870.2845 to 921745.7155

b Amount received from parents:

The amount of money provided by parents

Standard Error

Mode

Sample Variance

Kurtosis

Skewness

Sum

Confidence

- Have: Mean = 2773076.92, sample standard deviation = 1976044.31, n = 52, problem form of calculating CI for the mean unknownσ

Trang 14

With E = t_0.01;51* 1976044.31/ 52= 658137.9494

- Conclusion: 98% CI for the population mean of "amount of parental support"

x ± E = 2773076.92 ± 658137.9494

With 98% confidence, the population mean of "amount of parental support" will be between 3431214.8694 to 2114938.9706

2 Hypothesis testing:

Spending on e-commerce

Standard Error

Mode

Sample Variance

Kurtosis

Skewness

Trang 15

Confidence Level(98,0%)

- We have: Mean = 1,364,808, s = 1,330,284.49, n=52

Question: Is there any evidence that the population average of "online shopping amount" is greater than 1,500,000?

- Identify the problem: Test for the mean unknownσ

- Determine H0, H1:

+ H_0: μ=1,500,000

+ H_1: μ>1,500,000

- t_critical value =tα; 𝑑𝑓 = 𝑡0 02; 51 = -2.108

- We have the formula:

=1 364 808 -1.000.000/1 330 284, 49/ 52 = 1.978

- Conclude:t_STAT =1.978, t_critical value =-2.108

Because Tstart is closer to zero than t critical, H0 cannot be rejected

=> There is no evidence that the population's average "online shopping amount" is

Trang 16

3 Linear regression:

a State two variables X and Y

● X=Money given by parents

● Y=Spending on e-commerce

b Sample data needed for calculation (Calculate with Excel)

- From the data when analyzing the data, we can calculate the values of the regression equation The group formulas used and calculated in Excel yield the following results:

c Calculate by hand (general calculation)

- Formulas used to calculate the data The formulas stretch throughout the course of the MAS202 subject :

Trang 17

d Regression

- After calculating the necessary data, we have the following regression equation:

e Scatter Plot

- Scatter Plot is a powerful tool for analyzing the relationship between two variables in

a data set

Trang 18

between "amount of money spent on online shopping" and "amount of money provided by parents" there is a linear correlation between the two variables

+ The data points stretch around a straight line, and between "amount of online shopping" and "amount of parent support" there is a strong correlation between the two variables

⇒ From here, it can be seen that there is a linear relationship between "amount of money spent on online shopping" and "amount of money provided by parents"

f Test for slope coefficient

⇒ Conclusion: There is a linear relationship between two variables X and Y That is, Reject H0

C Reference:

(1) : dantri.com Information."Fallen" with the current level of student shopping spending Available :

https://dantri.com.vn/nhip-song-tre/nga-ngua-voi-muc-chi-tieu-mua-sam-cua-sinh-vie n-hien-nay-20221120090234439.htm

Trang 19

(3) : laodong.vn Information.How much do students spend each month?

Available:https://laodong.vn/y-kien-ban-doc/sinh-vien-chi-tieu-moi-thang-bao-nhieu-l a-du-1076796.ldo

Ngày đăng: 08/05/2024, 12:46

TỪ KHÓA LIÊN QUAN