1. Trang chủ
  2. » Luận Văn - Báo Cáo

business statistics mid term exam

17 0 0
Tài liệu được quét OCR, nội dung có thể không chính xác
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Business Statistics Mid-Term Exam
Tác giả Bùi Thùy Dương, Vương Khánh Linh, Sái Thùy Linh, Lê Anh Minh, Nguyễn Thanh Khánh Ngọc
Người hướng dẫn Assoc.Prof. Tran Thi Bich
Trường học Ha Noi
Chuyên ngành Advanced Finance
Thể loại exam
Năm xuất bản 2024
Thành phố Ha Noi
Định dạng
Số trang 17
Dung lượng 1,57 MB

Nội dung

PART I: QUESTIONS Question 1 The variable incomdol in gss.sav file is calculated as the class-midpoint of the income range for each value of income family income.. Based on the frequency

Trang 1

NATIONAL ECONOMICS UNIVERSITY

BUSINESS STATISTICS

MID-TERM EXAM

CLASS: Advanced Finance 64B

LECTURER: Assoc.Prof Tran Thi Bich

Trang 2

PART I: QUESTIONS Question 1

The variable incomdol in gss.sav file is calculated as the class-midpoint of the income range for each value of income (family income)

1 Propose how this variable is calculated? Illustrate your answer by calculating the classmidpoint of one category from the variable income

2 Compute summary statistics of this variable and make a histogram as well What kind of distribution do you get? Explain why do you have that kind of distribution

3 Based on the outputs in (2), indicate below what value of income do 25% of respondents fall? Above what value do 25% of respondents fall?

4, Repeat (2) but in the Frequency procedure, you select Values are group midpoints Indicate the change you notice

5 Do you think you know the exact income of family in your sample? Explain your answer Question 2_

In the gss.sav file, the variable tvhours tells you how many hours per day GSS respondents say they watch TV

1 Make a frequency table of the hours of television watched Do any of the values strike you

as strange? Explain

2 Based on the frequency table, answer the following questions: Of the people who answered the question, what percentage don’t watch any television? What percentage watch two hours

or less? Five hours or more? Of the people who watch TV, what percentage watch one hour? What percentage watch four hours or less?

3 From the frequency table, estimate the 25th, 50th, 75th, 95th percentiles What is the value for the Median, Mode?

4, Make a bar chart of the hours of TV watched What problem do you see with this display?

5 Make a histogram of the hours of TV watched What causes all of the values to be clumped together? Compare this histogram to the bar chart you generated in question 2d Which is a better display for these data?

Question 3_

Find a data set which is related to a specific organisational problem (either at the macro or micro level) and apply all possible descriptive statistical techniques that you think suitable to the problem Write a short report, which includes the objectives of your analysis, the research questions, the analytical techniques you apply to address to the research questions and your findings The maximum length of the report is 5 pages including Tables and Figures

Trang 3

Question 1;

1.1

The formula to calculate the class-midpoint is: ClassMidpoint — (Lower class limit + Upper class limit)/2

PART Hl: ANSWERS

In this case, the answer is: Midpoint = (1000 + 2999)/2 = 1999.5

1.2,

Statistics Family income; ranges recoded to midpoints

N Valid 1230

Missing 189

Median 32500.00

Std Deviation 30091.997

Variance 905528303.176

Std Error of Skewness 070

Range 109500

Family income; ranges recoded to midpoints

Frequency} Percent | Valid Cumulative

Percent Percent

500 17 1.2 1.4 1.4

2000 17 1.2 1.4 2.8

6500 19 1.3 1.5 6.7 Valid 7500 17 1.2 1.4 8.0

9000 40 2.8 3.3 11.3

11250 58 41 4.7 16.0

13750 56 3.9 4.6 20.6

16250 50 3.5 4.1 24.6

18750 54 3.8 4.4 29.0

21250 42 3.0 3.4 32.4

23750 59 42 48 37.2

Trang 4

27500 79 5.6 6.4 43.7

32500 86 6.1 7.0 50.7

37500 82 5.8 6.7 57.3

45000 119 8.4 97 67.0

55000 108 76 8.8 75.8

67500 111 7.8 90 84.8

82500 66 47 5.4 90.2

100000 45 3.2 3.7 93.8

110000 76 5.4 6.2 100.0 Total 1230 86.7 100.0

don't 65 4.6 Missing know/NA refused 124 8.7

Total 189 13.3 Total 1419 100.0

Histogram

Mean = 16.13

Std Dev = 5.636 N=1,354

TOTAL FAMILY INCOME FOR LAST YEAR

A left-skewed, or negatively skewed, distribution occurs when high income levels increase at

a slower rate than lower income values This is visually represented in a boxplot, where the longer tail of the distribution extends towards the lower income values, indicating the skewed

Trang 5

nature of the data In this type of distribution, the peak is on the right side, signifying the

concentration of observations at lower income levels Overall, a left-skewed distribution

reflects the slower increase in high income levels compared to lower income values, highlighted through the visualization of the boxplot

1.3

Statistics Family income; ranges recorded to midpoints

N Valid 1230

Missing | 189

- 25 18750.00

Percentiles

75 55000.00

Based on the findings from the analysis mentioned in (2), it is evident that 18,750 represents

a salary that is 25% higher than the income of the survey respondent, while 55,000 signifies a salary that is 25% lower than the income of the survey respondent

1.4,

Statistics

Family income; ranges recoded to midpoints

Valid 1230

N

Missing 189

Mean 41840.85

Median 34583.33°

Mode 45000

Std Deviation 30091.997

Variance 905528303.176

Std Error of Skewness 070

25 17668.27°

Percentiles 50 34583.33

75 60079.91

a Calculated from grouped data

b Percentiles are calculated from grouped

Statistics

Family income; ranges recoded to midpoints

N Valid 1230

Missing 188

Mean 41840.85 Median 32500.00 Mode 45000

Std Deviation 30091.997

Variance 905528303.2

Std Error of Skewness 070

Percentiles 25 18750.00

50 32500.00 r5 55000.00

After selecting “Values are group midpoints” in the Frequency procedure, it can be clearly

seen that there are changes in the value of Median and Percentiles

Trang 6

1.5

The exact income of the family in the sample is not known The dataset does not include specific details about the family's income and the data only give information about the midpoint of the family income It can not represent individual income of each member in the family but can only represent ranges or an average income of the whole family Therefore, without additional information, the exact income of the family cannot be determined using the provided dataset

Question 2

2.1

Statistics Hours per day watching

TV

Valid 4906

Missing [513

N

Hours per day watching TV

Frequency} Percent | Valid Cumulative

Percent Percent

0 54 3.8 6.0 6.0

1 189 13.3 20.9 26.8

2 238 16.8 26.3 53.1

3 159 11.2 17.5 70.6

4 115 8.1 12.7 83.3

5 54 3.8 6.0 89.3

6 30 2.1 3.3 92.6

Valid 8 22 1.6 2.4 96.1

10 13 9 1.4 97.6

12 13 9 1.4 99.3

24 1 wl wl 100.0 Total J906 63.8 100.0

NAP 486 34.2 Missing NA 27 19

Total ]513 36.2

Trang 7

} Total {1419 {100.0 | | |

As can be seen from the frequency table below, the figure that stands out to me is 12 - which corresponds to the 12 hours a day spent watching television The number of individuals who watch TV from 0 to 10 hours is pretty high, but the data begin to fall precipitously after variable 11 However, only variable 12 is unusually higher in this set of variables ranging

from 11 to 24, which strikes us as unusual

Most people watch under 6 hours per day with a cumulative percentage of 92.6

It is strange to see that for the value 24 hours, there is still an observation since this is

abnormal biological behaviour

2.2

Base on the above frequency table, we can infer these following data (using valid percent column)

Of the people who answered the question:

6% of the people don’t watch any televisions

o 53.1% of the people watched TV for two hours or less

o 16.6% of the people watched TV for five hours or more

© 83.4%, which is the total valid percent of whom watching TV from 0 to 4 hours (6% + 20,9% + 26,3% + 17,5% + 12,7% = 83,4%

Of the people who watch TV (which means the values of variable 0 is excluded):

o 20.9% watch TV for one hour

Oo 82.27% watch TV for four hours or less

2.3

Statistics Hours per day watching TV

Valid 906

N i

Missing }513

Median 2.00

Mode 2

25 1.00

- 50 2.00

Percentiles

75 4.00

95 8.00

In a data distribution, a percentile is the number below which a specified proportion of values falls In SPSS, there are several methods for calculating percentiles, as well as several equations Our group will compute the 25th, 50th, 75th, and 95th percentiles for the variable

Trang 8

TV hours We'll select the Frequencies option, which uses a weighted average algorithm to determine percentiles (as displayed in the SPSS data view above)

As can be seen from the results which appear in the SPSS output view:

The value for 25th percentile is 1.00

The value for 50th percentile is 2.00

The value for 75th percentile is 4.00

The value for 95th percentile is 8.00

The values for Median 2.00 hours

The values for Mode is 2 hours

2.4,

250-4

200-4

1507

100~

S0~

o 1 2 3 4 5 6 7 8 10 11 12 14 15 20 24

Hours per day watching TV

There are a few problems with this bar chart:

o A few outliers with low frequencies exist, however they can be used in a huge number

of discrete values

o Some values (9, 13, 16, 17, 18, 19, 21, 22, 23) are missing since they do not occur in

the survey answer The bar chart below lacks a gap that reflects these uncollected data (missing values), which might lead to misinterpretation for readers at first look

© The real form of the distribution is difficult to determine because there are low frequencies indicated for higher classes

Trang 9

2.5

Histogram

Std Dev = 2.56 N= 906

20074

= nn ao i

100

5054

5 10 15 20 25

Hours per day watching TV

The values are clumped together because most of the respondents watch tv equal to or less than 8 hours a day (96.1%), which is biologically normal Therefore, the statistics skew to the right (positive) and make a group on the right of the histogram

Histogram is better to display this statistic compared to bar charts It is because histogram can display the categories with 0 responses Therefore, histogram can give readers a clearer overview of the distribution of statistics full of its possible outcomes

Trang 10

Question 3

Title:

1 INTRODUCTION

In today's fiercely competitive business environment, gaining insights into customer perceptions and purchase outcomes holds immense significance for enterprises, especially those operating in the B2B sector

Our research will mainly focus on analysing three elements: geography, customer segment, revenue in correspondence with Therefore, our study objective is to present the influence of customer and region on customers’ decision to purchase products

The current study aims to fill the gap in the literature by addressing at some specific

objectives:

1 Examining the relationship between different customer type and their habit purchasing our products base on value ‘revenue’

2 Examining the relationship between geographical regions in the US and customer purchasing power toward different products

3 Recommending specific strategic plan for the company in expanding their market share regarding customer type and geographical region

2, ANALYTICAL TECHNIQUES & DESCRIPTIVE STATISTICS

2.1 What is the relationship between customer type and line total ?

To estimate the relationship between a set of data, we use frequency table and custom table

Customer Type

Cumulative Frequency | Percent | Valid Percent Percent Valid 8 2 2 2

Club 589 11.8 11.8 11.9 Distributor 908 18.1 18.1 30.1 Export 274 5.5 5.5 35.5 Online 874 17.5 17.5 53.0 Wholesale 2354 47.0 47.0 100.0 Total 5007 100.0 100.0

There is a correlation between a line total- dependent variable (that’s the variable or outcome you want to measure or predict) and customer type- independent variables (factors which may have an impact on the dependent variable) The reason this is the factor of our choosing 1s that we believe that customer type has a direct correlation with line total

There are four customer types: Club, Distributor, Export, and Wholesale Wholesale

customers seem to have the most transactions across all line total ranges There are more

Trang 11

wholesale transactions at every line total point compared to the other customer types and nearly half the percentage is about 47%, while export is not as common

* Custom Tables

[DataSer1]

Customer Type

Club Distributor Export I Online Wholesale

LineTotal | | 1410.2203 | 830619.7381 | 13679475 | 1242036313 | 13609597 | 3729029508 | 1359.8465 | 1188505854 | 14052199 | 3307887.628

Our research mainly analyses the market share of each customer group through the average value (Mean) of total revenue, and the total revenue (Sum) of each customer file

Looking at total revenue, we can see the proportion of 4 types of customers The table above shows that this company focuses on Distributor and Wholesaler customers, because these two types of customers have the largest market share Explanation: h

yx

=1 '

Arithmetic Mean: p = ——

Line total of Club has the highest Mean, lowest Sum => The number sold is the lowest Line total of Distributor, Export, Online, Wholesale has a similar Mean (1350 - 1411), but

Distributor and Wholesale have higher Sum Therefore, the sales volume of these two types

of customers 1s the highest, and has the largest revenue

From the table, the mean of revenue from different customer types are not so different However, the sum has a big gap between each other The reason is that the number of customers of different types are varied Therefore, when the mean multiply by the total

number, the sum will be effected

In conclusion, the wholesales bring the most revenue for the company But, it is just because the company has a large number of orders from wholesale In average, all customer types can bring almost the same revenue to the company

10

Ngày đăng: 12/08/2024, 14:35