1. Trang chủ
  2. » Luận Văn - Báo Cáo

mid term examnination business statistic in the gss sav file the variable tvhours tells you how many hours per day gss respondents say they watch tv

18 1 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Mid-Term Examination Business Statistic
Tác giả Mai Le Chau Anh, Nguyen Quynh Anh, Nguyen Ba Gia Bach, Nguyen Ngan Ha, Dinh Bao Ngoc, Ta Thi Minh Thu
Người hướng dẫn Tran Thi Bich, Assoc.Prof.
Trường học National Economics University
Chuyên ngành Business Statistic
Thể loại Exam
Định dạng
Số trang 18
Dung lượng 1,99 MB

Nội dung

This dataset is POSITIVELY SKEWED since most values are clustered around the left tail of the distribution while the right tail of the distribution is longer, all of the values in the hi

Trang 1

NATIONAL ECONOMICS UNIVERSITY

-*** -

MID-TERM EXAMNINATION BUSINESS STATISTIC

Group 4: 11210447 Mai Le Chau Anh

11215591 Nguyen Quynh Anh

11210899 Nguyen Ba Gia Bach

11211907 Nguyen Ngan Ha

11214285 Dinh Bao Ngoc

11215555 Ta Thi Minh Thu

Class: Advanced International Business Administration 63B

Lecturor: Assoc.Prof Tran Thi Bich

Trang 2

TABLE OF CONTENTS

PART I: QUESTIONS 2

PART II: ANSWERS 3

Question 1 3

1.1 3

1.2 5

1.3 6

1.4 7

Question 2: 8

2.1 8

2.2 9

2.3 9

2.4 .10

2.5 .11

Question 3: 12

I INTRODUCTION 12

II DATA DESCRIPTION 12

III DESCRIPTIVE STATISTICS 12

IV.ANALYTICAL TECHNIQUES 16

V FINDINGS AND INSIGHTS 16

VI CONCLUSION 17

Trang 3

PART I: QUESTIONS Question 1: Consider the variable income in gss.sav file (the variable is total family income in the year before the survey)

1 Make a frequency table for the variable Does the frequency table make sense? Does it make sense to make a histogram of the variable? A bar chart?

2 What is the scale of measurement for the variabl ẻ

3 What descriptive statistics are appropriate for describing this variable and why? Does it make sense to compute a mean?

4 Discuss the advantages and disadvantages of recording income in the manner Describe other ways of recording income and the problem associated with each of them

Question 2: In the gss.sav file, the variable tvhours tells you how many hours per day GSS respondents say they watch TV

1 Make a frequency table of the hours of television watched Do any of the values strike you as strange? Explain

2 Based on the frequency table, answer the following questions: Of the people who answered the question, what percentage don’t watch any television? What percentage watch two hours or less? Five hours or more? Of the people who watch TV, what percentage watch one hour? What percentage watch four hours or less?

3 From the frequency table, estimate the 25th, 50th, 75th, 95th percentiles What is the value for the Median, Mode?

4 Make a bar chart of the hours of TV watched What problem do you see with this display?

5 Make a histogram of the hours of TV watched What causes all of the values to be clumped together? Compare this histogram to the bar chart you generated in question 2d Which is a better display for these data?

Question 3 :

Find a data set which is related to a specific organizational problem (either at the macro or micro level) and apply all possible descriptive statistical techniques that you think suitable to the problem Write a short report, which includes the objectives of your analysis, the research questions and your findings The maximum length of the report is 5 pages including Tables and Figures

Trang 4

PART II: ANSWERS

Question 1

1.1

=> This frequency table makes sense because this frequency table is a Grouped Frequency Table, and there are so many values in income data, we need a frequency table to accurately describe pay groups Furthermore, all of the frequency, percentage, and cumulative percentages reflect the family income category in general

Trang 5

A histogram is the most common graph used to display frequency distributions The intervals

on the histogram's X-axis represent the scale of values within which the measurements fall, while the Y-axis represents the number of times the values occurred inside the intervals While

an equal width histogram of the income variable is achievable since family income is normally continuous data with the same class intervals, it is not recommended and so does not make sense

in this circumstance The missing values column is not expressed clearly since it is contained within the non-missing values column, which might lead to many misconceptions As a result, using the histogram for this variable makes no sense

BAR CHART

A bar chart is a feasible alternative when displaying a distribution of data points or comparing metric values We can observe which group has the greatest number or how they compare to other groups using the bar chart Despite the multiple numbers on the X-axis, we can readily detect the trend and make a conclusion from this bar chart So, in this case using a bar chart makes sense

Trang 6

1.2

This variable income has an ordinal scale of measurement since it has been separated into various categories that are not measured but only labeled Furthermore, the scale of measurement for revenue in the gss.sav file is ordinal

The yellow row is the variable “income”

Trang 7

1.3

- There are 4 types of descriptive statistics:

Measures of

Frequency

Measures of Central Tendency

Measures of Dispersion

or Variation

Measures of Position

- Count, Percent,

Frequency

- Displays how

frequently something

occurs

- Use this to display

how frequently a

response is delivered

- Mean, Median, and Mode

- Locates the distribution by various points

- Use this when you want to show how an average or most commonly indicated response

- Range, Variance, Standard Deviation

- Identifies the spread

of scores by stating intervals

- Range = High/Low points

- Variance or Standard Deviation =difference between observed score and mean

- Use this when you want to show how

"spread out" the data are It is helpful to know when your data are so spread out that it affects the mean

- Percentile Ranks, Quartile Ranks

- Describes how scores fall in relation

to one another Relies on standardized scores

- Use this when you need to compare scores to a normalized score (e.g., a national norm)

=> The Measure of Central Tendency (Median and Mode) is most suited for defining this variable since it is closer to our goal of determining the most often reported response

Trang 8

Median and Mode rather than Mean:

● We have no information on the values in this range (less than $1,000 and more than

$110,000) In this range, the severe courses are available

● We can observe that the histogram (drawn in part 1.1) is strongly skewed to the right when we draw it and the Coefficient of Skewness is smaller than 0 (negative) As a result, for skewed distributions, the mean is a poor descriptive statistic

● Because this frequency table has 65 missing values, using Mean may produce erroneous results

1.4

Advantages:

● It may help in determining the form and spread of income distribution

● It may help in determining the most common or usual number or range, known as the mode

● It might be useful for comparing income data across several categories, such as gender, age, or occupation

● It can assist in identifying outliers or extreme income numbers that are much higher or lower than the rest of the data

Disadvantages:

● It may overlook some information concerning actual income figures, particularly if they are arranged into ranges or intervals

● Comprehending complex or huge income data sets with multiple values or ranges might

be challenging

Trang 9

Question 2 :

2.1 As can be seen from the frequency table below, the figure that stands out to me is 12 - which corresponds to the 12 hours a day spent watching television The number of individuals who watch TV from 0 to 10 hours is pretty high, but the data begin to fall precipitously after the variable 11 However, only variable 12 is unusually higher in this set of variables ranging from 11 to 24, which strikes us as unusual

Hours per day watching TV

Frequency Percent Valid Percent

Cumulative Percent Valid 0 54 3.8 6.0 6.0

1 189 13.3 20.9 26.8

2 238 16.8 26.3 53.1

3 159 11.2 17.5 70.6

4 115 8.1 12.7 83.3

5 54 3.8 6.0 89.3

6 30 2.1 3.3 92.6

7 10 7 1.1 93.7

8 22 1.6 2.4 96.1

10 13 9 1.4 97.6

12 13 9 1.4 99.3

24 1 1 1 100.0

Total 906 63.8 100.0

Missing NAP 486 34.2

NA 27 1.9

Total 513 36.2

Total 1419 100.0

Trang 10

2.2

Of the people who answered the question:

● 6% of the people don’t watch any televisions

● 53.1% of the people watched TV for two hours or less

● 16,6% of the people watched TV for five hours or more (100% - (6% = 20,9% + 26,3% + 17,5% + 12,7%) = 16,6%)

● 83.4%, which is the total valid percent of whom watching TV from 0 to 4 hours (6% + 20,9% + 26,3% + 17,5% + 12,7% = 83,4%

Of the people who watch TV (which means the values of variable is excluded): 0

● 20.9% watch TV for one hour

● 82.27% watch TV for four hours or less (189 + 238 +159 +115

906 − 54 x 100% = 82,27%)

● 852 is the total number of people who watch TV (906 54 = 852) –

● 701 is the total number of people who watch TV from 1 to 4 hours per day (189 + 238 + 159 + 115 = 701)

2.3

Statistics Hours per day watching TV

Missing 513

Percentiles 25 1.00

In a data distribution, a percentile is the number below which a specified proportion of values falls In SPSS, there are several methods for calculating percentiles, as well as several equations Our group will compute the 25th, 50th, 75th, and 95th percentiles for the variable TV hours We'll select the Frequencies option, which uses a weighted average algorithm to determine percentiles (as displayed in the SPSS data view above)

Trang 11

● The value for 25 percentiles is 1.00 th

● The value for 50 percentiles is 2.00 th

● The value for 75 percentiles is 4.00 th

● The value for 95 percentiles is 8.00 th

● The values for Median 2.00

● The values for Mode is 2

2.4

BAR CHART

There are a few problems with this bar chart:

● A few outliers with low frequencies exist, however they can be used in a huge number

of discrete values

● Some values (9, 13, 16, 17, 18, 19, 21, 22, 23) are missing since they do not occur in the survey answer The bar chart below lacks a gap that reflects these uncollected data (missing values), which might lead to misinterpretation for readers at first look

● The real form of the distribution is difficult to determine because there are low frequencies indicated for higher classes

Trang 12

2.5

This dataset is POSITIVELY SKEWED (since most values are clustered around the left tail of the distribution while the right tail of the distribution is longer), all of the values in the histogram are grouped together This indicates that most of the survey respondents watch television for between one and four hours, with only a small percentage watching it for more than 10 hours

In this situation, we believe a histogram would perform better than a bar chart As we mentioned

in paragraph (2.4), the bar chart DOES NOT indicate the gap that represents the uncollected data, but the histogram tells a different tale Therefore, since the histogram can both "show the distributions of the values of data collected" and "show a gap to represent these uncollected data," it would be a superior way to display the data

Trang 13

Question 3:

Title: Understanding Business Perceptions and Purchase Outcomes in a Business- -to Business Context: A Study of HATCO Customers

In today's fiercely competitive business environment, gaining insights into customer perceptions and purchase outcomes holds immense significance for enterprises, especially those operating in the B2B sector, such as HATCO company HATCO's objective is to conduct an extensive segmentation study encompassing 100 data points across 14 variables

This study has two core reaseach questions:

1.1 Evaluating Perceptions of HATCO: "How do customers perceive HATCO across various attributes, including delivery speed, pricing, flexibility in negotiations, manufacturer's image, service quality, salesforce image, and product quality? What are the strengths and areas in need

of improvement according to customer ratings?"

1.2 Examining Purchase Outcomes: "What are the outcomes of customer interactions with HATCO in terms of usage levels and satisfaction levels? How does this data inform HATCO's market share within its customer base and overall customer satisfaction?"

II DATA DESCRIPTION

2.1 Dataset Origin:

The dataset used in this study was acquired from the fictitious Hair, Anderson, and Tatham Company (HATCO), an industrial supplier created solely for research purposes

2.2 Dataset Structure:

This dataset comprises 100 data points, with each data point associated with 14 variables These variables can be grouped into three primary categories:

2.2.1 HATCO Perceptions (Variables X1 to X7):

These attributes encompass the speed of product delivery (X1), perceived pricing level (X2), willingness to negotiate prices (X3), the overall image of the manufacturer (X4), service quality (X5), the image of HATCO's salesforce (X6), and product quality (X7)

2.2.2 Purchase Outcomes (Variables X9 and X10):

Two variables capture the outcomes of customer interactions with HATCO:

- X9, "Usage level," quantifies the percentage of a firm's total product purchases made from HATCO, with values ranging from 0 to 100 percent on a 100-point scale

- X10, "Satisfaction level," assesses customer satisfaction with prior purchases from HATCO using a visual rating scale, similar to the one applied to measure perceptions (X1 to X7)

III DESCRIPTIVE STATISTICS

3.1 Perceptions of HATCO

3.1.1 Measures of central tendency (mean, median, mode) and dispersion (range, variance, standard deviation) for variables X1 to X7:

Statistics Delivery

Speed Price Level Price Flexibility Manufacturer Image Service Salesforce Image Product Quality

N Valid 100 100 100 100 100 100 100

Trang 14

g 0 0 0 0 0 0 0

Mean 3.515 2.364 7.894 5.248 2.916 2.665 6.971 Median 3.400 2.150 8.050 5.000 3.000 2.600 7.150 Mode 2.4 a 1.3 a 9.9 4.5 3.0 a 2.5 8.4

Statistics Delivery

Speed Price Level Price Flexibility Manufacturer Image Service Salesforce Image Product Quality

N Valid 100 100 100 100 100 100 100

Missing 0 0 0 0 0 0 0

Std Deviation 1.3207 1.1957 1.3865 1.1314 7513 7709 1.5852

Variance 1.744 1.430 1.922 1.280 564 594 2.513

Range 6.1 5.2 5.0 5.7 3.9 3.5 6.3

Base on the Pearson’s Coefficient of skewness, we can calculate these data sets:

1 C of S of Delivery Speed = 3* (3.515- 3.400)/ 1.3207 = 0.26

2 C of S of Price Level = 3* (2.364- 2.15 /1.1957= 0.54 )

3 C of S of Price Flexibility = 3* (7.894- 8.050)/ 1.3865= -0.33

4 C of S of Manufacturer Image = 3* (5.248- 5)/ 1.1314= 0.65

5 C of S of Service = 3* (2.916- 3)/.7513= -0.33

6 C of S of Salesforce Image = 3* (2.665- 2.6)/ 7709= 0.25

7 C of S of Product Quality = 3* ( 6.971- 7.150)/ 1.5852= -0.33

3.1.2 Histograms or bar charts to visualize the distribution of perceptions for each attribute (X1

to X7):

Ngày đăng: 12/08/2024, 14:34

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN