make a frequency table for the variable does the frequency table make sense does it make sense to make a histogram of the variable a bar char

Because this frequency table is a Grouped Frequency Table, and there are so many values in income data, we need a frequency table to accurately define groups of wages.. While it is possi

Trang 1

Ministry of Education and Training National Economics University

GROUP MID-TERM EXAM SUBJECTS: STATISTICS Students: Hoang Lé Anh Tho - 11203780

Nguyễn Vũ Thùy Linh- 11205851 Nguyễn Nguyệt Minh - 11206121 Nguyễn Hải Nam - 11206222 Nguyễn Phương Thảo - 11206947 Nguyễn Quang Anh - 11204425 Class: Advanced Finance 62C

Ha Noi - 03/2022

income in the year before the survey)h 1 Conaidesd ‘abled BleCd ‘abled 1

Trang 2

1 Make a frequency table for the variable Does the frequency table make sense? Does it make sense to make a histogram of the variable? A bar chart?

FREQUENCY TABLE TOTAL FAMILY INCOME Statistics FOR LAST YEAR

Missing Valid 1354 65

TOTAL FAMILY INCOME FOR LAST YEAR

Frequency] Percent Valid Cumulative

Percent Percent

$10000 TO

$12500 TO

$15000 TO

$17500 TO

$20000 TO

$22500 TO

$25000 TO

$30000 TO

39999

Trang 3

Missing

Total

$40000 TO

49999

$50000 TO

59999

$60000 TO

74999

$75000 TO

$59999

$90000 -

$109999

$110000 OR

OVER

REFUSED

Total

DK

NA

119

108

111

66

45

76

124

1354

49

16

65

1419

8,4 7,6 7,8 4,7 3,2 5,4 8,7 95,4 3,5 1,1 4,6 100,0

8,8 8,0 8,2 4,9 3,3 5,6 9,2 100,0

60,9 68,8 77,0 81,9 85,2 90,8 100,0

=>]t makes sense to use this frequency table Because this frequency table is a Grouped Frequency Table, and there are so many values in income data, we need a frequency table to accurately define groups of wages And all of the frequency, percentage, and cumulative percentages demonstrate the family income group in a general way

Trang 4

Histogram

Mean = 16,13 Std Dev = 5,636

1207

N=1354 100-54

œ o 1

Frequency 5 1

40¬

20¬ o-

=> A histogram is the most frequently used graph to showcase frequency distributions The X-axis on the histogram can be seen as intervals that show the scale of values which the measurements fall under, meanwhile the Y-axis shows the number of times that the values occurred within the intervals While it is possible to make an equal- width histogram of the income variable due to the fact that family income 1s generally continuous data with the same class intervals, it is not advisable, and therefore does not make sense in this situation The missing values column is not stated clearly since

it is lying within the non-missing values, which, in tum, could cause several misunderstandings Therefore, it does not make sense to use the histogram for this variable

BAR CHART

Trang 5

120ể

Ủ o L

Frequency 3

40>

20-54 0~

Z 2 0 h0 đ N G 2 À ra nan S3 99 Ế0 5n 52 ự

ạ ooỏoococoooSh*ệs 0ụ 3+ đ Đ 0À đỏ ựú GỏGGỏGGứu ể

7# 588555555685 6S6S6EeseeSEeeeseesespg

đ 5S 5Ế 5Sểã1aaaĐOẹĐOSSDODODODODODODODOGOSGSGGjh

ể ể ể ể ể ể ể ể ể ể ể ể ể ' ỏ

ể ửO OOO OO OOỏGOGOOOOOGOOOGOODOGsỘ&Oẹ

B8 Đ 6 FY ON ẹ LBBB HHH oOwBeEH AH ST KEK GE Ể@6 e8 6 8o

@ @ Ủ @ @ @ Ủụ à 6 @ ệ à @ @ @ @ @ 6 @ ử6 Ủ6 C ử6 (6 @ (@ @ @6 @ GB @ @ @G6 @ @6 @6 6 6 @6 6 6 @6

6 ử6 @ử @ử @ử @ @ @ @ @ @ 6 ử6 @

=> Using a bar chart in this situation makes sense When you want to show a distribution of data points or perform a comparison of metric values, a bar chart is a viable option With the bar chart, we can see which group has the highest number or how they compare to other groups From this bar chart, we can also easily determine the trend going on and draw a conclusion, despite the numerous values on the X-axis

2 What is the scale of measurement for the variable?

The scale of measurement for this variable income is ordinal because it has been divided into several categories, which are not measured but merely assigned as labels Moreover, we can see in the gss.sav file shows the scale of measurement for income is ordinal

3 What descriptive statistics are appropriate for describing this variable and why? Does it make sense to compute a mean?

The four types of descriptive statistics are Measures of Frequency, Measures of Central Tendency, Measures of Dispersion or Variation, and Measures of Position For describing this variable, the Measure of Central Tendency ( Median and Mode) is the most suited and is

Trang 6

closest to our purpose, which is to determine the most often stated response

In this situation, computing a Mean does not make sense and there are reasons to choose Median and Mode instead of Mean to compute:

e We do not have any information about the values in this range (under $1,000 and $110,000 or above) The extreme classes are open in this range

e When we draw a histogram, we see that it is heavily skewed to the right As a result, mean is a poor descriptive statistic for skewed distributions

e Because there are 65 missing data in this frequency table, using Mean may give inaccurate results

4 Discuss the advantages and disadvantages of recording income in this manner Describe other ways of recording income and the problem associated with each of them

- On the one hand, the data can be presented in a comprehensible manner since it has been divided into small areas and logical details Moreover, those on the data view are clearly observed, people can see that it is made a room for a smaller label But on the other hand, there are some drawbacks while using this manner The Mean value will have deviations as a result of the multiple ranges calculated, and it will be difficult to distinguish between the numbers In addition, all the observation is assigned to a “class” The intervals are equally wide so even though it helps people to read and interpret the graph easily, it is not essential

- There are others ways to record income The first way is that we can compile the database on the occupation of the members of a family For example, we divide family members into categories such as Adults (>18 years old) and Teenagers (<18 years old) From this, we can continue to divide it into smaller sectors and focus on their careers in each category Another way is that we can collect the data by age With this method, people have to divide the range of ages like>18, 18 - 24, 24 — 30, 30 — 36, and more sectors following Then we make a survey to gather the information Although these methods all have different advantages, the problem associated with them is that because of the details of the procedures, the implementation will take a long time to synthesize the information and might result in a large number of expenses Furthermore, the mean values will be invalid because the variance between variables is so large

Question 2: In the gss.sav file, the variable tvhours tell you how many hours per day GSS respondents say they watch TV.

Trang 7

a, Make a frequency table of the hours of television watched Do any of the values strike you as strange? Explain

FREQUENCY TABLE Statistics Hours per day watching

TV

Missing 513

Hours per day watching TV

Frequency] Percent Valid Cumulative

Percent Percent

Trang 8

We can see from the frequency table, the value striking us as strange is 12 As

we can see from the frequency, the number of people watching TV in the range from 0

to 10 is high but from 11 to 24, the data reduces significantly However, 12 have a high frequency in this range from 11 to 24 which is unusual

b, Based on the frequency table, answer the following questions: Of the people who answered the question, what percentage don’t watch any television? What percentage watch two hours or less? Five hours or more? Of the people who watch TV, what percentage watch one hour? What percentage watch four hours or less?

- Based on the frequency table, of the people who answered this question

e The percentage of people who do not watch any television is 6%

e The percentage of people who watch two hours or less is 53,1%

e The percentage of people who watch five hours or more is 100% - (6% + 20,9% + 26,3% + 17,5% + 12,7%) = 16,6%

- Based on the frequency table, of the people who watch TV ( the values of variable 0 is eliminated)

e The percentage of people who watch one hour is

100% = 22,2%

e The percentage of people who watch four hours or less is

100% = 82,3%

c, From the frequency table, estimate the 25th, 50th, 75th, 95th percentiles What is the value for the Median, Mode?

We will calculate the 25th, 50th, 75th, and 95th percentiles for the variable tvhours with the Frequencies option, which generates percentiles, median, and mode (as shown in the SPSS data view above)

Statistics Hours per day watching

Trang 9

Valid | 906

25 1,00 Percentil 50 2,00

In conclusion, the SPSS output view shows:

¢ The value for 25" percentiles is |

¢ The value for 50" percentiles is 2

e The values for Median is 2

e The value for Mode is 2

d, Make a bar chart of the hours of TV watched What problem do you see with this display?

BAR CHART

Trang 10

25074

200-4

= un o 1

100

50-4

01 2 3 4 5 6 7 8 10 11 12 14 15 20 24

There are a few problems with this bar chart:

e A few outliers exist with low frequencies but are used in a large number of discrete values

e Some values are missing (9, 13, 16, 17, 18, 19, 21, 22, 23) because these values do not appear in the survey response

e Because there are low frequencies given for higher classes, the true shape of the distribution is difficult to understand

e, Make a histogram of the hours of TV watched What causes all of the values to

be clumped together? Compare this histogram to the bar chart you generated in

Trang 11

question 2d Which is a better display for these data?

Histogram

250-54 Mean = 3

Std Dev = 2.56 N= 908 200-4

= un o 1

100~—

5 10 15 20 25 Hours per day watching TV

All of the values in the histogram are clumped together because this graph above is skewed right, which means that most values are distributed to the left of the dataset and the right tail is longer The positively skewed histogram gives a brief overview that most of the people who took the survey watch television from 1 to 4

Trang 12

hours while very few of them watch television for more than 10 hours

From our point of view, a histogram would be a better choice than a bar chart in this situation We have written in part (d) in this section that the bar chart does not perform the uncollected data well, but on the other hand, the histogram does As a seeable result

of the fact that the histogram can present the distributions of the values of collected data as well as well-represent the uncollected data, it would be a more appropriate option in comparison to the bar chart

Question 3: Many factors contribute to the happiness of a marriage ( variable hapmar in gss.sav file) Select one of the factors in gss.sav file and a descriptive tool to answer the question “ what

distinguishes people who are very happy with their marriage from those who are less content” Write maximum 5 sentence to justify your selection and findings

Bar Chart

Trang 13

60000~

S0000~

40000—

300004

20000

100004

VERY HAPPY PRETTY HAPPY

HAPPINESS OF MARRIAGE

NOT TOO HAPPY

==> TO answer this question, we decided that the factor that determines the happiness of a marriage is family income (variable incomdo! in gss.sav file ) The reason this is the factor of our choosing is that we believe that income has a direct correlation with people's happiness As such, a difference in income would lead to a very clear difference in people's level of satisfaction in their marriage, allowing for a chart with a clear trend and easy to draw conclusions from According to the figure above, we can see that people who are more content with their marriage are people with higher income, due to the fact that they have fewer worries of money and can focus more on their marital happiness In conclusion,

a higher income will most likely lead to a higher level of happiness regarding their marriage

Tiêu đề	Make a frequency table for the variable. Does the frequency table make sense? Does it make sense to make a histogram of the variable? A bar chart?
Tác giả	Hoang Lộ Anh Tho, Nguyễn Vũ Thựy Linh, Nguyễn Nguyệt Minh, Nguyễn Hải Nam, Nguyễn Phương Thảo, Nguyễn Quang Anh
Trường học	National Economics University
Chuyên ngành	Statistics
Thể loại	Mid-term Exam
Năm xuất bản	2022
Thành phố	Ha Noi

Định dạng
Số trang	13
Dung lượng	1,3 MB