group 6 mid term exam subject statistics

Statistics TOTAL FAMILY INCOME FOR LAST YEAR Missing 65 TOTAL FAMILY INCOME FOR LAST YEAR Frequency Percent Valid Percent Cumulative Percent $10000 TO 12499 $12500 TO 14999... REFUSED 12

Trang 1

Ministry of Education and Training National Economics University

===========

GROUP 6 MID-TERM EXAM

SUBJECT: STATISTICS

Members: Bùi Mai Phương - 11219549

Đặng Linh Phương - 11210143 Bùi Nhật Quang - 11210142

Đỗ Minh Quang - 11219550

Đỗ Huy Phúc - 11219

Nguyễn Thị Thu Quỳnh - 11219551

Class: Financial economics 63

HÀ NỘI, 05/2022

Trang 2

Question 1:

1

Statistics

TOTAL FAMILY INCOME FOR

LAST YEAR

Missing 65

TOTAL FAMILY INCOME FOR LAST YEAR

Frequency Percent

Valid Percent

Cumulative Percent

$10000 TO

12499

$12500 TO

14999

Trang 3

$15000 TO

17499

$17500 TO

19999

$20000 TO

22499

$22500 TO

24999

$25000 TO

29999

$30000 TO

34999

$35000 TO

39999

$40000 TO

49999

$50000 TO

59999

$60000 TO

74999

$75000 TO

$89999

$90000

-$109999

$110000 OR

OVER

Trang 4

REFUSED 124 8.7 9.2 100.0

—> It makes sense to use this frequency table because there are so many discrete values in income data Hence, a Grouped Frequency Table can accurately define groups of wages Its associated values (Cumulative percent, percentage of frequency, valid percent) help to illustrate the family income group in a detailed and general way

—> It does not make sense to use the histogram for this variable A histogram is the most frequently used graph to showcase frequency distributions The x-axis on

Trang 5

the histogram represents intervals that show the scale of values which the measurements fall under; meanwhile the y-axis shows the frequency of the values occurred within the intervals However, in this case, the x-axis does not represent the desired data (amount of family income last year) but assigned group value and the missing values column is not stated clearly since it is lying within the non-missing values, which ensue misunderstandings

—> Bar chart makes sense in this situation because it can show a distribution of data points while also performing a comparison of metric values across discrete income groups From the bar chart, we can see the most common group, the direction of the trend as well as how others compare against each other

2

Scale of measurement for the variable here is ordinal because it is not collected as numeric values directly but divided into several categories and assigned as labels 3

The descriptive statistics are appropriate for describing this variable as measures of central location, specificallyMedian and Mode

Trang 6

- Median and Mode are easy to compute in this case (the Mode is unique and this sample is not too small)

- We can observe that the histogram is highly skewed to the right As a result, for skewed distributions, the Mean is a poor descriptive statistic Additionally, there are about 65 missing data in the frequency table, using Mean may give inaccurate output So we should not choose it

- The median is often believed to be the greatest representation of the data's middle location in skewed distributions

4

● The advantages and disadvantages of recording income in this manner Advantages

- The data can be presented in an understandable manner since it has been divided into small areas and logical features

- Those in the data view may easily see that space has been created for a smaller label

Disadvantages

- The Mean value will have deviations as a result of the multiple ranges calculated, and it will be difficult to distinguish between the numbers

- All the observations are assigned to a <class= The intervals are all the same width, so while it makes it easier to read and comprehend the graph, it

is not required

● Others ways to record income

1 Compiling the database on the occupation of the members of a family

-> Divide members of a household into different categories

For example:

- Adults (>18 years old)

- Teenagers (<18 years old)

…

-> Continue to divide it into smaller sectors and focus on their careers in each category

2 Collecting the data by age

-> Divide family members into different age groups and do a survey to gather income information

For example:

16 – 18 (<18), 18 – 24, 24 – 30 , 30 – 36,

…

Trang 7

● The problem associated with them

- Due to the complexity of the techniques, synthesising the data will take a long time

- Maybe it might need a large amount of expenses

- The mean values will be invalid because the variance between variables is so large

Question 2:

Frequencies

Hours per day watching TV Frequency Percent Valid Percent

Cumulative Percent

a) Observing the frequency table, the value strikes us as strange is 12 From 1 to 5, the frequency is high, over 50 From 2 to 24 hours of watching TV, the frequency has a tendency to decrease From 11 to 24, the frequency is much smaller than the

Trang 8

other interval, smaller than 3 Only 12 has a high frequency of 13, which is unusual

b) Based on frequency table, Valid Percentage of People:

Don’t watch any TV: 6%

Watch 2 hours or less: 53.1%

Five hours or more: 100% - 83.3% = 16.7%

Watch 1 hour: 20.9%

Watch 4 hours or less: 83.3%

c)

Statistics

Hours per day watching TV

Percentiles 25 1.00

In conclusion,

The value for 25 percentile: 1.00th

The value for Median: 2

The value for Mode: 2

d) Problem in Bar chart

In general, the dataset is not distributed equally As can be seen from the bar graph below, most of the respondents watch TV from 1 to 4 hours per day, whereas only

a minority of those watch TV for more than 10 hours

Moreover, the values <9, 13, 16, 17, 18, 19, 21, 22, 23= are not included in the bar chart due to the fact that these values do not appear in the survey answers (this might occured since the number of respondents are not large enough) Therefore, the problem in the bar chart is that it does not show a gap which represents these uncollected data (so-called missing values), which can lead to misunderstandings

In addition, it’s hard to tell the trend after reading the bar chart as there are many bars with unequal distribution

Trang 9

All the values in the histogram are clumped together as histogram represents a continuous data set, refers to a graphical representation that displays data to show

Trang 10

the frequency of numerical data It’s different from a bar chart which is a pictorial representation of data that uses bars to compare different categories of data Bars

do not touch each other, hence there are spaces between bars

Also, the histogram is positively skewed with a long tail to the right, which means that most of the values are distributed in the left Observing the histogram, we conclude that most people taking the survey watch tv from 1 to 4 hours while very few of them watch more than 10 hours

Bar charts and histograms both display data, but for different purposes Bar charts allow us to compare specific variables or categories Histograms allow us to understand the distribution of variables or the frequency of specific occurrences In this case, a histogram is a better choice than a bar chart as there are many categories and it’s more necessary to understand the distribution of variables (number of tv hours people most watch)

Question 3:

To distinguish people who are very happy with their marriage from those who are less content, our group has decided to choose mean family income; ranges recorded to midpoints, or variable incomdol

Trang 11

After researching, we believe that income has a direct correlation with people’s happiness A chart shows a clear trend that a difference in income would result in a significant difference in people’s level of satisfaction in their marriage

As can be seen from the figure, people who are happier with their marriage have higher income than those who are less happier The reason might be that people with higher income have fewer worries about financial issues, and can focus more

on building their marital happiness with their partners

To conclude, a high level of happiness in marriage depends much on people’s income

Trang 12

Besides mean family income, happiness of marriage also depends on Mean Husband and WIfe’s Education (yrs) Two above bar charts show a clear trend that husbands and wifes who have a higher number of years of education would be likely to have a happy marriage However, the difference is not as significant as those with mean family income

Tiêu đề	TOTAL FAMILY INCOME FOR LAST YEAR
Tác giả	Bùi Mai Phương, Đặng Linh Phương, Bùi Nhật Quang, Đỗ Minh Quang, Đỗ Huy Phúc, Nguyễn Thị Thu Quỳnh
Trường học	National Economics University
Chuyên ngành	Statistics
Thể loại	Mid-term exam
Năm xuất bản	2022
Thành phố	HÀ NỘI

Định dạng
Số trang	12
Dung lượng	1,25 MB