Statistics TOTAL FAMILY INCOME FOR LAST YEAR Missing 65 TOTAL FAMILY INCOME FOR LAST YEAR Frequency Percent Valid Percent Cumulative Percent $10000 TO 12499 $12500 TO 14999... REFUSED 12
Trang 1Ministry of Education and Training National Economics University
===========
GROUP 6 MID-TERM EXAM
SUBJECT: STATISTICS
Members: Bùi Mai Phương - 11219549
Đặng Linh Phương - 11210143 Bùi Nhật Quang - 11210142
Đỗ Minh Quang - 11219550
Đỗ Huy Phúc - 11219
Nguyễn Thị Thu Quỳnh - 11219551
Class: Financial economics 63
HÀ NỘI, 05/2022
Trang 2Question 1:
1
Statistics
TOTAL FAMILY INCOME FOR
LAST YEAR
Missing 65
TOTAL FAMILY INCOME FOR LAST YEAR
Frequency Percent
Valid Percent
Cumulative Percent
$10000 TO
12499
$12500 TO
14999
Trang 3$15000 TO
17499
$17500 TO
19999
$20000 TO
22499
$22500 TO
24999
$25000 TO
29999
$30000 TO
34999
$35000 TO
39999
$40000 TO
49999
$50000 TO
59999
$60000 TO
74999
$75000 TO
$89999
$90000
-$109999
$110000 OR
OVER
Trang 4REFUSED 124 8.7 9.2 100.0
—> It makes sense to use this frequency table because there are so many discrete values in income data Hence, a Grouped Frequency Table can accurately define groups of wages Its associated values (Cumulative percent, percentage of frequency, valid percent) help to illustrate the family income group in a detailed and general way
—> It does not make sense to use the histogram for this variable A histogram is the most frequently used graph to showcase frequency distributions The x-axis on
Trang 5the histogram represents intervals that show the scale of values which the measurements fall under; meanwhile the y-axis shows the frequency of the values occurred within the intervals However, in this case, the x-axis does not represent the desired data (amount of family income last year) but assigned group value and the missing values column is not stated clearly since it is lying within the non-missing values, which ensue misunderstandings
—> Bar chart makes sense in this situation because it can show a distribution of data points while also performing a comparison of metric values across discrete income groups From the bar chart, we can see the most common group, the direction of the trend as well as how others compare against each other
2
Scale of measurement for the variable here is ordinal because it is not collected as numeric values directly but divided into several categories and assigned as labels 3
The descriptive statistics are appropriate for describing this variable as measures of central location, specificallyMedian and Mode
Trang 6- Median and Mode are easy to compute in this case (the Mode is unique and this sample is not too small)
- We can observe that the histogram is highly skewed to the right As a result, for skewed distributions, the Mean is a poor descriptive statistic Additionally, there are about 65 missing data in the frequency table, using Mean may give inaccurate output So we should not choose it
- The median is often believed to be the greatest representation of the data's middle location in skewed distributions
4
● The advantages and disadvantages of recording income in this manner Advantages
- The data can be presented in an understandable manner since it has been divided into small areas and logical features
- Those in the data view may easily see that space has been created for a smaller label
Disadvantages
- The Mean value will have deviations as a result of the multiple ranges calculated, and it will be difficult to distinguish between the numbers
- All the observations are assigned to a <class= The intervals are all the same width, so while it makes it easier to read and comprehend the graph, it
is not required
● Others ways to record income
1 Compiling the database on the occupation of the members of a family
-> Divide members of a household into different categories
For example:
- Adults (>18 years old)
- Teenagers (<18 years old)
…
-> Continue to divide it into smaller sectors and focus on their careers in each category
2 Collecting the data by age
-> Divide family members into different age groups and do a survey to gather income information
For example:
16 – 18 (<18), 18 – 24, 24 – 30 , 30 – 36,
…
Trang 7● The problem associated with them
- Due to the complexity of the techniques, synthesising the data will take a long time
- Maybe it might need a large amount of expenses
- The mean values will be invalid because the variance between variables is so large
Question 2:
Frequencies
Hours per day watching TV Frequency Percent Valid Percent
Cumulative Percent
a) Observing the frequency table, the value strikes us as strange is 12 From 1 to 5, the frequency is high, over 50 From 2 to 24 hours of watching TV, the frequency has a tendency to decrease From 11 to 24, the frequency is much smaller than the
Trang 8other interval, smaller than 3 Only 12 has a high frequency of 13, which is unusual
b) Based on frequency table, Valid Percentage of People:
Don’t watch any TV: 6%
Watch 2 hours or less: 53.1%
Five hours or more: 100% - 83.3% = 16.7%
Watch 1 hour: 20.9%
Watch 4 hours or less: 83.3%
c)
Statistics
Hours per day watching TV
Percentiles 25 1.00
In conclusion,
The value for 25 percentile: 1.00th
The value for 50 percentile: 2.00th
The value for 75 percentile: 4.00th
The value for 90 percentile: 6.00th
The value for Median: 2
The value for Mode: 2
d) Problem in Bar chart
In general, the dataset is not distributed equally As can be seen from the bar graph below, most of the respondents watch TV from 1 to 4 hours per day, whereas only
a minority of those watch TV for more than 10 hours
Moreover, the values <9, 13, 16, 17, 18, 19, 21, 22, 23= are not included in the bar chart due to the fact that these values do not appear in the survey answers (this might occured since the number of respondents are not large enough) Therefore, the problem in the bar chart is that it does not show a gap which represents these uncollected data (so-called missing values), which can lead to misunderstandings
In addition, it’s hard to tell the trend after reading the bar chart as there are many bars with unequal distribution
Trang 9All the values in the histogram are clumped together as histogram represents a continuous data set, refers to a graphical representation that displays data to show
Trang 10the frequency of numerical data It’s different from a bar chart which is a pictorial representation of data that uses bars to compare different categories of data Bars
do not touch each other, hence there are spaces between bars
Also, the histogram is positively skewed with a long tail to the right, which means that most of the values are distributed in the left Observing the histogram, we conclude that most people taking the survey watch tv from 1 to 4 hours while very few of them watch more than 10 hours
Bar charts and histograms both display data, but for different purposes Bar charts allow us to compare specific variables or categories Histograms allow us to understand the distribution of variables or the frequency of specific occurrences In this case, a histogram is a better choice than a bar chart as there are many categories and it’s more necessary to understand the distribution of variables (number of tv hours people most watch)
Question 3:
To distinguish people who are very happy with their marriage from those who are less content, our group has decided to choose mean family income; ranges recorded to midpoints, or variable incomdol
Trang 11After researching, we believe that income has a direct correlation with people’s happiness A chart shows a clear trend that a difference in income would result in a significant difference in people’s level of satisfaction in their marriage
As can be seen from the figure, people who are happier with their marriage have higher income than those who are less happier The reason might be that people with higher income have fewer worries about financial issues, and can focus more
on building their marital happiness with their partners
To conclude, a high level of happiness in marriage depends much on people’s income
Trang 12Besides mean family income, happiness of marriage also depends on Mean Husband and WIfe’s Education (yrs) Two above bar charts show a clear trend that husbands and wifes who have a higher number of years of education would be likely to have a happy marriage However, the difference is not as significant as those with mean family income