Does it make sense to make a histogram of the variable?. Moreover, all the frequency, the percentage, and the cumulative percentage help us clear away details and determine the group of
Trang 1NATIONAL ECONOMICS UNIVERSITY
ADVANCED EDUCATIONAL PROGRAMS
BUSINESS STATISTICS Group Mid-term Assignment
Team members:
Question1: Answer:
Trang 21 Make a frequency table for the variable Does the frequency table make sense? Does it make sense to make a histogram of the variable? A bar chart
The frequency table for the variable
TOTAL FAMILY INCOME FOR LAST YEAR
Valid UNDER $1 000 17 1.2 13 1.3
$1 000 TO 2999 17 12 13 25
$3 000 TO 3 999 9 8 7 3.2
$4 000 TO 4 989 7 5 5 3.7
$5 000 TO 5 999 13 9 1.0 47
$6 000 TO 6 999 19 1.3 1.4 6.1
§7 000 TO 7 999 17 1.2 1.3 7.3
$8 000 TO 9 999 40 2.8 3.0 10.3
$10000 TO 12499 58 44 43 14.5
$12500 TO 14999 56 3.9 41 18.7
$15000 TO 17499 50 3.5 3.7 22.4
$17500 TO 19999 54 3.8 4.0 26.4
$20000 TO 22499 42 3.0 3.1 29.5
$22500 TO 24999 59 4.2 44 33.8
$25000 TO 29999 79 5.6 5.8 39.7
$30000 TO 34999 86 6.1 6.4 46.0
$35000 TO 39999 82 58 68.1 521
$40000 To 49999 119 84 88 80.9
$50000 TO 59999 108 7.6 8.0 68.8
$60000 TO 74999 111 7.8 8.2 77.0
$75000 TO $89999 66 47 4.9 81.9
$90000 - $109999 45 3.2 3.3 85.2
$110000 OR OVER 76 5.4 5.6 90.8 REFUSED 124 8.7 9.2 100.0 Total 1354 95.4 100.0 Missing DK 49 3.5
NA 18 1.1 Total 65 4.6 Total 1419 100.0
- The frequency table does really make sense since the variable income is the continuous variables They are usable and provide us the information which are understandable However, raw data provide too much information that causes many difficulties in data analysis Summarised information are concise and reflect the accurate view of original data Moreover, all the frequency, the percentage, and the cumulative percentage help us clear away details and determine the group of family income in a general way
- The Bar Chart (Total family income as the variable):
Trang 3TOTAL FAMILY INCOME FOR LAST YEAR
120-44
100-4
=
a
ể
2
u
407
20-4
o
Srynweenann ona cg Gn Hb Soka awe CO
S2SESeSSSERBSBRBEBEBE BREBBBRESREESEREBEREB BEESSSRS SER ERB BBB BEE EB EO BES
TOTAL FAMILY INCOME FOR LAST YEAR
Histogram
Std Dev = 5.636 N=1,354
=
B
cc
=œ
“
$
uw
TOTAL FAMILY INCOME FOR LAST YEAR
- The histogram with the total family income as the variable makes sense because the variable income is continuous data and the best tool for it to analyze is Histogram It indicates the pattern of family income, such as the density of the frequency at which the spectrum of family income is greater than the other, etc This histogram above uses the same value, since the horizontal axis is the mark value, not the value of this variable If
we use the value of this vector for the horizontal axis of the histogram, the bars within are not equal to each other
Trang 4- The bar chart of this variable doesn’t make sense if we make it in order to analyze the data Since the bar chart is useful for qualitative data and discrete data, not continuous data
=> It makes sense to make a histogram rather than bar chart because the best tool for continuous data is Histogram
2 What is the scale of measurement for the variable?
The scale of measurement for the variable income is ordinal scale Because we can categorize and rank the data in an order from the lowest income class to the highest income class, but we cannot say anything about the intervals between the rankings Such
as the income levels range from quintiles with incomes below $ 1000 to those with incomes above $ 110000 and the gap between the quintiles is unequal
3 What descriptive statistics are appropriate for describing this variable and why? Does it make sense to compute a mean?
- Among median, mode and mean, the most appropriate descriptive statistics for describing total family income variable can be best described by the median When our data is skewd as below , we find that the mean is being dragged in the direct of the skew while skewed data have a smaller effect on the median In these situations, the median is generally considered to be the best representative of the central location of the data As
we can see that, the mean; mode plus median values are 16.13, 24 and 17 respectively In this case, it does not make sense to compute a mean because there is an outlier This variable has a trend of negatively skew distribution and moreover, it doesn’t have any outliers so the mean in this situation doesn’t make any sense
The mean, median, mode of the total family income for last year variable
Trang 5» Frequencies
[DataSetl] C:\Users\Asus\Downloads\gss.sav
Statistics
TOTAL FAMILY INCOME FOR LAST YEAR
N Valid 1354 Missing 65 Mean 16.13 Median 17.00 Mode 24 Skewness -.608 Std Error of Skewness 066
4 Discuss the advantages and disadvantages of recording income in this manner Describe other ways of recording income and the problem associated with each of them?
% Advantages:
- Recording in a quite certain and definite way, which helps us to see how the frequency in this variable varies in different numbers
- Since it is recorded in this manner, the curve which illustrates this variable has really high accuracy
- Recording in this manner helps us to see the differences between the values of this variable more accurate since it divides the values into really small numbers
% Disadvantages:
- The recording doesn’t follow only one order only, some are using $999, some are using $9,999, which may make the bar chart with this variable cannot have the highest accuracy
- Recording in this manner is quite detailed, it’s really hard to see the data in the general picture because it is divided into many smaller values
- The histogram of this variable is not an exception of this problem’s sources, the values are clumped together, and we only can use the curve in order to see the trend more easily
Alternative methods:
- Wecan use only $4,999 class space for all thevalues of the variable instead of using different class spaces in this situation
(Problem: This scenario may not describe the trend or compare each value of this variable
as detailed and accurate as the previous way Furthermore, the value doesn’t vary as many as the original way.)
Trang 6- We can divide into each group of family with the different number of members inside, and then record the income of each group It may help to define the data more accurately and detailed
(Problem: This way is more difficult to record compared to the original method, since we may have to spend more time in order to divide the group of family into the smaller groups.)
Trang 7Question 2: Answer
(a) The frequency table of the hours of television watched
Hours per day watching TV
Cumulative Frequency Percent Valid Percent Percent Valid 0 54 38 6.0 8.0
1 189 133 20.9 26.8
2 238 188 26.3 531
4 116 81 127 833
5 54 38 6.0 89.3
6 30 24 3.3 926
10 7 11 93.7
8 2 16 24 961
10 13 9 14 97.6
12 13 9 14 99.3
14 2 1 2 99.6
15 2 1 2 99.8
20 1 1 1 99.9
24 1 1 1 100.0 Total 906 63.8 100.0 Missing NAP 486 342
NA 27 19 Total 513 36.2 Total 1419 100.0
- The values from the table over that we consider Irregulars are the values "20,
"24", "15", "14" From our analysis, these 4 values are distant evacuated from the mass of information and stand out as extraordinary exceptions
- Besides, since this was an examination of hours per day GSS respondents say they watch TV, it is about outlandish for somebody to observe TV for more than 14 hours daily
(b)
Trang 8
Cumulative Frequency Percent Valid Percent Percent Valid 0 54 38 6.0 6.0
1 189 13.3 20.9 26.8
2 238 16.8 26.3 §31
3 188 112 17.5 70.6
4 115 81 127 83.3
5 54 38 6.0 89.3
6 30 21 3.3 92.6
7 10 7 141 937
8 2 1.6 24 96.1
10 13 9 1.4 97.6
11 3 2 3 979
12 13 9 1.4 99.3
14 2 1 2 99.6
15 2 1 2 99.8
20 1 1 1 99.9
24 1 1 1 100.0 Total 906 63.8 100.0 Missing NAP 486 342
NA 27 19 Total 513 36.2 Total 1419 100.0
- According tothe data, of allGSS respondents answered to the question of hours per day watching TV, 6% people who answered the question don’t watch any television, 53.2% watch 2 hours or less, wheares 16.5% watch 5 hours or more
Hours per day watching TV
Cumulative
Trang 9
- Interm of people who watch TV, 22.2% watch one hour and 82.3% watch four hours or less
>» Frequencies
[DataSetl] C:\Users\Asus
Statistics
Hours per day watching TV
valid
Missing
Percentiles
Downloads gss
From the frequency table, the 25", 50°, 75", 95" percentiles are 1.00,2.00, 3.00 and 6.00 The value for the Median is 2.00 and for the Mode is 2
(d) The bar chart below shows how many hours person spend on watching TV per day The bar chart shows nearly 250 person spent on watching TV with family is
2 hours per day-the average daily time The number of hours people watching TV around | to 4 hours The longer the hours spend on watching TV the lower person
stays
Trang 10Frequency
Hours per day watching TV
According to the figure, it shows that the right-hand tail will be longer than the left-hand tail, this typically means the skewness will be positive So when data are skewed right, the mean is larger than the median
» Frequencies
Statistics
Hours per day watching TV
N Valid 906
Missing 513 Mean 3.00 Median 2.00 Mode 2
Skewness 2.460
Std Error of Skewness 081
(e) The bars of the histogram touch because they represent continuous
data .A histogram divides up the range of possible values in a data set into classes or groups A histogram has an appearance similar to a vertical bar graph, but when the variables are continuous, there are no gaps between the bars When the variables are discrete, however, gaps should be left between the bars
Trang 11Mean =
‘Std Dev = 2.56
N= 906
Hours per day watching TV
There are three differences between histograms and bar charts we can notice :
- Histograms are used to show distributions of variables while bar charts are used to compare variables
- Histograms plot binned quantitative data while bar charts plot categorical data
- Bars can be reordered in bar charts but not in histograms