In which: UnitsSold: The number of sold bikes FloorSpace: The area of display square meters CompetingAds: Advertising costs of competitors thousand USD Price: Price of product USD Obs Un
Trang 1Bài tập Thống kê Ra quyết định quản lý
One sale manager who works for a bicycle company wants to review the affect of factors relating to the revenue The data from 30 stores are collected In which:
UnitsSold: The number of sold bikes
FloorSpace: The area of display (square meters)
CompetingAds: Advertising costs of competitors (thousand USD)
Price: Price of product (USD)
Obs UnitsSold FloorSpace CompetingAds Price
Use above data to answer following questions:
Trang 21 Use suitable statistic model to comment about variables:
Variables which need to be analyzed are quantitative ones so specific
descriptive statistics for these variables include mean, median, quartiles, interquartile
range, minimum value, maximum value, variance, standard deviation, outlier and
confidence intervals for each variable We use megastat tool on excel megastat
-descriptive statistics and put all data into data section, we have statistical description
and graphical boxplot for each variable as follows:
Descriptive statistics
UnitsSold (unit) FloorSpace (m2) CompetingAds(1000USD) Price (USD)
Count 30 30 30 30
Mean 1,146.4333 66.6900 97.6933 1,136.1333 sample variance 54,335.9092 362.3864 46.6669 73,186.1885 sample standard deviation 233.1006 19.0365 6.8313 270.5295 minimum 455 8.2 85.5 480
maximum 1493 102.1 111.1 1725
Range 1038 93.9 25.6 1245
confidence interval 95.% lower 1,059.3921 59.5817 95.1425 1,035.1160 confidence interval 95.% upper 1,233.4745 73.7983 100.2442 1,237.1507 half-width 87.0412 7.1083 2.5509 101.0174 1st quartile 1,022.0000 56.1500 93.6250 957.0000 Median 1,148.5000 67.5000 97.5000 1,155.5000 3rd quartile 1,318.5000 77.2500 101.0750 1,292.7500 interquartile range 296.5000 21.1000 7.4500 335.7500 Mode #N/A 84.5000 98.2000 #N/A low extremes 0 0 0 0
low outliers 1 1 0 0
high outliers 0 0 0 0
high extremes 0 0 0 0
Trang 3Basing on the results of statistic description, it can be seen that the distribution
is relatively symmetric, the average and median are approximately equal Revenue variable and shop area variable have an outlier (there is a shop which has area and revenue much lesser than that of another one) Variables of price and advertising cost have no outlier
8/9/2012 8:48.19 (4)
Trang 4Variables of floorspace and advertising cost have relatively concentrated distributions: range is small; standard deviation is tiny in which variable of advertising cost has the most concentrated distribution (smallest range)
Variables of floorspace and advertising cost have mode, the other variables do not have
2 Use scatter graph to evaluate the linear relationship between the revenue and the remaining factors Are results from the graph the same with your expectation
on this relationship based on economic theory? Use the correlation coefficient to check the results from the graph:
Draw the scatter graph by using Megastat on excel, using
Correlation/Regression- scaterplot, we have a graph which reflects the relationship between bicycle sales (shown on the vertical axis) with separate variables respectively: floorspace, advertising cost of competitors and prices of the shop (on the horizontal axis with calculation units respectively: square meters, thousands of dollars, and
dollars) as follows:
From at the graph, it can be seen that there is an increasing trend which shows the positively proportional correlation between sales and floorspace: The larger the floorspace is, the greater the number of products sold is This is quite consistent with economic theory because the product has a trademark, so customers often go to large-scale stores which provide many choices to find the most suitable product Thus, the increase of sales is associated with scale of store display Moreover, the larger the
Trang 5floorspace is, the larger the number of products sold is and this is also consistent with the reality According to the results from the above table, we have the regression function Y = 10.382x + 454.035; the correlation coefficient is positive, reflecting the positively proportional relation If the floorspace increases by 1m2, the number of sales will increase by 10, 382 units (about 10 units) R2 coefficient is 0.719 This means 71.9% of the sale increase is due to the floorspace factor, the remaining 28.1% is determined by other factors
The increasing trend in the graph indicates the positively proportional correlation between sales and advertising of competitors, this seems to be inconsistent with the economic theory (revenue usually decreases when advertising costs of competitors increase) However, this trend is not clear due to the dispersion of points
is quite clearly scattered in comparison with the line which shows the general trend This can be explained that the advertising activities of competitors generally reduce the sales, but for bike, the competitors' ads do not affect much to the sales of stores
From to the table, the correlation function between sales and advertising costs
of competitors is Y = 8.625x + 303 833; the correlation coefficient is positve, which reflects the positively proportional relation If the cost of competitors' ads increases by 1,000 USD, the store's sales will increase by 9 units However, the coefficient R2 is
0064, showing that only 6.4% of the increase in sales is influenced by the advertising
Trang 6costs of competitors The number is not much, so can not confirm the relationship between sales and advertising costs
The relationship between Units sold and price: it can be seen from the graph that there is a downward trend; prices rise when sales decline, but the decline of prices
is not much (the trend line is nearly horizontal), which shows that the concern about the price does not have much influence on decision to purchase this product This is entirely consistent with theory about price elasticity of demand (if the price increases, the sales will decrease) The regression function: y =-0.064x + 1.218.619; the function
is consistent with the graph, the correlation coefficient is negative, which represents the inversely proportional relationship between the two variables However, the coefficient R2 is 0.005, meaning that only 0.5% of the decrease of sales is due to the increase of price
3.Using average confidence interval for above variables with confidence level of 95%, explain the meaning of results obtained Estimate the proportion of stores having sales greater than 1200 units:
Use Confidence interval – mean in MegaStat
Confidence interval – mean UnitsSold (unit)
95% confidence level 1146.433333 mean
Trang 7233.1006418 std dev.
30 n 2.045 t (df = 29) 87.0412 half-width 1,233.4745
upper confidence limit
1,059.3921
lower confidence limit
Comment: The average turnover of stores is in the range (1146 ± 87) (from 1059 to
1233), with confidence level of 95%
Confidence interval - mean FloorSpace (m2)
95% confidence level 66.69 mean
19.03645052 std dev
30 n 2.045 t (df = 29) 7.108 half-width 73.798
upper confidence limit
59.582
lower confidence limit
Comment: the average floorspace of stores is in the range (66 ± 7) (from 59 to 73) m2, with the confidence level of 95%
Confidence interval - mean CompetingAds(1000USD)
95% confidence level 97.69333333 mean
6.831313971 std dev
30 n 2.045 t (df = 29) 2.5509 half-width 100.2442 upper confidence limit 95.1425 lower confidence limit
Comment: The advertising cost of compertitors is in the range of (97.69± 2.55) (from
95.14 to 100.24) (unit: 1000USD), confidence level is 95%
Trang 8Confidence interval - mean Price (USD)
95% confidence level 1136.133333 mean
270.5294596 std dev
30 n 2.045 t (df = 29) 101.0174 half-width 1,237.1507
upper confidence limit 1,035.1160
lower confidence limit
Comment: The average price of stores is in the range (1136± 101) (or from 1035 to
1237) USD, the confidence level of 95%
Estimate the proportion of stores having units sold more than 1200 units:
Use Frequency Distribution – Quantitative to estimate:
Frequency Distribution - Quantitative
UnitsSold
low
er
uppe r
midpoi nt
widt h
frequen cy
percen
t
freque ncy
percen t
1,00
0 900 200 5 16.7 7 23.3 1,00
1,20
0 1,100 200 9 30.0 16 53.3 1,20
1,40
0 1,300 200 9 30.0 25 83.3 1,40
1,60
0 1,500 200 5 16.7 30 100.0
30 100.0 From the distribution table, it can be seen that the number of stores having sales greater than 1200 units is 14 in 30 stores sampled, accounting for 46.7%
Use Confidence interval – proportion to estimate the proportion of stores having sales more than 1200 units:
Confidence interval - proportion
95% confidence level
Trang 97 proportion
30 n 1.960 z
0.179 half-width
0.645
upper confidence limit
0.288
lower confidence limit
Comment: The proportion of stores which have units sold more than 1200 is in the
range of (46.7 ± 17.9 %), the confidence level is 95%
4 Test the idea that the average advertising costs of competitors is less than 100 thousand dollars and the average sales of stores is less than 1200 units:
Use Hypothesis test to test the hypothesis: “the average advertising cost of
competitors is less than 100 thousand USD”.
We have hypothesis couple: H0 ≤ 100 and H1 > 100
Hypothesis Test: Mean vs Hypothesized Value
100.00000 hypothesized value
97.69333 mean CompetingAds(1000USD)
6.83131 std dev
1.24722 std error
30 n
29 df
-1.85 t
.9627 p-value (one-tailed, upper)
95.14248 confidence interval 95.% lower
100.24419 confidence interval 95.% upper
2.55085 margin of error
Comments: Base on the value of P-value in the table above, p-value = 0.96> α = 0.05,
so we can not reject the hypothesis Ho, which means the hypothesis: "the average advertising cost of competitors is less than 100 thousand dollars" is rejected
Use Hypothesis test function to test the hypothesis: “the average sales of stores is less
than 1200 ”.
Hypothesis couple is: H0 ≥ 1200 and H1 < 1200
Hypothesis Test: Mean vs Hypothesized Value
Trang 101,200.00000 hypothesized value
1,146.43333 mean UnitsSold (unit)
233.10064 std dev
42.55816 std error
30 n
29 df
-1.26 t 1091 p-value (one-tailed, lower)
1,059.39212 confidence interval 95.% lower
1,233.47454 confidence interval 95.% upper
87.04121 margin of error
Comments: Basing on the value of P-value in the table above, p-value = 0.10> α =
0.05, so we can not reject the hypothesis Ho, which means can not reject the
hypothesis "the average sales of stores is greater than or equal to 1200 units ", which means that the hypothesis H1:" the average sales of stores is less than 1200
units of product " is rejected.
5 Estimate a linear regression model in which the dependent variable is sales, the independent variables are remaining variables:
Use Megastat tool on excel: basing on the data given by use Correlation/Regression- regression analysis, we have table of regression analysis which reflects the relationship between sales (denoted by Y) with independent variables: floorspace denoted by X1 (m2), advertising costs of competitors, denoted by X2 (thousand USD) and the price denoted by X3 (USD) as follows:
Regression Analysis
R² 0.759 Adjusted R² 0.731 n 30
Std Error
120.78
UnitsSold (unit)
ANOVA table
Regression 1,196,409.6 3 398,803.21 27.33
Trang 113.36E-542 81 08 Residual
379,331.712
5 26
14,589.681
2 Total
1,575,741.3
667 29
variables coefficients
std.
error t (df=26)
p-value
95%
lower
95% upper
Intercept 1,225.4435
397.26
85 3.085 0048
408.84
64
2,042.04
06 FloorSpace (m2) 11.5222 1.3296 8.666
3.82 E-09 8.7892 14.2553
CompetingAds(1000U
SD) -6.9351 3.9048 -1.776 0874
-14.961
5 1.0913 Price (USD) -0.1496 0.0893 -1.675 1059 -0.3331 0.0339
a Explain the significance of the regression coefficients and the R2 coefficient Which Independent variables have most impact on sales:
Regression model reflects the relationship between sales and other factors: floorspace (m2), advertising costs of competitors (thousand USD) and price of stores (USD) as follows:
Y = b 0 + b 1 *X 1 + b 2 *X 2 + b 3 *X 3
Y=1225.4435+ 11.5222X 1 – 6.9351X 2 - 0.1496X 3
+ the regression coefficient of the independent variable Floorspace is 11.5222 , meaning that the relationship between sales and floorspace is proportional; If the floorspace increases by 1m2, the number of bikes sold will increase by 11, assume that other factors like advertising costs of competitors and the price remain the same
+ the regression coefficient of the independent variable Ads cost of competitors
is -6.9351, meaning that the relationship between sales and ads cost of competitors is inversely related; as the ads cost of competitors increase by 1,000 USD, the number of bikes sold goes down about 7 units, other factors remain the same
+ The regression coefficient of the independent variable Price is -0.1496, which means that the relationship between the sales and the price is inversely proportional; when the price rises by 1 USD, the number of bikes sold fell by 0.15 (equivalent to the
Trang 12price increase by 7 USD, the number of units sold decreases by 1), provided that other factors advertising cost of competitors and floorspace remain the same
+ Coefficient R2 = 0759, which means that 75.9% of the increase in sales is explained by the influence of these factors: floorspace, advertising costs of competitors and the price
+ In the above factors, floorspace factor has most impact on sales (regression coefficient is the largest)
b.Use appropriate test which independent variables have impacts on sales and which ones do not Then, estimate the suitability of the selection of appropriate independent variables? Is there any possible missing variables which also can affect the sales, give an example:
b1 Test the relation between floorspace and sales:
Hypotheses:
H0: β1 = 0
H1: β1 ≠ 0
t = 8.666 and p-value = 3.82.10-9 < α = 0,05 ; reject H0
Conclusion: Sales and floorspace are related and floorspace has influence on sales
b2 Test the relation between sales and advertising cost of competitors:
Hypotheses: H0: β2 = 0
H1: β2 ≠ 0
t = -1.176 and p-value = 0.0874 > α = 0,05 ; accept H0
Thus, there is no foundation to confirm the relation between advertising cost of competitors and sales of stores
b3 Test the relation betwee sales and price:
Hypothesis: H0: β3 = 0
H1: β3 ≠ 0
With t = -1,675 and p-value = 0.1059 > α = 0,05 ; accept H0
So, can not confirm the relation between sales and price of stores