Regression analysis is a technique used to determine the mathematical relation between a dependent variable and one or more explanatory variables. The ex- planatory variables are the economic variables that are thought to affect the value of the dependent variable. In the simple linear regression model, the dependent vari- able Y is related to only one explanatory variable X, and the relation between Y and X is linear
Y 5 a 1 bX
This is the equation for a straight line, with X plotted along the horizontal axis and Y along the vertical axis. The parameter a is called the intercept parameter because it gives the value of Y at the point where the regression line crosses the Y-axis. (X is equal to 0 at this point.) The parameter b is called the slope parameter because it gives the slope of the regression line. The slope of a line measures the rate of change in Y as X changes (DYyDX); it is therefore the change in Y per unit change in X.
Note that Y and X are linearly related in the regression model; that is, the effect of a change in X on the value of Y is constant. More specifically, a one-unit
regression analysis A statistical technique for estimating the parameters of an equation and testing for statistical significance.
parameter estimation The process of finding estimates of the numerical values of the parameters of an equation.
dependent variable The variable whose variation is to be explained.
explanatory variables The variables that are thought to cause the dependent variable to take on different values.
intercept parameter The parameter that gives the value of Y at the point where the regression line crosses the Y-axis.
slope parameter The slope of the regression line, b 5 DY/DX, or the change in Y associated with a one-unit change in X.
change in X causes Y to change by a constant b units. The simple regression model is based on a linear relation between Y and X, in large part because estimating the parameters of a linear model is relatively simple statistically. As it turns out, assuming a linear relation is not overly restrictive. For one thing, many variables are actually linearly related or very nearly linearly related. For those cases where Y and X are instead related in a curvilinear fashion, you will see that a simple transformation of the variables often makes it possible to model nonlinear relations within the framework of the linear regression model. You will see how to make these simple transformations later in this chapter.
A Hypothetical Regression Model
To illustrate the simple regression model, consider a statistical problem facing the Tampa Bay Travel Agents’ Association. The association wishes to determine the mathematical relation between the dollar volume of sales of travel packages (S) and the level of expenditures on newspaper advertising (A) for travel agents located in the Tampa–St. Petersburg metropolitan area. Suppose that the true (or actual) relation between sales and advertising expenditures is
S 5 10,000 1 5A
where S measures monthly sales in dollars and A measures monthly advertising expenditures in dollars. The true relation between sales and advertising is unknown to the analyst; it must be “discovered” by analyzing data on sales and advertising. Researchers are never able to know with certainty the exact nature of the underlying mathematical relation between the dependent variable and the explanatory variable, but regression analysis does provide a method for estimating the true relation.
Figure 4.1 shows the true or actual relation between sales and advertising expen- ditures. If an agency chooses to spend nothing on newspaper advertising, its sales are expected to be $10,000 per month. If an agency spends $3,000 monthly on ads, it can expect sales of $25,000 (5 10,000 1 5 3 3,000). Because DSyDA 5 5, for every $1 of additional expenditure on advertising, the travel agency can expect a $5 increase in sales. For example, increasing outlays from $3,000 to $4,000 per month causes expected monthly sales to rise from $25,000 to $30,000, as shown in Figure 4.1.
The Random Error Term
The regression equation (or line) shows the level of expected sales for each level of advertising expenditure. As noted, if a travel agency spends $3,000 monthly on ads, it can expect on average to have sales of $25,000. We should stress that $25,000 should be interpreted not as the exact level of sales that a firm will experience when advertising expenditures are $3,000 but only as an average level. To illustrate this point, suppose that three travel agencies in the Tampa–St. Petersburg area each spend exactly $3,000 on advertising. Will all three of these firms experience sales of precisely $25,000? This is not likely. While each of these three firms spends exactly the same amount on advertising, each firm experiences certain random
true (or actual) relation
The true or actual underlying relation between Y and X that is unknown to the researcher but is to be discovered by analyzing the sample data.
effects that are peculiar to that firm. These random effects cause the sales of the various firms to deviate from the expected $25,000 level of sales.
Table 4.1 illustrates the impact of random effects on the actual level of sales achieved. Each of the three firms in Table 4.1 spent $3,000 on advertising in the month of January. According to the true regression equation, each of these travel agencies would be expected to have sales of $25,000 in January. As it turns out, the manager of the Tampa Travel Agency used the advertising agency owned and managed by her brother, who gave better than usual service. This travel agency actually sold $30,000 worth of travel packages in January—$5,000 more than the expected or average level of sales. The manager of Buccaneer Travel Service was on a ski vacation in early January and did not start spending money on advertising until the middle of January. Buccaneer Travel Service’s sales were only $21,000—
$4,000 less than the regression line predicted. In January, nothing unusual hap- pened to Happy Getaway Tours, and its sales of $25,000 exactly matched what the average travel agency in Tampa would be expected to sell when it spends $3,000 on advertising.
F I G U R E 4.1 The True Regression Line: Relating Sales and Advertising Expenditures
55,000
30,000 25,000
10,000
Monthly sales (dollars)
True regression line S = 10,000 + 5A
0
Monthly advertising expenditures
3,000 4,000 9,000 A
S
Firm Advertising
expenditure Actual
sales Expected
sales Random
effect
Tampa Travel Agency $3,000 $30,000 $25,000 $5,000
Buccaneer Travel Service 3,000 21,000 25,000 24,000
Happy Getaway Tours 3,000 25,000 25,000 0
T A B L E 4.1
The Impact of Random Effects on January Sales
Because of these random effects, the level of sales for a firm cannot be exactly predicted. The regression equation shows only the average or expected level of sales when a firm spends a given amount on advertising. The exact level of sales for any particular travel agency (such as the ith agency) can be expressed as
Si 5 10,000 1 5Ai 1 ei
where Si and Ai are, respectively, the sales and advertising levels of the ith agency and ei is the random effect experienced by the ith travel agency. Since ei measures the amount by which the actual level of sales differs from the average level of sales, ei is called an error term, or a random error. The random error term captures the effects of all the minor, unpredictable factors that cannot reasonably be included in the model as explanatory variables.
Because the true regression line is unknown, the first task of regression analysis is to obtain estimates of a and b. To do this, data on monthly sales and advertising expenditures must be collected from Tampa Bay–area travel agents. Using these data, a regression line is then fitted. Before turning to the task of fitting a regression line to the data points in a sample, we summarize the simple regression model in the following statistical relation:
Relation The simple linear regression model relates a dependent variable Y to a single independent explanatory variable X in a linear equation called the true regression line
Y 5 a 1 bX
where a is the Y-intercept, and b is the slope of the regression line (DYyDX). The regression line shows the average or expected value of Y for each level of the explanatory variable X.
Now try Technical Problem 1.