494 H. White using this last step, but also because it facilitates a feasible computation of an approxi- mation to the cross-validated MSE. Although we touched on this issue only briefly above, it is now necessary to con- front head-on the challenges for cross-validation posed by models nonlinear in the parameters. This challenge is that in order to compute exactly the cross-validated MSE associated with any given nonlinear model, one must compute the NLS parameter es- timates obtained by holding out each required validation block of observations. There are roughly as many validation blocks as there are observations (thousands here). This multiplies by the number of validation blocks the difficulties presented by the conver- gence problems encountered in a single NLS optimization over the entire estimation data set. Even if this did not present a logistical quagmire (which it surely does), this also requires a huge increase in the required computations (a factor of approximately 1700 here). Some means of approximating the cross-validated MSE is thus required. Here we adopt the expedient of viewing the hidden unit coefficients obtained by the initial NLS on the estimation set as identifying potentially useful predictive transforms of the underlying variables and hold these fixed in cross-validation. Thus we only need to re-compute the hidden-to-output coefficients by OLS for each validation block. As mentioned above, this can be done in a highly computationally efficient manner using Racine’s (1997) feasible block cross-validation method. This might well result in overly optimistic cross-validated estimates of MSE, but without some such approximation, the exercise is not feasible. (The exercise avoiding such approximations might be feasible on a supercomputer, but, as we see shortly, this brute force NLS approach is dominated by QuickNet, so the effort is not likely justified.) Table 1 reports a subset of the results for this first exercise. Here we report two sum- mary measures of goodness of fit: mean squared error (MSE) and r-squared (R 2 ). We report these measures for the estimation sample, the cross-validation sample (CV), and the hold-out sample (Hold-Out). For the estimation sample, R 2 is the stan- dard multiple correlation coefficient. For the cross-validation sample, R 2 is computed as one minus the ratio of the cross-validated MSE to the estimation sample variance of the dependent variable. For the hold-out sample, R 2 is computed as one minus the ratio of the hold-out MSE to the hold-out sample variance of the dependent variable about the estimation sample mean of the dependent variable. Thus, we can observe negative values for the CV and Hold-Out R 2 ’s. A positive value for the Hold-Out R 2 indicates that the out-of-sample predictive performance for the estimated model is better than that afforded by the simple constant prediction provided by the estimation sample mean of the dependent variable. From Table 1 we see that, as expected, the estimation R 2 is never very large, ranging from a low of about 0.0089 to a high of about 0.0315. For the full experiment, the great- est estimation sample R 2 is about 0.0647, occurring with 50 hidden units (not shown). This apparently good performance is belied by the uniformly negative CV R 2 ’s. Al- though the best CV R 2 or MSE (indicated by “ * ”) identifies the model with the best Hold-Out R 2 (indicated by “ ∧ ”), that is, the model with only linear predictors (zero hidden units), this model has a negative Hold-Out R 2 , indicating that it does not even Ch. 9: Approximate Nonlinear Forecasting Methods 495 Table 1 S&P 500: Naive nonlinear least squares – Logistic Summary goodness of fit Hidden units Estimation MSE CV MSE Hold-out MSE Estimation R-squared CV R-squared Hold-out R-squared 01.67890 1.79932 ∗ 0.55548 0.00886 −0.06223 −0.03016 ∧,∗ 11.67819 1.79965 0.56183 0.00928 −0.06242 −0.04194 21.67458 1.79955 0.57721 0.01141 −0.06236 −0.07046 31.67707 1.81529 0.55925 0.00994 −0.07166 −0.03715 41.65754 1.83507 0.58907 0.02147 −0.08333 −0.09245 51.64420 1.86859 0.57978 0.02935 −0.10312 −0.07522 61.67122 1.86478 0.55448 0.01340 −0.10087 −0. 02831 71.66337 1.89032 0.56545 0.01803 −0.11595 −0.04865 81.66138 1.86556 0.59504 0.01921 −0.10134 −0.10353 91.65662 1.90687 0.56750 0.02202 −0.12572 −0.05245 10 1.66970 1.94597 0.56098 0.01429 −0.14880 −0.04037 11 1.64669 1.87287 0.58445 0.02788 −0.10565 −0.08390 12 1.65209 1.85557 0.55982 0.02469 −0.09544 −0 .03822 13 1.64594 2.03215 0.56302 0.02832 −0.19968 −0.04415 14 1.64064 1.91624 0.58246 0.03145 −0.13125 −0.08020 15 1.64342 2.00411 0.57788 0.02981 −0.18313 −0.07170 16 1.65963 2.00244 0.57707 0.02024 −0.18214 −0.07021 17 1.65444 2.05466 0.58594 0.02330 −0.21297 −0.08665 18 1.64254 1.98832 0.60214 0.03033 −0.17381 − 0.11670 19 1.65228 2.01295 0.59406 0.02458 −0.18835 −0.10172 20 1.64575 2.09084 0.60126 0.02843 −0.23432 −0.11506 perform as well as using the estimation sample mean as a predictor in the hold-out sample. This unimpressive prediction performance is entirely expected, given our earlier dis- cussion of the implications of the efficient market hypothesis, but what might not have been expected is the erratic behavior we see in the estimation sample MSEs. We see that as we consider increasingly flexible models, we do not observe increasingly better in-sample fits. Instead, the fit first improves for hidden units one and two, then wors- ens for hidden unit three, then at hidden units four and five improves dramatically, then worsens for hidden unit six, and so on, bouncing around here and there. Such behavior will not be surprising to those with prior ANN experience, but it can be disconcerting to those not previously inoculated. The erratic behavior we have just observed is in fact a direct consequence of the challenging nonconvexity of the NLS objective function induced by the nonlinearity in parameters of the ANN model, coupled with our choice of a new set of random starting values for the coefficients at each hidden unit addition. This behavior directly reflects and illustrates the challenges posed by parameter nonlinearity pointed out earlier. 496 H. White Table 2 S&P 500: Modified nonlinear least squares – Logistic Summary goodness of fit Hidden units Estimation MSE CV MSE Hold-out MSE Estimation R-squared CV R-squared Hold-out R-squared 01.67890 1.79932 ∗ 0.55548 0.00886 −0.06223 −0.03016 ∧,∗ 11.67819 1.79965 0.56183 0.00928 −0.06242 −0.04194 21.67813 1.80647 0.56221 0.00932 −0.06645 −0.04264 31.67290 1.80611 0.58417 0.01241 −0.06623 −0.08338 41.67166 1.84150 0.58922 0.01314 −0.08713 −0.09274 51.67024 1.84690 0.59676 0.01398 −0.09032 −0.10673 61.67010 1.84711 0.59660 0.01406 −0.09044 −0. 10642 71.66877 1.85188 0.59627 0.01484 −0.09326 −0.10582 81.66782 1.85215 0.59292 0.01541 −0.09341 −0.09961 91.66752 1.89321 0.59516 0.01558 −0.11766 −0.10375 10 1.66726 1.93842 0.59673 0.01573 −0.14434 −0.10666 11 1.66305 1.94770 0.59417 0.01822 −0.14982 −0.10193 12 1.65801 1.95322 0.58804 0.02119 −0.15308 −0 .09056 13 1.65795 1.96126 0.58773 0.02123 −0.15783 −0.08998 14 1.65734 1.96638 0.58533 0.02159 −0.16085 −0.08552 15 1.65599 1.98448 0.58592 0.02239 −0.17153 −0.08662 16 1.65548 2.00899 0.58556 0.02269 −0.18601 −0.08595 17 1.65527 2.01352 0.58510 0.02281 −0.18868 −0.08509 18 1.65451 2.02145 0.58404 0.02326 −0.19336 − 0.08313 19 1.65397 2.02584 0.58254 0.02358 −0.19595 −0.08035 20 1.65397 2.02583 0.58254 0.02358 −0.19595 −0.08036 This erratic estimation performance opens the possibility that the observed poor predictive performance could be due not to the inherent unpredictability of the target variable, but rather to the poor estimation job done by the brute force NLS approach. We next investigate the consequences of using a modified NLS that is designed to eliminate this erratic behavior. This modified NLS method picks initial values for the coefficients at each stage in a manner designed to yield increasingly better in-sample fits as flexibil- ity increases. We simply use as initial values the final values found for the coefficients in the previous stage and select new initial coefficients at random only for the new hidden unit added at that stage; this implements a simple homotopy method. We present the results of this next exercise in Table 2. Now we see that the in-sample MSE’s behave as expected, decreasing nicely as flexibility increases. On the other hand, whereas our naïve brute force approach found a solution with only five hidden units delivering an estimation sample R 2 of 0.0293, this second approach requires 30 hidden units (not reported here) to achieve a comparable in-sample fit. Once again we have the best CV performance occurring with zero hidden units, corresponding to the best (but negative) out-of-sample R 2 . Clearly, this modification to naïve brute force NLS does not resolve the question of whether the so far unimpressive results could be due to poor Ch. 9: Approximate Nonlinear Forecasting Methods 497 Table 3 S&P 500: QuickNet – Logistic Summary goodness of fit Hidden units Estimation MSE CV MSE Hold-out MSE Estimation R-squared CV R-squared Hold-out R-squared 01.67890 1.79932 0.55548 0.00886 −0.06223 −0.03016 ∧ 11.66180 1.79907 0.55916 0.01896 −0.06208 −0.03699 21.65123 1.78741 0.55726 0.02520 −0.05520 −0.03346 31.63153 1.76889 0.61121 0.03683 −0.04427 −0.13352 41.62336 1.76625 0.60269 0.04165 −0.04271 −0.11772 51.61769 1.77087 0.60690 0.04500 −0.04543 −0.12552 61.60716 1.76750 0.62050 0.05121 −0.04344 −0. 15075 71.59857 1.75783 0.61638 0.05629 −0.03773 −0.14310 81.59297 1.76191 0.61259 0.05959 −0.04014 −0.13609 91.58653 1.75298 0.63545 0.06339 −0.03487 −0.17848 10 1.58100 1.75481 0.64401 0.06666 −0.03595 −0.19436 11 1.57871 1.75054 ∗ 0.64341 0.06801 −0.03343 −0.19323 ∗ 12 1.57364 1.75662 0.65497 0.07100 −0.03702 −0.21467 13 1.56924 1.76587 0.64614 0.07360 −0.04248 −0.19830 14 1.56483 1.76621 0.65012 0.07621 −0.04268 −0.20567 15 1.55869 1.76868 0.64660 0.07983 −0.04414 −0.19915 16 1.55063 1.78549 0.64260 0.08459 −0.05406 −0.19173 17 1.54289 1.78510 0.65037 0.08915 −0.05383 −0. 20614 18 1.53846 1.78166 0.65182 0.09177 −0.05180 −0.20883 19 1.53587 1.80860 0.64796 0.09330 −0.06771 −0.20167 20 1.53230 1.81120 0.64651 0.09541 −0.06924 −0.19899 estimation performance, as the estimation performance of the naïve method is better, even if more erratic. Can QuickNet provide a solution? Table 3 reports the results of applying QuickNet to our S&P 500 data, again with the logistic cdf activation function. At each iteration of Step 1, we selected the best of m = 500 candidate units and applied cross-validation using OLS, taking the hidden unit coefficients as given. Here we see much better performance in the CV and estimation samples than we saw in either of the two NLS approaches. The estimation sample MSEs decrease monotonically, as we should expect. Further, we see CV MSE first decreasing and then increasing as one would like, identifying an optimal complexity of eleven hidden units for the nonlinear model. The estimation sample R 2 for this CV-best model is 0.0634, much better than the value of 0.0293 found by the CV-best model in Table 1, and the CV MSE is now 1.751, much better than the corresponding best CV MSE of 1.800 found in Table 1. Thus QuickNet does a much better job of fitting the data, in terms of both estima- tion and cross-validation measures. It is also much faster. Apart from the computation time required for cross-validation, which is comparable between the methods, Quick- Net required 30.90 seconds to arrive at its solution, whereas naïve NLS required 600.30 498 H. White seconds and modified NLS required 561.46 seconds respectively to obtain inferior so- lutions in terms of estimation and cross-validated fit. Another interesting piece of evidence related to the flexibility of ANNs and the rela- tive fitting capabilities of the different methods applied here is that QuickNet delivered a maximum estimation R 2 of 0.1727, compared to 0.0647 for naïve NLS and 0.0553 for modified NLS, with 50 hidden units (not shown) generating each of these values. Comparing these and other results, it is clear that QuickNet rapidly delivers much better sample fits for given degrees of model complexity, just as it was designed to do. A serious difficulty remains, however: the CV-best model identified by QuickNet is not at all a good model for the hold-out data, performing quite poorly. It is thus im- portant to warn that even with a principled attempt to avoid overfit via cross-validation, there is no guarantee that the CV-best model will perform well in real-world hold-out data. One possible explanation for this is that, even with cross-validation, the sheer flex- ibility of ANNs somehow makes them prone to over-fitting the data, viewed from the perspective of pure hold-out data. Another strong possibility is that real world hold-out data can differ from the esti- mation (and thus cross-validation) data in important ways. If the relationship between the target variable and its predictors changes between the estimation and hold-out data, then even if we have found a good prediction model using the estimation data, there is no reason for that model to be useful on the hold-out data, where a different predic- tive relationship may hold. A possible response to handling such situations is to proceed recursively for each out-of-sample observation, refitting the model as each new observa- tion becomes available. For simplicity, we leave aside an investigation of such methods here. This example underscores the usefulness of an out-of-sample evaluation of predictive performance. Our results illustrate that it can be quite dangerous to simply trust that the predictive relationship of interest is sufficiently stable to permit building a model useful for even a modest post-sample time frame. Below we investigate the behavior of our methods in a less ambiguous environment, using artificial data to ensure (1) that there is in fact a nonlinear relationship to be un- covered, and (2) that the predictive relationship in the hold-out data is identical to that in the estimation data. Before turning to these results, however, we examine two alterna- tives to the standard logistic ANN applied so far. The first alternative is a ridgelet ANN, and the second is a nonneural network method that uses the familiar algebraic polyno- mials. The purpose of these experiments is to compare the standard ANN approach with a promising but less familiar ANN method and to contrast the ANN approaches with a more familiar benchmark. In Table 4, we present an experiment identical to that of Table 3, except that instead of using the standard logistic cdf activation function, we instead use the ridgelet activation function (z) = D 5 φ(z) = −z 5 + 10z 3 − 15z φ(z). Ch. 9: Approximate Nonlinear Forecasting Methods 499 Table 4 S&P 500: QuickNet – Ridgelet Summary goodness of fit Hidden units Estimation MSE CV MSE Hold-out MSE Estimation R-squared CV R-squared Hold-out R-squared 01.67890 1.79932 0.55548 0.00886 −0.06223 −0.03016 ∧ 11.66861 1.79555 0.56961 0.01494 −0.06000 −0.05636 21.66080 1.78798 0.59077 0.01955 −0.05553 −0.09561 31.65142 1.78114 0.59605 0.02509 −0.05150 −0.10540 41.63519 1.79177 0.59107 0.03467 −0.05777 −0.09617 51.62747 1.78463 0.60156 0.03922 −0.05356 −0.11561 61.61933 1.77995 0.61657 0.04403 −0.05079 −0. 14346 71.60872 1.77598 0.64556 0.05029 −0.04845 −0.19723 81.59657 1.76742 0.67802 0.05747 −0.04339 −0.25742 91.58620 1.76409 0.70122 0.06358 −0.04143 −0.30045 10 1.57463 1.76207 0.72377 0.07042 −0.04023 −0.34226 . . . . . . . . . . . . . . . . . . . . . 36 1.35532 1.65232 0.87676 0.19989 0.02456 −0.62600 37 1.34989 1.65332 0.88115 0.20309 0.02396 −0.63414 38 1.34144 1.65063 0.88568 0.20808 0.02555 −0.64253 39 1.33741 1.64768 ∗ 0.88580 0.21046 0.02729 −0.64277 ∗ 40 1.33291 1.65941 0.88432 0.21312 0.02037 −0.64001 41 1.32711 1.65571 0.89149 0.21654 0.02255 −0.65331 42 1.32098 1.65407 0.89831 0.22016 0.02352 −0.66596 43 1.31413 1.66000 0.90193 0.22420 0.02002 −0.67268 44 1.30282 1.65042 0.91420 0.23088 0.02568 −0.69543 45 1.29695 1.65575 0.91205 0.23434 0.02253 −0.69144 46 1.29116 1.65312 0.91696 0.23776 0.02408 −0.70056 47 1.28461 1.65054 0.90577 0.24163 0.02560 −0.67980 48 1.27684 1.64873 0.92609 0.24622 0.02667 −0.71748 49 1.27043 1.65199 0.94510 0.25000 0.02475 −0.75273 50 1.26459 1.64845 0.95154 0.25345 0.02684 −0.76468 The choice of h = 5 is dictated by the fact that k = 10 for the present example. As this is a nonpolynomial analytic activation function, it is also GCR, so we may expect Quick- Net to perform well in sample. We emphasize that we are simply performing QuickNet with a ridgelet activation function and are not implementing any estimation procedure specified by Candes. The results given here thus do not necessarily put ridgelets in their best light, but are nevertheless of interest as they do indicate what can be achieved with some fairly simple procedures. Examining Table 4, we see results qualitatively similar to those for the logistic cdf ac- tivation function, but with the features noted there even more pronounced. Specifically, the estimation sample fit improves with additional complexity, but even more quickly, suggesting that the ridgelets are even more successful at fitting the estimation sample 500 H. White Table 5 S&P 500: QuickNet – Polynomial Summary goodness of fit Hidden units Estimation MSE CV MSE Hold-out MSE Estimation R-squared CV R-squared Hold-out R-squared 01.67890 1.79932∗ 0.55548 0.00886 −0.06223 −0.03016 ∧,∗ 11.65446 1.81835 0.56226 0.02329 −0.07346 −0.04274 21.64104 1.80630 0.56455 0.03121 −0.06635 −0.04698 31.62964 2.56943 0.56291 0.03794 −0.51686 −0.04394 41.62598 2.67543 0.56242 0.04011 −0.57944 −0.04304 51.62234 2.81905 0.56188 0.04225 −0.66422 −0.04203 61.61654 3.57609 0.56654 0.04568 −1.11114 −0. 05068 71.60293 3.79118 0.56974 0.05371 −1.23812 −0.05661 81.59820 3.86937 0.56716 0.05650 −1.28428 −0.05183 91.59449 4.01195 0.56530 0.05870 −1.36845 −0.04837 10 1.58759 6.92957 0.56664 0.06277 −3.09087 −0.05086 11 1.58411 7.55240 0.56159 0.06482 −3.45855 −0.04150 12 1.58229 7.56162 0.56250 0.06590 −3.46400 −0 .04318 13 1.57722 8.71949 0.56481 0.06889 −4.14755 −0.04747 14 1.57068 9.11945 0.56922 0.07275 −4.38366 −0.05565 15 1.56755 8.98026 0.57053 0.07460 −4.30149 −0.05807 16 1.56073 6.66135 0.57268 0.07862 −2.93253 −0.06206 17 1.55548 6.57781 0.56465 0.08172 −2.88321 −0.04717 18 1.55177 6.53618 0.56305 0.08392 −2.85863 − 0.04420 19 1.54951 7.45435 0.56129 0.08525 −3.40067 −0.04094 20 1.54512 7.24081 0.57165 0.08784 −3.27461 −0.06015 data patterns. The estimation sample R 2 reaches a maximum of 0.2534 for 50 hidden units, an almost 50% increase over the best value for the logistic. The best CV perfor- mance occurs with 39 hidden units, with a CV R 2 that is actually positive (0.0273). As good as this performance is on the estimation and CV data, however, it is quite bad on the hold-out data. The Hold-out R 2 with 39 ridgelet units is −0.643, reinforcing our comments above about the possible mismatch between the estimation predictive rela- tionship and the importance of hold-out sample evaluation. In recent work, Hahn (1998) and Hirano and Imbens (2001) have suggested using algebraic polynomials for nonparametric estimation of certain conditional expectations arising in the estimation of causal effects. These polynomials thus represent a famil- iar and interesting benchmark against which to contrast our previous ANN results. In Table 5 we report the results of nonlinear approximation using algebraic polynomials, performed in a manner analogous to QuickNet. The estimation algorithm is identical, except that instead of randomly choosing m candidate hidden units as before, we now randomly choose m candidate monomials from which to construct polynomials. For concreteness and to control erratic behavior that can result from the use of poly- nomials of too high a degree, we restrict ourselves to polynomials of degree less than or Ch. 9: Approximate Nonlinear Forecasting Methods 501 equal to 4. As before, we always include linear terms, so we randomly select candidate monomials of degree between 2 and 4. The candidates were chosen as follows. First, we randomly selected the degree of the candidate monomial such that degrees 2, 3, and 4 had equal (1/3) probabilities of selection. Let the randomly chosen degree be denoted d. Then we randomly selected d indexes with replacement from the set {1, ,9} and con- structed the candidate monomial by multiplying together the variables corresponding to the selected indexes. The results of Table 5 are interesting in several respects. First, we see that although the estimation fits improve as additional terms are added, the improvement is nowhere near as rapid as it isfor the ANNapproaches.Even with 50 terms, theestimationR 2 only reaches 0.1422 (not shown). Most striking, however, is the extremely erratic behavior of the CV MSE. This bounces around, but generally trends up, reaching values as high as 41. As a consequence, the CV MSE ends up identifying the simple linear model as best, with its negative Hold-out R 2 . The erratic behavior of the CV MSE is traceable to extreme variation in the distributions of the included monomials. (Standard deviations can range from 2 to 150; moreover, simple rescaling cannot cure the problem, as the associated regression coefficients essentially undo any rescaling.) This variation causes the OLS estimates, which are highly sensitive to leverage points, to vary wildly in the cross-validation exercise, creating large CV errors and effectively rendering CV MSE useless as an indicator of which polynomial model to select. Our experiments so far have revealed some interesting properties of our methods, but because of the extremely challenging real-world forecasting environment to which they have been applied, we have not really been able to observe anything of their relative forecasting ability. To investigate the behavior of our methods in a more controlled environment, we now discuss a second set of experiments using artificial data in which we ensure (1) that there is in fact a nonlinear relationship to be uncovered, and (2) that the predictive relationship in the hold-out data is identical to that in the estimation data. We achieve these goals by generating artificial estimation data according to the non- linear relationship Y ∗ t = a f q X t ,θ ∗ q + 0.1ε t , with q = 4, where X t = (Y t−1 ,Y t−2 ,Y t−3 , |Y t−1 |, |Y t−2 |, |Y t−3 |,R t−1 ,R t−2 ,R t−3 ) , as in the original estimation data (note that X t contains lags of the original Y t and not lags of Y ∗ t ). In particular, we take to be the logistic cdf and set f q x,θ ∗ q = x α ∗ q + q j=1 x γ ∗ j β ∗ qj , where ε t = Y t −f q (x, θ ∗ q ), and with θ ∗ q obtained by applying QuickNet (logistic) to the original estimation data with four hidden units. We choose a to ensure that Y ∗ t exhibits the same unconditional standard deviation in the simulated data as it does in the actual data. The result is an artificial series of returns that contains an “amplified” nonlinear signal relative to the noise constituted by ε t . We generate hold-out data according to the 502 H. White Table 6 Artificial data: Ideal specification Summary goodness of fit Hidden units Estimation MSE CV MSE Hold-out MSE Estimation R-squared CV R-squared Hold-out R-squared 01.30098 1.58077 0.99298 0.23196 0.06679 0.06664 11.12885 1.19004 0.83977 0.33359 0.29746 0.21065 20.81753 0.86963 0.67849 0.51737 0.48662 0.36225 30.66176 0.70360 0.63142 0.60933 0.58463 0.40649 40.43081 0.45147 ∗ 0.45279 0.74567 0.73348 0.57439 ∧,∗ same relationship using the actual X t ’s, but now with ε t generated as i.i.d. normal with mean zero and standard deviation equal to that of the errors in the estimation sample. The maximum possible hold-out sample R 2 turns out to be 0.574, which occurs when the model uses precisely the right set of coefficients for each of the four hidden units. The relationship is decidedly nonlinear, as using a linear predictor alonedelivers a Hold- Out R 2 of only 0.0667. The results of applying the precisely right hidden units are presented in Table 6. First we apply naïve NLS to these data, parallel to the results discussed of Table 1. Again we choose initial values for the coefficients at random. Given that the ideal hid- den unit coefficients are located in a 40-dimensional space, there is little likelihood of stumbling upon these, so even though the model is in principle correctly specified for specifications with four or more hidden units, whatever results we obtain must be viewed as an approximation to an unknown nonlinear predictive relationship. We report our naïve NLS results in Table 7. Here we again see the bouncing pattern of in-sample MSEs first seen in Table 1, but now the CV-best model containing eight hidden units also identifies a model that has locally superior hold-out sample perfor- mance. For the CV-best model, the estimation sample R 2 is 0.6228, the CV sample R 2 is 0.5405, and the Hold-Out R 2 is 0.3914. We also include in Table 7 the model that has the best Hold-Out R 2 , which has 49 hidden units. For this model the Hold-Out R 2 is 0.4700; however, the CV sample R 2 is only 0.1750, so this even better model would not have appeared as a viable candidate. Despite this, these results are encouraging, in that now the ANN model identifies and delivers rather good predictive performance, both in and out of sample. Table 8 displays the results using the modified NLS procedure parallel to Table 2. Now the estimation sample MSEs decline monotonically, but the CV MSEs never approach those seen in Table 7. The best CV R 2 is 0.4072, which corresponds to a Hold-Out R 2 of 0.286. The best Hold-Out R 2 of 0.3879 occurs with 41 hidden units, but again this would not have appeared as a viable candidate, as the corresponding CV R 2 is only 0.3251. Ch. 9: Approximate Nonlinear Forecasting Methods 503 Table 7 Artificial data: Naive nonlinear least squares – Logistic Summary goodness of fit Hidden units Estimation MSE CV MSE Hold-out MSE Estimation R-squared CV R-squared Hold-out R-squared 01.30098 1.58077 0.99298 0.23196 0.06679 0.06664 11.30013 1.49201 0.99851 0.23247 0.11919 0.06144 21.25102 1.46083 0.93593 0.26146 0.13760 0.12026 31.25931 1.49946 0.93903 0.25657 0.11479 0.11735 41.14688 1.57175 0.92754 0.32294 0.07212 0.12815 51.24746 1.51200 0.93970 0.26356 0.10739 0.11672 61.23788 1.57817 0.96208 0.26922 0.06833 0.09569 71.10184 1.41418 0.86285 0.34953 0. 16514 0.18895 80.63895 0.77829 ∗ 0.64743 0.62280 0.54054 0.39144 ∗ 91.07860 1.36222 0.83499 0.36325 0.19582 0.21514 10 1.17196 1.51568 0.89399 0.30814 0.10522 0.15968 11 1.01325 1.44063 0.73511 0.40183 0.14952 0.30902 12 1.04729 1.57122 0.89255 0.38174 0.07243 0.16104 13 1.16834 1.69258 0.92319 0.31027 0.00079 0.13224 14 0.97988 1.67652 0.85443 0.42153 0.01027 0.19687 15 1.17205 1.63191 0.83216 0.30808 0.03660 0.21780 16 1.02739 1.58299 0.77350 0.39348 0. 06548 0.27294 17 1.07750 1.62341 0.84962 0.36390 0.04162 0.20140 18 0.97684 1.45189 0.72514 0.42333 0.14288 0.31840 19 1.01071 1.77567 0.75559 0.40333 −0.04827 0.28978 20 1.08027 2.20172 0.80205 0.36226 −0.29979 0.24610 . . . . . . . . . . . . . . . . . . . . . 49 0.72198 1.39742 0.56383 0.57378 0.17504 0.47002 ∧ Next we examine the results obtained by QuickNet, parallel to the results of Table 3. In Table 9 we observe quite encouraging performance. The CV-best configuration has 33 hidden units, with a CV R 2 of 0.6484 and corresponding Hold-Out R 2 of 0.5430. This is quite close to the maximum possible value of 0.574 obtained by using precisely the right hidden units. Further, the true best hold-out performance has a Hold-Out R 2 of 0.5510 using 49 hidden units, not much different from that of the CV-best model. The corresponding CV R 2 is 0.6215, also not much different from that observed for the CV best model. The required estimation time for QuickNet here is essentially identical to that re- ported above (about 31 seconds), but now naïve NLS takes 788.27 seconds and modified NLS requires 726.10 seconds. In Table 10, we report the results of applying QuickNet with a ridgelet activation function. Given that the ridgelet basis is less smooth relative to our target function than the standard logistic ANN, which is ideally smooth in this sense, we should not expect . direct consequence of the challenging nonconvexity of the NLS objective function induced by the nonlinearity in parameters of the ANN model, coupled with our choice of a new set of random starting values. 1. 9532 2 0.58804 0.02119 −0. 1530 8 −0 .09056 13 1.65795 1.96126 0.58773 0.02123 −0.15783 −0.08998 14 1.65734 1.96638 0.5 8533 0.02159 −0.16085 −0.08552 15 1.65599 1.98448 0.58592 0.02239 −0.17 153. 1.54289 1.78510 0.65037 0.08915 −0. 0538 3 −0. 20614 18 1 .538 46 1.78166 0.65182 0.09177 −0.05180 −0.20883 19 1 .535 87 1.80860 0.64796 0.09330 −0.06771 −0.20167 20 1 .532 30 1.81120 0.64651 0.09541 −0.06924