SAS/ETS 9.22 User''''s Guide 160 potx

1582 ✦ Chapter 22: The SEVERITY Procedure (Experimental) The plots in Output 22.3.2 show that both the lognormal and GPD distributions fit the data poorly, GPD being the worst. The Burr distribution fits the data as well as the LOGNGPD mixed distribution in the body region, but has a poorer fit in the tail region than the LOGNGPD mixed distribution. Output 22.3.2 Comparison of the CDF and PDF Estimates of the Fitted Models Example 22.3: Defining a Model for Mixed Tail Distributions ✦ 1583 Output 22.3.2 continued 1584 ✦ Chapter 22: The SEVERITY Procedure (Experimental) The P-P plots of Output 22.3.3 provide a better visual confirmation that the LOGNGPD distribution fits the tail region better than the Burr distribution. Output 22.3.3 P-P Plots for the LOGNGPD and BURR Distribution Models Example 22.3: Defining a Model for Mixed Tail Distributions ✦ 1585 Output 22.3.3 continued 1586 ✦ Chapter 22: The SEVERITY Procedure (Experimental) The detailed results for the LOGNGPD distribution are shown in Output 22.3.4. The initial values table indicates the values computed by LOGNGPD_PARMINIT subroutine for the Xr and Pn parameters. It also uses the bounds columns to indicate the constant parameters. The last table in the figure shows the final parameter estimates. The estimates of all free parameters are significantly different than 0. As expected, the final estimates of the constant parameters Xr and Pn have not changed from their initial values. Output 22.3.4 Detailed Results for the LOGNGPD Distribution The SEVERITY Procedure Distribution Information Name logngpd Description Lognormal Body-GPD Tail Distribution. Mu, Sigma, and Xi are free parameters. Xr and Pn are constant parameters. Number of Distribution 5 Parameters Initial Parameter Values and Bounds for logngpd Distribution Initial Lower Upper Parameter Value Bound Bound Mu 1.49954 -Infty Infty Sigma 0.76306 1.05367E-8 Infty Xi 0.36661 1.05367E-8 Infty Xr 1.27395 Constant Constant Pn 0.80000 Constant Constant Convergence Status for logngpd Distribution Convergence criterion (GCONV=1E-8) satisfied. Optimization Summary for logngpd Distribution Optimization Technique Trust Region Number of Iterations 11 Number of Function Evaluations 31 Log Likelihood -209.39116 Parameter Estimates for logngpd Distribution Standard Approx Parameter Estimate Error t Value Pr > |t| Mu 1.57921 0.06426 24.57 <.0001 Sigma 0.31868 0.04459 7.15 <.0001 Xi 1.03771 0.38205 2.72 0.0078 Xr 1.27395 Constant . . Pn 0.80000 Constant . . Example 22.3: Defining a Model for Mixed Tail Distributions ✦ 1587 The following SAS statements use the parameter estimates to compute the value where the tail region is estimated to start (x b D e O Ox r ) and the scale of the GPD tail distribution (Â t D G.x b / g.x b / .1p n / p n ): / * Compute tail cutoff and tail distribution's scale * / data xb_thetat(keep=x_b theta_t); set parmest(where=(_MODEL_='logngpd' and _TYPE_='EST')); x_b = exp(Mu) * Xr; theta_t = (CDF('LOGN',x_b,Mu,Sigma)/PDF('LOGN',x_b,Mu,Sigma)) * ((1-Pn)/Pn); run; proc print data=xb_thetat noobs; run; Output 22.3.5 Start of the Tail and Scale of the GPD Tail Distribution Obs x_b theta_t 1 6.18005 1.27865 The computed values of x b and Â t are shown as x_b and theta_t in Output 22.3.5. Equipped with this additional derived information, you can now interpret the results of fitting the mixed tail distribution as follows:  The tail starts at y  6:18 . The primary benefit of using the scale-normalized cutoff ( x r ) as the constant parameter instead of using the actual cutoff ( x b ) is that the absolute cutoff gets optimized by virtue of optimizing the scale of the body region (Â D e  ).  The values y Ä 6:18 follow the lognormal distribution with parameters   1:58 and   0:32 . These parameter estimates are reasonably close to the parameters used for simulating the sample.  The values y t D y  6:18 ( y t > 0 ) follow the GPD distribution with scale Â t  1:28 and shape   1:04. 1588 ✦ Chapter 22: The SEVERITY Procedure (Experimental) References D’Agostino, R. and Stephens, M. (1986), Goodness-of-Fit Techniques, New York: Marcel Dekker, Inc. Danielsson, J., De Haan, L., Peng, L., and de Vries, C. G. (2001), “Using a Bootstrap Method to Choose the Sample Fraction in Tail Index Estimation,” Journal of Multivariate Analysis, 76, 226–248. Hill, B. M. (1975), “A Simple General Approach to Inference about the Tail of a Distribution,” Annals of Statistics, 3(5), 1163–1174. Kaplan, E. L. and Meier, P. (1958), “Nonparametric Estimation from Incomplete Observations,” Journal of American Statistical Association, 53, 457–481. Klein, J. P. and Moeschberger, M. L. (1997), Survival Analysis: Techniques for Censored and Truncated Data, New York: Springer-Verlag. Klugman, S. A., Panjer, H. H., Willmot, G. E. (1998), Loss Models: From Data to Decisions, New York: John Wiley & Sons. Koziol, J. A. and Green, S. B. (1976), “A Cramér-von-Mises Statistic for Randomly Censored Data,” Biometrika, 63, 466–474. Lai, T. L. and Ying, Z. (1991), “Estimating A Distribution Function with Truncated and Censored Data,” Annals of Statistics, 19(1), 417–442. Lynden-Bell, D. (1971), “A Method of Allowing for Known Observational Selection in Small Samples Applied to 3CR Quasars,” Monthly Notices of the Royal Astronomical Society, 155, 95–118. Chapter 23 The SIMILARITY Procedure Contents Overview: SIMILARITY Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 1590 Getting Started: SIMILARITY Procedure . . . . . . . . . . . . . . . . . . . . . . 1592 Syntax: SIMILARITY Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 1594 Functional Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1594 PROC SIMILARITY Statement . . . . . . . . . . . . . . . . . . . . . . . . 1596 BY Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1598 FCMPOPT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1599 ID Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1599 INPUT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1602 TARGET Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1604 Details: SIMILARITY Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 1610 Accumulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1611 Missing Value Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . 1613 Zero Value Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1613 Time Series Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 1613 Time Series Differencing . . . . . . . . . . . . . . . . . . . . . . . . . . . 1614 Time Series Missing Value Trimming . . . . . . . . . . . . . . . . . . . . . 1614 Time Series Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . 1615 Input and Target Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 1615 Sliding Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1615 Time Warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1615 Sequence Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1616 Sequence Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1616 Similarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1616 User-Defined Functions and Subroutines . . . . . . . . . . . . . . . . . . . . 1617 Output Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1625 OUT= Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1625 OUTMEASURE= Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . 1625 OUTPATH= Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1626 OUTSEQUENCE= Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . 1627 OUTSUM= Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1628 _STATUS_ Variable Values . . . . . . . . . . . . . . . . . . . . . . . . . . 1629 Printed Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1629 ODS Table Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1630 1590 ✦ Chapter 23: The SIMILARITY Procedure ODS Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1631 Examples: SIMILARITY Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 1633 Example 23.1: Accumulating Transactional Data into Time Series Data . . . 1633 Example 23.2: Similarity Analysis . . . . . . . . . . . . . . . . . . . . . . 1635 Example 23.3: Sliding Similarity Analysis . . . . . . . . . . . . . . . . . . 1652 Example 23.4: Searching for Historical Analogies . . . . . . . . . . . . . . 1654 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1657 Overview: SIMILARITY Procedure The SIMILARITY procedure computes similarity measures associated with time-stamped data, time series, and other sequentially ordered numeric data. PROC SIMILARITY computes similarity measures for time-stamped transactional data (transactions) with respect to time by accumulating the data into a time series format, and it computes similarity measures for sequentially ordered numeric data (sequences) by respecting the ordering of the data. Given two ordered numeric sequences (input and target), a similarity measure is a metric that measures the distance between the input and target sequences while taking into account the ordering of the data. The SIMILARITY procedure computes similarity measures between an input sequence and a target sequence, in addition to similarity measures that “slide” the target sequence with respect to the input sequence. The “slides” can be by observation index (sliding-sequence similarity measures) or by seasonal index (seasonal-sliding-sequence similarity measures). In order to compare the raw input and the raw target time-stamped data, the raw data must be accumulated to a time series format. After the input and target time series are formed, the two accumulated time series can be compared as two ordered numeric sequences. For raw time-stamped data, after the transactional data are accumulated to form time series and any missing values are interpreted, each accumulated time series can be functionally transformed, if desired. Transformations are useful when you want to stabilize the time series before computing the similarity measures. Transformations performed by the SIMILARITY procedure include the following:  log (LOG)  square-root (SQRT)  logistic (LOGISTIC)  Box-Cox (BOXCOX)  user-defined transformations Each time series can be transformed further by using simple differencing or seasonal differencing or both. Additional time series transformations can be performed by using various time series transformation and analysis techniques provided by this procedure or other SAS/ETS procedures. Overview: SIMILARITY Procedure ✦ 1591 After optionally transforming each time series, the accumulated and transformed time series can be stored in an output data set (OUT= data set). After optional accumulation and transformation, each of these time series are the “working series,” which can now be analyzed as sequences of numeric data. Each of these sequences can be a target sequence, an input sequence, or both a target and an input sequence. Throughout the remainder of this chapter, the term “original sequence” applies to both the original input and target sequence. The term “working sequence” applies to a version of both the original input and target sequence under investigation. Each original sequence can be normalized prior to similarity analysis. Normalizations are useful when you want to compare the “shape” or “profile” of the time series. Normalizations performed by the SIMILARITY procedure include the following:  standard (STANDARD)  absolute (ABSOLUTE)  user-defined normalizations After each original sequence is optionally normalized, each working input sequence can be scaled to the target sequence prior to similarity analysis. Scaling is useful when you want to compare the input sequence to the target sequence while discounting the variation of the target sequence. Input sequence scaling performed by the SIMILARITY procedure include the following:  standard (STANDARD)  absolute (ABSOLUTE)  user-defined scaling After the working input sequence is optionally scaled to the target sequence, similarity measures can be computed. Similarity measures computed by the SIMILARITY procedure include:  squared deviation (SQRDEV)  absolute deviation (ABSDEV)  mean square deviation (MSQRDEV)  mean absolute deviation (MABSDEV)  user-defined similarity measures In computing the similarity measure between two time series, tasks are needed for transforming time series, normalizing sequences, scaling sequences, and computing metrics or measures. The SIMILARITY procedure provides built-in routines to perform these tasks. The SIMILARITY procedure also enables you to extend the procedure with user-defined routines. . distribution. Output 22. 3.2 Comparison of the CDF and PDF Estimates of the Fitted Models Example 22. 3: Defining a Model for Mixed Tail Distributions ✦ 1583 Output 22. 3.2 continued 1584 ✦ Chapter 22: The SEVERITY. Likelihood -2 09. 391 16 Parameter Estimates for logngpd Distribution Standard Approx Parameter Estimate Error t Value Pr > |t| Mu 1.5 792 1 0.06426 24.57 <.0001 Sigma 0.31868 0.044 59 7.15 <.0001 Xi. and Moeschberger, M. L. ( 199 7), Survival Analysis: Techniques for Censored and Truncated Data, New York: Springer-Verlag. Klugman, S. A., Panjer, H. H., Willmot, G. E. ( 199 8), Loss Models: From

Định dạng
Số trang	10
Dung lượng	406,25 KB