CHAPTER 6 THE DETERMINANTS OF TRADE SECRET INTENSITY
6.8 Endogenous Switching: Endogenous Selection and Sample Selectivity
The use of prosecution data for economic analysis is rife with challenges. One major challenge is that of self-‐selectivity within the sample. As a whole, this can be referred to as endogenous switching and includes both endogenous selection and sample selectivity. In this study, the data are restricted to only those cases in which a trade secret was stolen, the theft was detected, the theft was reported to the FBI, the FBI referred the case the district attorney and the case reached court. This series of steps means that the EEA cases represent a small portion of the wider population of trade secrets, and even stolen trade secrets. Endogenous variables influence the observed data and the inclusion of observations in the observed data. Hence, a strictly OLS regression will fail to account for this sample selectivity or endogenous selection.
This section of the study uses methods to correct for this endogenous switching problem including the Heckman Correction and, while not strictly an
endogenous switching correction, Truncated Regression.
355 Note that the Stata code used in this case was user-‐generated by Jann (2010) and thus does not conform to the standard Stata output. In this case, no p-‐values are calculated.
6.8.1 Heckman Correction
A method to correct for endogenous switching is that of a sample selectivity correction. The sample is sample selected in that, of all trade secrets, only those that have been stolen are observed. However, when working with the Heckman correction356 for sample selectivity, the procedure needs missing values on which to base its analysis. Hence, the use of a complete data set (i.e. one that has had all missing values replaced) is inappropriate as the model reverts to an OLS analysis.
In order to examine the sample selectivity of TSI, we will proceed by examining the original data set (i.e. before adjusting for missing values.) However, the data set is solely for those cases that concluded in prosecution and no information is available for unprosecuted cases. The missing values are due to incomplete information in terms of the availability of the value of the trade secrets, details regarding the victim firm, ongoing cases and other complications with data collection. Thus, the data set will only allow the Heckman correction to account for missing information and not the sample selection concern of the decision to prosecute. Table 6-‐15 below reports on the results of a Heckman correction applied to the data using the logs of firm size (vsales) and a valuation of the trade secret (xref) as the variables in the selection model.
Table 6-15: Heckman Correction of Log-linear model with Sectoral Dummies; Selection Model Based on Sales and Xref
(no model of missing values)
356 Also known as the Heckman selection model, or Heckman’s estimator for sample selection, the Heckman correction calculates expected value of the error, known as the Inverse Mill’s Ratio (IMR), and then uses it as a regressor in the linear outcome model. The IMR is the ratio of the probability density function over the cumulative distribution function of a distribution and is calculated using a probit model. See Greene (1993) for further details.
However, a number of problems appear with the results in Table 6-‐15. One is that the model overall, and virtually all of the coefficients, are not significant.
The other is that ρ is equal to 1, which indicates that the sample is not
conforming to the Heckman assumptions. The use of the Heckman correction is, therefore, inappropriate. Additionally, the sample size has dropped to 16
making the analysis rather weak. In this case, the Heckman correction does not further the analysis.
6.8.2 Truncated Regression
Another method of correcting for sample selection is the similar concept of truncation. Truncation assumes that we do not observe variables below or above a certain level. The sample is truncated in that, of stolen trade secrets, only those reaching a certain minimum value of value reach the court. That is, that the FBI likely investigates only those trade secrets whose value exceeds a
minimum. The FBI’s Reporting Theft checklist357 asks victims to place the value of their trade secret within a range. As discussed in Chapter 3, FBI Assistant Direct Chip Burrus “likened the FBI’s current fraud-‐enforcement policies – in which losses below $150,000 have little chance of being addressed – to ‘triage.’
Even cases with losses approaching $500,000 are much less likely to be accepted for investigation than before 9/11.”358 While there is no public document
supporting this triage policy, anecdotal evidence suggests that, in practice, it exists. If this were the case, we would expect that a truncated regression would improve the analysis.
The use of missing value analysis is permitted in truncated regression.359 Using the truncreg method in Stata, and the mean inputted for missing values, we get the following:
Table 6-16: Truncated Regression for Log-linear Model with Sectoral Dummies
These results are close to those presented in Table 6-‐4, which is comforting but, in this way, the truncation analysis does not add to our earlier analysis.
357 www.justice.gov/criminal/cybercrime/reportingchecklist-‐ts.pdf.
358 Shukovsky, Paul et al (2007.)
359 Truncated regression assumes that the observed cumulative density function is a truncation of the standard normal. In this case, $150,000 could be level at which the observations are truncated. Taking this truncated standard normal density function, Truncated regression performs a ML estimate with a normalized density function.
Additionally, it suggests that the FBI is not, in fact, conducting the alleged triage discussed earlier and in Chapter 3. This is supported further by a case studies analysis. The analysis suggests that the FBI is seeking prosecution for a wide range of values of trade secrets. For example, in the case of Genovese,360 the FBI chose to prosecute the theft of Microsoft source code, which the defendant sold for $20. The decision to prosecute a theft which the defendant valued at a mere
$20 is likely a case of the FBI strategically choosing to prosecute a single defendant with the intent of dissuading other would-‐be thieves. Thus, in the absence of a truncation, the truncreg method fails to enhance the analysis.