Ranking Issues, False Positives, and False Negatives

Một phần của tài liệu Phương pháp phân tích không gian của các vụ va chạm giao thông đường bộ - Spatial Analysis Methods of Road Traffic Collisions (Trang 194 - 197)

It follows that the identified HRLs inevitably include both true positives and false positives, as they are by definition not distinguishable. Next, which one(s) of the identified HRLs should be treated and in what order? There needs to be a follow-up ranking exercise within the identification process because site investi- gation, data analysis, and treatment require substantial time and other resources.

Generally, the larger is the pool of identified HRLs and/or the smaller are the resources available for follow-up actions, the more important is the ranking exer- cise in ensuring that HRLs posing different levels of road hazards are treated with the correct priority.

TABLE 9.1

Problems of False Positives and False Negatives Illustrated

Decision

True State

Safe Not Safe

HRL Incorrect, false positive Correct

Not HRL Correct Incorrect, false negative

Cluster Identifications in Networks 167

Following magic figure definitions, the ranking of HRLs usually follows the sim- ple ranking (SR) method. Most notably, the set of HRLs is ranked in descending order based on their observed collision frequency (Oi). A good example of using the SR technique in compiling the collision count profile can be found in Nicholson (1989). Despite its easy implementation, the SR method is found to suffer from prob- lems of producing large numbers of false positives caused by the random annual fluctuations of collisions (Hauer and Persaud 1984; Persaud 1986; Hauer 1997). In Kentucky, HRLs identified by the magic figure approach were screened monthly, basically following the SR technique. Approximately 10% were selected for thor- ough field investigation by traffic engineers, maintenance engineers, and police per- sonnel. Improvements recommended were then implemented. However, through this approach, “in as much as approximately 35% of the locations investigated in the field do not warrant improvement” (Deacon et al. 1975, 16). For the same reason of high random fluctuations of annual collision frequency at any specific location, the SR method also suffers from producing excessive number of false negatives and, hence, allowing truly hazardous locations to escape identification and result in inefficient use of resources.

The scale of false positives and negatives seems to be clearly specified with sta- tistical definitions because the yardsticks are based on classical statistical confidence intervals (CI). Typical statistical significance chosen is 0.95 or 0.99. The Type I error, which corresponds to the false positive error in road safety, is therefore 0.05 or 0.01, respectively. Through increasing the statistical significance chosen, the number of HRLs that pass the statistical test will reduce. While it is not possible to say for certain (i.e., confidence level of 100%), it is at least possible to specify the level of confidence that the researchers have on the results. The problem, however, is that the statistical significance can only be specified with respect to an assumed underlying statistical distribution. In most situations, the normal distribution is assumed (i.e., z = 2.54) (Oppe 1979; Ceder and Livneh 1982). Nonetheless, traffic collisions hap- pening at a specific road segment over a year are really rare events, which follow the Poisson or negative binomial distribution rather than the normal distribution (Cheng and Washington 2005). Various statistical distributions, such as the generalized Poisson (Kemp 1973), logarithmic models (Andreassen and Hoque 1986), Poisson log-linear regression (Blower et al. 1993), and the negative binomial (Persaud 1990;

Hauer 1997; Abdel-Aty and Radwan 2000) models, have been used to address this statistical drawback (Anderson 2009).

In the late 1980s, Maher and Mountain (1988) introduced the simulation-based approach for the ranking exercise. Over time, the Monte Carlo simulation has been the most widely used for the purpose of defining statistically meaningful threshold levels (TL) for identifying and ranking HRLs, independent of the theoretical under- lying statistical distribution/form of traffic collisions (Yamada and Thill 2007, 2010).

The general procedures are to simulate sufficient number of randomly distributed collision patterns so as to establish the statistical significance. In each simulation, the total number of collisions are distributed randomly with equal chance over the entire road network. Following the event-based approach, it is not possible to allocate col- lisions randomly to the theoretically infinite number of points on the network that a collision may happen. Hence, GIS can be used to identify representative points with

an equal interval along the road network, similar to the logic of the Geographical Analysis Machine (GAM) (Openshaw et  al. 1987). Following the link-attribute approach, all collisions Oi

i=1

N

⎛⎝⎜ ⎞

⎠⎟ are randomly assigned to one of the N BSUs of the road network in each simulation. After each simulation, the simulated collision frequency, Si, can be obtained. When the simulation is repeated 1000 times, the 10th largest value of Si can be used as the threshold level TLi at the significance level of 0.01. The larger the number of repeated randomization, the more stable the resulting estimates and the more reliable the pseudo-significance levels (Yamada and Thill 2004; Loo and Yao 2013). Following the EB methods, the statistical significance is usually established by specifying the upper percentiles of the distribution of EB estimates of safety specific to the roadway element (g = 1, 2, … , G) (Elvik 2008).

After the statistical tests are passed (either by making assumptions about the statistical distributions or the simulation approach), the ranking of HRLs may simply follow the SR method or the rate-quality method by controlling certain key expo- sure factors. More sophisticated yet data-intensive ranking exercise may follow the benefit–cost method. The rationale of ranking HRLs is

Max(BiCi) (9.5) or

Max BCii

⎝⎜ ⎞

⎠⎟ (9.6)

where

Bi is the potential benefits of improving HRLi Ci is the estimated costs of improving HRLi

Bi/Ci is the well-known benefit–cost ratio. Historically, Bi = f(PCRi). In other words, the benefits of improving HRLi are directly dependent on PCRi, and multiplied by the average saving of preventing a collision. More detailed ranking exercise further weighs collisions by types, such as property damage only, collisions causing slight injury only, and collisions causing serious injury and fatality. Nonetheless, all these collision-based estimates do not consider the fact that the number, injury severity levels, and health outcomes of persons injured or killed in a collision can vary sub- stantially. A collision involving buses, for instance, can involve more than a hundred persons killed or injured. Another collision may involve a slightly injured passenger only. Hence, the use of an average saving of preventing a collision, whether further classified by types or not, is inadequate. Hence, Loo et al. (2013) proposed the use of the person-based rather than collision-based approach in the identification of HRLs, so that potential benefits of addressing HRLs can be more accurately reflected and human based. Their method, put simply, is to analyze the number of persons injured or killed in traffic collisions directly (PEi), rather than considering the collision fre- quency (Oi) or rate (Ri) indirectly. Next, the cost of improving HRLi needs to be esti- mated. As the ranking exercise aims to screen HRLs for more expensive and detailed site investigation and analysis, Ci is usually estimated using ballpark figures from

Cluster Identifications in Networks 169

standard improvement measures. Theoretically, Ci can be obtained by combining the individual cost of a bundle of road safety measures known to be effective for address- ing road hazards based on risk factors. Practically, a manual of standard costs, such as installation of pedestrian railings, is available in the more advanced road safety administrations. With the EB approach, past records of expenses in improving HRLs of roadway element g or typical improvement scheme costs for roadway element g are used to estimate Ci (Geurts and Wets 2003).

Một phần của tài liệu Phương pháp phân tích không gian của các vụ va chạm giao thông đường bộ - Spatial Analysis Methods of Road Traffic Collisions (Trang 194 - 197)

Tải bản đầy đủ (PDF)

(350 trang)