Suppose a road safety analyst is well aware of the pertinent issues of HRL identifi- cation described earlier, he/she will be ready to take the additional step of applying spatial analysis in HRL identification. To do so, there are three major stages (Loo 2009; Loo and Yao 2013). Stage I is geovalidation. Stage II is to define the spatial unit of analysis and calculate the collision statistics. Stage III is the HRL identifica- tion. HRLs can broadly be identified using either the hot spot methodology or the hot zone methodology. As we have covered geovalidation in Chapter 7, this chapter assumes that the researchers are satisfied with the quality of the spatial data and proceeds to discuss the major methodological issues in Stages II and III.
9.4.1 Definingthe Spatial UnitOf analySiSanD CalCUlating COlliSiOn StatiStiCS
Following the link-attribute approach, Stage II will first involve cutting up the entire road network into small road segments as BSUs having a standard length, l. As far as possible, l should be equal for all BSUs. Following the event-based approach, this step involves determining the number and positions of reference points (RPs), for calculating the kernel density with a standard window width, h. Strictly speaking, the spacing or interval (Int) of RPs is independent of h. The value of h determines the width of the kernels placed over individual collisions (see Chapter 8). An RP is simply an “accounting” point for summarizing the total height of the kernels at a location. Ideally, Int should be equal, resulting in RPs at regular intervals covering the entire road network. As BSUs are also of standard length, one may simply take the starting point, mid-point, or random point r within l of each BSU to generate the RPs (Loo and Yao 2013). The details are described in Chapter 8.
9.4.2 hOt ZOne iDentifiCatiOn
At Stage III, HRLs need to be identified based on the collision statistics of individual spatial units (BSUs or RPs). For sake of illustration, we shall define HRL sites as hot zones rather than hot spots. To recall, hot zones explicitly take into consideration spatial interdependency among neighboring spatial units. Moreover, the hot zone methodology is more complex and it can be easily modified to become a hot spot methodology (by setting the network proximity weighs all to zeros), if desired (e.g., for comparison purpose). Hence, we define HRL sites as hot zones in the illustrations later in the chapter. For the setting of criteria, we shall use the statistical definition
with repeated randomization. The major consideration is to keep the procedures rea- sonably simple, compact, and comparable using both the link-attribute and the event- based approaches. The simple numerical definitions are not used because they are known to suffer from serious false positive and negative problems. The model-based definitions are highly heterogeneous and much more data intensive. Many of them, such as the EB method, require detailed discussion on the model-building process and are elaborated in other parts of the book.
9.4.2.1 Link-Attribute Approach
At Stage III of HRL identification, researchers need to consider both the network connectivity and the statistical significance of collision records at the same time.
These concerns raise methodological challenges. To properly consider spatial con- nectivity of BSUs in HRL identification, Loo (2009) proposes an index, called the hot zone index I(HZ), on the basis of the local Moran’s I method (Anselin 1995).
Depending on the spatial relationships among BSUs, I(HZ)i for BSU i is defined as:
I(HZ )i=zi Wijzj
j=1, j≠i
∑N (9.7)
zi= 1, ifOi≥ti
0, otherwise
⎧⎨
⎪
⎩⎪ (9.8)
where
N is the number of BSUs
ti is the threshold collision rate of BSU i Oi is the observed collision rate at the ith BSU Wij is the network proximity matrix
In spatial analysis, matrices are widely used for representing spatial concepts such as distance, adjacency, interaction, and neighborhood. For hot zone identification, we focus on those contiguous BSUs with relatively high risks. Generally, most collision patterns do not strongly exist beyond the first degree of spatial proximity (Flahaut et al. 2003; Flahaut 2004). Thus, Wij is denoted as a contiguity (0,1) matrix whose ele- ments are only ones or zeros. Nonetheless, researchers may use other distance-based proximity matrix, such as dij−2, when more than one degree of neighbor is considered.
To establish the statistical significance of the hot zone results, ti is defined statisti- cally using the simulation approach and the Monte Carlo method. The introduction of statistical definitions as critical thresholds for detecting link-attribute traffic hot zones is first presented in Loo and Yao (2013). In each simulation, the procedures are to randomly allocating the total number of road collisions over the BSUs and obtain
zi(sim) for each BSU. Then, simulations can be repeated 100, 500, 1000, or more times.
Using the value of zi(sim) of the top 1% or 5% of all zi(sim), the pseudo-significance level of 95% or 99% can be obtained, respectively. These cutoff values are then substituted into ti to define the threshold value and compute I(HZ)i in Equations 9.7 and 9.8.
Cluster Identifications in Networks 171
After determining the key parameters and methods used in hot zone identifica- tion, the implementation procedures involved are summarized in Figure 9.1. Details of the GIS-based algorithm are reported in Loo (2009). To begin with, the first BSU record is examined to see whether its observed collision frequency Oi is greater than or equal to its threshold value ti. If the answer is positive, a new working table is
A BSU record
Next record Next record
Yes No
Yes
Yes Yes
No No
No No AR >= threshold?
Yes
Checked (“check” = 1) ?
Store “1”
in “check”variable
Add 1 to index store “1” in “hotzone”of
the BSU
Last contiguous BUS?
Index unchanged
Check index >= 2?
Hot zone found; store “1” in
“hotzone” and “check” of the BSU
Store “1” in “check” of the BSU Create a new working table
with the BSU information Set index = 1 List all contiguous BSUs
A R >= threshold for the contiguous BSU?
FIGURE 9.1 A flowchart showing the steps of hot zone identification. (Reprinted from Loo, B.P.Y., Int. J. Sustain. Transp., 3(3), 187, 2009. With permission from Taylor & Francis Group Ltd., http://www.tandfonline.com/.)
created with an index number equal to one. All contiguous BSUs are checked by GIS and listed out in this working table. Then, each contiguous BSU is analyzed. I(HZ)i is computed with the assistance of GIS, and the result is recorded as a variable in the attribute table of the BSU dataset. Whenever the observed collision frequency of any one of the contiguous BSUs is also greater than or equal to the respective ti, the index number I(HZ)i increases by one. The checking will continue until all contiguous BSUs in the working table have been checked. When this is done, the index number is examined. If the index number is greater than one, a hot zone has been identified and the “hot zone” variable of the BSU in the main table (default = 0) is updated.
The entire process repeats until all BSUs in the road network have been checked.
Mathematically, the value of I(HZ) is either positive (I(HZ) = 1, 2, …, N − 1) or equal to zero. A positive value of I(HZ)i indicates that the observed collision rates of BSU i and at least one of its neighboring BSUs are no less than their threshold values, and a hot zone is detected. The spatial pattern of HRLs can then be visualized and analyzed by plotting the Oi or other attributes of all BSUs that form part of a hot zone (BSUs with I(HZ) ≥ 1) in a road network map of an appropriate spatial scale.
9.4.2.2 Event-Based Approach
How to consider the network proximity of RPs properly under the event-based approach? By drawing reference to I(HZ), Loo and Yao (2013) introduce an event- based hot zone indicator LK(HZ) based on the KLINCS approach of Yamada and Thill (2007):
LK(HZ )i=Zi f(HZ )ij j=1, j≠i
∑m Zj (9.9)
Zi= 1, if LKi≥ti
0, otherwise
⎧⎨
⎪
⎩⎪ (9.10)
where
LKi is the local network-constrained K-function index for the RP i ti is the threshold value at RP i
LKi can be calculated following Equations 9.9 and 9.10. Similarly, ti can be defined statistically by Monte Carlo simulations rather than an arbitrary number (see preced- ing text). f(HZ)ij is a binary variable indicating whether or not RP i and j are contigu- ous. It is measured by
f(HZ )ij= 1, if d(HZ )ij≤Int 0, otherwise
⎧⎨
⎪
⎩⎪
(9.11) where d(HZ)ij is the network distance between RP i and j. The value of LK(HZ) is also either positive or equal to zero. Once again, the identification of hot zones only focuses on contiguous RPs with positive LK(HZ). For each of the hot zones identified, the profile of LKi can be further analyzed and compared.
Cluster Identifications in Networks 173