Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 30 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
30
Dung lượng
766,61 KB
Nội dung
11.3 Detecting and Tracking Moving Obstacles on Roads 345 the general case. Therefore, it is always recommended to take into account the best estimates for the road state and for the relative state of other vehicles. The last three columns in Figure 11.6 will be of interest for the more advanced vision systems of the future exploiting the full potential of the sense of vision with high resolution when sufficient computing power will be available. It is the big advantage of vision over radar and laser range finding that vision al- lows recognizing the traffic situation with good resolution and up to greater ranges if multifocal vision with active gaze control is used. This is not yet the general state of the art since the data rates to be handled are rather high (many gigabytes/second) and their interpretation requires sophisticated software. In the case of expectation-based, multi focal, saccadic vision (EMS vision) it has been demonstrated that from a functional point of view, visual perception as in humans is possible; until the human performance level is achieved, however, quite a bit of development has still to be done. We will come back to this point in the fi- nal outlook. Due to this situation, industry has decided to pick radar for obstacle detection in systems already on the market for traffic applications; LRF has also been studied intensively and is being prepared for market introduction in the near future. Radar- based systems for driver assistance in cruise control have been available for a few years by now. Complementing them by vision for road and lane recognition as well as for reduction of false alarms has been investigated for about the same time. These combined systems will not be looked at here; the basic goal of this section is to develop and demonstrate the potential of vertebrate-type vision for use in the long run. It exploits exactly the same features as human vision does, and should thus be sufficient for safe driving. Multisensor adaptive cruise control will be dis- cussed in Section 14.6.3. 11.3.1 Feature Sets for Visual Vehicle Detection Many different approaches have been tried for solving this problem since the late 1980s. Regensburger (1993) presents a good survey on the task “visual obstacle recognition in road traffic”. In [Carlson, Eklundh 1990], an object detection method using prediction and motion parallax is investigated. In [Kuehnle 1991], the use of symmetries of contours, gray levels and horizontal lines for obstacle detection and tracking is discussed. [Zielke et al. 1993] investigates a similar approach. Other ap- proaches are the evaluation of optical flow fields [Enkelmann 1990] and model- based techniques like the one described below [Koller et al. 1993]. Solder and Graefe (1990) find road vehicles by extracting the left, right and lower object boundary using controlled correlation. An up-to-date survey on the topic may be found in [Masaki 1992++] or in the vision bibliography [http://iris.usc.edu/Vision- Notes/bibliography/contents.html]. Some more recent papers are [Graefe, Efenberger 1996; Kalinke et al. 1998; Fleischer et al. 2002; Labayarde et al.2002; Broggi et al. 2004]. The main goal of the 4-D approach to dynamic machine vision from the begin- ning has been to take advantage of the full spatiotemporal framework for internal representation and to do as little reasoning as possible in the image plane and be- tween frames. Instead, temporal continuity in physical space according to some 11 Perception of Obstacles and Vehicles 346 model for the motion of objects is being exploited in conjunction with spatial shape rigidity in this “analysis-by-synthesis” approach. Since high image evaluation rate had proven more beneficial in this approach than using a wide variety of features, only edge features with adjacent average in- tensity values in mask regions were used when computing power was very low (see Section 5.2). With increasing computing power, homogeneously shaded blobs, corner features, and in the long run, color and texture are being added. In any case, perturbations both from the motion process and from measurements as well as from data interpretation tend to change rapidly over time so that a single image in a sequence should not be given too much weight; instead, filtering likely (maybe not very precise) results at a high rate using motion models with low eigenfrequencies has proven to be a good way to go. So, concentration on feature extraction was on fast available ones with selection of those used guided by expectations and statisti- cal data of the recursive estimation process running. For this reason, image evaluation rates of less than about ten per second were not considered acceptable from the beginning in the early 1980s; the number of processors in the system and workload sharing had to be adjusted such that the high evaluation rate was achievable. This was in sharp contrast to the approaches to machine vision studied by most other groups around the globe at that time. Ac- cumulated delay times could be handled by exploiting the spatiotemporal models for compensation by prediction. These short cycle times, of course, left no great choice of features to be used. On the contrary, even simple edge detection could not be used all over the image but had to be concentrated (attention controlled!) in those regions where objects of interest for the task at hand could be expected. Once the road has been known from the specific perception loop for it, “obsta- cles” could be only those objects in a certain volume above the road region, strictly speaking only those within and somewhat to the side of the width of the wheel tracks. 11.3.1.1 Edge Features and Adjacent Average Gray Values Edge features are robust to changes in lighting conditions; maybe this is the reason why their extraction is widespread in biological vision systems (striate cortex). Edge features on their own have three parameters for specifying them completely: position, orientation, and the value of the extreme intensity gradient. By associat- ing the average intensity on one side of the edge as a fourth parameter with each edge, average intensities on both sides are known since the gradient is the differ- ence between both sides; this allows coarse area-based information to be included in the feature. Mori and Charkari (1993) have shown that the shadow underneath a vehicle is a significant pattern for detecting vehicles; it usually is the darkest region in the en- vironment. Combining this feature with knowledge of 3-D geometric models and 4-D dynamic scene understanding leads to a robust method for obstacle detection and tracking. [Thomanek et al. 1994; Thomanek 1996] developed the first vision sys- tem capable of tracking a half dozen vehicles on highways in each hemisphere with bifocal vision based on these facts in closed-loop autonomous driving. This ap- proach will be taken as a starting point for discussing more modern approaches ex- 11.3 Detecting and Tracking Moving Obstacles on Roads 347 ploiting the increase in computing power by at least two orders of mag- nitude (factor > 100) since. Figure 11.7 shows a highway scene from a wide-angle camera with one car ahead in the sub- ject’s lane. A search for horizontal edge features is performed in vertical search stripes with KRONOS masks of size 5 × 7 as indicated on the right-hand side (see Sec- tion 5.2). Due to missing computer performance in the early 1990s, the search stripes did not cover the whole image below the horizon; evaluation cycle time was 80 ms (every second video field with the same index). Stripe width and spacing as well as mask parameters had to be adjusted according to the detection range de- sired. For improved resolution, there was a second camera with a telelens on the gaze controlled platform (see Figure 1.3) with a viewing range about three times as far (and a correspondingly narrower field of view) compared to the wide-angle camera. This allowed using exactly the same feature extraction algorithms for ve- hicles nearby and further away (see Figure 11.22 further below). Find lower edge of a vehicle: About 30 search stripes of 100 pixels length have been analyzed by shifting the correlation mask top-down to find close-to-horizontal edge features at extreme correlation values. Potential candidates for the dark area underneath the vehicle have to satisfy the criteria: The value of the mask response (correlation magnitude) at the edge has to be above a threshold value (corr min,uv ). The average gray value of the trailing mask region (upper part) has to be below a threshold value (dark min,uv ). The first bullet requires a pronounced dark-to-bright transition, and the second one eliminates areas that are too bright to stem from the shaded region underneath the vehicle; adapting these threshold values to the situation actually given is the chal- lenge for good performance. For tanker vehicles and low standing sun, the ap- proach very likely does not work. In this case, the big volume above the wheels may require area-based features for robust recognition (homogeneously shaded, for example). Generate horizontal contours: Edge elements satisfying certain gestalt conditions are aggregated applying a known algorithm for chaining. The following steps are performed, starting from the left window and ending with the right one: 1. For each edge element, search the nearest one in the neighboring stripe and store the corresponding index if the distance to it is below a threshold value. Search direction n d = 3 n 0 = 1 m d = 7 n w = 5 Mas k Figure 11.7. Detection of vehicle candidates by search of horizontal edges in vertical search stripes below the hori- on: Mask parameters selected such that several stripes cover a small vehicle [Thomanek 1996] z 11 Perception of Obstacles and Vehicles 348 2. Tag each edge element with the number count of previous corresponding ele- ments (e.g., six, if the contour contains six edge elements up to this point). Read starting point P s (y s , z s ) and end point P e (y e , z e ) of each extracted contour and check the slope, whose magnitude |(z e – z s )/(y e – y s )| must be below a threshold for being accepted (close to horizontal, see Figure 11.8). If lines grow too long, they very likely stem from the shadow of a bridge or from other buildings in the vicinity; they may be tracked as the hypothesis for a new stationary object (shadow or discontinuity in surface appearance), but elimi- nating them altogether will do no harm to tracking moving vehicles with speed al- ready recognized. Within a few cycles, these elongated lines will have moved out of the actual image. With knowledge of 3-D geometry (projection equations link row number to range), the extracted contours are examined to see whether they al- low association with certain object classes: Side constraints con- cerning width must be satisfied; likely height is thereby hypothesized. Contours starting from inhomogeneous areas inside the objects (i.e., bumper bar or rear window) are discarded; they lie above the lower shadow region (see Fig- ure 11.9). Determine lateral boundaries: Depending on the lateral position relative to the lane driven in, the vertical object boundaries are extracted additionally. This is done with an edge detector which exploits the fact that the difference in brightness on the object and from the background is not constant and can even change sign; in 1. Chaining of geometrically next element end point 2. Numbering of edge elements (each branch) 3. Elimination of shorter branch, determine start and end point start point Figure 11.8. Contour generation from edge elements observing gestalt ideas of nearness and colinearity; below an upper limit for total contour length, only the longer one is kept Figure 11.9. Extracted horizontal edge elements: The rec- tangular group of features is an indication of a vehicle can- didate; the lower elements (aggregated shadow region un- der the car) allow estimation of the range to the vehicle 11.3 Detecting and Tracking Moving Obstacles on Roads 349 Figure 11.10, the wheels and fender are darker than the light gray of the road while the white body is brighter than the road. For this purpose, the gradient of brightness is calculated at each posi- tion in each image row, and its abso- lute values are summed up over the lines of interest. The calculated dis- tribution of correlation values has significantly large maxima at the ob- ject boundaries (lower part of fig- ure). The maxima of the accumu- lated values yield the width of the obstacle in the image; knowing range and mapping parameters, ob- stacle size in the real world is initial- ized for recursive estimation and up- dated until it is stable. With clearly visible extremes as in the lower part of Figure 11.10 the object width of the real vehicle is fixed, and changes in the image are from now on used to support range estimation. For vehicles driving in their own lanes, the left and right object boundary must be present to accept the extracted horizontal contour as representing an object. In neighboring lanes, it suffices to find a vertical boundary on the side of the vehicle adjacent to their own lane to prove the hypothesis of an object in connection with the lower contour. This means that in the left lane, a vertical line to the right of the lower contour has to be found, while in the right lane, one to the left has to be found for acceptance of the hy- pothesis of a vehicle. This allows recognition of partially occluded objects, too. The algorithm was able to detect and track up to five objects in parallel with four INMOS Transputer® 222 (16 bit) for feature extraction and one T805 (32 bit) for recursive estimation at a cycle time of 80 ms. Figure 11.10. Determination of lateral boundaries of a vehicle by accumulation of correlation values at each position in each single row of the lower part of the body with a KRONOS-mask (n w = 1; n d = large). The maxima of the accumulated values yield the width of the obstacle in the image. Histo- gram of corre- lation max- ima from single rows Pixel position Applying these methods is a powerful tool for extracting vehicle boundaries in monochrome images also for modern high-performance microprocessors. Adding more features, however, can make the system more versatile with respect to type of vehicle and more robust under strong perturbations in lighting conditions. 11.3.1.2 Homogeneous Intensity Blobs Region-based methods, extracting homogeneously shaded or textured areas are of importance especially for robust recognition of large vehicles. Color recognition very much alleviates object separation in complex scenes with many objects of dif- ferent colors. But just regions of homogeneous intensity shading alleviate object separation considerably (especially in connection with other features). 11 Perception of Obstacles and Vehicles 350 In Figure 11.11 the homogeneously shaded areas of the road yield the back- ground for detecting vehi- cles with different inten- sity blobs above a dark region on the ground, stemming from vehicle shade underneath the body. Though resolution is poor (32 pixels per mel and 128 per mask) and some artifacts normal to the search direction can be seen, relatively good hypotheses for objects are derivable from this coarse scale. Five vehicle candi- dates can be recognized, three of which are par- tially occluded. The car ahead in the same lane and the bus in the right neighboring lane are clearly visible. The truck further ahead in the subject’s lane can clearly be recognized by its dark upper body. For the two cars in the left neighboring lane, resolution is too poor to recognize details; however, from the shape of the road area, the presence of two cars can be hypothe- sized. Low resolution allows higher evaluation frequency for limited computing power. Figure 11.11. Highway scene with many vehicles, ana- lyzed with UBM method (see Section 5.3.2.4) in vertical stripes with coarse resolution (22.42C) and aggregation of homogeneous intensity blobs (see text). Performing the search on the coarse scale for homogeneously shaded regions in both vertical and horizontal stripes yields sharp edges in the search direction; thus, close to vertical blob boundaries should be taken from horizontal search results while close to horizontal boundaries should be taken from ver- tical search results. Reconstructed image: Coarse (4x4) Fine Coarse resolution Figure 11.12. Highway scene similar to Figure 11.11 with more vehicles analyzed with UBM method in hori- zontal stripes; the outer regions are treated with coarse resolution (11.44R), while the central region (within the white box) covering a larger look-ahead range above the road, is analyzed on a fine scale (11.11R) (reconstructed images, see text) Figure 11.12 shows re- sults from a row search with different parameters (11.44R) for another im- age out of the same se- quence (see bus in right neighboring lane and the 11.3 Detecting and Tracking Moving Obstacles on Roads 351 dark truck in the subject’s lane). Here, however, the central part of the image, into which objects further away on the road are mapped, is analyzed at fine resolution giving full details (11.11R). This yields many more details and homogeneous in- tensity blobs; the reconstructed image shown can hardly be distinguished from the original image. A total of eight vehicle candidates can be recognized, six of which are partially occluded. It can be easily understood from this image that large vehi- cles like trucks and buses should be hypothesized from the presence of larger ho- mogeneous areas well above an elevation of one wheel diameter from the ground. For humans, it is immediately clear that in neighboring lanes, vehicles are recog- nized by three wheels if no occlusion is present; the far outer front wheel will be self-occluded by the vehicle body. All wheels will be only partially visible. This fact has led to the development of parameterized wheel detectors based on features defined by regional intensity elements [Hofmann 2004]. Figure 11.13 shows the basic idea and the derivation of templates that can be adapted to wheel diameter (including range) and aspect angle in pan (small tilt an- gles are neglected because they enter with a cosine effect (§ 1)); since the car body occludes a large part of the wheels, the lower part of the dark tire contrasting the road to its sides is especially emphasized. For orthogonal and oblique views of the near side of the vehicle, usually, the inner part of the wheel contrasts to the tire around it; ellipticity is continuously adapted according to the best estimate for the relative yaw (pan) angle. Figure 11.13. Derivation of templates for wheel recognition from coarse shape repre- sentations (octagon): (a) Basic geometric parameters: width, outer and inner visible ra- dius of tire; (b) oblique view transforms circle into ellipses as a function of aspect angle; (c) shape approximation for templates, radii, and aspect angle are parameters; (d) tem- plate masks for typically visible parts of wheels [seen from left, right, § orthogonal, far side (underneath body)]. Intelligently controlled 2-D search is done based on the exist- ing hypothesis for a vehicle body (after [Hofmann 2004]). The wheels on the near side appear in pairs, usually, separated by the axle dis- tance in the longitudinal direction which lets the front wheel appear higher up in the image due to camera elevation above the wheel axle. There is good default knowledge available on the geometric parameters involved so that initialization poses no challenge. Again, being overly accurate in a single image does not make sense, since averaging over time will lead to a stable (maybe a little bit noisier) re- 11 Perception of Obstacles and Vehicles 352 sult with the noise doing no harm. To support estimation of the aspect conditions, taking into account other characteristic subobjects like light groups in relation to the license plate as regional features will help. 11.3.1.3 Corner Features This class of features is especially helpful before a good interpretation of the scene or an object has been achieved. If corner localization can be achieved precisely and consistently from frame to frame, it allows determining feature flow in both image dimensions and is thus optimally suited for tracking without image understanding. However, the challenge is that checking consistency requires some kind of under- standing of the feature arrangement. Recognition of complex motion patterns of ar- ticulated bodies is very much alleviated using these features. For this reason, their extraction has received quite a bit of attention in the literature (see Section 5.3.3). Even special hardware has been developed for this purpose. With the computing power nowadays available in general-purpose micro- processors, corner detection can be afforded as a standard component in image analysis. The unified blob-edge-corner method (UBM) treated in Section 5.3 first separates candidate regions for corners in a very simple way from those for homo- geneously shaded regions and edges. Only a very small percentage of usual road images qualify as corner candidates depending on the planarity threshold specified (see Figures 5.23 and 5.26); this allows efficient corner detection in real time to- gether with blobs and edges. The combination then alleviates detection of joint fea- ture flow and object candidates: Jointly moving blobs, edges, and corners in the image plane are the best indicators of a moving object. 11.3.2 Hypothesis Generation and Initialization The center of gravity of a jointly moving group of features tells us something about the translational motion of the object normal to the optical axis; expanding or shrinking similar feature distributions contains information on radial motion. Changing relative positions of features other than expansion or shrinking carries information on rotational motion of the object. The crucial point is the jump from 2-D feature distributions observed over a short amount of time to an object hy- pothesis in 3-D space and time. 11.3.2.1 Influence of Domain and Actual Situation If one had to start from scratch without any knowledge about the domain of the ac- tual task, the problem would be hardly solvable. Even within a known domain (like “road traffic”) the challenge is still large since there are so many types of roads, lighting-, and weather conditions; the vehicle may be stationary or moving on a smooth or on a rough surface. It is assumed here that the human operator has checked the lighting and weather conditions and has found them acceptable for autonomous perception and opera- 11.3 Detecting and Tracking Moving Obstacles on Roads 353 tion. When observation of other vehicles is started, it is also assumed that road rec- ognition has been initiated successfully and is working properly; this provides the system (via DOB, see Chapters 4 and 13) with the number and widths of lanes ac- tually available. With GPS and digital maps onboard and working, the type of road being driven is known: unidirectional or two-way traffic, motorway or general cross-country/urban road. The type of road determines the classes of obstacles that might be expected with certain likelihood; the levels of likelihood may be taken into account in hypothesis generation. Pedestrians are less likely on high-speed than on urban roads. Speed actually being driven and traffic density also have an influence on this choice; for example, in a traffic jam on a freeway with very low average speed, pedestrians are more likely than in normal freeway traffic. 11.3.2.2 Three Components Required for Instantiation In the 4-D approach, there are always three components necessary for starting per- ception based on recursive estimation: (1) the generic object type (class and sub- class with reasonable parameter settings), (2) the aspect conditions (initial values for state components, and (3) the dynamic model as knowledge (or side constraint) of evolution over time; for subjects, this includes knowledge of (stereotypical) mo- tion capabilities and their temporal sequence. This latter component means an indi- vidual capability for animation based on onsets of maneuvers visually observed; this component will be needed mainly in tracking (see Section 11.3.3). However, a passing car cutting into the vehicle’s lane immediately ahead will be perceived much faster and more robustly if this motion behavior (normally not allowed) is available also during the initialization phase, which takes about one half to one second, usually. Instantiation of a generic object (3-D shape): The first step always is to establish good range estimation to the object. If stereovision or direct range measurements are available, this information should be taken from these sources. For monocular vision, this step is done with the row index z Bu of the lowest features that belong most likely to the object. Then, the first part of the following procedure is, as it is for static obstacles, to obtain initial values of range and bearing. With range information and the known camera parameters, the object in the im- age can be scaled for comparison with models in the knowledge base of 3-D ob- jects. Homogeneously shaded regions with edges and corners moving in conjunc- tion give an indication of the vehicle type. For example, in Figure 11.11, the car upfront, the truck ahead of it (obscured in the lower part), and the bus upfront to the right are easily classified correctly; the two cars in the lane to the left allow only uncertain classification due to occlusion of large parts of them. Humans may feel certain in classifying the car upfront left, since they interpret the intensity blobs vertically located at the top and the center of the hypothesized car: The somewhat brighter rectangle at the top may originate from the light of the sky re- flected from the curved roof of the car. The bright rectangular patch between two more quadratic ones a little bit darker halfway from the roof to the ground is inter- preted as a license plate between light groups at each rear side of the car. 11 Perception of Obstacles and Vehicles 354 Figure 11.12 (taken a few frames apart from Figure 11.11) shows in the inner high-resolution part that this interpretation is correct. It also can be seen by the three bright blobs reasonably distributed over the rear surface that the car immedi- ately ahead is now braking (in color vision, these blobs would be bright red). The two cars in the neighboring lane beside the dark truck are also braking. (Note the different locations and partial obscuration of the braking lights on the three cars depending on make and traffic situation). Confining image interpretation for obsta- cle detection to the region marked by the white rectangle (as done in the early days) would make vehicle classification much more difficult. Therefore, both pe- ripheral low-resolution and foveal high-resolution images in conjunction allow ef- ficient and sufficiently precise image interpretation. Aspect conditions: The vertical aspect angle is determined by the range and eleva- tion of the camera in the subject vehicle above the ground. It will differ for cars, vans, and trucks/buses. Therefore, only the aspect angle in yaw has to be derived from image evaluation. In normal traffic situations with vehicles driving in the di- rection of the lanes, lane recognition yields the essential input for initializing the aspect angle in yaw. On straight roads, lane width and range to the vehicle determine the yaw aspect angle. It is large for vehicles nearby and decreases with distance. Therefore, in the right neighboring lane, only the left-hand and the rear side can be seen; in the left neighboring lane, it is the right-hand and rear side. Tires of vehicles on the left have their dark contact area to the ground on the left side of the elliptically mapped vertical wheel surface (and vice versa for the other side; see Figure 11.13d). As- pects conditions and 3-D shape are closely linked together, of course, since both in conjunction determine the feature distribution in the image after perspective pro- jection, which is the only source available for dynamic scene understanding. Dynamic model: The third essential component for starting recursive estimation is the process model for motion which implements continuity conditions and knowl- edge about the evolution of motion over time. This temporal component was the one that allowed achieving superior performance in image sequence interpretation and autonomous driving. As mentioned before, there are two big advantages in temporal embedding: 1. Known state variables in a motion process decouple future evolution from the past (by definition); so there is no need to store previous images if all objects of relevance are represented by an individual dynamic process model. Future evo- lution depends only on (a) the actual state, (b) the control output applied, and (c) on external perturbations. Items (b) and (c) principally are the unknowns while best estimates for (a) are derived by visual observation exploiting a knowledge base of vehicle classes (see Chapter 3). 2. Disturbance statistics can be compiled for both process and measurement noise; knowing these characteristics allows setting up a temporal filter process that (under certain constraints) yields optimal estimates for open parameters and for the state variables in the generic process model. 3. These components together are the means by which “the outside world is trans- duced into an internal representation in the computer”. (The corresponding question often asked in biological systems is, how does the world get into your [...]... architecture of the second-generation vision systems of UniBwM Here, we just discuss the subsystem for visual perception with the cameras shown in the upper left, and the road and obstacle perception system shown in the lower right corners The upper right part for system integration/locomotion control will be treated in Chapters 13 and 14 Figure 11.19 Overall system architecture of second-generation vision. .. purchased for ~ 20% of the cost of a custom-designed system used before 370 11 Perception of Obstacles and Vehicles Figure 11.24 Object tracking with a COTS-PC system at the turn of the century The software system applied for intelligently controlled edge extraction was CRONOS At about the same time, the first systems for Automatic Cruise Control (ACC) came on the market based on radar for distance... for 3-D shape during motion [Schick 1992], for moving humans [Kinzel 1995], and for signal lights [Tsinas 1997] were developed This was the visual perception part of the system that formed the base for the other components for viewing direction control [Schiehlen 1995], for situation assessment [Hock 1994], for behavior decision [Maurer 2000], and for vehicle control [Brüdigam 1994] Figure 11.19 shows... longitudinal control is done fully automatically based on the fused evaluation results of radar and vision Radar is used for hypothesis generation and range estimation; vision checks all of these hypotheses for objects (vehicles) and eliminates those that cannot be substantiated by corresponding sets of visual features For those confirmed, their precise lateral extension of the lower part and their positions... to be achieved in this step now In the 4-D approach, this is done in one demanding large step by directly jumping to internal representations of 3-D objects moving in 3-D space over time Since these objects observed in traffic are themselves capable of perception and motion control (and are thus “subjects” as introduced in Chapters 2 and 3), the relevant parts of their internal decision processes leading... generation The lower part impletended process with adaptation ments the 4-D approach to dynamic vision, in of open parameters and of state which background knowledge about 3-D shape variables in the generic models and motion is exploited for animating the spatiodescribing the geometric relatemporal scene observed in the interpretation tions in the scene process 362 11 Perception of Obstacles and Vehicles... a price of $ 2 million for the autonomous vehicle capable of driving a 60-mile distance in an urban environment in the least amount of time (below 6 hours as additional constraint); this shall include reacting to other vehicles, also in stop -and- go traffic The final test of this “Urban Grand Challenge” with a maximal allowed speed of 20 mph is planned for the 3rd of November 2007 in a mock-up town... feedback of prediction-errors shows the validity of the models used and allows adaptation for improved performance The dynamic models used in the early days in [Thomanek 1996] were the following (separate, decoupled models for longitudinal and lateral translation, no rotational dynamics): Simplified longitudinal dynamics: The goal was to estimate the range and range rate sufficiently well for automatic... continuous temporal change of the gap between the front wheel (only partially visible) and fender of the car passing will indicate an upcoming sideways motion of the car, probably needing special attention for a while Knowing the causal chain of events allows reasonable predictions and may save time for proper reactions The capability of drawing the right conclusions from a limited set of information visually... total of 4 MB/s had to be handled on the two video buses for four cameras shown in the bottom of the figure The fields of view of these cameras looking to the front and rear hemispheres can be seen for the front hemisphere in Figure 11.20; the ratio of focal lengths of the cameras was about 3.2 which seemed optimal for observing the first two rows of vehicles in front The system was designed for perceiving . pattern for detecting vehicles; it usually is the darkest region in the en- vironment. Combining this feature with knowledge of 3 -D geometric models and 4 -D dynamic scene understanding leads to. selection, and grouping as well as hypothesis generation. The lower part imple- ments the 4 -D approach to dynamic vision, in which background knowledge about 3 -D shape and motion is exploited for. correct model with Ackermann steering is used or whether independent second-order motion models in both translational degrees of freedom and in yaw are implemented (independent Newtonian motion