Stereo-Based Ground Plane Estimation

2.2.1 Introduction

Ground plane estimation, or road recognition, is one of active research ﬁelds in intelligent transportation systems. It aims to develop techniques that are able to determine which pixels in input image belong to the road and that are able to locate the position of the road with respect to the host vehicle. The estimated road plane can help autonomous vehicles to drive themselves and to detect frontal objects.

Ground plane estimation can be performed on single images or stereo images.

The advantage of the approach that uses stereo images is that the road’s position can be determined easily by using computer vision techniques. Therefore, stereo- based ground plane estimation is approached in this thesis. In order to explain stereo-based ground plane estimation, several important concepts in the ﬁeld are introduced as follows:

1. U-disparity Image:

Given a disparity image, for example, mapL(u, v) , U-disparity image that is computed from the given disparity image is an image of the size D×U.

The U-disparity image is computed by projecting the disparity surface, i.e., mapL(u, v), onto the U-D plane of the disparity space and accumulating the pixels in the surface for each pixel in the U-disparity image. Thereby, the intensity of pixel (ui, dj) in the U-disparity image is the total number of pixels that lay in the vertical line u = ui of the disparity image and that their disparities are dj.

2. V-disparity Image:

Given a disparity image, the V-disparity image that is computed from the given disparity image is an image of the size V ×D. The V-disparity image is computed by projecting the disparity surface onto the V-D plane of the disparity space and accumulating the pixels in the surface for each pixel in the V-disparity image. Thereby, the intensity of pixel (vi, dj) in the V- disparity image is the total number of pixels that lay in the horizontal line v =vi of the disparity image and that their disparities are dj.

Chapter 2. Related Works 40

2.2.2 Camera Geometry

FIGURE 2.2 shows the geometry of stereo cameras that are used used in this thesis. The two stereo cameras are assumed to be mounted on the host vehicle that is not shown in the figure. The car in the figure stands on the ground plane and lays in the cameras’ field of view. The world coordinate isOWXWYWZW. The left and the right coordinate are OLXLYLZL and ORXRYRZR respectively. OW, OL, and OR are not shown in the figure for more readable. The two cameras tilt downward an angleθ, which is referred to as the pitch angle of the stereo cameras.

In this thesis, the roll and the yaw angle as assumed to be small and negligible.

The height of the stereo cameras is deﬁned as the distance fromOW to lineOLOR.

Figure 2.2: Geometry of stereo cameras.

The left image and the disparity surface obtained from the geometry in FIGURE 2.2 are shown in FIGURE 2.31. In disparity space, the ground plane is a plane, which is marked by blue border. With the assumptions mentioned above, the ground plane now has two parameters, vhori and α. vhori is the position of the horizontal line of vanishing points. α is the angle of the ground plane is disparity space. Therefore, the purpose of ground plane estimation is to determinevhoriand α. Based on the estimated values, the pitch angle and the height of the stereo cameras can be computed by Equation 2.7 and 2.8 respectively [76]. Where, f is the focal length, t is the pixel’s size, v0 is optical center in V-direction, vor is the

1This is only an artiﬁcial example.

Chapter 2. Related Works 41 intersection of the lined= 0 and the ground plane’s proﬁle in disparity space (vor equalsvhori in the case of non-verged cameras),cr =tan(α), and b is the baseline.

Given a pixel in the reference image (i.e., the left image in this thesis) and the disparity associated with that pixel, the position of the object’s point in the world coordinate that corresponds to the pixel can be computed by triangular relation by using the geometry of the stereo cameras in FIGURE 2.2. In this thesis, the pitch angle is small, so the depth of frontal objects can be simply estimated by Equation 2.9, where d is the disparity of the detected objects.

Figure 2.3: Ground Plane in Disparity Space.

θ =tan−1(v0−vor

f /t ) (2.7)

hcam =bcosθ

cr (2.8)

Z = f t

d (2.9)

Chapter 2. Related Works 42

2.2.3 Hough Transform Based Method

In [76], sparse disparity image was used as the input to ground plane estimation. The sparse disparity image was computed with the following settings: stereo matching was performed at only the pixels that their horizontal gradients were high, and normalized cross correlation was used as the cost function. the spare disparity image was used to compute V-disparity image. Labayrade et al. also proved that the ground plane in the created V-disparity image is a slanted straight line, from the bottom horizontal line to the left vertical line. They used Hough transform to extract such the line. The intensities, which are the number of accumulated pixels along horizontal line in the input image, of each pixel in the V-disparity image were also taken into account to ensure that the extracted line contains the largest number of accumulated pixels. Hough transform was also used in a variety of existing studies [77, 78,103].

Hough transform based method is simple for computation, and it perfectly adequate with the case that there are strong edges in road, for example lane marks, traﬃc signs, and so on. However, it incurs at least two problems as follows [103]:

(1) it is diﬃcult to select suitable thresholds from which high gradients are selected and (2) Hough transform does not care about the distribution of pixels in extracted lines. More specially, an extracted line, which is not the ground plane’s proﬁle, may contain only two pixels in the V-disparity image, one pixel have the accumulated value of 1, and the other pixel has a very large accumulated value.

2.2.4 Fitting-Based Method

2.2.4.1 Least-Squares Method

Michele Zanin [104] assumed that the road was a plane, and he estimated the plane by two steps: the ﬁst step was to select the road’s points and the second step was to estimate the road plane by using least-squares method. In theory, the least-squares method is widely known to be sensitive to outliers, so his road plane estimation method solved such the problem by the following heuristics:

• He forced the estimate road plane passing through ﬁxed points which were the projection of the cameras’s centers on the ground plane. The ﬁxed points were referred to as anchor points.

Chapter 2. Related Works 43

• From the input disparity image, he selected only the points that laid on a predeﬁned region, such the region could be deﬁned if we could estimate the position of the road in advance (in the calibration step). He also forced the selected points laying near the ground plane of the previous frame, if it was available.

Similar to Hough transform method, least-squares method is also simple for computation. However, the heuristics that were used in [104] still produce too many outliers if there are objects in front of the cameras. Moreover, the heuristics also need to know in advance the region in which the ground plane can vary. In addi- tion, due to the shock in the host vehicle, the anchor points become very unstable.

2.2.4.2 Iteratively Reweighted Least-Squares Method

In [105], Nikolay et al. used Iteratively Reweighted Least-Squares (IRLS) [106]

to estimate the ground plane from the input dense disparity image. IRLS was performed on selected points in the world coordinate. The selected points were determined as follows:

• A fixed set of 9 points (3×3 lattice) in the lower part of the input disparity image was selected. The disparities of such the points were fitted to the ground plane in the disparity space. After that, the fitted disparities were converted to the world coordinate.

• Two additional points were also added. Those two points were the points where the front wheels of the host vehicle contacted with the ground plane.

Such the two points could be determined at the calibration step. So, total number of points that were used in IRLS was 11 points.

Although IRLS can reduce outliers better than Least-Squares, in theory, the method mentioned above is still unstable because it uses only 11 points in the estimation. In the case of there are fontal objects, several points of the 11 points are already outliers, so the estimation can produce unstable ground plane.

Chapter 2. Related Works 44 2.2.4.3 RANSAC Method

In ﬁtting-based method, because there may exist frontal objects in the road, it is diﬃcult to select a set of sample road’s points, which are referred to as the feature points. In [107], Stephen and Michael utilized Random Sample Consensus (RANSAC) [108] to solve such the problem. By using RANSAC, he estimated the ground plane by the following steps:

1. Randomly select three feature points to ﬁt a plane, check each of all the feature points whether or not it satisﬁes this plane and count the number of supporting points.

2. Repeat step 1 for m times, select the triple with maximum support and do least-squares ground plane ﬁtting to this triple with all its supporting points.

RANSAC method is simple for computation but eﬀective for removing outliers, so it was also used in many other studies in ground plane estimation [114, 115].

2.2.5 Dynamic Programming Based Method

In [109, 110], Suganuma et al. introduced concept ”virtual disparity map”, which is the disparity image as if the stereo cameras laid the projection of their original positions on the road plane. Stereo cameras’ parameters, i.e. the height and the pitch and the roll angle that were estimated for the previous frame, were used to transform from the original disparity map to the virtual disparity map. After that, V-disparity image was created from the virtual disparity map. A non-ﬂat ground plane’s proﬁle was a horizontal curve in the V-disparity image. They used Dynamic Programming to extract such the horizontal curve and to detect frontal objects as well.

The above method is adequate for using with long baseline stereo cameras, which a large disparity search range is available. With the large disparity search range, the horizontal curve is long enough to be detected by using dynamic programming.

However, in the case of short baseline stereo cameras, the horizonal curve may have only 20 to 30 pixels in which almost of the left pixels in the curve are noisy because of the objects near the inﬁnity. Based on such a short curve, dynamic programming can not work reliably.

Chapter 2. Related Works 45

2.2.6 Parallel-Based Method

The method discussed in this section assumes that the following conditions are true: the pitch and the roll angle of the stereo cameras is small enough, the road is able to be approximated by a plane, and the variation of the stereo cameras’

height is neglectable.

Based on the above assumption, Broggi et al. [111] have experimentally found that the ground plane’s profile is to oscillate and parallel to itself in the V-disparity image during the host vehicle’s movement. Hence, they find the ground plane’s profile by accumulating the values in the V-disparity image for several candidate lines and finding the the greatest accumulated value to infer the ground plane’s profile.

The study in [111] created the V-disparity image from pixel-to-pixel matching costs, so it may not be reliable in the case of lack textures. In [112], Junzhao et al.

mathematically investigated the characteristics of the ground plane’s oscillation and solved the lack texture’s problem. They used very a wide window for doing the cost aggregation in stereo matching. In fact, they consider one horizontal scan-line in the input image as a window. By using such a wide window, the ground plane can be extracted more reliably.

The above studies can not work with non-ﬂat road, and they need to know the direction of the ground plane’s proﬁle in the V-disparity image in advance (by calibration step). However, if there is a large shock during the movement of the host vehicle then the height and the pitch angle do not satisfy the assumption mentioned above, so the above studies can cause very large errors in the estimation.

2.2.7 Polar Histogram Based Method

In [113], Nedevschi et al. computed 3D points for disparities in the input disparity image and then estimated the ground plane from those 3D points. They project the 3D points onto vertical plane YOZ which is perpendicular to the ground plane and parallel with the principal axes of the stereo cameras. The ground plane appears as a horizontal curve in YOZ. They consider that in the first 30m near to the host vehicle the ground plane was flat. For the first part of the road points, they built a polar histogram which center was the origin of YOZ. The ground

Chapter 2. Related Works 46 plane in the fist part was detected by finding the maximum accumulated value in the histogram. They repeated such the estimation for several next parts in the road plane, so they could estimate a non-flat ground plane eventually.

2.2.8 Ground Plane Estimation Summary

Ground plane estimation is an important research ﬁeld in the sense of guiding autonomous vehicles and detecting on-road objects. Therefore, there are many studies in this ﬁeld. Generally, the research in ground plane estimation is to solve the two problems as follows:

1. How to make to ground plane estimation algorithms reliable and robust in the case that there are many outlier points in the set of sample road’s points.

There are many reasons that produce the outliers, for example, error matches in stereo matching and frontal objects.

2. How to estimate non-ﬂat road. Almost of the existing studies assumed that the road was planar. A reliable method that is able to estimate non-ﬂat road is still open until now.

All of the ground plane estimation methods mentioned in this chapter did not focus on the way to obtain reliable disparity image, the disparity image was computed by many classical stereo matching methods. Unfortunately, in the case that the input stereo images are textureless in the road area, there are small number of road points available, so the methods mentioned can cause large errors in the estimation.

A New Coarse-To-Fine Method for Computing Disparity Images

Robust Approaches for Stereo Matching