Luận án tiến sĩ: A laminar cortical model of three-dimensional surface perception and figure-ground separation: Stereogram depth, lightness, and amodal completion

A 3D LAMINART model Grossberg and Howe, 2003; Cao and Grossberg, 2005 proposed that laminar cortical mechanisms interact to create 3D surface percepts using interactions between boundary

Trang 1

GRADUATE SCHOOL OF ARTS AND SCIENCES

M.E., Automation Institute, Chinese Academy of Sciences, 2001

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy

2007

Trang 2

INFORMATION TO USERS

The quality of this reproduction is dependent upon the quality of the copy submitted Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction

In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted Also, if unauthorized copyright material had to be removed, a note will indicate the deletion

ProQuest Information and Learning Company

300 North Zeeb Road P.O Box 1346 Ann Arbor, MI 48106-1346

Trang 3

2006

Trang 4

Wang Professor of Cognitive and Neural Systems Professor of Mathematics, Psychology, and Biomedical Engineering

Trang 5

I would first like to thank my advisor, Professor Stephen Grossberg, for his talented scientific insights and generous guidance throughout my work I would also thank Professor Ennio Mingolla for introducing me into this fantastic vision field and much other generous assistance I appreciate and cherish the assistance and friendship that I received from everyone in the CNS community Special thanks to Dr Yongqiang Cao and Dr Arash Yazdanbakhsh for the intuitive discussions and wonderful coffee times that we spent together

I wish to present my sincere thanks to my parents who, as always, have been constant sources of encouragement and supports My final thanks are reserved to my wife who has accompanied me all the way through these years and made them part of lovely

memories

iv

Trang 6

FIGURE-GROUND SEPARATION:

STEREOGRAM DEPTH, LIGHTNESS, AND AMODAL COMPLETION

LIANG FANG Boston University Graduate School of Arts and Sciences, 2007

Major Professor: Stephen Grossberg, Wang Professor of Cognitive and Neural Systems

Professor of Mathematics, Psychology, and Biomedical Engineering

ABSTRACT

In viewing a 3D scene, object features are seen on 3D surfaces infused with lightness and color at correct depths By only focusing on how left and right features are correctly matched, most 3D vision models have not explained how this happens A 3D LAMINART model (Grossberg and Howe, 2003; Cao and Grossberg, 2005) proposed that laminar cortical mechanisms interact to create 3D surface percepts using interactions between boundary and surface representations Previous work using this model explained perception of relatively simple objects, like bars and blocks, in relatively simple spatial configurations that did not contain any mutual occlusions This thesis extends the 3D LAMINART model to predict how textured images with multiple potential false binocular matches, e.g dense stereograms, generate correct 3D surface representations of figures and their backgrounds The model also clarifies how sparse stereograms can induce the formation of continuous surfaces at correct depths across contrast-free regions Furthermore, when textured stereograms define emergent occluding and occluded

v

Trang 7

partially occluded textured surfaces can be amodally completed behind the occluding textured surface Thus, the model provides a unified explanation of stereopsis, 3D figure- ground separation, and completion of partially occluded object surfaces The model clarifies how interactions between layers 4, 3B, and 2/3A in V1 and V2 contribute to stereopsis, and proposes how a disparity filter and 3D perceptual grouping laws in V2 interact with 3D surface filling-in operations in V1, V2, and V4 to produce appropriate figure-ground perception These interactions help to convert the complementary rules for boundary and surface formation (Grossberg, 1994) into a consistent, unitary visual percept

vi

Trang 8

ACKNOWLEDGEMENTS co on HH nọ ni in Bi n0 001860916 iv ABSTRACTT co co HH HH HH BI 0.0 09609 090009019619 61999 Vv LIST OF TABLES .ccccscecccsvsvsccccccscscsccesessccescnsesessesesssssesseceesenenes ix

1 Introducfion on nọ Họ Đ BI 0 1000600009196 1

2 Sfereopsis processing and model cons(raÌnfS -.- -«<<ce<s c5 25

2.1 Reconciles contrast-specific binocular fusion with contrast-invariant

boundary percepfiOn c c ng ng nh kh ke 29 2.2 Implements the contrast magnitude constraint on binocular fusion 30 2.3 Encourages the unique-matching rule and solves the correspondence

2.4 Combines monocular and binocular information to form depth percepts.33 2.5 Forms perceptual groupings including amodal boundary completions 35 2.6 Forms 3D amodal and modal surface representations 39 2.7 Determines the correct depth for horizontal boundaries, ensures perceptual

consistency, and initiates figure-ground separation with surface-to- boundary feedback con HH nh kh, 43 2.8 Uses small-scale texture borders to induce large-scale perceptual

Vil

Trang 9

3.1 V1 monocular boundaries -.- chen nen heo 53 3.2 VI binocular boundaries ch nen se 53 3.3 VI monocular SUTÍAC€S ch nh kh ha 55 3.4 V2 boundaries cọ cọ SH nh nh bà nến 56 3.5 V2 monoculÌar SUTÍAC€S cọ HH nh kh sen 58 3.6 V4 binocular SUTÍAC€S HH nh hen 39 Model equations -.o co co Co no nh n0 6/0099066 6 60 4.1 Index legend and general processes 60 4.2 Model equatlons - HH nh kh nh ve 66 Model simulations ¬ = ¬ ‹Ô 5.1 Border ownership and figure-ground percepfs - 84

5.3 Sparse RDS ch nh n nà nà kệ 94 5.4 Figure-ground percept of RDS textured surfaces with emergent

OCCÏUSIOT SH Km nh Kinh KH tà tế 95 DÌSCUSSỈON co co O0 no B0 6 0900019 800196198 919069900 99890 100 6.1 Anatomical and physiological dafa - che sen 100 6.2 Model complexity and explanatOry pOW€F c: 102 6.3 Hypotheses and predictions of the model - 103

Trang 10

Table 1: Matrix of inhibition coefficients in the disparity filter Table 2: The allelotropic shift -‹ -c CS SH ng kh, Table 3: Anatomical and physiological data that support the model

ix

Trang 11

¬ EEE E EERE LEER EE EEE EE EE EE EE EE OEE EE OE EEE EEO EO EE EEE EGER SH EEEEES 30

Figure 15: The V2 disparity filter 0.00 cccccccceccccceeeeceecuvuvecueeetesceveeeeenseses 32 Figure 16: da Vinci S†€f€OðTaim HH renee nh kh nh nh kh khu 34 Figure 17: Boundary completion is formed inwardly, instead of outwardly 36 Figure 18: Demonstration of 3D surface capture - co nen 41

Trang 12

Figure 20: Demonstration of depth-determination of horizontal and monocular

boundari€§ - con SH KH K KT KT kh nh ke nh by 46 Figure 21: Simulation of border ownership and figure-ground percept (part 1) 85 Figure 22: Simulation of border ownership and figure-ground percepts (part 2) 88 Figure 23: Simulation of border ownership and figure-ground percepts (part 3) 91 Figure 24: Role of V1 surface-to-boundary feedback in multiple-scale boundary

DTOC€SSITE ee cece e rete eee Een ene nO Re EEE EEO E EEE EE EA EA EG OHO HEED SEH eH EE ER FEES 97

xi

Trang 13

2D Two Dimensional

Trang 14

Introduction

When we view a 3D scene, the retinas of our two eyes receive two-dimensional arrays of light, but we effortlessly perceive the world in depth The positional differences of an object’s projections on an observer’s left and right retinas, or their binocular disparity, is

a strong cue for perceiving depth at sufficiently near depths (Howard & Rogers, 2002; Julesz, 1971; see Figure 1)

B

B›

Figure 1 Horizontal binocular disparity: when two eyes are focused on object A, the projections of object

B at a different distance lie on different positions, B, and B,, on the left and right retinas Such positional difference is used in the brain in reconstructing depth from 2-D retinal inputs

Binocular disparity is most effective for sufficiently near objects (Tyler, 2004) For distant objects, monocular cues, such as T-junctions, may be used to determine relative depth when one object is nearer than another object, and occludes parts of the farther

Trang 15

view 2D images, see Figure 2

Trang 16

Figure 3 Perceptual grouping: in addition to be perceived at a farther depth, the visible fragments of a tiger

face are perceptually linked together instead of being seen as unrelated (Courtesy to anonymous artist)

The work that is presented in this thesis further develops a neural model of how the subcortical area LGN and the visual cortical areas V1, V2, and V4 work together to give rise to correct 3D boundary and surface percepts of binocular stimuli that contain disparity and occlusion information

Object features are seen on 3D surfaces infused with lightness and color at the correct depths Most previous models of stereopsis (Dev, 1975; Fleet, Wagner, & Heeger, 1996; Grimson, 1981; Julesz, 1971; Lehky & Sejnowski, 1990; Lippert & Wagner, 2002; Marr & Poggio, 1976, 1979; Matthews et al., 2003; Ohzawa, DeAngelis, & Freeman,

1990, 1996; Prince & Eagle, 2000; Read, 2002; Sperling, 1970; Qian, 1997) restricted their attention to how left and right eye contours could be matched, but did not explain how this matching process spontaneously gives rise to continuous percepts in depth of

Trang 17

Kelly & Grossberg, 2000; McLoughlin & Grossberg, 1998) has proposed how 3D surface features, including lightness, can be represented as a result of interactions between boundary and surface processing streams Recently, a 3D LAMINART model has developed FACADE theory to predict how laminar circuits within the visual cortex generate 3D boundary and surface percepts and separate figures from their backgrounds (Cao & Grossberg, 2005; Grossberg & Howe, 2003; Grossberg & Swaminathan, 2004; Grossberg & Yazdanbakhsh, 2005) This thesis further develops the 3D LAMINART model to explain how 3D boundary and surface percepts occur, and partially occluded objects can be completed and correctly recognized as a whole, in response to both dense and sparse Random-Dot-Stereograms (RDS)

A vigorous recent discussion on CVNet, which was initiated by Jeremy Wilmer, summarized key aspects of the rich literature on the advantages of having binocular stereopsis This thesis sheds new mechanistic light on the properties of disparity-based depth percepts that were highlighted during this discussion For example, binocular stereopsis is needed to control action in 3D space that requires either high precision (e.g., threading a needle or soldering) or high speed (e.g., reaching for, grasping, or placing objects quickly), especially when obstacle avoidance is needed, as shown by behavioral studies that compare monocular and binocular conditions in stereopsis-normal people (Melmoth & Grant, 2006; Sheedy et al., 1986), or that compare stereopsis-normal people and stereopsis-deficit people (e.g., with amblyopia; see Agrawal et al., 2006; McKee et al., 2003) Stereopsis is also important for object segmentation and surface perception

Trang 18

camouflage Nakayama, Shimojo, & Silverman (1989) further suggested that stereopsis helps to determine border-ownership and, as a consequence, amodal completion and recognition of partially occlude objects In this regard, binocular stereopsis gives predatory animals evolutional advantages in hunting (Pettigrew, 1986; Regan, 1999)

Binocular visual experiences are important for the normal development of visual cortex of cat (Mitchell et al., 2003), and monkey (Sakai et al., 2006) Disparity-selective cells were initially found in cats’ striate cortex (Barlow, Blakemore, & Pettigrew, 1967), and in monkeys’ (Poggio & Fischer, 1977) Given the importance of stereo vision, it is not surprising that many stereopsis models have been developed, as summarized above Some models are non-biological, some are biologically inspired, and some are based on known properties of disparity-selective cortical cells Among them, perhaps one of the most biologically plausible models was developed by Qian and his colleagues (Chen & Qian, 2004; Qian, 1994, 1997; Qian & Zhu, 1997) They based their work on known phase-shift and position-shift receptive field profiles of disparity-selective cells in visual cortex, investigated the disparity calculation in the visual cortex through population coding, and explained various physiological data in cortical stereopsis processing However, Qian’s model, like most other models, restricted attention to how left and right eye contours could be matched, but did not explain how this matching process spontaneously gives rise to continuous percepts in depth of surface features, including lightness Although neurons accomplish disparity sensitivity by matching edges (Cumming & DeAngelis, 2001), object features are always seen on 3D surfaces infused

Trang 19

surface color can influence the percept of depth, and vice versa (Egusa, 1983; Gilchrist, 1977; Grossberg, 1994) Surface percepts of depth, including lightness and color, are thus

an essential constraint in cortical stereopsis processing, but have been largely ignored by previous models

Indeed, neural responses to the interior of any continuous surfaces are largely suppressed in the ganglia cells in the retinas, before signals are sent to LGN and other visual areas in the brain (Kandel, Schwartz, & Jessell, 2000) Land (1977, 1986; also Land & McCann, 1971) did a series of experiments showing that color patches in the

“Land-McCann Mondrians” look nearly unchanged even if the illuminating lights are changed wildly by mixing the frequency components differently Furthermore, two color patches look strikingly different even when the lights reflected from them are exactly the same, if only the illuminating condition is changed while the mandarin pattern itself keeps unchanged In other words, the subjects “see” the reflectance of color patches instead of the reflected light spectrum; i.e., the illuminants are discounted The function

of discounting the illuminant in early visual system has been proposed as early as the 19" century (Helmholtz, 1866), and it is essential in explaining perceptual phenomena of brightness constancy and brightness contrast (Grossberg, 1980) Land articulated that, human color perception of a surface interior is not determined by the spectral components that make up the light that it reflects, but by ratios across the color edges Such a claim is supported by the finding that neural coding in retinal ganglia cells is reliable only near the contrast edges (Kandel, Schwartz, & Jessell, 2000) One of the ecological reasons for

Trang 20

veridical perception of the world, when it is illuminated in widely different light conditions

If the neural coding for the surface interiors is not somehow reconstructed in the brain, we would see a hollow world consisting only of colorful edges Land (1977, 1986) proposed an algorithm called Retinex to calculate the perception of surface interiors from the ratios across color contrasts Despite its successes in explaining perceptual data, Retinex did not explain how the reconstruction of surface interiors is achieved in real brain Grossberg and his colleagues (Grossberg 1984, 1987; Grossberg & Mingolla, 1985a; Grossberg & Todorovié, 1988) have proposed a model called BCS/FCS They argued that visual information in the brain is processed in two parallel yet interacting systems, namely the BCS (Boundary Contour System) and the FCS (Feature Contour System), to generate visual percepts of 3D boundaries and surfaces The BCS and the FCS both receive inputs from subcortical area LGN, yet obey different computational rules Namely, the boundary signals are sensitive to orientation but not to contrast polarity, and are formed inwardly; while the feature signals are not sensitive to orientation but are sensitive to contrast polarity, and spread outwardly, as shown in Figure 4

Trang 21

Insensitive to contrast-polarity Sensitive to contrast-polarity

Figure 4 Complementary processing of visual information Boundary processing (left) is sensitive to

orientation, but not to contrast polarity, and boundaries are completed inwardly; i.e., boundary completion can be formed between two existing and approximately collinear boundaries Surface processing (right) is

insensitive to orientation, but sensitive to contrast polarity, and surface signals fill-in outwardly; i.e., surface signals can freely propagate to all directions until they encounter a boundary

This opposition in the computational rules that are followed by the boundary and the surface systems implies that these two systems are complementary to each other: Each of these two systems calculates a set of properties that belong to the visual scene, and are unselective to another set of properties that are calculated by the other system These two systems interact with each other to support the consistent processing of a visual scene Only by recruiting the complementary systems, our visual systems can be able to perform

a full-scale analysis of a visual scene

Trang 22

hierarchies: V1 interblobs, V2 pale stripes and part of V4 comprise the BCS; while V1 blobs, V2 thin stripes, and part of V4 comprise the FCS Such cortical bifurcation of visual processing is supported by physiological findings of different responsive properties

of neurons in different brain regions (Hubel & Livingstone, 1987) In FCS, feature signals near the contrast edges diffuse freely across space until they are blocked by the boundaries In this way, the interior of a surface is “filled-in” with appropriate brightness and color The filling-in of color has been previously studied by Yarbus (1967), and

“filling-in” was proposed to be the neural mechanism of the perceptual compensation of the blind spot and shadows of retinal veins Perceptual filling-in of color is also strikingly demonstrated in various visual illusions, such as neon color spreading (Van Tuijl, 1975), and the watercolor effect (Pinna & Grossberg, 2005)

As noted above, monocular cues, such as T-junctions, may be used to determine relative depth when one object is nearer than another object and occludes parts of the farther object (Howard & Rogers, 2002) Occlusion can also elicit strong 3D percepts when we view 2D images In addition to being perceived at a farther depth, the visible parts of an occluded object are often perceptually linked together behind the occluder, thereby supporting recognition of partially occluded objects (Nakayama, Shimojo, & Silverman, 1989) Such depth percepts for distant objects elicited by occlusion also have not been included in previous stereopsis models

This thesis proposes a neural model of how the subcortical Lateral Geniculate Nucleus (LGN) and laminar circuits within visual cortical areas V1, V2, and V4 work

Trang 23

together to give rise to correct 3D percepts, of binocular stimuli that contain disparity and occlusion information, including surface depth and lightness In particular, the model illustrates, in seeing a scene that contains occlusion, how border-ownership is determined, how boundary and surface representations of partially occluded objects are amodally completed, and how recognition and visibility of these objects are supported in different cortical areas It does this for the case of stereogram percepts

RDS (Random-Dot-Stereogram) were introduced by Julesz (1971) to demonstrate that binocular disparity is processed in early vision and can raise percepts of 3D forms without monocular cues Previous models used RDS to study how binocular matching is performed, but few have been studied on either surface percepts or figure-ground percepts in seeing it (Grimson, 1981; Marr & Poggio, 1976; Qian, 1994) In this thesis, three kinds of RDS are studied and simulated Each of them presents a distinct challenge

to the cortical processing of stereopsis, thus illuminating crucial constraints in the modeling that were largely ignored by previous studies, as follows:

Dense RDS Dense RDS contain crowded features They challenge the brain by creating many false binocular matches, and making the classical correspondence problem hard to solve (Howard & Rogers, 2002; Julesz, 1971) Figure 5 illustrates how the enhanced 3D LAMINART model separates objects and their surface features in depth in response to a dense stereogram.

Trang 24

Figure 5 Simulation of dense random-dot-stereogram: (a) Retinal inputs The left and central images are

for uncrossed fusion, while the central and right images are for crossed fusion This convention for displaying the stereograms is used through out the thesis When two images are successfully fused, two

textured “L” shaped bars are perceived to be floating above a textured background Furthermore, the upright “L” shaped bar on the left side is perceived to be nearer than the reversed one on the right side (b)

Boundaries represented by the activation patterns of model V1 binocular complex celis There are many

false matches on every depth plane (c) Boundaries represented by the activation patterns of model cells in layer 2/3 of model V2 pale stripes Most false matches are suppressed, while correct ones are preserved

The correspondence problem is thus solved (d) Visible surfaces represented by the activation patterns of model V4 cells Filling-in of visible surface features is controlled by the boundaries in (c) See text for

details

Trang 25

Sparse RDS Sparse RDS contain widely separated features They challenge the brain’s ability to assign definite surface depths to large feature-absent image regions, whose depth is locally ambiguous Local filtering of contrast features can compute only binocular disparities at the matched edges of the sparse image features Such filtering, by itself, cannot explain the percept of an entire 3D surface in depth that is elicited by these features The model proposes how long-range boundary completion by a 3D perceptual grouping process responds to the filtered sparse features to form connected boundaries at multiple depths These boundaries induce and contain the selective filling-in of surface lightness at its depth, thus lifting entire surfaces to their correct depths, a process that is called 3D surface capture See Figure 6.

Trang 26

Surface, Near, V2 Surface, Fixation, V2 Surface, Far, V2

Surface, Near, V4 Surface, Fixation, V4 Surface, Far, V4

Figure 6 Simulation of sparse random-dot-stereogram: (a) Retinal inputs When two images are successfully fused, a white square dotted with black is perceived to be floating over a white background that is also dotted with black The square is perceived to be slightly whiter than the background The question is why the feature-absent spaces between the dots are perceived in the correct depths (b) Boundaries represented by the activation patterns of model cells in layer 2/3 of mode] V2 pale stripes Illusory boundaries formed by a cooperative grouping network in the V2 pale stripes connect the spatially

sparse dots and enable the outline of a big square be recognizable in the boundary stream (c) Surfaces

represented by the activation patterns of model cells in the V2 thin stripes: A square-shaped surface with black dots is captured at the near depth, and separated from the background surface at the far depth

Illusory boundaries contain the filling-in of whiteness within the square, create illusory contrasts at the corresponding depth, and make the square recognizable in the surface stream (d) Visible surfaces

represented by the activation patterns of V4 cells: At the fixation depth plane, the illusory boundaries contain the filling-in of white outside the black dots and within the square, while the dots are filled-in with blackness Thus both the dots and the whole square surface are visible at the fixation depth At the far depth, the region occluded by the central square formed at the fixation depth is prohibited from being filled-in (the gray level there corresponds to zero neural activity), and explains why the square looks opaque instead of transparent The rest of the background surface is also filled-in with white that makes it

visible at the far depth See text for details

Trang 27

Dense RDS that implicitly define occlusion When the retinal projection of a nearer opaque object superimposes on that of a farther object, occlusion occurs and it makes the farther object only partially visible Occlusion is such a primary clue for depth perception that it can evoke vivid 3D percepts even from 2D images (Howard & Rogers, 2002) Dense RDS that implicitly define a partially occluded object challenge the brain by requiring many small-scale features at the depth of the occluded object to be suppressed

at the perceived positions of the occluder, while large-scale groupings form behind the occluder and perceptually link the visible parts of the occluded object These completed boundaries are amodally recognized, but they are not seen with visible surface lightness The model explains how this is accomplished by multiple-scale boundary processing See Figure 7,

Trang 28

be nearer than a textured horizontal bar and to occlude the central part of the horizontal bar, with both bars

nearer than the textured background The visible parts of the textured horizontal bar on the flanks of the vertical bar are perceived to be perceptually grouped together to give the percept of a partially occluded bar, instead of two separated squares (b) Small-scale boundaries represented by the activation patterns of

model cells with small receptive field sizes in layer 2/3 of model V2 pale stripes The boundaries that belong to the texture compartments are separated in depth (c) Large-scale boundaries represented by the activation patterns of model cells with large receptive field sizes in layer 2/3 of model V2 pale stripes Instead of forming texture compartmental boundaries, the large-scale boundaries form at the borders of

textured figures The boundary of a horizontal bar at the fixation depth explains the amodal perceptual grouping of the two spatially separated textured squares and their recognition as parts of a single bar The vertical boundary at the near depth inhibits the boundaries at the fixation depth of the right side of the left square and the left side of the right square, since they share borders with the vertical bar outline at the near depth (d) Visible surfaces represented by the activation patterns of model cells in V4 See text for details

Trang 29

What refinements of the 3D LAMINART model have been needed to meet the challenges of the previous three types of 3D percepts?

Eliminating Spurious Boundaries and Surface-to-Boundary Feedback First and foremost, the 3D LAMINART model has been extended to deal with the challenges that arise when there are many possible false matches between left and right eye inputs, and also many spurious horizontal boundaries that are represented at multiple depths Concerning the latter, although the original 3D LAMINART model can correctly simulate many 3D surface percepts, it does not fully explain how to process horizontal boundaries that contain no binocular information Instead, the model can leave spurious copies of such horizontal boundaries at the wrong depths (Grossberg, 1994) Even when these spurious boundaries do not degrade the quality of 3D surface representations of objects, they can interfere with successful object recognition that is based upon boundary representations (Biederman & Ju, 1988; Davidoff, 1991; Grossberg & Williamson, 1999) Figure 8 compares how horizontal boundaries are computed by the previous 3D LAMINART model of Grossberg and Howe (2003) and the present model refinement.

Trang 30

boundaries are suppressed

In order to eliminate these spurious horizontal boundaries, the model was extended to include feedback between boundary and surface representations that is predicted to occur between the V1 interblobs and blobs, respectively In particular, successfully filled-in monocular surfaces in the blobs are predicted to send contour-sensitive surface-to- boundary feedback signals to V1 interblobs These surface-to-boundary signals modulate the activities of V1 binocular complex cells so that the boundaries that they form around successfully filled-in surfaces are selectively enhanced, including the horizontal boundaries

Trang 31

This feedback interaction extends to V1 a type of feedback between boundaries and surfaces that was earlier predicted to occur between V2 pale stripes and thin stripes (Grossberg, 1994) Both the V1 and the V2 feedback interactions have the role, first and foremost, of realizing consistent boundary and surface representations, even though the computational rules that these processes obey are complementary (Grossberg, 1994, 2003) It has previously been shown that this consistency operation also has the effect of initiating 3D separation of figures from each other and their backgrounds (Grossberg, 1994; Kelly & Grossberg, 2000) The present extension enhances the overall symmetry of model interactions at different cortical levels, while also facilitating 3D figure-ground separation by enabling separation of dense figural boundaries in depth

The following question naturally arises: why is not feedback between V2 boundaries and surfaces sufficient? How does V1 feedback do more? The main point is to explain how horizontal boundaries are enhanced at their correct depths before they reach the V2 disparity filter, which helps to suppress false matches and to thereby solve the correspondence problem The disparity filter is part of V2 boundary processing, which occurs before V2 surface-to-boundary feedback can occur Without the V1 boundary enhancement, the disparity filter can select the wrong horizontal boundaries, as is explained below

Multiple-Scale Processing and Amodal Completion of Occluded Forms The FACADE model predicted neural mechanisms that could qualitatively explain 3D figure- ground separation, and how a partially occluded object can be completed and correctly recognized as a whole (Grossberg 1994, 1997), notably how T-junctions can induce 3D

Trang 32

figure-ground separation Recent development of the 3D LAMINART model has shown how these mechanisms can simulate challenging data about 3D transparency and neon color spreading (Grossberg & Yazdanbakhsh, 2005) and about bistable 3D percepts, such

as the Necker cube (Grossberg & Swaminathan, 2004) Recent psychophysical data have provided additional experimental support for these predicted mechanisms (e.g., Dresp, Durand, & Grossberg, 2002; Pinna & Grossberg, 2005; Tse, 2005; Yazdanbakhsh & Watanabe, 2004) This thesis shows how these ideas can be extended to a multiple-scale boundary grouping process Such multiple-scale grouping is needed to understand how,

on the one hand, the small features of a RDS can be binocularly matched and grouped using small-scale receptive field sizes, while the amodally perceived boundary behind the occluder can be completed by a large-scale grouping process The following discussion summarizes the nature of the computational dilemma that the brain seems to solve in this way

a

Figure 9 Examples of amodal completion: (a) Amodal boundary representation: the offset black horizontal lines induce an amodal percept of a vertical boundary that can be recognized even though it does not generate a visible brightness difference (b) While a horizontal bar occludes a vertical bar, visible parts of

the vertical bar are linked and amodally perceived as parts of a continuous bar (c) Amodal surface

representation: as Kanizsa (1979) noted, amodal completion behind the disks does not lead to the more

"likely" perception of squares that the checkerboard would suggest according to a Bayesian theory Instead,

one is amodally aware of a white cross and a black cross that are partially occluded by the gray disks

Trang 33

Figure 9 shows three examples of amodal completion In Figure 9a, the illusory vertical boundary between the two displaced line gratings is easy to recognize even without separating any clear lightness, brightness, color, or depth difference in the percept It is amodal In Figure 9b, the boundaries of the partially occluded vertical rectangle are completed behind the horizontal rectangular occluder These completed vertical boundaries can be recognized but not seen They are also amodal Figure 9c makes the additional point that boundaries and surfaces may both be completed amodally Here, a black (lower right) or white (upper left) cross-like boundary is completed behind the occluding gray disks In addition to being able to amodally recognize the completed boundaries, a viewer can also amodally recognize that the completed square-like surface shape is black or white, respectively, behind its occluding disk This percept suggests that,

in some visual areas, surface representations can also be amodally completed behind occluders, and with the appropriate color These surface representations are not, however, visible If they were, then the occluding disks would look transparent More generally, all occluding surfaces would look transparent Grossberg (1994, 1997) predicts how and why boundary and surface representations form in such a way that all boundary representations are amodal within the cortical boundary processing stream (through V1 interblobs and V2 pale stripes) and some surface representations are amodal (predicted to

be in the V2 thin stripes), whereas some surface representations are visible, or modal (in V4)

Link between Amodal Completion, Recognition, and Visible Surface Percepts Amodal boundaries and surfaces enable the brain to recognize amodally completed object

Trang 34

representations in depth Modal surface representations enable the brain to see the unoccluded parts of opaque object surfaces in depth If the brain could not make these distinctions, either partially occluded objects could not be recognized, or all occluders would look transparent It has elsewhere been predicted that this arrangement enables occluders and partially occluded objects to be better recognized via V2-to-IT projections, and unoccluded object parts to be recognized and grasped via outputs from V4 (Grossberg, 1994)

Many examples illustrate that, when occlusion occurs, figure and ground can be perceptually detached from each other in such a way that both can be better recognized For example in Figure 10, the partial “B” letter forms in the Bregman-Kanizsa image (Bregman, 1981; Kanizsa, 1979) are much better recognizable when they co-exist with the black snakelike occluder (Figure 10b) than without it (Figure 10c), although in both cases the visible parts of the letters are identical The widely accepted psychological explanation is: The shared borders between the occluding and occluded figures are attributed only to the occluding figure, and are somehow detached from the occluded figure, a process called border ownership (Anderson & Julesz, 1995; Bregman, 1981; Grossberg, 1994; Kanizsa, 1979; Nakayama, Shimojo, & Silverman, 1989) The visible parts of the partly occluded object can hereby be grouped together and recognized as a whole without interference from the shared borders In particular, when the occluder boundaries are removed from the occluded object parts and the occluded object boundaries are formed within a boundary representation that codes for a more distant depth, then the occluded boundaries are free to be collinearly completed at that depth.

Trang 35

easily recognized when they are partially occluded by a black snakelike occluder (c) The same B shapes as

in B, except the occluder is white and therefore merges with the remainder of the white background Although the exposed portions of the letters are identical in (b) and (c), they are much easier to recognize in (b) This difference in recognition correlates with the percept that the black occluder pops out in front of the

gray B fragments, thereby enabling the gray B fragments to be amodally completed behind the black

occluder The black occluder also appears to exclusively own the shared boundaries between it and the B fragments

Neurophysiological studies show that V2 thin stripes and pale stripes all project heavily

to IT (Baizer, Ungerleider, & Desimone, 1991; Seltzer & Pandya, 1978) which carries out aspects of object recognition (Desimone 1991; Gross et al., 1985; Rolls, 2000; Ungerleider & Mishkin, 1982) The model proposes how the V2 thin stripes implement monocular Surface Representations, or Filling-In-Domains (FIDOs), in which amodal surface representations of objects are formed through a surface filling-in process These

surfaces can be recognized in IT and beyond; see Figures 9b and 9c Modal surface representations are predicted to occur in binocular Surface Representations, or FIDOs, in V4.

Trang 36

The model also proposes how all boundaries, including horizontal boundaries, are assigned to different depth-selective boundary representations in the V2 pale stripes; see Figures 5-7 The assignment of the correct depths to horizontal boundaries helps to ensure consistency between the boundary and surface representations (Grossberg, 1994), and facilitates their object recognition in IT

Mechanisms of Figure-Ground Separation for Amodal and Modal Percepts The 3D LAMINART model, and the FACADE model before it (Grossberg, 1994), predicted that several cortical mechanisms work together to accomplish 3D figure-ground separation and perceptual completion of partially occluded figures The following mechanisms are more completely explained in Chapter 2: (a) surface-to-boundary feedback ensures boundary and surface consistency, and also initiates figure-ground separation, including the assignment of border ownership, using near-to-far inhibition from surface to boundary representations; (b) boundary completion during the 3D perceptual grouping process builds amodally completed boundaries that can contain filling-in of surface features within connected boundary compartments in the surface processing stream; and (c) boundary enrichment and surface-feature pruning in V4 ensure that only the unoccluded parts of opaque surfaces are filled-in and seen An example of these processes at work is shown in Figure 11 Chapter 5.1 discusses the mechanisms whereby these properties are achieved

Trang 37

a Left Eye Input Right Eye Input

Figure 11 3D figure-ground separation and amodal completion: (a) Retinal images, left

b V2 boundary, Near V2 boundary, Far and right, respectively; (b) Boundary

= ˆ representation in V2: at the far depth,

horizontal boundaries belonging to the

occluded surface are completed, and all

suppressed The complete boundaries at the

far depth support IT to recognize the occluded object as a whole (c) Amodal

€ V2 surface, Near V2 surface, Far surface representation in V2: a complete

surface is captured and infused with grayness

at the far depth It supports IT to recognize the occluded surface as a continuous gray bar (d) Modal surface representation in V4: pruned surface features are filled-in within enriched boundaries and created visible

V4 surface, Near V4 surface, Far

Coexistence of Near-to-Far Inhibition and Amodal Boundary Completion The above

Trang 38

scale, boundaries of the small-scale texture compartments are mostly suppressed while large-scale borders of the textured figures are registered The model predicts that multiple-scale boundaries in the V1 interblobs and V2 pale stripes interact with filled-in surfaces in the V1 blobs and V2 thin stripes, respectively, through surface-to-boundary feedback, to realize these properties

The thesis is organized in the following way: Chapter 2 summarizes essential characteristics of cortical processing of stereopsis, and how the model constraints itself to catch these characteristics Chapter 3 describes six model components one by one Chapter 4 consists of mathematical equations of the model Chapter 5 illustrates computer simulations of the model, in response to four different stereograms Finally, Chapter 6 discusses the anatomical and physiological data that support the model, how the model achieves its explanatory power with minimum complexity, and the hypotheses and predictions made by the model

Chapter 2

Stereopsis processing and model constraints

The circuit diagram of the model and its functional neuroanatomical interpretation are illustrated in Figures 12 and 13 As shown in Figure 12, the model contains two complementary processing streams: the Boundary Contour System and the Feature Contour System The BCS includes the V1 interblobs, V2 pale stripes, and part of V4 The FCS includes the V1 blobs, V2 thin stripes, and part of V4 Both the BCS and the FCS receive illuminant-discounted signals from LGN cells with center-surround

Trang 39

receptive fields Both BCS and FCS converge in V4, where visible 3D surfaces are perceived which have been figure-ground separated (Felleman & van Essen, 1991; Schiller, 1994, 1995; Schiller & Lee, 1991; Zeki, 1983a, 1983b) They also output to IT (not shown) where object recognition takes place

V1 interblobs binocularly match left and right boundary contours; V1 blobs fill-in monocular 3D surfaces that are captured by V1 binocular boundaries and send feedback

to V1 interblobs to enhance consistent boundaries in V1 interblobs; V2 pale stripes form complete 3D boundaries in which spurious boundaries have been eliminated; V2 thin stripes fill-in complete monocular 3D surfaces and send feedback to enhance consistent boundaries in V2 pale stripes; boundaries and surfaces formed in V2 are amodal, and sent

to IT for object recognition; V4 fills-in binocular figure-ground separated 3D surfaces that support visible 3D percepts, and emits output signals that lead to recognition and grasping of unoccluded object parts.

Trang 40

mi +4 >| iw

Š 58 at i ee sẽ

sẻ VI VI VI sẻ

| > >

Left.Eye Right Eye

Figure 12 3D LAMINART macrocircuit It illustrates the interactions between model components Retina/LGN and cortical areas V1, V2, and V4.

Tiêu đề	A Laminar Cortical Model of Three-Dimensional Surface Perception and Figure-Ground Separation: Stereogram Depth, Lightness, and Amodal Completion
Tác giả	Liang Fang
Người hướng dẫn	Stephen Grossberg, Ph.D., Ennio Mingolla, Ph.D., Gail Carpenter, Ph.D.
Trường học	Boston University
Chuyên ngành	Cognitive and Neural Systems
Thể loại	Dissertation
Năm xuất bản	2007
Thành phố	Boston

Định dạng
Số trang	137
Dung lượng	5,96 MB