Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 39 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
39
Dung lượng
462,16 KB
Nội dung
Intelligent Image Processing Steve Mann Copyright 2002 John Wiley & Sons, Inc ISBNs: 0-471-40637-6 (Hardback); 0-471-22163-5 (Electronic) THE EYETAP PRINCIPLE: EFFECTIVELY LOCATING THE CAMERA INSIDE THE EYE AS AN ALTERNATIVE TO WEARABLE CAMERA SYSTEMS This chapter discloses the operational principles of the EyeTap reality mediator, both in its idealized form and as practical embodiments of the invention The inner workings of the reality mediator, in particular, its optical arrangement, are described 3.1 A PERSONAL IMAGING SYSTEM FOR LIFELONG VIDEO CAPTURE A device that measures and resynthesizes light that would otherwise pass through the lens of an eye of a user is described The device diverts at least a portion of eyeward-bound light into a measurement system that measures how much light would have entered the eye in the absence of the device In one embodiment, the device uses a focus control to reconstruct light in a depth plane that moves to follow subject matter of interest In another embodiment, the device reconstructs light in a wide range of depth planes, in some cases having infinite or nearinfinite depth of field The device has at least one mode of operation in which it reconstructs these rays of light, under the control of a portable computational system Additionally the device has other modes of operation in which it can, by program control, cause the user to experience an altered visual perception of reality The device is useful as a visual communications system, for electronic newsgathering, or to assist the visually challenged 3.2 THE EYETAP PRINCIPLE The EyeTap reality mediator is characterized by three components: a lightspace analysis system; a lightspace modification system; and a lightspace synthesis system 64 THE EYETAP PRINCIPLE 65 To understand how the reality mediator works, consider the first of these three components, namely the device called a “lightspace analyzer” (Fig 3.1) The lightspace analyzer absorbs and quantifies incoming light Typically (but not necessarily) it is completely opaque It provides a numerical description (e.g., it turns light into numbers) It is not necessarily flat (e.g., it is drawn as curved to emphasize this point) The second component, the lightspace modifier, is typically a processor (WearComp, etc.) and will be described later, in relation to the first and third components The third component is the “lightspace synthesizer” (Fig 3.2) The lightspace synthesizer turns an input (stream of numbers) into the corresponding rays of light Now suppose that we connect the output of the lightspace analyzer to the input of the lightspace synthesizer (Fig 3.3) What we now have is an illusory transparency Incoming rays of light 10011000 Numerical description Figure 3.1 Lightspace analyzer absorbs and quantifies every ray of incoming light It converts every incoming ray of light into a numerical description Here the lightspace analyzer is depicted as a piece of glass Typically (although not necessarily) it is completely opaque Outgoing synthetic (virtual) light 10011000 Numerical description Figure 3.2 The lightspace synthesizer produces rays of light in response to a numerical input An incoming numerical description provides information pertaining to each ray of outgoing light that the device produces Here the lightspace synthesizer is also depicted as a special piece of glass 66 THE EYETAP PRINCIPLE: EFFECTIVELY LOCATING THE CAMERA Lightspace analysis Lightspace synthesis Outgoing synthetic (virtual) light Incoming real light 10011000 10011000 Figure 3.3 Illusory transparency formed by connecting the output of the lightspace analysis glass to the input of the lightspace synthesis glass Outgoing (synthesis) Incoming (analysis) 10011000 10011000 Figure 3.4 Collinear illusory transparency formed by bringing together the analysis glass and the synthesis glass to which it is connected Moreover suppose that we could bring the lightspace analyzer glass into direct contact with the lightspace synthesizer glass Placing the two back-to-back would create a collinear illusory transparency in which any emergent ray of virtual light would be collinear with the incoming ray of real light that gave rise to it (Fig 3.4) Now a natural question to ask is: Why make all this effort in a simple illusion of transparency, when we can just as easily purchase a small piece of clear glass? The answer is the second component, the lightspace modifier, which gives us the ability to modify our perception of visual reality This ability is typically achieved by inserting a WearComp between the lightspace analyzer and the lightspace synthesizer (Fig 3.5) The result is a computational means of altering the visual perception of reality PRACTICAL EMBODIMENTS OF EYETAP 67 Outgoing (synthesis) Incoming (analysis) WearComp Figure 3.5 Reality mediator satisfying the collinearity (EyeTap) condition In summary: A lightspace analyzer converts incoming light into numbers A lightspace modifier (i.e., a processor that is typically body-worn) alters the lightspace by processing these numbers A lightspace synthesizer converts these numbers back into light 3.2.1 ‘‘Lightspace Glasses’’ A visor made from the lightspace analysis glass and lightspace synthesis glass could clearly be used as a virtual reality (VR) display because of the synthesis capability It could absorb and quantify all the incoming rays of light and then simply ignore this information, while the synthesis portion of the glass could create a virtual environment for the user (See Fig 3.6, top panel.) Now, in addition to creating the illusion of allowing light to pass right through, the visor also can create new rays of light, having nothing to with the rays of light coming into it The combined illusion of transparency and the new light provides the wearer with an AR experience (Fig 3.6, middle panel) Finally, the glasses could be used to alter the perception of visual reality, as described previously in this chapter and the previous chapter (Fig 3.6, bottom panel) Thus VR is a special case of AR which is a special case of MR 3.3 PRACTICAL EMBODIMENTS OF EYETAP In practice, there are other embodiments of this invention than the one described above One of these practical embodiments will now be described 68 THE EYETAP PRINCIPLE: EFFECTIVELY LOCATING THE CAMERA Visor made of ideal ‘lightspace’ glass Real (actual) objects User Virtual reality (VR) User Augmented reality (AR) User Mediated reality (MR) Figure 3.6 Eyeglasses made from lightspace analysis and lightspace synthesis systems can be used for virtual reality, augmented reality, or mediated reality Such a glass, made into a visor, could produce a virtual reality (VR) experience by ignoring all rays of light from the real world, and generating rays of light that simulate a virtual world Rays of light from real (actual) objects indicated by solid shaded lines; rays of light from the display device itself indicated by dashed lines The device could also produce a typical augmented reality (AR) experience by creating the ‘‘illusion of transparency’’ and also generating rays of light to make computer-generated ‘‘overlays.’’ Furthermore it could ‘‘mediate’’ the visual experience, allowing the perception of reality itself to be altered In this figure a less useful (except in the domain of psychophysical experiments) but illustrative example is shown: objects are left-right reversed before being presented to the viewer A display system is said to be orthoscopic when the recording and viewing arrangement is such that rays of light enter the eye at the same angle as they would have if the person viewing the display were at the camera’s location The concept of being orthoscopic is generalized to the lightspace passing through the reality-mediator; the ideal reality-mediator is capable of being (and thus facilitates): PRACTICAL EMBODIMENTS OF EYETAP 69 orthospatial (collinear) a orthoscopic b orthofocal orthotonal a orthoquantigraphic (quantigraphic overlays) b orthospectral (nonmetameric overlays) orthotemporal (nonlagging overlays) An ideal reality mediator is such that it is capable of producing an illusion of transparency over some or all of the visual field of view, and thus meets all of the criteria above Although, in practice, there are often slight, (and sometimes even deliberate large) deviations from these criteria (e.g., violations of the orthotemporal characteristic are useful for embodiments implementing a photographic/videographic memory recall, or “WearCam flashbacks” [61]), it is preferable that the criteria be achievable in at least some modes of operation Thus these criteria must be met in the system design, so that they can be deliberately violated at certain specific instants This is better than not being able to meet them at all, which takes away an important capability Extended time periods of use without being able to meet these criteria have a more detrimental effect on performing other tasks through the camera Of course, there are more detrimental flashbacks upon removal of the camera after it has been worn for many hours while doing tasks that require good hand-to-eye coordination 3.3.1 Practical Embodiments of the Invention The criteria listed above are typically only implemented in a discrete sense (e.g., discrete sampling of a discrete frame rate, which itself imposes limitations on sense of transparency, just as in virtual reality [62]) Typically the apparatus turns the lightspace into a numerical description of finite word length, and finite sampling for processing, after which the processed numerical description is converted back to the lightspace, within the limitations of this numerical representation 3.3.2 Importance of the Collinearity Criterion The most important criterion is the orthospatial criterion for mitigation of any resulting mismatch between viewfinder image and the real world that would otherwise create an unnatural mapping Indeed, anyone who has walked around holding a small camcorder up to his or her eye for several hours a day will obtain an understanding of the ill psychophysical effects that result Eventually such adverse effects as nausea, and flashbacks, may persist even after the camera is removed There is also the question as to whether or not such a so-called 70 THE EYETAP PRINCIPLE: EFFECTIVELY LOCATING THE CAMERA mediated reality might, over a long period of time, cause brain damage, such as damage to the visual cortex, in the sense that learning (including the learning of new spatial mappings) permanently alters the brain This consideration is particularly important if one wishes to photograph, film, or make video recordings of the experience of eating or playing volleyball, and the like, by doing the task while concentrating primarily on the eye that is looking through the camera viewfinder Indeed, since known cameras were never intended to be used this way (to record events from a first-person perspective while looking through the viewfinder), it is not surprising that performance of any of the apparatus known in the prior art is poor in this usage The embodiments of the wearable camera system sometimes give rise to a small displacement between the actual location of the camera, and the location of the virtual image of the viewfinder Therefore either the parallax must be corrected by a vision system, followed by 3D coordinate transformation, followed by rerendering, or if the video is fed through directly, the wearer must learn to make this compensation mentally When this mental task is imposed upon the wearer, when performing tasks at close range, such as looking into a microscope while wearing the glasses, there is a discrepancy that is difficult to learn, and it may give rise to unpleasant psychophysical effects such as nausea or “flashbacks.” If an eyetap is not properly designed, initially one wearing the eyetap will tend to put the microscope eyepiece up to an eye rather than to the camera, if the camera is not the eye As a result the apparatus will fail to record exactly the wearer’s experience, unless the camera is the wearer’s own eye Effectively locating the cameras elsewhere (other than in at least one eye of the wearer) does not give rise to a proper eyetap, as there will always be some error It is preferred that the apparatus record exactly the wearer’s experience Thus, if the wearer looks into a microscope, the eyetap should record that experience for others to observe vicariously through at least one eye of the wearer Although the wearer can learn the difference between the camera position and the eye position, it is preferable that this not be required, for otherwise, as previously described, long-term usage may lead to undesirable flashback effects 3.3.3 Exact Identity Mapping: The Orthoscopic Reality Mediator It is easy to imagine a camera connected to a television screen, and carefully arranged in such a way that the television screen displays exactly what is blocked by the screen so that an illusory transparency results Moreover it is easy to imagine a portable miniature device that accomplishes this situation, especially given the proliferation of consumer camcorder systems (e.g., portable cameras with built in displays), see Figure 3.7 We may try to achieve the condition shown in Figure 3.7 with a handheld camcorder, perhaps miniaturized to fit into a helmet-mounted apparatus, but it is impossible to line up the images exactly with what would appear in the absence of the apparatus We can better understand this problem by referring to Figure 3.8 In Figure 3.8 we imagine that the objective lens of the camera is much larger than PRACTICAL EMBODIMENTS OF EYETAP 71 de dc 1E 22 22i 10VF 1D 10 1C 23F 23N OA Hitachi Whitevideo camera R B 39 2C 23 2D 2E Figure 3.7 A modern camcorder (denoted by the reference numeral 10 in the figure) could, in principle, have its zoom setting set for unity magnification Distant objects 23 appear to the eye to be identical in size and position while one looks through the camcorder as they would in the absence of the camcorder However, nearby subject matter 23 N will be distance de , which is closer to the effective center of projection of the camcorder than distance de to the effective center of projection of the eye The eye is denoted by reference numeral 39, while the camera iris denoted 22i defines the center of projection of the camera lens 22 For distant subject matter the difference in location between iris 22i and eye 39 is negligible, but for nearby subject matter it is not Therefore nearby subject matter will be magnified as denoted by the dotted line figure having reference numeral 23 F Alternatively, setting the camcorder zoom for unity magnification for nearby subject matter will result in significantly less than unity magnification for distant subject matter Thus there is no zoom setting that will make both near and far subject matter simultaneously appear as it would in the absence of the camcorder it really is It captures all eyeward bound rays of light, for which we can imagine that it processes these rays in a collinear fashion However, this reasoning is pure fiction, and breaks down as soon as we consider the scene that has some depth of field, such as is shown in Figure 3.9 Thus we may regard the apparatus consisting of a camera and display as being modeled by a fictionally large camera opening, but only over subject matter confined to a plane Even if the lens of the camera has sufficient depth of focus to form an image of subject matter at various depths, this collinearity criterion will only hold at one such depth, as shown in Figure 3.10 This same argument may be made for the camera being off-axis Thus, when the subject matter is confined to a single plane, the illusory transparency can be sustained even when the camera is off-axis, as shown in Figure 3.11 Some real-world examples are shown in Figure 3.12 An important limitation is that the system obviously only works for a particular viewpoint and for 72 THE EYETAP PRINCIPLE: EFFECTIVELY LOCATING THE CAMERA 1F 10C 1E 24B 24C 32A 32C 1C 39 1D OA 2D 22 2C 22F 23 2E Eye 32B 24A 40 = Trivially inverted 32A 24B 10D 24 24C 32C 32B 24A Figure 3.8 Suppose that the camera portion of the camcorder, denoted by reference numeral 10C, were fitted with a very large objective lens 22F This lens would collect eyeward bound rays of light 1E and 2E It would also collect rays of light coming toward the center of projection of lens 22 Rays of light coming toward this camera center of projection are denoted 1C and 2C Lens 22 converges rays 1E and 1C to point 24A on the camera sensor element Likewise rays of light 2C and 2E are focused to point 24B Ordinarily the image (denoted by reference numeral 24) is upside down in a camera, but cameras and displays are designed so that when the signal from a camera is fed to a display (e.g., a TV set) it shows rightside up Thus the image appears with point 32A of the display creating rays of light such as denoted 1D Ray 1D is collinear with eyeward bound ray 1E Ray 1D is response to, and collinear with ray 1E that would have entered the eye in the absence of the apparatus Likewise, by similar reasoning, ray 2D is responsive to, and collinear with, eyeward bound ray 2E It should be noted, however, that the large lens 22F is just an element of fiction Thus lens 22F is a fictional lens because a true lens should be represented by its center of projection; that is, its behavior should not change other than by depth of focus, diffraction, and amount of light passed when its iris is opened or closed Therefore we could replace lens 22F with a pinhole lens and simply imagine lens 22 to have captured rays 1E and 2E, when it actually only captures rays 1C and 2C subject matter in a particular depth plane This same setup could obviously be miniaturized and concealed in ordinary looking sunglasses, in which case the limitation to a particular viewpoint is not a problem (since the sunglasses could be anchored to a fixed viewpoint with respect to at least one eye of a user) However, the other important limitation, that the system only works for subject matter in the same depth plane, remains 73 PRACTICAL EMBODIMENTS OF EYETAP 22F 22 de 23A dc 23FA 1E 1C 32A 32AA 1F OA 23C 24B 23NA 1D 23N 2D 32C 32B 24C 24A 2C 23 39 23F 2E 23B 32C 32B 2F Figure 3.9 The small lens 22 shown in solid lines collects rays of light 1C and 2C Consider, for example, eyeward bound ray of light 1E, which may be imagined to be collected by a large fictional lens 22F (when in fact ray 1C is captured by the actual lens 22), and focused to point 24A The sensor element collecting light at point 24A is displayed as point 32A on the camcorder viewfinder, which is then viewed by magnifying lens and emerges as ray 1D into eye 39 It should be noted that the top of nearby subject matter 23N also images to point 24A and is displayed at point 32A, emerging as ray 1D as well Thus nearby subject matter 23N will appear as shown in the dotted line denoted 23F, with the top point appearing as 23FA even though the actual point should appear as 23NA (e.g., would appear as point 23NA in the absence of the apparatus) 1F 10C 23A 23T 24B 1E 24C 1C 3C OA 23M 23 2D 22 2C Eye 32B 10D 2E 23B 39 1D 3E 23C 32A 32C 23N 22F 2F 24A 10B Figure 3.10 Camera 10C may therefore be regarded as having a large fictional lens 22F, despite the actual much smaller lens 22, so long as we limit our consideration to a single depth plane and exclude from consideration subject matter 23N not in that same depth plane ... FOCUSER EYE EYE P3 FOCUS CONTROLLER P3 FOCUS CONTROLLER PROC PROC (a ) (b ) Figure 3.15 Focus tracking aremac (a) With a NEARBY SUBJECT, a point P0 that would otherwise be imaged at P3 in the EYE