Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 41 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
41
Dung lượng
484,14 KB
Nội dung
TERMINAL SOFTWARE PLATFORM TECHNOLOGIES 223 Applications Adaptive Reliability Manager Adaptation Policy Strategy Selector Strategy Replacement Manager Event Monitor Mervlet Application RMS Failure Free Strategy Failure Free Strategy Recovery Recovery Strategy Adapter Adapter FFI RI Fault Tolerant Strategies Preference Manager Capability Profiler Failure Model Reliability Guarantee Strategy Decision Events to Monitor Event Fired Fault Tolerance Metadata Strategies Capabilities Switch/ Add Monitor Figure 7.14 Reliability support in AOE by getting the current failure-free strategy first and then calling the desired method (e.g., doPost and doGet) on the strategy. Recoverable Mervlets allow the same application to have different fault-tolerance mechanisms during different contexts. For example, the Web Mail application may be configured to be more reliable for corporate e-mail than personal e-mail. Dynamic reconfigurability support in fault tolerance is achieved by allowing the two main components, the RMS and the Recoverable Mervlet, to have different failure-free and recovery strategies, which can be set dynamically by the ARM (shown in Figure 7.14). The separation between failure-free and recovery strategies helps in developing multiple recovery strategies corresponding to a failure-free strategy. For example, in case of RMS, one recovery strategy may prioritize the order in which messages are recovered, while another recovery strategy may not. In our current implementation, the adaptability in fault-tolerance support is reflected in the ability to dynamically switch on and off server-side logging depending on current server load. Under high server load, the ARM can reconfigure the RMS to stop logging on the server side. In some cases, this can result in marked improvement in the client perceived response time. 7.7 Conclusions The evolution of handheld devices clearly indicates that they are becoming highly relevant in users’ everyday activities. Voice transmission still plays a central role but machine-to- machine interaction is becoming important and it is poised to surpass voice transmission. This data transmission is triggered by digital services running on the phone as well as on the network that allow users to access data and functionality everywhere and at anytime. 224 TERMINAL SOFTWARE PLATFORM TECHNOLOGIES This digital revolution requires a middleware infrastructure to orchestrate the services running on the handhelds, to interact with remote resources, to discover and announce data and functionality, to simplify the migration of functionality, and to simplify the development of applications. At DoCoMo Labs USA, we understand that the middleware has to be designed to take into account the issues that are specific to handheld devices and that make them different from traditional servers and workstation computers. Examples of these issues are mobility, limited resources, fault tolerance, and security. DoCoMo Labs USA also understands that software running on handheld devices must be built in such a way that it can be dynamically modified and inspected without stopping its execution. Systems built according to this requirement are known as reflective systems. They allow inspecting of their internal state, reasoning about their execution, and introducing changes whenever required. Our goal is to provide an infrastructure to construct systems that can be fully assembled at runtime and that explicitly externalize their state, logic, and architecture. We refer to these systems as completely reconfigurable systems. 8 Multimedia Coding Technologies and Applications Minoru Etoh, Frank Bossen, Wai Chu, and Khosrow Lashkari 8.1 Introduction As the bandwidth provided by next-generation (XG) mobile networks will increase, the quality of media communication, such as audiovisual streaming, will improve. However, a huge bandwidth gap (by one or two orders of magnitude) always exists between wireless and wired networks, as explained in Chapter 1. This bandwidth gap demands that cod- ing technologies achieve compact representations of media data over wireless networks. Considering the heterogeneity of radio access networks, we cannot presume availability of high-bandwidth connectivity at all times. Figure 8.1 illustrates the importance of media coding technologies and radio access technologies. These are complementary and orthog- onal approaches for improving media quality over mobile networks. Thus, media coding technologies are essential even in the XG mobile network environment, as discussed in Chapter 1. Speech communication has been the dominant application in the first three generations of mobile networks. 8-kHz sampling has been used for telephony with the adaptive multirate (AMR) (3GPP 1999d) speech codec (encoder and decoder) that is used in 3G networks. The 8-kHz restriction ensures the interoperability with the legacy wired telephony network. If this restriction is removed and peer-to-peer communications with higher audio sampling is adopted, new media types, such as wideband speech and real-time audio, will become more widespread. Figure 8.2 illustrates existing speech and audio coding technologies with Next Generation Mobile Systems. EditedbyDr.M.Etoh 2005 John Wiley & Sons, Ltd 226 MULTIMEDIA CODING TECHNOLOGIES AND APPLICATIONS Figure 8.1 Essential coding technologies Figure 8.2 Speech and audio codecs with regard to bitrate regard to usage and bitrate, where adaptive multirate wideband (AMR-WB) (ITU-T 2002) is shown as an example of wideband speech communication, and MPEG-2 of broadcast and storage media. Given 44-kHz sampling and a new type of codec that is suitable for real-time communication, low-latency hi-fi telephony can be achieved and convey more realistic sounds between users. Video media requires a higher bandwidth in comparison with speech and audio. In the last decade, video compression technologies have evolved in the series of MPEG-1, MPEG-2, MPEG-4, and H.264, which will be discussed in the following sections. Given MULTIMEDIA CODING TECHNOLOGIES AND APPLICATIONS 227 Figure 8.3 Video codecs with regard to bitrate a bandwidth of several megabits per second (Mbps), these codecs can transmit broadcast- quality video. Because of the bandwidth gap (even in XG), however, it is important to have a codec that provides better coding efficiency. Figure 8.3 summarizes the typical existing codecs and the low-rate hi-fi video codec that is required by mobile applications. This chapter covers the technological progress of the last 10 years and the research directed toward more advanced coding technologies. Current technologies were designed to minimize implementation costs, such as the cost of memory, and also to be compatible with legacy hardware architectures. Moore’s Law, which states that computing power doubles every 18 months, has been an important factor in codec evolution. As a result of this law, there have been significant advances in technology in the 10 years since the adoption of MPEG-2. Future coding technologies will need to incorporate advances in signal processing local spectral information (LSI) technologies. Additional computational complexity is the principle driving codec evolution. This chapter also covers mobile applications enabled by the recent progress of coding technologies. These are the TV phone, multimedia messaging services already realized in 3G, and future media-streaming services. 8.2 Speech and Audio Coding Technologies In speech and audio coding, digitized speech or audio signals are represented with as few bits as possible, while maintaining a reasonable level of perceptual quality. This is accomplished by removing the redundancies and the irrelevancies from the signal. Although the objectives of speech and audio coding are similar, they have evolved along very different paths. Most speech coding standards are developed to handle narrowband speech, that is, digi- tized speech with a sampling frequency of 8 kHz. Narrowband speech provides toll quality suitable for general-purpose communication and is interoperable with legacy wired tele- phony networks. Recent trends focus on wideband speech, which has a sampling frequency of 16 kHz. Wideband speech (50–7000 Hz) provides better quality and improved intelligi- bility required by more-demanding applications, such as teleconferencing and multimedia 228 MULTIMEDIA CODING TECHNOLOGIES AND APPLICATIONS services. Modern speech codecs employ source-filter models to mimic the human sound production mechanism (glottis, mouth, and lips). The goal in audio coding is to provide a perceptually transparent reproduction, meaning that trained listeners (so-called golden ears) cannot distinguish the original source material from the compressed audio. The goal is not to faithfully reproduce the signal waveform or its spectrum but to reproduce the information that is relevant to human auditory perception. Modern audio codecs employ psychoacoustic principles to model human auditory perception. This section includes an overview of various standardized speech and audio codecs, an explanation of the relevant issues concerning the advancement of the field, and a description of the most-promising research directions. 8.2.1 Speech Coding Standards A large number of speech coding standards have been developed over the past three decades. Generally speaking, speech codecs can be divided into three broad categories: 1. Waveform codecs using pulse code modulation (PCM), differential PCM (DPCM), or adaptive DPCM (ADPCM). 2. Parametric codecs using linear prediction coding (LPC) or mixed excitation linear prediction (MELP). 3. Hybrid codecs using variations of the code-excited linear prediction (CELP) algorithm. This subsection describes the essence of these coding technologies, and the standards that are based on them. Figure 8.4 shows the landmark standards developed for speech coding. Figure 8.4 Evolution of speech coding standards MULTIMEDIA CODING TECHNOLOGIES AND APPLICATIONS 229 Waveform Codecs Waveform codecs attempt to preserve the shape of the signal waveform and were widely used in early digital communication systems. Their operational bitrate is relatively high, which is necessary to maintain acceptable quality. The fundamental scheme for waveform coding is PCM, which is a quantization process in which samples of the signals are quantized and represented using a fixed number of bits. This scheme has negligible complexity and delay, but a large number of bits is necessary to achieve good quality. Speech samples do not have uniform distribution, so it is advantageous to use nonuniform quantization. ITU-T G.711 (ITU-T 1988) is a nonuniform PCM standard recommended for encoding speech signals, where the nonlinear transfer characteristics of the quantizer are fully specified. It encodes narrowband speech at 64 kbps. Most speech samples are highly correlated with their neighbors, that is, the sample value at a given instance is similar to the near past and the near future. Therefore, it is possible to make predictions and remove redundancies, thereby achieving compression. DPCM and ADPCM use prediction, where the prediction error is quantized and transmitted instead of the sample itself. Figure 8.5 shows the block diagrams of a DPCM encoder and decoder. ITU- T G.726 is an ADPCM standard, and incorporates a pole-zero predictor. Four operational bitrates are specified: 40, 32, 24, and 16 kbps (ITU-T 1990). The main difference between DPCM and ADPCM is that the latter uses adaptation, where the parameters of the quantizer are adjusted according to the properties of the signal. A commonly adapted element is the x[n] i[n] – eˆ[n] x p [n] xˆ[n] i[n] xˆ[n] x p [n] Encoder (Quantizer) Predictor e[n] Decoder (Quantizer) Predictor Decoder (Quantizer) eˆ[n] Figure 8.5 DPCM encoder (top) and decoder (bottom). Reproduced by permission of John Wiley & Sons, Inc. 230 MULTIMEDIA CODING TECHNOLOGIES AND APPLICATIONS predictor, where changes to its parameters can greatly increase its effectiveness, leading to substantial improvement in performance. The previously described schemes are designed for narrowband signals. The ITU-T standardized a wideband codec known as G.722 (ITU-T 1986) in 1986. It uses subband coding, where the input signal is split into two bands and separately encoded using ADPCM. This codec can operate at bitrates of 48, 56, and 64 kbps and produces good quality for speech and general audio signals. G.722 operating at 64 kbps is often used as a reference for evaluating new codecs. Parametric Codecs In parametric codecs, a multiple-parameter model is used to generate speech signals. This type of codec makes no attempt to preserve the shape of the waveform, and quality of the synthetic speech is linked to the sophistication of the model. A very successful model is based on linear prediction (LP), where a time-varying filter is used. The coefficients of the filter are derived by an LP analysis procedure (Chu 2003). The FS-1015 linear prediction coding (LPC) algorithm developed in the early 1980s (Tremain 1982) relies on a simple model for speech production (Figure 8.6) derived from practical observations of the properties of speech signals. Speech signals may be clas- sified as voiced or unvoiced. Voiced signals possess a clear periodic structure in the time domain, while unvoiced signals are largely random. As a result, it is possible to use a two- state model to capture the dynamics of the underlying signal. The FS-1015 codec operates at 2.4 kbps, where the quality of the synthetic speech is considered low. The coefficients of the synthesis filter are recomputed within short time intervals, resulting in a time-varying filter. A major shortcoming of the LPC model is that misclassification of voiced and unvoiced sig- nals can create annoying artifacts in the synthetic speech; in fact, under many circumstances, the speech signal cannot be strictly classified. Thus, many speech coding standards devel- oped after FS-1015 avoid the two-state model to improve the naturalness of the synthetic speech. Pitch period Speech Voicing Gain Filter coefficients Impulse train generator White noise generator Voiced/ unvoiced switch Synthesis filter Figure 8.6 The LPC model of speech production. Reproduced by permission of John Wiley & Sons, Inc. MULTIMEDIA CODING TECHNOLOGIES AND APPLICATIONS 231 Impulse response Pitch period Voicing strengths Speech Period jitter Impulse train generator Pulse generation filter Pulse shaping filter White noise generator Noise shaping filter Synthesis filter Filter coefficients Gain Figure 8.7 The MELP model of speech production. Reproduced by permission of John Wiley & Sons, Inc. The MELP codec (McCree et al. 1997) emerged as an improvement to the basic LPC codec. In the MELP codec, many features were added to the speech production model (Figure 8.7), including subband mixture of voiced and unvoiced excitation, transmission of harmonic magnitudes for voiced signals, handling of transitions using aperiodic excitation, and additional filtering for signal enhancement. The MELP codec operates at the same 2.4-kbps bitrate as FS-1015. It incorporates many technological advances, such as vector quantization. Its quality is much better than that of the LPC codec because the strict signal classification is avoided and is replaced by mixing noise and periodic excitation to obtain a mixed excitation (Chu 2003). The harmonic vector-excitation codec (HVXC), which is part of the MPEG-4 stan- dard (Nishiguchi and Edler 2002), was designed for narrowband speech and operates at either 2 or 4 kbps. This codec also supports a variable bitrate mode and can operate at bitrates below 2 kbps. The HVXC codec is based on the principles of linear prediction, and like the MELP codec, transmits the spectral shape of the excitation for voiced frames. For unvoiced frames, it employs a mechanism similar to CELP to find the best excitation. Hybrid Codecs Hybrid codecs combine features of waveform codecs and parametric codecs. They use a model to capture the dynamics of the signal, and attempt to match the synthetic signal to the original signal in the time domain. The code-excited linear prediction (CELP) algorithm is the best representative of this family of codecs, and many standardized codecs are based on it. Among the core techniques of a CELP codec are the use of long-term and short- term linear prediction models for speech synthesis, and the incorporation of an excitation codebook, containing the code to excite the synthesis filters. Figure 8.8 shows the block 232 MULTIMEDIA CODING TECHNOLOGIES AND APPLICATIONS Input speech Synthetic speech Error minimization Excitation codebook Synthesis filter Spectral analysis Gain calculation Figure 8.8 Block diagram showing the key components of a CELP encoder. Reproduced by permission of John Wiley & Sons, Inc. diagram of a basic CELP encoder, where the excitation codebook is searched in a closed- loop fashion to locate the best excitation for the synthesis filter, with the coefficients of the synthesis filter found through an open-loop procedure. The key components of a CELP bitstream are the gain, which contains the power infor- mation of the signal; the filter coefficients, which contain the local spectral information; an index to the excitation codebook, which contains information related to the excitation waveform; and the parameters of the long-term predictors, such as a pitch period and an adaptive codebook gain. CELP codecs are best operated in the medium bitrate range of 5–15 kbps. They pro- vide higher performance than most low-bitrate parametric codecs because the phase of the signal is partially preserved through the encoding of the excitation waveform. This technique allows a much better reproduction of plosive sounds, where strong transients exist. Standardized CELP codecs for narrowband speech include the TIA IS54 vector-sum- excited linear prediction (VSELP) codec, the FS-1016 CELP codec, the ITU-T G.729 (ITU- T 1995) conjugate-structure algebraic CELP (ACELP) codec, and the AMR codec (3GPP 1999d). For wideband speech, the best representatives are the ITU-T G.722.2 AMR-WB codec (ITU-T 2002) and the MPEG-4 version of CELP (Nishiguchi and Edler 2002). Recent trends in CELP codec design have focused on the development of multimode codecs. They take advantage of the dynamic nature of the speech signal and adapt to the time-varying network conditions. In multimode codecs, one of several distinct coding modes is selected. There are two methods for choosing the coding modes: source con- trol, when it is based on the local properties of the input speech, and network control, when the switching obeys some external commands in response to network or channel conditions. An example of a source-controlled multimode codec is the TIA IS96 stan- dard (Chu 2003), which dynamically selects one of four data rates every 20 ms, depending on speech activity. The AMR and AMR-WB standards, on the other hand, are network controlled. The AMR standard is a family of eight codecs operating at 12.2, 10.2, 7.95, [...]... TECHNOLOGIES AND APPLICATIONS 2 47 is expected to cover a wide range of applications from mobile communications at 64 kbps to high-definition broadcasting at 10 Mbps With respect to mobile networks and 3G in particular, MPEG-4 Simple Profile and H.263 Baseline are two standardized codecs that have been deployed as of mid-2003 3GPP defines H.263 Baseline as a mandatory codec and MPEG-4 as an optional one Both standards... multiplex voice and image into one mobile communication channel and control messages exchanged in each communication phase, H.223 and H.245 are used On the basis of H.324 Annex C, the 3GPP Codec Working Group selected essential speech and video codecs and operation modes optimized for W-CDMA requirements, and prescribed the 3GPP standard 3G- 324M in December 1999 In particular, a codec optimal for 3G was selected... applications: mobile TV phone, video clip download, and multimedia messaging services over its 3G network Figure 8.18 summarizes the current mobile multimedia applications operated by DoCoMo as of early 2004 8.4.1 Mobile TV Phone With the rapid spread of mobile communication and the progress of standardization for 3G mobile networks, the ITU-T began studies on audiovisual terminals for mobile communication... the 3GPP open standard and has opened their operational specification to the public Figure 8.20 shows the history of the multimedia file format The 3GPP standard-based file format is now being used for both DoCoMo’s MMS and content download service 8.4.3 Future Trends Mobile applications evolve with the generations of mobile networks and compression technologies Through the increase in available bandwidth,... - Still picture 3GPP Standard DoCoMo’s Operation Basic MP4 - AMR - Video (H.263, MPEG-4) Mobile MP4 - Text - DMR - Progressive Download 3G Mobile Multimedia Figure 8.20 History of the MMS file format Interactive applications that use streaming services include on-demand and live information delivery applications Examples of on-demand applications are music, music video, and news-on-demand applications... the handset’s built-in cameras or downloaded from sites In this context, ISO/IEC and 3GPP standards have been adopted for Multimedia Messaging Services (MMS) The media types specified for MMS are text, AMR for speech, MPEG-4 AAC for audio, JPEG for still images, GIF and PNG for bitmap graphics, H.263 and MPEG-4 for video, and Scalable Vector Graphics (SVG) MMS specifications and usages are given in (3GPP... heterogeneity This approach, however, requires multiple codecs and multiple standards Scalability is a clear trend in speech coding and offers distinct advantages in this regard Narrowband and wideband AMR and selectable mode vocoder (SMV) are examples of scalable codecs MPEG-4 speech coding standards also support scalability (Nishiguchi and Edler 2002) Because of extra overhead, scalable codecs typically...MULTIMEDIA CODING TECHNOLOGIES AND APPLICATIONS 233 7. 40, 6 .70 , 5.90, 5.15, and 4 .75 kbps The selectable mode vocoder (SMV) (3GPP2 2001) is both network controlled and source controlled It is based on four codecs operating at 8.55, 4.0, 2.0, and 0.8 kbps and four network-controlled operating modes Depending on the selected mode, a different rate-determination... latest standard Table 8.2 Timeline of evolution video coding standards Year Body Standard Application Domain 1989 ITU-T H.261 p x 64 kbps videoconferencing 1991 ISO/IEC MPEG-1 Stored media (e.g., video CD) 19 97 ISO/IEC ITU-T ITU-T MPEG-2 H.262 H.263 1999 ISO/IEC MPEG-4 Mobile and Internet 2003 ITU-T ISO/IEC H.264 MPEG-4 AVC Mobile, Internet, broadcasting, HD-DVD 1994 Digital broadcasting and DVD Videoconferencing... The activities of this standardization body have culminated in a number of successful and popular coding standards The MPEG-1 audio standard was completed in 1992 MPEG-2 BC is a backward-compatible extension to MPEG-1 and was finalized in 1994 MPEG-2 AAC is a more efficient audio coding standard MPEG-4 Audio includes tools for general audio coding and was issued in 1999 These standards support audio encoding . designed for narrowband signals. The ITU-T standardized a wideband codec known as G .72 2 (ITU-T 1986) in 1986. It uses subband coding, where the input signal is split into two bands and separately. particular, the HVXC standard can be used to handle speech while the harmonic and individual lines plus noise (HILN) standard is used to handle music (Herre and Purn- hagen 2002; Nishiguchi and Edler 2002) illustrates existing speech and audio coding technologies with Next Generation Mobile Systems. EditedbyDr.M.Etoh 2005 John Wiley & Sons, Ltd 226 MULTIMEDIA CODING TECHNOLOGIES AND APPLICATIONS Figure