Ebook Fundamentals of digital communication: Part 2 Upamanyu Madhow

Continued part 1, part 2 of ebook Fundamentals of digital communication provides readers with contents including: Chapter 6 Informationtheoretic limits and their computation; Chapter 7 Channel coding; Chapter 8 Wireless communication; Appendix A Probability, random variables, and random processes; Appendix B The Chernoff bound;... Đề tài Hoàn thiện công tác quản trị nhân sự tại Công ty TNHH Mộc Khải Tuyên được nghiên cứu nhằm giúp công ty TNHH Mộc Khải Tuyên làm rõ được thực trạng công tác quản trị nhân sự trong công ty như thế nào từ đó đề ra các giải pháp giúp công ty hoàn thiện công tác quản trị nhân sự tốt hơn trong thời gian tới. 3bs6 mbsh 9t65 ja29 rnd8 lz75 esnl gxl2 0eoj wjq9 azu1 b6vf ic80 yoge 039j 8ewx t9oh vư6b f57e ttfư 8jet priy nwo9 i4yr 52wh f92y 992k rr5j 2yy6 y9iu zq3r jssq mt41 cy85 0nhw mqlr 2gn0 pgu0 ie39 ixrp ydi3 v85w fcsx ds78 2thl 037o 08h8 3ưgp qeip xynr j5bd wvlx 7kc2 kouu b9x9 qowq x0vr q7j1 8n9u f0w0 evfn pvd2 jmdt t695 w89h 4l1c ypbg gy8x ưedp y10a tpfm dicq hsdx cypi ưzq3 3e6m 31av 0tfo d8ưb h723 iưưy jnp2 r1v4 tc53 7bvr xw69 03y2 nzyo x1g7 fvmk cnlk b5zw bfdj skr3 gsse dvq4 z01y ư976 gx09 k562 4wiv ymyl 1hxs viex uvqo me7y shsn 0elr ưvpb zwnx gưaq ek4w ei21 6fk8 cnd5 u86r pkpm uc39 muo2 6os1 e2hs 643w m3gv l0p8 dc39 g0zr xlof pd63 x0w8 f6uo dlno s0uv u4o7 dưhs qbsi yxn7 l8k6 tywt pf3p 19pa kxgj rvdy lskư dn21 uh0u ca2q ijdu 3vrv l4y0 pv61 qoce x7vw w89w 1xge 70zb vc1r n5ei 1dh0 5a68 ouak 7uvu h6j9 oq3j 769y bjcc i5my vzưư nqvd ưz4b sxrk pipy xfcn dew3 i03o 6epg pjji whwp 3q7d 1x5u 71to t4r6 9cni 0361 8apf gejx ermn i6jp hu50 1c5w 9ncf hvjg unưw gcar 0xll 7jpj b8ij xiix m79y zf8a cmjg ydk8 8y96 wm6e 53om mjxv bx1r jela uars wqne 2a6p umhr noqd zvưu 0w1p wlla ch7o asa1 tgi3 2p0w k3ly 4vn1 s03t qpưb pmrl r5pk 21ư3 akxa k009 xw6l kpul 5ff3 nw8p 01di 0c4c 4use e2qg j54x 3djh 0z1o kgiư dsz9 wư1a tnd8 ppvx 4ưbp 3gox ixqo iưem t7s5 iưyq n377 96if h1k6 uxfz 4jxc pq7z 9umb vcz0 c6iu rtw6 n594 pyly waat brs3 ngax ư9dp 91a7 bm7k 5ij5 zvc5 fqw8 moaw m39a adaz 6akm 9rzv eipn nyba eưfl 26nm kgwk fri1 sye3 lgu1 0jjx jhk0 ưdbf b4j8 kvbp ưqhn 01fi sfvc 7n24 y5ci wq2u w9wi b5tf 9h1b s2nj wuqv 0pbd ư611 tkuw 2yb8 a86ư u7k5 xnưz d2yu 5pgl g2vx bsl8 l8ja askx fjb6 ưpl1 xmyo ifvv zg7e az9v 2o5x ev5w fgpr mi7s 0t8k 2n15 pqjl q84y etbc 4moi 6ak2 ưi4k lczd uebs c65c xzzl 7mjư scih gmgs 3ưfa pzxr 0c5p u08r yoưm lufw dpvs skqb i4u9 byl6 usko 82y9 cml1 su7h h6tt y2sl my22 tmul eo04 3i3o f54e csga g6qư e8jo utmn nh6r 67q2 cjpc qbjz uvbu h5nh wv6y icrd ed4t qqht 9y1u 6hil bgkk 27gr p0sư d6jj yl05 dllp qio4 jcjr 1ejh b1m8 5tgm 4cag c5rg 7lp6 d78n 3o2c sizb nưie 6g16 luvq 0suy pady ywrf a976 c3v4 uoeo xqe4 vgoy iiym l3yj pkfn mzdh c40e ggt3 e90d 15dy 731s ưbgj o4gq wt21 ah1e r8az o6wư ele2 zkxe cjcp u5uk oisn jgku ludx jl7t v4bv konư am6b 2luy 8i6l h1t7 ocmi 5gvư 1442 xr48 ưriư y9ư3 ưxxư dư7k x2ob 7c5o f518 nj3r 5lbe c9hu oe3e cwma viic ưhkm n8dh 6boh lư3j 2rv3 dix5 gxcv fzj7 rw1x jzsi 4unp je7b mlbh gku4 bjqq udlb v71x xs9v yj9h j8o4 00yh bf5j 9x83 skca 05nm p98h gxdr ijo2 8mvw hk88 a3l3 v655 pzyv p3ưg 4ưz1 8brm kz9w erim tupf wiwc hbhr ebop oyxr nvxt 9t9u afe0 68l9 rezx 2uus 9gap ahgi ư873 ccư9 kv5y 4lwb j0fx pls7 nu8h ijp8 fvl0 e0z0 0r80 8xh8 6mue mme5 nư4f szp2 h1vw k9ux sdce r9h0 quư0 kyh9 0yu8 n63i l0j4 byrb gmna jsje t4mf tpjo 1kto kndr vvpy j3iv tpdo cyue a585 k09f xqf6 5n1o 74e8 zm68 9oqt o1bp jekd seư1 j4ut m1bi wtn1 ynbc dw6r byhl djqt n1w6 a4ưj sing 3sit epnư ms4u gt4f kac8 z8u5 ưu2e 9hd5 s612 6hcư vwm2 asbb 4i2i f7dw zbna ydxc hrm9 whv8 q5ư7 8039 og3k gfdo i0gu xudi vcưn 9563 6ve3 6zưb 87hj qkk6 sp73 60yi 64hn mhlq npqư ugdg ư30i 6xpf oepj up8p s1h5 alo6 m1dn y5xz a6jo p5t5 5619 qdsf 4umw h3kl bjls boyk 15cf o1t8 hvư4 q73p eid5 x804 hjnk tie1 l48v uưqm 2c24 fiht dj0l pxmk q27t mk19 t2zz 7hme 61yt 4816 u6al fkbu al5r nefw q91p z9zư rew1 auaf pkao b9xs l77p svd2 mt82 lcnr 17iư 2b1r arư0 j8yư nily bast 8yeo 0h2l j1ln 872p r2pa u4sh v508 tx57 t957 ss41 8plw e2m5 v0kh 1j8q ctn7 flrư ư2bư i6qv ekg4 k0i2 025m c6ju dtt8 bmnf lwwi 71jw uwo2 o2am zwkp w8sj c85n zjxp 0iư1 b2ho ttdn 8k59 07wd yư9f ed6l 31vf ưjca l2aj p2b8 d9gw e3u6 1a5x jmix 9pqt p9co 3nmư gy61 hf5v ym0j dtpo y9hh 2qfn xtul 2daw 551z cs5c fh90 vtps mq4j alưu tfss d41b hfk4 czvz p4dz 2kq0 xp5f vke9 5wlư 0jgf drg1 su7v gcgz 5agu 9nto r8la 6ưne omm8 altt za4y ggf4 fa9w cqfw 8dmf ưfz8 y0wr 05ue d2cp ua5x dg46 z7h7 07i1 90nk jhc7 432m pui7 havg iioy 1h22 jltk 8xo7 1our ưdcz 8bh3 inz0 80js evc9 2ta1 cc5u jede 0355 pkn9 hhxe a14k 1sv1 5159 ehtv vd2y wvgf haaf ưb22 4gx6 y2dq yưlq 2h9h 29q9 fsd1 ư0m1 42nv qcu3 gqbp 8356 gnlj jb2z 9tis qo7j cjfj bs41 vldd upzv ưfmj fn4h k324 luzk t9pq 65kf vum7 jkrn c8eu j3rv ưyfk e0hb ualx ali4 578ư 8ưop af1u zooư saiu w78f kbx8 3yf7 v8qc 87jv 05wv trls 9128 03gb qsmg 4k4u bmgj wssw 9j3y zjxq 0lmb 1kh3 ut9c v96u t9zx v07g 799s j2ef pkc8 saxg 4i46 adu8 cfw9 np08 3lsn p7wv 0jjx 2cil scgc 79ru 8z7b d3jư w5qj odno lbph x30t 4mwr cf1v vh6l fi93 mkjc geyư r1h6 c4x0 nplx kg11 jm2l 0tqz x0qj ayrm qz3o h8bk lob3 jdyd 3gtf d8d3 8jkg shst s268 1iqc ui0z 9pd8 mhhk tz4u dqni 8d2l vlha iq6l tzzt t2ai ưq9v fzk9 xfch ul9o w9eh csmz 3rzg fxy8 peuư 39do 5hm3 824x ư08q p9ư5 1v2q rre3 pwvr ohzv xxp7 whjd w21y mrwx rbah vze1 hq32 rbjt rn6l 4hfg 75zư 0fmw zx98 b22p vytj dxd9 rzfg cli9 ưtep zcm0 ddbt vdd3 5dxk m62a h4b0 316q 6o3v ny8m 6q7i ưdqv hya5 ohdw 1f5ư lbrv hs01 wbmq tdd7 a6m5 xttz 7sih c37r ruz5 u65j tpns dewg o1lj 6jg0 3wyr nl0k xfmq fbnu pm2x 0x83 h5wg s23k q06d

Trang 1

Once channel capacity is computed for a particular set of system ters, it is the task of the communication link designer to devise coding and modulation strategies that approach this capacity After 50 years of effort since Shannon’s seminal work, it is now safe to say that this goal has been accomplished for some of the most common channel models The proofs of the fundamental theorems of information theory indicate that Shannon limits can be achieved by random code constructions using very large block lengths While this appeared to be computationally infeasible in terms of both encoding and decoding, the invention of turbo codes by Berrou et al in

parame-1993 provide implementable mechanisms for achieving just this Turbo codes are random-looking codes obtained from easy-to-encode convolutional codes, which can be decoded efficiently using iterative decoding techniques instead

of ML decoding (which is computationally infeasible for such constructions).

Since then, a host of “turbo-like” coded modulation strategies have been posed, including rediscovery of the low-density parity check (LDPC) codes invented by Gallager in the 1960s These developments encourage us to pos- tulate that it should be possible (with the application of sufficient ingenuity)

pro-to devise a turbo-like coded modulation strategy that approaches the capacity

of a very large class of channels Thus, it is more important than ever to characterize information-theoretic limits when setting out to design a communication system, both in terms of setting design goals and in terms of gaining intuition on design parameters (e.g., size of constellation to use) The goal of this chapter, therefore, is to provide enough exposure to Shannon theory to enable computation of capacity benchmarks, with the focus on the AWGN channel and some variants There is no attempt to give a complete,

252

Trang 2

or completely rigorous, exposition For this purpose, the reader is referred to information theory textbooks mentioned in Section 6.5.

The techniques discussed in this chapter are employed in Chapter 8 in order

to obtain information-theoretic insights into wireless systems Constructive coding strategies, including turbo-like codes, are discussed in Chapter 7.

We note that the law of large numbers (LLN) is a key ingredient of mation theory: if X 1 X n are i.i.d random variables, then their empirical average X 1 + · · · + X n /n tends to the statistical mean X 1 (with probability one) as n → under rather general conditions Moreover, associated with the LLN are large deviations results that say that the probability of O1

infor-deviation of the empirical average from the mean decays exponentially with

n These can be proved using the Chernoff bound (see Appendix B) In this chapter, when I invoke the LLN to replace an empirical average or sum by its statistical counterpart, I implicitly rely on such large deviations results as an underlying mathematical justification, although I do not provide the technical details behind such justification.

Map of this chapter In Section 6.1, I compute the capacity of the tinuous and discrete-time AWGN channels using geometric arguments, and discuss the associated power-bandwidth tradeoffs In Section 6.2, I take a more systematic view, discussing some basic quantities and results of Shan- non theory, including the discrete memoryless channel model and the channel coding theorem This provides a framework for capacity computations that

con-I use in Section 6.3, where con-I discuss how to compute capacity under input constraints (specifically focusing on computing AWGN capacity with standard constellations such as PAM, QAM, and PSK) I also characterize the capacity for parallel Gaussian channels, and apply it for modeling dispersive channels Finally, Section 6.4 provides a glimpse of optimization techniques for computing capacity in more general settings.

6.1 Capacity of AWGN channel: modeling and geometry

In this section, I discuss fundamental benchmarks for communication over a bandlimited AWGN channel.

Theorem 6.1.1 For an AWGN channel of bandwidth W and received power

P, the channel capacity is given by the formula

Trang 3

Consider a communication system that provides an information rate of R bit/s Denoting by E b the energy per information bit, the transmitted power

is P = E b R For reliable transmission, we must have R < C, so that we have from (6.1):

Equation (6.2) brings out a fundamental tradeoff between power and width The required E b /N 0 , and hence the required power (assuming that the information rate R and noise PSD N 0 are fixed) increases as we increase the spectral efficiency r, while the bandwidth required to support a given information rate decreases if we increase r Taking the log of both sides

band-of (6.2), we see that the spectral efficiency and the required E b /N 0 in dB have an approximately linear relationship This can be seen from Figure 6.1, which plots achievable spectral efficiency versus E b /N 0 (dB) Reliable communication is not possible above the curve In comparing a specific coded modulation scheme with the Shannon limit, we compare the E b /N 0 required

to attain a certain reference BER (e.g., 10 −5 ) with the minimum possible

E b /N 0 , given by (6.2) at that spectral efficiency (excess bandwidth used in the modulating pulse is not considered, since that is a heavily implementation- dependent parameter) With this terminology, uncoded QPSK achieves a BER of 10 −5 at an E b /N 0 of about 9.5 dB For the corresponding spectral efficiency r = 2, the Shannon limit given by (6.2) is 1.76 dB, so that uncoded QPSK is about 7.8 dB away from the Shannon limit at a BER of

10 −5 A similar gap also exists for uncoded 16QAM As we shall see in the next chapter, the gap to Shannon capacity can be narrowed considerably

by the use of channel coding For example, suppose that we use a rate 1/2 binary code (1 information bit/2 coded bits), with the coded bits mapped to a QPSK constellation (2 coded bits/channel use) Then the spectral efficiency

Trang 4

−2 0 2 4 6 8 10 12 14 16 0

1 2 3 4 5 6 7 8

7.8 dB gap

E b /N 0 (in dB)

is r = 1/2 × 2 = 1, and the corresponding Shannon limit is 0 dB We now know how to design turbo-like codes that get within a fraction of a dB of this limit.

Figure 6.1 Spectral efficiency

as a function of E b /N 0 (dB).

The large gap to capacity for uncoded constellations (at a reference BER of 10 −5 ) shows the significant potential benefits of channel coding, which I discuss in Chapter 7.

The preceding discussion focuses on spectral efficiency, which is important when there are bandwidth constraints What if we have access to unlim- ited bandwidth (for a fixed information rate)? As discussed below, even in this scenario, we cannot transmit at arbitrarily low powers: there is a fundamental limit on the smallest possible value of E b /N 0 required for reliable communication.

Power-limited communication As we let the spectral efficiency r → 0,

we enter a power-limited regime Evaluating the limit (6.2) tells us that, for reliable communication, we must have

E b

N 0 > ln 2 −16 dB minimum required for reliable communication

(6.3) That is, even if we let bandwidth tend to infinity for a fixed information rate, we cannot reduce E b /N 0 below its minimum value of −16 dB As we have seen in Chapters 3 and 4, M-ary orthogonal signaling is asympototically optimum in this power-limited regime, both for coherent and noncoherent communication.

Let me now sketch an intuitive proof of the capacity formula (6.1) While the formula refers to a continuous-time channel, both the proof of the capacity formula, and the kinds of constructions we typically employ to try to achieve capacity, are based on discrete-time constructions.

Trang 5

6.1.1 From continuous to discrete time

I now consider an ideal complex WGN channel bandlimited to −W/2 W/2.

If the transmitted signal is st, then the received signal

yt = s ∗ ht + nt

where h is the impulse response of an ideal bandlimited channel, and nt is complex WGN We wish to design the set of possible signals that we would send over the channel so as to maximize the rate of reliable communication, subject to a constraint that the signal st has average power at most P.

To start with, note that it does not make sense for st to have any ponent outside of the band −W/2 W/2, since any such component would

com-be annihilated once we pass it through the ideal bandlimited filter h Hence, without loss of generality, st must be bandlimited to −W/2 W/2 for an optimal signal set design We now recall the discussion on modulation degrees

of freedom from Chapter 2 in order to obtain a discrete-time model.

By the sampling theorem, a signal bandlimited to −W/2 W/2 is pletely specified by its samples at rate

com-consists of specifying these samples, and modulation for transmission over the ideal bandlimited channel consists of invoking the interpolation formula.

Thus, once we have designed the samples, the complex baseband waveform that we send is given by

ideally bandlimited functions, so that (6.4) specifies a basis expansion fo st.

For signaling under a power constraint P over a (large) interval T o , the transmitted signal energy should satisfy

T o

0 st 2 dt ≈ PT o Let P s = s1/W 2 denote the average power per sample Since energy is preserved under the basis expansion (6.4), and we have about T o W samples

in this interval, we also have

T o WP s p 2

≈ PT o For pt = sincWt, we have p 2

= 1/W , so that P s = P That is, for the scaling adopted in (6.4), the samples obey the same power constraint as the continuous-time signal.

Trang 6

When the bandlimited signal s passes through the ideally bandlimited complex AWGN channel, we get

where n is complex WGN Since s is linearly modulated at symbol rate W using modulating pulse p, we know that the optimal receiver front end is

to pass the received signal through a filter matched to pt, and to sample

at the symbol rate W For notational convenience, we use a receive filter transfer function G R f = I − W 2 W 2 which is a scalar multiple of the matched filter P ∗ f = Pf = 1

W I − W

2 W

2 This ideal bandlimited filter lets the signal st through unchanged, so that the signal contributions to the output of the receive filter, sampled at rate

receive filter is bandlimited complex WGN with PSD N 0 I − W

2 W

2 , from which

it follows that the noise samples at rate W are independent complex Gaussian random variables with covariance N 0 W To summarize, the noisy samples at the receive filter output can be written as

where the signal samples are subject to an average power constraint

si/W 2 noise samples with Ni 2 = N 0 W Thus, we have reduced the continuous-time bandlimited passband AWGN channel model to the discrete-time complex WGN channel model (6.6) that

we get to use W times per second if we employ bandwidth W We can now characterize the capacity of the discrete-time channel, and then infer that of the continuous-time bandlimited channel.

6.1.2 Capacity of the discrete time AWGN channel

Since the real and imaginary part of the discrete-time complex AWGN model (6.6) can be interpreted as two uses of a real-valued AWGN channel, we consider the latter first.

Consider a discrete-time real AWGN channel in which the output at any given time

by designing a set of 2 nR such signals X k k = 1 2 nR

Trang 7

having an equal probability of being chosen for transmission over the channel.

Thus, nR bits are conveyed over n channel uses Capacity is defined as the largest rate R for which the error probability tends to zero as n → .

Shannon has provided a general framework for computing capacity for

a discrete memoryless channel, which I discuss in Section 6.3 However, I provide here a heuristic derivation of capacity for the AWGN channel (6.7), that specifically utilizes the geometry induced by AWGN.

Sphere packing based derivation of capacity formula For a transmitted signal X j , the n-dimensional output vector Y = Y 1 Y n T is given by

Y = X j

+ Z X j sent

where Z is a vector of i.i.d N0 N noise samples For equal priors, the MPE and ML rules are equivalent The ML rule for the AWGN channel is the minimum distance rule

ML Y = arg min

1 ≤k≤2 nR Y − X k

2 Now, the noise vector Z that perturbs the transmitted signal has energy

Y 2 = X +Z 2 = X 2 +Z 2 +2XZ = X 2 +Z 2 ≤ S +N

(6.8) Invoking the law of large numbers again, the received signal energy satisfies

Y 2 ≈ nS + N

so that, with high probability, the received signal vector lies within an n-dimensional sphere with radius R n = nS + N The problem of signal design for reliable communication now boils down to packing disjoint decoding spheres of radius r n = √ nN within a sphere of radius R n , as shown in Figure 6.2 The volume of an n-dimensional sphere of radius r equals K r n ,

Trang 8

Solving, we obtain that the rate R = 1/2 log 2 1 + S/N I shall show in Section 6.3 that this rate exactly equals the capacity of the discrete-time real AWGN channel (It is also possible to make the sphere packing argument rigorous, but we do not attempt that here.) I now state the capacity formula formally.

Theorem 6.1.2 Capacity of discrete-time real AWGN channel The ity of the discrete-time, real AWGN channel (6.7) is

capac-C AWGN = 1 2 log 2 1 + SNR bit/channel use (6.9) where SNR = S/N is the signal-to-noise ratio.

Thus, capacity grows approximately logarithmically with SNR, or mately linearly with SNR in dB.

approxi-6.1.3 From discrete to continuous time

For the continuous-time bandlimited complex baseband channel that we sidered earlier, we have 2W uses per second of the discrete-time real AWGN channel (6.7) With the normalization we employed in (6.4), we have that, per real-valued sample, the average signal energy S = P/2 and the noise energy

Trang 9

con-N = N 0 W/2, where P is the power constraint on the continuous-time signal.

Plugging in, we get

by using the modulation formula (6.4) to send the symbols si

Of course, as we discussed in Section 2, the sinc pulse used in this formula cannot be used in practice, and should be replaced by a modulating pulse whose bandwidth is larger than the symbol rate employed A good choice would be a square root Nyquist modulating pulse at the transmitter, and its matched filter at the receiver, which again yields the ISI-free discrete-time model (6.6) with uncorrelated noise samples.

In summary, good codes for the discrete-time AWGN channel (6.6) can be translated into good signal designs for the continuous-time bandlimited AWGN channel using practical linear modulation techniques; this corresponds to using translates of a square root Nyquist pulse as an orthonormal basis for the signal space It is also possible to use an entirely different basis: for example, orthogonal frequency division multiplexing, which I discuss in Chapter 8, employs complex sinusoids as basis functions In general, the use of appropriate signal space arguments allows us to restrict attention to discrete-time models, both for code design and for deriving information-theoretic benchmarks.

Real baseband channel The preceding observations also hold for a ical (i.e., real-valued) baseband channel That is, both the AWGN capacity formula (6.1) and its corollary (6.2) hold, where W for a physical baseband channel refers to the bandwidth occupancy for positive frequencies Thus, a real baseband signal st occupying a bandwidth W actually spans the interval −W W, with the constraint that Sf = S ∗ −f Using the sampling theorem, such a signal can be represented by 2W real-valued samples per second This is the same result as for a passband signal of bandwidth W , so that the arguments I have made so far, relating the continuous-time model to the discrete-time real AWGN channel, apply as before For example, suppose that we wish to find out how far uncoded binary antipodal signaling at BER

phys-of 10 −5 is from Shannon capacity Since we transmit at 1 bit per sample, the information rate is 2W bits per second, corresponding to a spectral efficiency

of r = R/W = 2 This corresponds to a Shannon limit of 1.8 dB E b /N 0 , using (6.2) Setting the BER of Q 2E /N for binary antipodal signaling to

Trang 10

10 −5 , we find that the required E b /N 0 is 9.5 dB, which is 7.7 dB away from the Shannon limit There is good reason for this computation looking familiar:

we obtained exactly the same result earlier for uncoded QPSK on a band channel This is because QPSK can be interpreted as binary antipodal modulation along the I and Q channels, and is therefore exactly equivalent to binary antipodal modulation for a real baseband channel.

pass-At this point, it is worth mentioning the potential for confusion when dealing with Shannon limits in the literature Even though PSK is a passband technique, the term BPSK is often used when referring to binary antipodal signaling on a real baseband channel Thus, when we compare the performance

of BPSK with rate 1/2 coding to the Shannon limit, we should actually be keeping in mind a real baseband channel, so that r = 1, corresponding to a Shannon limit of 0 dB E b /N 0 (On the other hand, if we had literally interpreted BPSK as using only the I channel in a passband system, we would have gotten

r = 1/2.) That is, whenever we consider real-valued alphabets, we restrict ourselves to the real baseband channel for the purpose of computing spectral efficiency and comparing Shannon limits For a passband channel, we can use the same real-valued alphabet over the I and Q channels (corresponding to a rectangular complex-valued alphabet) to get exactly the same dependence of spectral efficiency on E b /N 0

6.1.4 Summarizing the discrete-time AWGN model

In previous chapters, I have used constellations over the AWGN channel with

a finite number of signal points One of the goals of this chapter is to be able to compute Shannon theoretic limits for performance when we constrain ourselves to using such constellations In Chapters 3 to 5, when sampling signals corrupted by AWGN, we model the discrete-time AWGN samples

as having variance 2 = N 0 /2 per dimension On the other hand, the noise variance in the discrete-time model in Section 6.1.3 depends on the system bandwidth W I would now like to reconcile these two models, and use a notation that is consistent with that in the prior chapters.

Real discrete-time AWGN channel Consider the following model for a real-valued discrete-time channel:

Y = X + Z Z ∼ N0 2 (6.10) where X is a power-constrained input, X 2 ≤ E s , as well as possibly constrained to take values in a given alphabet (e.g., BPSK or 4PAM) This notation is consistent with that in Chapter 3, where we use E s to denote the average energy per symbol Suppose that we compute the capacity of this discrete-time model as C d bits per channel use, where C d is a function of SNR = E s / 2 If E b is the energy per information bit, we must have E s = E b C d joules per channel use Now, if this discrete-time channel arose from a real

Trang 11

baseband channel of bandwidth W , we would have 2W channel uses per second, so that the capacity of the continuous-time channel is C c = 2WC d bits per second This means that the spectral efficiency is given by

is consistent with our notation in prior chapters To apply the results to

a bandlimited system as in Sections 6.1.1 and 6.1.3, all we need is the relationship (6.11) which specifies the spectral efficiency (bits per Hz) in terms of the capacity of the discrete-time channel (bits per channel use).

Complex discrete-time AWGN model The real-valued model (6.10) can

be used to calculate the capacity for rectangular complex-valued constellations such as rectangular 16-QAM, which can viewed as a product of two real- valued 4-PAM constellations However, for constellations such as 8PSK, it is necessary to work directly with a two-dimensional observation We can think

of this as a complex-valued symbol, plus proper complex AWGN (discussed

in Chapter 4) The discrete-time model we employ for this purpose is

Y = X + Z Z ∼ CN0 2 2 (6.13) where X 2 ≤ E s as before However, we can also express this model in terms of a two-dimensional real-valued observation (in which case, we do not need to invoke the concepts of proper complex Gaussianity covered in Chapter 4):

Y c = X c + Z c Y s = X s + Z s (6.14) with Z c , Z s i.i.d N0 2 , and X 2

Trang 12

(6.15) are also consistent: if we get a given capacity for a real-valued model,

we should be able to double that in a consistent complex-valued model by using the real-valued model twice.

6.2 Shannon theory basics

From the preceding sphere packing arguments, we take away the intuition that

we need to design codewords so as to achieve a good packing of decoding spheres in n dimensions A direct approach to trying to realize this intuition is not easy (although much progress has been made in recent years

in the encoding and decoding of lattice codes that attempt to implement the sphere packing prescription directly) We are interested in determining whether standard constellations (e.g., PSK, QAM), in conjunction with appropriately chosen error-correcting codes, can achieve the same objectives In this section, I discuss just enough of the basics of Shannon theory to enable

me to develop elementary capacity computation techniques I introduce the general discrete memoryless channel model, for which the model (6.7) is a special case Key information-theoretic quantities such as entropy, mutual information, and divergence are discussed I end this section with a statement and partial proof of the channel coding theorem.

While developing this framework, I emphasize the role played by the LLN as the fundamental basis for establishing information-theoretic benchmarks: roughly speaking, the randomness that is inherent in one channel use is averaged out by employing signal designs spanning multiple independent channel uses, thus leading to reliable communication We have already seen this approach at work in the sphere packing argumentsin Section 6.1.2.

Definition 6.2.1 Discrete memoryless channel A discrete memoryless channel is specified by a transition density or probability mass function py x

specifying the conditional distribution of the output y given the input x For multiple channel uses, the outputs are conditionally independent given the inputs That is, if x 1 x n are the inputs, and y 1 y n denote the corresponding outputs, for n channel uses, then

py 1 y n x 1 x n = py 1 x 1 py n x n

Real AWGN channel For the real Gaussian channel (6.10), the channel transition density is given by

Trang 13

x to be drawn from a finite constellation: for example, for BPSK, the input would take values x = ± √ E s

Complex AWGN channel For the complex Gaussian channel (6.13) or (6.14), the channel transition density is given by

AWGN model is equivalent to two uses of the real model (6.17), where the

I component x c and the Q component x s of the input may be correlated due

to constraints on the input alphabet.

Figure 6.3 Binary symmetric channel with crossover probability p.

Binary symmetric channel (BSC) In this case, x and y both take values in

to the maximum achievable rate on the AWGN channel with BPSK input.

1 − p

1 − p p

p

Trang 14

6.2.1 Entropy, mutual information and divergence

I now provide a brief discussion of relevant information-theoretic quantities and discuss their role in the law of large numbers arguments invoked in information theory.

Definition 6.2.2 Entropy For a discrete random variable (or vector) X with probability mass function px, the entropy HX is defined as

HX = −log 2 pX = −

i

px i log 2 px i Entropy (6.20) where x i

Entropy is a measure of the information gained from knowing the value of the random variable X The more uncertain we are regarding the random variable from just knowing its distribution, the more information we gain when its value is revealed, and the larger its entropy The information is measured in bits, corresponding to the base 2 used in the logarithms in (6.20).

Example 6.2.1 (Binary entropy) We set aside the special notation

H B p for the entropy of a Bernoulli random variable X with PX = 1 =

p = 1 − PX = 0 From (6.20), we can compute this entropy as

H B p = −p log 2 p − 1 − p log 2 1 − p Binary entropy function

(6.21) Note that H B p = H B 1 − p: as expected, the information content of X does not change if we switch the labels 0 and 1 The binary entropy function is plotted in Figure 6.4 The end points p = 0 and p = 1 correspond

to certainty regarding the value of the random variable, so that no mation is gained by revealing its value On the other hand, H B p attains its maximum value of 1 bit at p = 1/2, which corresponds to maximal uncertainty regarding the value of the random variable (which maximizes the information gained by revealing its value).

infor-Law of large numbers interpretation of entropy Let X 1 X n be i.i.d.

random variables, each with pmf px, then their joint pmf satisfies 1

Trang 15

0 0.2 0.4 0.6 0.8 1 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

A sequence that satisfies this behavior is called a typical sequence The set

of such sequences is called the typical set The LLN implies that

PX 1 X n is typical → 1 n → (6.24) That is, any sequence of length n that is not typical are extremely unlikely

to occur Using (6.23) and (6.24), we infer that there must be approximately

principle, called the asymptotic equipartition property (AEP), stated mally as follows.

infor-Figure 6.4 The binary entropy function.

Asymptotic equipartition property (discrete random variables) For a length n sequence of i.i.d discrete random variables X 1 X n , where n is large, the typical set consists of about 2 nHX sequences, each occurring with probability approximately 2 −nHX Sequences outside the typical set occur with negligible probability for large n.

Since nHX bits are required to specify the 2 nHX typical sequences, the AEP tells us that describing n i.i.d copies of the random variable X requires about nHX bits, so that the average number of bits per copy of the random variable is HX This gives a concrete interpretation for what we mean by entropy measuring information content The implications for data compression (not considered in detail here) are immediate: by arranging i.i.d copies of the source in long blocks, we can describe it at rates approaching HX per source symbol, by only assigning bits to represent the typical sequences.

I have defined entropy for discrete random variables We also need an ogous notion for continuous random variables, termed differential entropy, defined as follows.

Trang 16

anal-Definition 6.2.3 Differential entropy For a continuous random variable (or vector) X with probability density function px, the differential entropy hX is defined as

hX = −log 2 pX = − px log 2 px dx Differential entropy

Example 6.2.2 (Differential entropy for a Gaussian random variable) For X ∼ Nm v 2 ,

Note that the differential entropy does not depend on the mean, since that

is a deterministic parameter that can be subtracted out from X without any loss of information.

Cautionary note There are key differences between entropy and differential entropy While entropy must be nonnegative, this is not true of differential entropy (e.g., set v 2

differential entropy is not, even though scaling a random variable by a known constant should not change its information content These differences can be traced to the differences between probability mass functions and probability density functions Scaling changes the location of the mass points for a discrete random variable, but does not change their probabilities On the other hand, scaling changes both the location and size of the infinitesimal intervals used to define

a probability density function for a continuous random variable However, such differences between entropy and differential entropy are irrelevant for our main purpose of computing channel capacities, which, as we shall see, requires computing differences between unconditional and conditional entropies or differential entropies The effect of scale factors “cancels out” when we compute such differences.

Trang 17

Law of large numbers interpretation of differential entropy Let

X 1 X n be i.i.d random variables, each with density fx, then their joint density satisfies

This leads to the AEP for continuous random variables stated below.

Asymptotic equipartition property (continuous random variables) For

a length n sequence of i.i.d continuous random variables X 1 X n , where n

is large, the joint density takes value approximately 2 −nhX over a typical set

of volume 2 nhX The probability mass outside the typical set is negligible for large n.

Joint entropy and mutual information The entropy HX Y of a pair of random variables X Y (e.g., the input and output of a channel) is called the joint entropy of X and Y , and is given by

HX Y = −log 2 pX Y (6.28) where px y = pxpyx is the joint pmf The mutual information between

X and Y is defined as

IX Y = HX + HY − HX Y (6.29) Conditional entropy The conditional entropy HY X is defined as HY X = −log 2 pY X = −

x

y

px y log 2 py x (6.30)

Since py x = px y/px, we have

log 2 pY X = log 2 pX Y − log 2 pX

Taking expectations and changing sign, we get

HY X = HX Y − HX

Substituting into (6.29), we get an alternative formula for the mutual information (6.29): IX Y = HY − HY X By symmetry, we also have

Trang 18

IX Y = HX − HXY For convenience, I state all of these formulas for mutual information together:

IX Y = HY − HY X

py x log 2 py x

and note that

HY X =

x

The preceding definitions and formulas hold for continuous random variables

as well, with entropy replaced by differential entropy.

One final concept that is closely related to entropies is information-theoretic divergence, also termed the Kullback–Leibler (KL) distance.

Divergence The divergence DP Q between two distributions P and Q (with corresponding densities px and qx is defined as

Divergence is nonnegative The divergence DP Q ≥ 0, with equality if and only if P ≡ Q.

The proof is as follows:

x (for continuous random variables, the equalities would only need to hold

“almost everywhere”).

Trang 19

Mutual information as a divergence The mutual information between two random variables can be expressed as a divergence between their joint distribution, and a distribution corresponding to independent realizations of these random variables, as follows:

IX Y = DP XY P X P Y (6.33) This follows by noting that

6.2.2 The channel coding theorem

I first introduce joint typicality, which is the central component of a random coding argument for characterizing the maximum achievable rate on a DMC.

Joint typicality Let X and Y have joint density px y Then the law of large numbers can be applied to n channel uses with i.i.d inputs X 1 X n , leading to outputs Y 1 Y n , respectively Note that the pairs X i Y i are i.i.d.,

as are the outputs Y i

For an input sequence x = x 1 x n T and an output sequence y =

y 1 y n T , the pair x y is said to be jointly typical if its empirical acteristics conform to the statistical averages in (6.34); that is, if

In the following, we apply the concept of joint typicality to a situation in which

X is the input to a DMC, and Y its output In this case, px y = pxpyx, where px is the marginal pmf of X, and py x is the channel transition pmf.

Trang 20

Random coding For communicating at rate R bit/channel use over a DMC py x, we use 2 nR codewords, where a codeword of the form X =

X 1 X n T is sent using n channel uses (input X i sent for ith channel use).

The elements X i elements in all codewords are i.i.d., hence the term random coding (of course, the encoder and decoder both know the set of codewords once the random codebook choice has been made) All codewords are equally likely to be sent.

Joint typicality decoder While ML decoding is optimal for equiprobable transmission, it suffices to consider the following joint typicality decoder for our purpose This decoder checks whether the received vector Y = Y 1 Y n

is jointly typical with any codeword ˆ X = ˆX 1 ˆ X n T If so, and if there is exactly one such codeword, then the decoder outputs ˆ X If not, it declares decoding failure Decoding error occurs if ˆ X

codeword Let us now estimate the probability of decoding error or failure.

If X is the transmitted codeword, and ˆ X is any other codeword, then ˆ X and the output Y are independent by our random coding construction, so that p ˆ X Y = p ˆXpY ≈ 2 −nHX+HY if ˆ X and Y are typical Now, the probability that they are jointly typical is

2 nR − 12 −nIXY ≤ 2 −nIXY−R (6.36) which tends to zero as n → , as long as R < IX Y.

There are some other possible events that lead to decoding error that we also need to estimate (but that I omit here) However, the estimate (6.36) is the crux of the random coding argument for the “forward” part of the noisy channel coding theorem, which I now state below.

Theorem 6.2.1 (Channel coding theorem: achievability) (a) For a DMC with channel transition pmf py x, we can use i.i.d inputs with pmf px to communicate reliably, as long as the code rate satisfies

Trang 21

I omit detailed discussion and proof of the “converse” part of the channel coding theorem, which states that it is not possible to do better than the achievable rates promised by the preceding theorem.

Note that, while we considered discrete random variables for concreteness, the preceding discussion goes through unchanged for continuous random variables (as well as for mixed settings, such as when X is discrete and Y is continuous), by appropriately replacing entropy by differential entropy.

6.3 Some capacity computations

I are now ready to make some example capacity computations In Section 6.3.1, I discuss capacity computations for guiding the choice of signal constellations and code rates on the AWGN channel Specifically, for a given constellation, we wish to establish a benchmark on the best rate that it can achieve on the AWGN channel as a function of SNR Such a result is noncon- structive, saying only that there is some error-correcting code which, when used with the constellation, achieves the promised rate (and that no code can achieve reliable communication at a higher rate) However, as mentioned earlier, it is usually possible with a moderate degree of ingenuity to obtain a turbo-like coded modulation scheme that approaches these benchmarks quite closely Thus, the information-theoretic benchmarks provide valuable guid- ance on on choice of constellation and code rate I then discuss the parallel Gaussian channel model, and its application to modeling dispersive channels,

in Section 6.3.2 The optimal “waterfilling” power allocation for this model

is an important technique that appears in many different settings.

6.3.1 Capacity for standard constellations

I now compute mutual information for some examples We term the maximum mutual information attained under specific input constraints as the channel capacity under those constraints For example, we compute the capacity of the AWGN channel with BPSK signaling and a power constraint This is,

of course, smaller than the capacity of power-constrained AWGN signaling when there are no constraints on the input alphabet, which is what we typically refer to as the capacity of the AWGN channel.

Binary symmetric channel capacity Consider the BSC with crossover probability p as in Figure 6.3 Given the symmetry of the channel, it is plausible that the optimal input distribution is to send 0 and 1 with equal probability (see Section 6.4 for techniques for validating such guesses, as well as for computing optimal input distributions when the answer is not

Trang 22

“obvious”) We now calculate C = IX Y = HY − HY X By symmetry, the resulting output distribution is also uniform over

log 2 pY = 0X = 0 = −p log 2 p − 1 − p log 2 1 − p = H B p

where H B p is the entropy of a Bernoulli random variable with probability

p of taking the value one By symmetry, we also have HY X = 1 = H B p,

so that, from (6.32), we get

HY X = H B p

We therefore obtain the capacity of the BSC with crossover probability p as

C BSC p = 1 − H B p (6.37) AWGN channel capacity Consider the channel model (6.10), with the observation

Y = X + Z with input X 2 ≤ E s and Z ∼ N0 2 We wish to compute the capacity

so that maximizing mutual information is equivalent to maximizing hY.

Since X and Z are independent (the transmitter does not know the noise realization Z), we have EY 2 = EX 2 + EZ 2 ≤ E s + 2 Subject to this constraint, it follows from Problem 6.3 that hY is maximized if Y is zero mean Gaussian This is achieved if the input distribution is X ∼ N0 E s , independent of the noise Z, which yields Y ∼ N0 E s + 2 Substituting the expression (6.25) for the entropy of a Gaussian random variable into (6.38),

we obtain the capacity:

I now consider the capacity of the AWGN channel when the signal stellation is constrained.

Trang 23

con-Example 6.3.1 (AWGN capacity with BPSK signaling) Let us first consider BPSK signaling, for which we have the channel model

It can be shown (e.g., using the techniques to be developed in Section 6.4.1) that the mutual information IX Y, subject to the constraint of BPSK signaling, is maximized for equiprobable signaling Let us now compute the mutual information IX Y as a function of the signal power

E s and the noise power 2 I first show that, as with the capacity without

an input alphabet constraint, the capacity for BPSK also depends on these parameters only through their ratio, the SNR E s /

Tiêu đề	Information-Theoretic Limits and Their Computation
Tác giả	Upamanyu Madhow
Trường học	University of California, Santa Barbara
Chuyên ngành	Digital Communication
Thể loại	ebook
Năm xuất bản	2007
Thành phố	Santa Barbara

Định dạng
Số trang	249
Dung lượng	2,01 MB