FUNDAMENTAL CONCEPTS FOR MEMORYLESS SOURCES
7.3 RELATIONSHIPS WITH CHANNEL CODING
There are several relationships between the channel coding theory ofChaps. 2 through 6 and rate distortion theory. Some of these appear simply because the same mathematicaltoolsare applied in boththeories,while others are ofamore fundamental nature.
Suppose we no longerassume anoiseless channel and consider both source and channel block coding as shown in Fig. 7.6. Assume the discretememoryless source emits asymbol onceevery Ts seconds and thatwe havea sourceencoder and decoder for a source block code, ^, ofblock length N and rate Rnats per symbol of duration Ts seconds. For each NTS seconds, a minimum distortion codeword in ^ = {vj, v2, ..., VM} is chosen to represent the source sequence u e tftN and thecodewordindexmissentoverthechannel.Hence,onceeveryNTS
seconds, one ofM = eNR messages is sent over the channel. We assume that the memorylesschannelisused onceevery Tcsecondsandhas achannelcapacity ofC
natsper channel useofduration Tcseconds.Thechannel encoderand decoderuse a channel block code, #, of block length N and rate R nats per channel use where10
TCN= TSN
TCR = TSR
10Itisnotstrictlynecessaryforthechannel blocklength tosatisfy (7.3.1)sincethechannelencoder can regardsequences ofsourceencoder outputs aschannel input symbols; thatis,Ncould be any multiple ofitsvalue as givenby(7.3.1).
me (1,2,.. .,M}
R nats everyTsseconds Source encoder
Channelsource
me {1, 2,. ..,M}
R nats everyTcseconds
Pr {}=Pr(m*m]
R nats everyTcseconds
Sourcedecoder j
Channel decoder
Channeldestination
Figure7.6 Combinedsourceandchannelcoding.
Here let S =
{m=/=m} be the event that a channel message error occurs and let
S =
{m = m]denote itscomplement. Theaveragedistortion attainedwhenusing source code $ and channel code # is
E{dN(u, vj| , tf, <?} PrK) (7.3.2)
where the expectation E{ } is over both source output random variables and noisy channel outputs.When nochannel errors occur, we have
4v(u, vj = dN(u, vj =
d(u\3) = min ^(u, v) From (7.2.1), we have
(7.3.3) Substituting this bound and
Pr{^} < 1 (7.3.4) in (7.3.2), we have
(7.3.5)
From channel coding theory (Theorem 3.2.1), we know that there exists a channel code # such that the probability of a channel message error Pr{S] is bounded by
(7.3.6)
where
T.R
>
Similarlyfrom Theorem7.2.1,we knowthatthereexistsasourcecode ^suchthat
<D 4-
where E(R,D) > for R >
R(D). Applying these codes to the combined source and channel coding scheme ofFig. 7.6, substituting (7.3.6) and (7.3.7) in (7.3.5), yields the average distortion given by the followingtheorem.
Theorem 7.3.1 For the combined source and channel coding scheme of Fig. 7.6 discussed above, there exists a source code M ofrate R and block lengthN andachannel code#ofrate and block lengthNsatisfying(7.3.1)
such that the averagedistortion is bounded by
} + d e-(T ITt)NE(TRITt)
(7.3.8)
406 SOURCECODING FORDIGITAL COMMUNICATION where
E(R, D) > and
for R satisfying
R(D) <R<C
(7.3.9)
where
C = |c (7.3.10)
isthe channel capacity in nats per Ts seconds.
As long as the rate distortion function is less than the channel capacity,
R(D) < C, we can achieve average distortion arbitrarily close to D. When
R(D)> C, this is impossible, as established by the following.
Theorem7.3.2 Itisimpossible toreproducethesourcein Theorem7.3.1with
fidelityDat the receivingend ofany discretememorylesschannelof capacity
C < R(D) nats per sourceletter.
PROOF The proofofthis converse follows directly from the data-processing theorem (Theorem 1.2.1)and theconverse source coding theorem (Theorem
7.2.3) (seeProb. 7.5).
The above converse theorem is true regardless ofwhat typeofencodersand decodersare used. Infact, theyneed notbe separatedasshownin Fig. 7.6,nordo they need to be block coding schemes for Theorem 7.3.2 to be true. Since
Theorem 7.3.1 istruefor theblock sourceand channel codingscheme ofFig. 7.6,
we see that in the limit of large block lengths there is no loss of generality in assuming a complete separation of source coding and channel coding. From a practical viewpoint, this separation is desirable since it allows channel encoders and decoders to be designed independently of the actual source and user. The source encoder and decoder in effect adapts the source and user to any channel coding system whichhassufficientcapacity.Asblock length increases, thesource encoder outputsbecomeequallylikely(asymptoticequipartitionproperty)sothat, in the limit of largeblock lengths,allsource encoder outputsdepend only onthe rate of the encoder andnot on the detailed nature of the source.
From Fig. 7.6, we see a natural duality between source and channel block coding.The sourceencoder performsanoperationsimilar tothechannel decoder, while the channel encoder is similar tothesource decoder. Generally,inchannel coding, the channel decoder is the more complex device, while in source coding thesource encoder isthe more complexdevice. We shall see inSec. 7.4thatthis duality also holds for trellis codingsystems. Finally,we note that, although the sourceencoderremoves redundancy fromsource sequencesandchannel encoding addsredundancy,theseoperationsaredoneforquitedifferentreasons.Thesource
encoder takes advantage of the statistical regularity of long sequences of the source output in order to represent the source outputs with a limited rate R(D).
The channel encoder adds redundancy so as to achieve immunity to channel
errors.
We next show an interesting channel coding interpretation for the source coding theorems ofSec. 7.2. Forthe general discretememorylesssource, represen tation alphabet, and distortion measure defined earlier, consider 3>D =
{P(v\u}:
D(P) <
D] for some fidelity D. For any P e^D, define the channel transition probability for a discrete memoryless channel withinputalphabet y"and output alphabet ^ as
This is sometimes referred to as the"backwardtest channel."Nowconsiderany source code% ={vj, v2, ..., VM} ofrateR andblocklengthN.Wecan regard{v , v !,..., VM}as achannelcode11 for theabove backward test channel as shown in
Fig. 7.7. Assume that the codewords are equally likely so that the maximum
probability of correct detection,denoted Pc(v ,v1? ..., VM), wouldbe achievedby theusualmaximum likelihood decoder.But suppose weuse asuboptimum chan
nel decoder which uses the decision rule, for given channel output u e WN choose ve {v , YJ, ..., VM} which minimizes dv(u, v) (7.3.12) Then for thissuboptimumdecoder, the probability ofacorrect decision,denoted
Pc(v , v j,..., VM), is upper-bounded by
M, i f--i I 11\~* m/ mn ~~11\~~* rn/ vfHis sent
<P
C(V , ?!,..., VM)
< e-N(E(P,p)-pR]
-l<p<0 (7.3.13)
where the last inequality followsfromthestrong converse tothecoding theorem (Theorem 3.9.1).We nowuse(7.3.13) to show why in the source coding theorem
1Thevectorv plays the sameroleasthedummyvectorv intheproofofLemma7.2.1.
.\ )
Figure?J Backwardtestchannel.
408 SOURCECODINGFORDIGITAL COMMUNICATION
(Theorem 7.2.1, also see Lemma 7.2.1) the source coding exponent, E(R, D\ is
essentially the exponent in thestrong converse to the coding theorem.
Weare primarily interestedin theterm Pr{dN(u,v
)< minm^ dN(u, vm)|
v is sent} which may or may not be larger than Pc(v , vt, ..., VM). However, ifwe average (7.3.13) over theensemble ofcodewords {v , vls ..., VM} where allcom
ponents are chosen independently according to (P(v): v e 1f\ we have12
Pr\dN(u, \ )< min dN(u,\m) \Q is sent
^cK)> Vi, ..., VM)
< e-NiEo(P,P)-pR]
_i<p<o (7.3.14)
which isexactly Lemma 7.2.1. Then, as in Sec. 7.2, for Pe0>D we have average distortion
<D +d Pr
|4v(u,
v )< min dN(u, vm) v is sent
<D + d e-NlE(p>p)-<>R]
(7.3.15)
where
max [E (p, P) -
pR] > for R>
/(P)
Here we see that the source coding theorem can be derived directly from the strong conversetothecodingtheorem duetoArimoto[1973]byapplyingittothe backward test channel corresponding to any P e0>Das shown in Fig. 7.7. Since the strong converseto the codingtheorem results in an exponent that isdual to the ensemble average error exponent,the source coding exponent is dual to the ensemble average errorexponent.
Perhaps the least direct relationship between channel and source coding theories is the relationship between the low-rate expurgated error bounds of channel coding theory andthenatural rate distortion function associatedwiththe Bhattacharyya distance measure of the channel. In particular,suppose we havea
DMC with input alphabet $T, output alphabet <&, and transition conditional probabilities {p(y\x):y e ^, x E %}. For any twochannel inputlettersx,x e #",
we have the Bhattacharyya distancedefined [see (2.3.15)] as
d(x, x) = -In X Jp(y\x)p(y\x ) (7.3.16)
y
and we suppose that the channel input letters have a probability distribution (g(x): xe
#"}.Alternatively,for a source with alphabet^ =3C, probability distri
bution (q(x):xe#*},representationalphabet^ %, andtheBhattacharyyadis tancein(7.3.16)as adistortion measure,we havearate distortionfunctionwhich
We again use theoverbartodenotethecode ensembleaverage.Symmetrygives the equalityhere.
wedenoteas R(D; q). Thisleadsus to define the natural ratedistortionfunctionfor the Bhattacharyya distance (7.3.16)as
R(D) = max R(D
q) (7.3.17)
q
To show the relationship between R(D) and theexpurgated exponentforthe
DMC,let usconsider theBSCwith crossoverprobabilityp.Here # = i
~ =
{0, 1}
and the distortion measure is
Thus letting a = In
^/4p(l p),weseethat theBhattacharyyadistanceispro portional to the Hammingdistance. It iseasy to show (Sec. 7.6) that
= max
=
ln2-jr[-J (7.3.19)
and the corresponding source isthe Binary Symmetric Source (BSS).
Recallfrom Sec. 3.4 [see (3.4.8)] that, bythe expurgated bound forthe BSC,
there exists a block code #ofblock length N and rate R such that
pE <e~
NE (R)
(7.3.20)
where D = ex(K)satisfiesR = R(D),and R(D)isgiven by(7.3.19). Hence, wesee that the naturalratedistortionfunction for the BSCyields the expurgated expo nent as the distortion level.
We canalsoprovetheGilbert bounddiscussed inSec. 3.9byusingtheabove relationship with rate distortion theory. Let
d(N, R) = max
<U() (7.3.21)
<f
where
<U() = min rf
w(x, x) (7.3.22)
and where the maximizationisover allcodesofblock length N andrate R.Next
let #* be a code of block length N and rate R that achieves the maximum minimum distance with the fewest codeword pairs that have the minimum dis tanced(N, R). Hence
d(N, R) = d
min(tf*)
>
d(\|tf*) for all \e N (7.3.23)
where
d(\\V*)= mindjv(x, x)
410 SOURCECODING FORDIGITAL COMMUNICATION
This inequality follows from the fact that if there exists an x* e&N such that d(\*\^*)> dmin(^*), then by interchanging x* with a codeword in ^* that achieves theminimumdistancewhen paired with anothercodeword,therewould
result anew code with fewer pairs ofcodewords that achieve the minimum dis tance.Thiscontradicts the definition of#*. With (7.3.23), we cannow provethe Gilbert bound.
Theorem7.3.3: Gilbertbound
d(N, R)>D
where
R = R(D) = In 2- jel-\
(7.3.24)
\a/
and DH = D/OL is the Hammingdistance.
PROOF #* defined above is a code of rate R which has average distortion
)satisfying
= I*vW<W*)
= d(N, R) (7.3.25)
where (7.3.23)isusedin thisinequality.Here weconsider#*asa source block code. The converse source coding theorem (Theorem 7.2.3) states that any source code ^* with distortion
d(<g*)must have
R > R(d(V*)) (7.3.26)
SinceD is given by (7.3.24), we must have R(D) = R>
R(d(<#*)).Then since R(D) is astrictly decreasing function ofD on < D< a/2, we have
d(N, R) >
>D
The results for the BSC generalize to all DMCs, when we use the Bhatta-
charyya distance, iffor the parameter s such that D = Ds, the matrix [esd(x x>)
] is
positivedefinite(seeJelinek [19686] and Lesh[1976]).Thispositivedefinitecondi tion holds for all s < in most channels of interest. This shows that, for an arbitrary DMC, the Bhattacharyya distance is the natural generalization to the
Hammingdistancefor binary codes used overtheBSC, anda generalized Gilbert
bound analogous to Theorem 7.3.3 can be found (see Probs. 7.8 and 7.9).