IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 58, NO 3, MARCH 2010 1347 A Limited Feedback Joint Precoding for Amplify-and-Forward Relaying Yongming Huang, Luxi Yang, Member, IEEE, Mats Bengtsson, Senior Member, IEEE, and Björn Ottersten, Fellow, IEEE Abstract—This paper deals with the practical precoding design for a dual hop downlink with multiple-input multiple-output (MIMO) amplify-and-forward relaying First, assuming that full channel state information (CSI) of the two hop channels is available, a suboptimal dual hop joint precoding scheme, i.e., precoding at both the base station and relay station, is investigated Based on its structure, a scheme of limited feedback joint precoding using joint codebooks is then proposed, which uses a distributed codeword selection to concurrently choose two joint precoders such that the feedback delay is considerably decreased Finally, the joint codebook design for the limited feedback joint precoding system is analyzed, and results reveal that independent codebook designs at the base station and relay station using the conventional Grassmannian subspace packing method is able to guarantee that the overall performance of the dual hop joint precoding scheme improves with the size of each of the two codebooks Simulation results show that the proposed dual hop joint precoding system using distributed codeword selection scheme exhibits a rate or BER performance close to the one using the optimal centralized codeword selection scheme, while having lower computational complexity and shorter feedback delay Index Terms—Amplify-and-forward relaying, dual hop, Grassmannian codebook, joint precoding, limited feedback, multipleinput multiple-output I INTRODUCTION T HE introduction of relaying technology in cellular networks shows large promise to increase coverage and system capacity at a low cost and is therefore considered in Manuscript received November 23, 2008; accepted September 09, 2009 First published November 06, 2009; current version published February 10, 2010 This work was supported in part by the National Basic Research Program of China by Grant 2007CB310603, the National Natural Science Foundation of China by Grants 60902012 and 60672093, the National High Technology Project of China by Grant 2007AA01Z262, Ph.D Programs Foundation of the Ministry of Education of China under Grant 20090092120013, the European Research Council under the European Community’s Seventh Framework Programme (FP7/20072013)/ERC Grant agreement no 228044, and by the Huawei Technologies Corporation The associate editor coordinating the review of this manuscript and approving it for publication was Dr Shahram Shahbazpanahi Y Huang is with the School of Information Science and Engineering, Southeast University, Nanjing 210096, China He is also with the ACCESS Linnaeus Center, KTH Signal Processing Lab, Royal Institute of Technology, SE-100 44 Stockholm, Sweden (e-mail: huangym@seu.edu.cn) L Yang is with the School of Information Science and Engineering, Southeast University, Nanjing 210096, China (e-mail: lxyang@seu.edu.cn) M Bengtsson is with ACCESS Linnaeus Center, KTH Signal Processing Lab, Royal Institute of Technology, SE-100 44 Stockholm, Sweden (e-mail: mats bengtsson@ee.kth.se) B Ottersten is with ACCESS Linnaeus Center, KTH Signal Processing Lab, Royal Institute of Technology, SE-100 44 Stockholm, Sweden He is also with the securityandtrust.lu, University of Luxembourg (e-mail: bjorn.ottersten@ee kth.se) Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org Digital Object Identifier 10.1109/TSP.2009.2036061 IMT-Advanced standardization work such as 3GPP LTE-Advanced and IEEE 802.16m The same holds for Multiple-Input Multiple-Output (MIMO) technology [1]–[7] and its application in multiuser environments [8]–[14] As for the combination of MIMO and relaying technology, most previous studies focus on the information theoretic limits for multi-antenna relay channels with different protocols Capacity bounds of relaying channels in a single MIMO relay network have been developed in [15], where a regenerative MIMO relay is considered For the multiple MIMO relay network, an asymptotical quantitative capacity result is presented in [16], where distributive diversity is achieved through cooperation among all the nonregenerative relays available in the network This paper focuses on practical signalling design for a dual hop transmission with MIMO relay Although the use of regenerative relays employing decode-and-forward (DF) shows advantages over nonregenerative relays using amplify-and-forward (AF) in many scenarios, it requires much higher delay tolerance and may cause security problems, thus here we concentrate on the AF MIMO relaying strategy For dual hop transmission with a single MIMO AF relay station, the optimal linear transceiver design at the relay-destination link has been developed [17], [18], assuming that the channel state information (CSI) of both the source-relay and relay-destination links is available at the relay station It is revealed that such a dual hop transmission can be transformed into several simultaneous data streams transmitted over orthogonal subchannels In the case of multiple AF relay stations, a relay selection scheme is presented in [19] to exploit the additional diversity offered by the multiple relay stations available in the network, where the preferred relay station is chosen as a function of CSI to implement a dual hop transmission Moreover, assuming that the CSI of all the links is available, a quasi-optimal joint design of linear transceivers at both the source-relay and the relay-destination links is developed in [20] and [21], which achieves very good performance while requiring high computational complexity Note that the above dual hop transmit schemes all require full CSI of both two hop channels and are unfortunately infeasible in practical frequency division duplex (FDD) systems, though they provide considerable performance gains To overcome this problem, a limited feedback beamforming scheme for MIMO AF relaying was proposed in [22], which employs Grassmannian codebook to reduce the feedback overhead It can even be extended to the case where the second order statistics of channel vectors are used instead of the limited instantaneous channel knowledge However, this scheme is only limited in the beamforming case and its extension to the precoding case (multiple simultaneous data streams) is nontrivial, which usually results in a rate performance loss especially when all the nodes 1053-587X/$26.00 © 2010 IEEE 1348 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 58, NO 3, MARCH 2010 are equipped with multiple antennas, due to the fact that the multiplexing gain offered by MIMO channels can not be fully exploited In this paper we aim to design a practical dual hop transmit scheme which can fully exploit the multiplexing gains offered by multiple antennas More specifically, we propose a limited feedback joint precoding scheme using the criterion of optimizing the system rate or the BER performance, where the reduction of both feedback overhead and feedback delay will be fully considered The main contributions are listed as follows: 1) We first present a CSI based suboptimal joint precoding scheme for a dual hop downlink with AF, where the overall dual hop MIMO channels can be effectively transformed into several orthogonal subchannels by using the optimal pairing between the eigenmodes of the dual MIMO channels Based on this, we then propose a codebook based limited feedback joint precoding scheme, where a distributed codeword selection (CS) scheme is further proposed based on the newly derived bounds for the capacity and the mean square error (MSE) sum of a dual hop MIMO transmission with a linear minimum mean square error (MMSE) receiver, such that the feedback burden and feedback delay are both greatly reduced 2) Furthermore, we investigate the codebook design for the proposed limited feedback joint precoding scheme, and disclose that if the conventional method of Grassmannian subspace packing is separately employed to construct the codebooks at the base station and relay station, the overall performance of the dual hop transmit scheme can be guaranteed to improve with the size of each of the two codebooks The rest of this paper is organized as follows In the next section we introduce the system model for the dual hop joint precoding In Section III we investigate the expression of the optimal joint precoders based on full CSI, and provide a suboptimal joint precoding scheme which can reduce to a limited feedback scheme In Section IV we first present a codebook based joint precoding system using a centralized codeword selection scheme, and then propose a distributed codeword selection scheme to reduce computational complexity and feedback delay In Section V we analyze the design criterion of the joint codebooks used in the dual hop precoding system Simulation results are presented in Section VI and conclusions are drawn in Section VII II SYSTEM MODEL We consider a dual hop downlink model which consists of a base station and a relay station transmitting through two time anslots We assume that the base station is equipped with tennas, the relay station is equipped with antennas and the user antennas As depicted in Fig 1, terminal is equipped with during the first slot, the base station employs linear precoding to transmit simultaneous data streams, i.e., a data vector , to the relay station Without loss of generality, we assume , with denoting the expectation operator The received baseband signal at the relay station is written as (1) Fig The signal model for the dual hop joint precoding system where denotes the precoding matrix at the base station, without loss of generality, we assume with being the trace operator, denotes the first hop channel matrix between the base station and the relay denotes the total transmit power at the base station station, denotes a white Gaussian noise vector with zero mean and and variance Keeping in mind that a multiuser downlink can be transformed into several single-user downlinks by employing multiple access techniques such as TDMA and OFDMA, here we concentrate on the single-user dual hop downlink Moreover, we focus on relay deployments intended for coverage expansion, where the direct link between the base station and the user terminal can be neglected due to path loss or severe shadowing To succeed a downlink communication between the base station and the user terminal, during the second slot the relay station will forward its received signal using a linear that has to be designed With precoding matrix at the relay station, should the transmit power constraint satisfy that (2) The received baseband signal at the user terminal during this time slot is written as (3) where denotes the second hop channel matrix bedenotes a tween the relay station and the user terminal, and white Gaussian noise vector with zero mean and variance Note that in the above system model we can normalize the variand , and have the effects of large scale ances of both and The fading incorporated into the noise variances of key point of the above dual hop joint precoding system lies in and , which commonly rethe design of two precoders quires channel information feedback in FDD systems Also, the number of simultaneous data streams should be carefully determined It is well known that a MIMO channel transmit antennas and receive antennas can be with orthogonal subtransformed into a maximum of channels via singular value decomposition (SVD) The simuldata streams over ortaneous transmission of thogonal subchannels can fully utilize the multiplexing gain and is thereby capacity-approaching, while the scheme of always transmitting a single data stream in general cannot achieve the HUANG et al.: A LIMITED FEEDBACK JOINT PRECODING 1349 potential rate offered by MIMO channels, due to the fact that the multiplexing gain cannot be fully exploited in this case This result can be easily extended to the dual hop MIMO transmission Considering that the overall performance of the dual hop downlink is dominated by the worse one of the two hops, it is reasonable to choose the number of simultaneous data streams in our if possible, instead of always system equal to using a single data stream regardless of antenna configuration, such that the overall rate performance can be optimized , , , and are unitary matrices, where and are diagonal matrices with their elements and , respectively Obvibeing the singular values of and (and ously, the ordering of the singular values in , the corresponding ordering of the singular vectors in , 2) influences the specific decomposition expressions Here we first assume an arbitrary ordering and leave its optimization to be solved later By substituting (6) in (4) the MSE matrix can be rewritten as III JOINT PRECODING WITH FULL CSI This section concentrates on the design of two joint precoders assuming that full channel state information of the two hops is available In difference to the previous related work which aims at the optimal performance by using an iterative approach, we are more interested in the suboptimal scheme which has a simple structure and can provide some insight on the design of a limited feedback joint precoding scheme We consider an MMSE receiver at the user terminal, as shown in [17], [18], the MSE matrix for the dual hop joint precoding can be written as (4), shown at the top of the next page (7) The diagonalization of can be obtained by (8) (9) where denotes the submatrix formed by the first columns , and are two diagonal matrices with nonnegative of and , elements denoted as , , and as respectively We partition the matrices (10) , and all belong to , , , and By substituting (8)–(10) in (7), the MSE matrix can be simplified as , where (4) (11) and as express and , respectively, the achieved sum rate can be easily derived as Then, The sum rate achieved by an MMSE receiver is upper bounded by the instantaneous capacity , which can be expressed as [17], [23] if we further (12) (5) denotes the th diagonal element of , dewhere notes of the determinant of , the factor 0.5 is due to the two channel uses which are needed by a dual hop downlink, and will be omitted henceforth for convenience Obviously, the equality is diagonal, which means that the capacity in (5) holds when is achieved by an MMSE receiver in this case Therefore, the and should first satisfy the condition that the design of and MSE matrix is diagonalized [19] Let the SVD of be (6) It is shown that with the above joint precoding, the overall dual hop channel can be transformed into orthogonal subchannels, with their channel gains each represented by the product of a and , while the diagonal matrices pair of eigenmodes and can be viewed as the power allocation for the joint precoding Since does not influence the sum rate, it should be set to zero to avoid wasting power The resulting precoding matrix at the relay station is (13) where and denote the submatrices formed by the first columns of and , respectively Aiming to maximize the sum rate of the dual hop transmission, we need to optimize the 1350 power allocation matrices optimization problem: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 58, NO 3, MARCH 2010 and by solving the following (14) where the two constraints are obtained from the power constraints at the base station and the relay station Specifically, the first constrain is obtained by substituting (8) in , while the second constraint is obtained by substituting (8) and (9) in (2) We would like to note here that this optimization should be done over the optimal ordering of the singular and , since different ordering will values at the SVD of Defining new notations give different values of , , , and , by replacing the notations in (14) with the newly defined notations, the above optimization problem can be simplified as (15) It is clear from the above steps that the ordering of singular and influences both the sum rate values at the SVD of and the specific expression of the optimal joint precoders Thus, the joint optimal ordering of singular values at the SVD of and needs to be addressed It is seen from (12) that only singular values of each hop, i.e., eigenmodes of each hop, affect the sum rate Therefore, the problem reduces to the optimal selection of active eigenmodes from each hop followed by the optimal pairing of active eigenmodes between the two hops Since the sum rate expressed in (12) monotonically inand the eigenmode , creases with both the eigenmode eigenmodes from each the scheme of selecting the largest hop will give a maximum sum rate Moreover, it is found from (15) that the eigenmode pairing problem is equivalent to the subchannel pairing problem of the dual hop MIMO-OFDM systems in [20] The results in [20] showed that it is optimal to pair the active eigenmodes of the first hop ordered in with the active eigenmodes of the second hop or, which means that the dered in and optimal joint precoders should be given by the SVD of both having its singular values arranged in a nonincreasing order For notation simplicity, henceforth the SVD expressions and refer to a nonincreasing ordering of singular of values It should be noted that although (8) and (13) provide a simple expression for the optimal joint precoders, the closed-form and solution for the included power allocation matrices are difficult to obtain Hammerström et al [20] showed that the optimization problem in (15) cannot be exactly solved but its quasi-optimal solution can be obtained using an iterative method, and the optimal power allocation schemes at both the base station and relay station are similar to the waterfilling scheme in point-to-point MIMO systems Since it is well known that an uniform power allocation (UPA) in general only suffers from slight performance loss compared to the optimal waterfilling scheme, while having lower cost and reduced feedback burden in FDD systems, we will use UPA to form a suboptimal joint precoding scheme Next we will show that such a UPA based dual hop joint precoding scheme can reduce to a practical limited feedback joint precoding scheme IV LIMITED FEEDBACK PRECODING By employing UPA, it is seen from (8), (13) that the joint and can be precoders with full channel knowledge of simplified as (16) (17) where is a common scaling to fulfill the transmit power constraint at the relay station Since it is reasonable to assume that is available at the relay station and available at the user terminal, the above joint precoding solution requires the to the base station and to the relay stafeedback of tion In order to reduce the feedback burden, we use two codeand , such that, similar to the precoding books to quantize for point-to-point MIMO systems, only the indices of the preferred codewords are required to be fed back to the base station and relay station, respectively However, the extension of point-to-point precoding to a dual hop transmission is nontrivial and the following problems need to be addressed and depend on and , 1) Though the optimal respectively, the codebook based choice of the precoder at the base station or the relay station is in general a function and In practical FDD systems, however, of both only the user terminal may know the channel of both two hops without feedback If both two precoders are selected by the user, it will suffer from a severe feedback delay due to the fact that the communication between the base station and the user terminal has to be forwarded by the relay station Therefore, the precoder selection and feedback scheme should be carefully designed to reduce the feedback delay 2) The criterion for precoding codebook design has been widely studied in point-to-point MIMO communication systems However, it is an open problem whether these developed codebook design criteria can be directly employed in the dual hop joint precoding systems In order to address the first problem, we first present a centralized codeword selection scheme which provides the optimal performance but a high feedback delay Then, we propose a suboptimal distributed codeword selection scheme where feedback delay and complexity are both greatly reduced A Centralized Codeword Selection We employ precoding according to (16) and (17) and assume and have been designed and dethat two codebooks for and , respectively In order to maximize the canoted as HUANG et al.: A LIMITED FEEDBACK JOINT PRECODING pacity expressed in (5), the codeword selection for can be written as 1351 and and terminal needs), the distributed codeword selection for should be merely based on and , respectively, such that the feedback overhead and feedback delay can be considerably reduced To this end, a new objective function, either from the capacity or the error rate perspective, should be designed In this section we will derive bounds for the capacity and the MSE-trace, and then use them as the objective functions with its SVD expression, the MSE matrix By replacing in (4) can be simplified as (18) Alternatively, considering that the minimization of the trace of MSE matrix means to some degree the optimization of the error rate performance of an MMSE receiver, an MSE-trace selection scheme aiming to minimize the error rate may be employed and is expressed as (20) Based on this, the capacity of the dual hop transmission can be lower bounded by (19) (21) Obviously, the codeword selection either from the sum rate and , or the error rate perspective is a function of both which requires the selection operator to know full CSI of both two hops, and thereby is called a centralized codeword selection scheme Due to the fact that each calculation of the objective function includes one or two matrix inversions, this centralized selection scheme requires a high computational complexity Moreover, since full knowledge of the two hop channels may only be available at the user terminal without feedback in and practical FDD systems, the codeword selection for should be both conducted by the user Unfortunately, the feedfrom the user terminal to the base back of selection result for station has to be forwarded by the relay station, which results in a high delay where , , , are the eigenvalues of the Hermitian matrix and arranged in a nonincreasing order For a proof, refer to Appendix A Note that this capacity lower bound increases with both and , namely, the lower bound increases if is increased, for any value of , or increases if is increased, for any value of Since and merely depend on and respectively, the and following distributed codeword selection scheme for , will maximize the lower bound of the capacity , B Distributed Codeword Selection In order to reduce the feedback latency, we propose a distributed codeword selection scheme where the codeword selecand can be concurrently conducted by the relay tions for station and the user terminal, respectively Since in practical can be available at the relay station without systems only can be easily available at the user terfeedback, while only should be fed forward by the relay station if the user minal ( (22) In order that the proposed distributed codeword selection scheme can minimize the error rate of the dual hop transmission, we derive two upper bounds for the MSE trace Both and , and can decrease with two decoupled functions of 1352 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 58, NO 3, MARCH 2010 be utilized as the codeword selection criteria Based on (20), upper bounds of the MSE trace can be expressed as (23) (24) See Appendix B for proofs Obviously, minimization of the upper bound in (23) is equivalent to the maximization of the lower bound in (21) Thus, the distributed codeword selection scheme of (22) also works from the perspective of minimizing the error rate In addition, since the upper bound in (24) is and , an alternative formed by a sum of two functions of distributed codeword selection scheme, to optimize the error rate performance, is given as (25) V CODEBOOK DESIGN CRITERIA We have derived codeword selection schemes for the dual hop joint precoding system, and it is important that the codebook and are designed specifically for the chosen sepair of lection schemes Love et al [5] have shown that the criterion of maximizing the minimum Grassmannian subspace distance between any pair of codewords is quasi-optimal for point-to-point precoding systems In dual hop precoding systems using the proposed distributed codeword selection scheme, our following and using the analysis shows that a separate design for conventional Grassmannian subspace packing method is able to guarantee that the overall performance increases with the size of each of the two codebooks To define a notion of an optimal codebook, we need a distortion measure with which to measure the average distortion It is seen from (21), (23), and (24) that when the term is maximized, the lower bound of capacity will be maximized, and the upper bound of MSE trace will be minimized as well Thus, we utilize this term as a performance metric and define the following error difference: C Distributed Beamforming Selection In general, the proposed distributed codeword selection schemes for the joint precoding system are able to reduce both the overall feedback delay and the computational complexity, while they may suffer from a performance loss compared to the centralized selection scheme, due to the fact that the employed selection objective functions are not the exact capacity or the MSE trace, but their bounds However, our following brief analysis shows that the proposed distributed selection scheme in the special case ) will of beamforming (it happens when suffer from no performance loss as compared with the centralized one, which is consistent with the result found in [22], though different analyzing methods are used For the beamforming case, the MSE matrix in (20) reduces into a scalar and can be written as (26) and reduce to As now both scalars, their eigenvalues are equal to themselves Also, it follows from (2), (16), and (17) that (29) which is nonnegative for any choices of and , since the first term is the performance metric obtained by the and Furthermore, we will design optimal precoders of our codebook pair to minimize the average distortion (30) where denotes the expectation with respect to and If we define the minimum distances of the codebook pair as , (31) (27) namely, the so-called projection two-norm distance between two subspaces is employed, the average distortion can be upper bounded as Substituting (27) in (26), yields (28) where and It can be easily derived and that the MSE is minimized when both are maximized, which means that the proposed distributed codeword selection schemes are optimal from the perspective of both the capacity and the error rate (32) HUANG et al.: A LIMITED FEEDBACK JOINT PRECODING 1353 where and denote the sizes of the codebooks and , respectively For a proof, refer to Appendix C Similar to the and conclusion in [5], assuming that , we always have that the average disand Thus, we can design tortion is decreased with both and separately, with each codebook the codebook pair constructed to maximize the minimum projection two-norm distance between any pair of codewords VI SIMULATION RESULTS Monte Carlo simulations are performed to illustrate the performance of the proposed dual hop joint precoding system with distributed and centralized codeword selection schemes A block fading flat MIMO channel model is used throughout and the simulations The two hop channel matrices are both assumed to have entries independently and identi, with the large scale factors of cally distributed with channels incorporated into the effective noise variances The , antenna configurations are focused on and The Grassmannian codebook provided in [24] is employed in our simulations, and we use the same codebook at the base station and relay station, with its size shown in figures in terms of the number of feedback bits The average SNR at the relay and , station and the user terminal are defined as respectively For comparison, some optimal or suboptimal dual hop precoding systems based on full channel state information are also simulated, where the hereinafter mentioned joint optimal scheme denotes the precoding system in (8) and (13), the suboptimal scheme denotes the precoding system in (16) and (17) with uniform power allocation, and the relay side optimal scheme denotes the system in [17], [18], where only the precoding matrix at the relay station is optimized based on full CSI, and its rate performance is calculated as the information theoretic instantaneous capacity of an equivalent , the open-loop MIMO system Note that in the case of joint optimal precoding can not be analytically solved since the objective function in (15) is not concave with respect to Here we use the alternating optimization method presented in [20] to find the global or local optimum and repeat it with 50 randomly generated starting vectors, using the maximum one in comparison A Dual Hop Joint Beamforming This section focuses on the configurations and Since the receiver is only equipped with single antenna, a joint beamforming, i.e., , should be employed As disclosed in Section IV-C, in this case the proposed distributed codeword selection scheme will not result in any performance loss as compared with the centralized codeword selection scheme, and it reduces to the same scheme as the one presented in [22] Fig shows that the proposed dual hop joint beamforming scheme using distributed CS exhibits slight rate loss as compared with the full CSI based joint optimal beamforming scheme, especially for the Fig illustrates the cumulative case of distribution function of the rate achieved by the dual hop joint beamforming, the results also show a slight gap between the proposed limited feedback joint beamforming scheme and the ;; M;L;N M;L;N ) = ;; Fig The rate of the dual hop joint beamforming system with ( ) = (2 1), 15 dB SNR at the relay station (4 1) and ( M;L;N ;; Fig The cumulative distribution functions of the rate achieved by the proposed dual hop joint beamforming system, with ( ) = (4 1) and ( ) = (2 1), 15 dB SNR at both the receiver and relay station M;L;N ;; joint optimal scheme Fig illustrates the BER performance of the proposed dual hop joint beamforming using QPSK modulation Similar results are also observed B Dual Hop Joint Precoding This section focuses on the configurations and Fig shows the sum rate of the dual hop joint precoding system using two different codeword selection schemes It is seen that the performances of the dual hop joint precoding schemes using distributed and centralized CS both increase with the codebook size Compared with the centralized CS, the distributed CS suffers from a slight rate loss This is because the distributed CS is based on a bound but not an exact rate metric However, the distributed CS has a shorter feedback delay and requires much lower computational complexity Also, it is reasonable to see that even the scheme using centralized CS has a gap from the full CSI based suboptimal scheme, due to the quantization of the optimal joint precoders 1354 M;L;N IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 58, NO 3, MARCH 2010 ;; M;L;N ;; Fig The BER of the proposed dual hop joint beamforming system with ( ) = (4 1) and ( ) = (2 1), 15 dB SNR at the relay station ;; M;L;N ;; Fig The cumulative distribution functions of the rate achieved by the proposed dual hop joint precoding system, with ( ) = (4 2), 15 dB SNR at both the receiver and relay station The BER of the proposed dual hop joint precoding system with M;L;N ) = Fig (M;L;N ) = (4; 4; 2) and 15 dB SNR at the relay station Fig The rate of the dual hop joint precoding system with ( (4 2) and 15 dB SNR at the relay station Moreover, the results reveal that the proposed joint precoding scheme with distributed CS shows obvious advantage over the beamforming scheme presented in [22] in terms of the rate performance, especially in medium-to-high SNR regions This is due to the fact that our proposed precoding scheme employs multiple simultaneous data streams and thus can fully exploit the multiplexing gain offered by the dual hop MIMO channels Fig shows the cumulative distribution function of the rate achieved by the dual hop joint precoding system Similar results are seen as in Fig Interestingly, it is also found from Fig and Fig that the full CSI based suboptimal scheme with UPA shows slight performance loss as compared with the joint optimal scheme, only in the range of medium-to-high SNRs And, the relay side optimal scheme shows the worst performance among three full CSI based schemes, especially in high SNR region This is due to the fact that the precoder at the base station is not optimized It should also be noted that, though it seems from the curves that the relay side optimal scheme outperforms the proposed scheme in most of the SNR region, this is a result of unfair comparison, where the performance of the relay side optimal scheme is calculated as the instantaneous capacity, but not the sum rate achieved by an MMSE receiver Fig shows the BER performance of the dual hop joint precoding scheme using QPSK and MMSE receiver Both the proposed two distributed CS schemes, i.e., (22) and (25), are simulated It is seen that the BER performance of these two schemes (denoted as distributed CS #1 and #2) are very close, and they both increase with the codebook size Compared with the centralized CS scheme, a loss of less than dB is observed in the proposed two distributed CS schemes VII CONCLUSION In this paper we have presented a limited feedback joint precoding for the dual hop downlink with amplify-and-forward relaying The proposed scheme employs a distributed codeword selection and thus has lower computational complexity and feedback delay Also, we have analyzed the joint codebook HUANG et al.: A LIMITED FEEDBACK JOINT PRECODING 1355 design for the joint precoding system, and revealed that a separate codebook design for the base station and the relay station using Grassmannian subspace packing method can guarantee that the overall performance of the proposed scheme improves with the size of each of the two codebooks Finally, computer simulations have confirmed the advantage of the proposed scheme in terms of the tradeoff between performance and complexity, as compared with the limited feedback joint precoding with a centralized codeword selection APPENDIX A PROOF OF (21) Assuming that the relay station transmit signal with full power, it is derived from (2) that (36) Thus, we further have We first present the following matrix inequalities [25]: Given positive semidefinite Hermitian matrices and two with eigenvalues and arranged in nonincreasing order, respectively, we have (37) (33) Since the matrix determinant equals the product of the eigenvalues, the capacity of the dual hop transmission with precoders and can be rewritten as This concludes the proof APPENDIX B PROOF OF (23) AND (24) We first prove the first upper bound of the MSE trace in (23) (38) where (34) By applying the inequality in (33), this yields and the inequality in (a) comes from the lower bound of , which has been derived in Appendix A Similar to the above derivation, the second upper bound of the MSE trace in (24) can be obtained as follows: (39) (35) This concludes the proof 1356 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 58, NO 3, MARCH 2010 (41) APPENDIX C PROOF OF (32) ACKNOWLEDGMENT Before the proof of (32), we first give the following inequality Given arbitrary nonnegative variables , , and , we have [22, Lemma 1] The authors would like to thank all the anonymous reviewers and the editor for their valuable comments that have helped to improve the quality of this paper REFERENCES (40) With that, the average distortion can now be upper bounded as shown in (41) at the top of the page, where the inequality is a result of direct use of (40) Based on the results in [5, eq 29, 30], the two terms in the right-hand side (RHS) can be further upper bounded as (42) (43) Thus, the upper bound of the average distortion can modified as (44) This concludes the proof [1] G J Foschini and M J Gans, “On limits of wireless communications in a fading environment when using multiple antennas,” Wireless Pers Commun., vol 6, no 3, pp 311–335, 1998 [2] G J Foschini, “Layered space-time architecture for wireless communication in fading environment when using multielement antennas,” Bell Labs Tech J., vol 1, no 2, pp 41–59, Aug 1996 [3] A Scaglione, P Stoica, S Barbarossa, G B Giannakis, and H Sampath, “Optimal designs for space-time linear precoders and decoders,” IEEE Trans Signal Process., vol 50, no 5, pp 1051–1064, May 2002 [4] H Sampath, P Stoica, and A Paulraj, “Generalized linear precoder and decoder design for MIMO channels using the weighted MMSE criterion,” IEEE Trans Commun., vol 49, no 12, pp 2198–2206, Dec 2001 [5] D J Love and R W Heath, Jr., “Limited feedback unitary precoding for spatial multiplexing systems,” IEEE Trans Inf Theory, vol 51, no 8, pp 2967–2976, Aug 2005 [6] Y Huang, D Xu, L Yang, and W P Zhu, “A limited feedback precoding system with hierarchical codebook and linear receiver,” IEEE Trans Wireless Commun., vol 7, no 12, pp 4843–4848, Dec 2008 [7] S M Alamouti, “A simple transmit diversity technique for wireless communications,” IEEE J Sel Areas Commun., vol 16, no 8, pp 1451–1458, Oct 1998 [8] H Weingarten, Y Stenberg, and S Shamai, “The capacity region of the Gaussian multiple-input multiple-output broadcast channel,” IEEE Trans Inf Theory, vol 52, no 9, pp 3936–3964, Sep 2006 [9] Q H Spencer, A L Swindelhurst, and M Haardt, “Zero forcing methods for downlink spatial multiplexing in multiuser MIMO channels,” IEEE Trans Signal Process., vol 52, pp 461–471, Feb 2004 [10] K K Wong, R Murch, and K B Letaief, “A joint-channel diagonalization for multiuser MIMO antenna systems,” IEEE Trans Wireless Commun., vol 2, pp 773–786, July 2003 [11] N Jindal, “MIMO broadcast channels with finite-rate feedback,” IEEE Trans Inf Theory, vol 52, pp 5045–5060, Nov 2006 [12] M Sharif and B Hassibi, “On the capacity of MIMO broadcast channels with partial side information,” IEEE Trans Inf Theory, vol 51, pp 506–522, Feb 2005 [13] D Xu, Y Huang, L Yang, and B Li, “Linear transceiver design for multiuser MIMO downlink,” in Proc IEEE Int Conf Commun., May 2008, pp 761–765 [14] Y Huang, L Yang, and J Liu, “A limited feedback SDMA for downlink of multiuser MIMO communication system,” EURASIP J Adv Signal Process., vol 2008, Oct 2008 [15] B Wang, J Zhang, and A Høst-Madsen, “On the capacity of MIMO relay channels,” IEEE Trans Inf Theory, vol 51, no 1, pp 29–43, Jan 2005 HUANG et al.: A LIMITED FEEDBACK JOINT PRECODING [16] H Bolcskei, R U Nabar, O Oyman, and A J Paulraj, “Capacity scaling laws in MIMO relay networks,” IEEE Trans Wireless Commun., vol 5, no 6, Jun 2006 [17] O Munoz-Medina, J Vidal, and A Agustin, “Linear transceiver design in nonregenerative relays with channel state information,” IEEE Trans Signal Process., vol 55, no 6, pp 2593–2604, Jun 2007 [18] X Tang and Y Hua, “Optimal design of non-regenerative MIMO wireless relays,” IEEE Trans Wireless Commun., vol 6, no 4, pp 1398–1407, Apr 2007 [19] Y Fan and J Thompson, “MIMO configurations for relay channels: Theory and Practice,” IEEE Trans Wireless Commun., vol 5, no 5, pp 1774–1786, May 2007 [20] I Hammerström and A Wittneben, “Power allocation schemes for amplify-and-forward MIMO-OFDM relay links,” IEEE Trans Wireless Commun., vol 6, no 8, pp 2798–2802, Aug 2007 [21] Z Fang, Y Hua, and J C Koshy, “Joint source and relay optimization for a non-regenerative MIMO relay,” in Proc IEEE Workshop Sens Array Multichannel Signal Process., Jul 2006, pp 239–243 [22] B Khoshnevis, W Yu, and R Adve, “Grassmannian beamforming for MIMO amplify-and-forward relaying,” IEEE J Sel Areas Commun., vol 26, no 8, pp 1397–1407, Oct 2008 [23] R W Heath, Jr., S Sandhu, and A Paulraj, “Antenna selection for spatial multiplexing systems with linear receivers,” IEEE Commun Lett., vol 5, no 4, pp 142–144, Apr 2001 [24] D J Love, Personal Webpage on Grassmannian Subspace Packing [Online] Available: http://dynamo.ecn.purdue.edu/djlove/grass.html for 0,” 0, [25] H Sha, “Estimation of the eigenvalues of Linear Algebra Its Appl., vol 73, pp 147–150, 1986 AB A > B > Yongming Huang received the B.S and M.S degrees from Nanjing University, China, in 2000 and 2003, and the Ph.D degree from the School of Information Science and Engineering, Southeast University, China, 2007, respectively Since 2007, he has been an Assistant Professor with the School of Information Science and Engineering, Southeast University In December 2008, he joined in the Signal Processing Lab, Electrical Engineering, Royal Institute of Technology (KTH), Stockholm, Sweden, as a Postdoctoral Researcher His current research interest includes MIMO communication systems, multiuser MIMO communications, and cooperative communications Luxi Yang (M’96) received the M.S and Ph.D degree in electrical engineering, from the Southeast University, Nanjing, China, in 1990 and 1993, respectively Since 1993, he has been with the Department of Radio Engineering, Southeast University, where he is currently a professor of Information Systems and Communications, and the Director of Digital Signal Processing Division His current research interests include signal processing for wireless communications, MIMO communications, cooperative relaying 1357 systems, and statistical signal processing He is the author or coauthor of two published books and more than 100 journal papers, and holds 10 patents Prof Yang received the first- and second-class prizes of Science and Technology Progress Awards of the State Education Ministry of China in 1998 and 2002 He is currently a member of Signal Processing Committee of Chinese Institute of Electronics Mats Bengtsson (M’00–SM’06) received the M.S degree in computer science from Linköping University, Linköping, Sweden, in 1991 and the Tech Lic and Ph.D degrees in electrical engineering from the Royal Institute of Technology (KTH), Stockholm, Sweden, in 1997 and 2000, respectively From 1991 to 1995, he was with Ericsson Telecom AB Karlstad He currently holds a position as Associate Professor with the Signal Processing Laboratory, School of Electrical Engineering, KTH His research interests include statistical signal processing and its applications to antenna-array processing and communications, radio resource management, and propagation channel modeling Dr Bengtsson served as Associate Editor for the IEEE TRANSACTIONS ON SIGNAL PROCESSING during 2007–2009 and is a member of the IEEE SPCOM Technical Committee Björn Ottersten (S’87-M’89-SM’99-F’04) was born in Stockholm, Sweden, in 1961 He received the M.S degree in electrical engineering and applied physics from Linköping University, Linköping, Sweden, in 1986 In 1989, he received the Ph.D degree in electrical engineering from Stanford University, Stanford, CA He has held research positions with the Department of Electrical Engineering, Linköping University, the Information Systems Laboratory, Stanford University, the Katholieke Universiteit Leuven, Leuven, and the University of Luxembourg During 1996–1997, he was Director of Research at ArrayComm, Inc., a start-up company in San Jose, CA, based on Ottersten’s patented technology In 1991, he was appointed Professor of Signal Processing at the Royal Institute of Technology (KTH), Stockholm From 1992 to 2004, he was head of the Department for Signals, Sensors, and Systems at KTH and from 2004 to 2008 he was dean of the School of Electrical Engineering at KTH Currently, he is Director for the Interdisciplinary Centre for Security, Reliability and Trust at the University of Luxembourg His research interests include security and trust, reliable wireless communications, and statistical signal processing Dr Ottersten has served as Associate Editor for the IEEE TRANSACTIONS ON SIGNAL PROCESSING and on the editorial board of the IEEE Signal Processing Magazine He is currently Editor-in-Chief of the EURASIP Signal Processing Journal and a member of the editorial board of the EURASIP Journal of Applied Signal Processing He has coauthored papers that received the IEEE Signal Processing Society Best Paper Award in 1993, 2001, and 2006 He is a Fellow of the EURASIP He is a first recipient of the European Research Council advanced research grant