DSpace at VNU: Second-order optimization based adaptive PARAFAC decomposition of three-way tensors tài liệu, giáo án, bà...
JID:YDSPR AID:2072 /FLA [m5G; v1.195; Prn:13/01/2017; 9:02] P.1 (1-12) Digital Signal Processing ••• (••••) •••–••• Contents lists available at ScienceDirect 67 68 Digital Signal Processing 69 70 71 72 www.elsevier.com/locate/dsp 73 74 75 10 76 11 12 13 Second-order optimization based adaptive PARAFAC decomposition of three-way tensors 14 15 16 17 18 Viet-Dung Nguyen a , Karim Abed-Meraim , Nguyen Linh-Trung b a PRISME Laboratory, University of Orléans, 12 rue de Blois BP 6744, Orléans, France b University of Engineering and Technology, Vietnam National University Hanoi, 144 Xuan Thuy, Cau Giay, Hanoi, Viet Nam 23 26 27 28 81 82 83 84 86 a r t i c l e i n f o Article history: Available online xxxx 24 25 79 85 20 22 78 80 a,∗ 19 21 77 Keywords: Fast adaptive PARAFAC Big data Parallel computing Non-negative constraint 29 30 a b s t r a c t 87 88 A fast adaptive parallel factor (PARAFAC) decomposition algorithm is proposed for a class of thirdorder tensors that have one dimension growing linearly with time It is based on an alternating least squares approach in conjunction with a Newton-type optimization technique By preserving the Khatri– Rao product and exploiting the reduced-rank update structure of the estimated subspace at each time instant, the algorithm achieves linear complexity and superior convergence performance A modified version of the algorithm is also proposed to deal with the non-negative constraint In addition, parallel implementation issues are investigated Finally, the performance of the algorithm is numerically studied and compared to several state-of-the-art algorithms © 2017 Elsevier Inc All rights reserved 89 90 91 92 93 94 95 96 31 97 32 98 33 99 34 35 100 Introduction 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 With the recent advances on sensor and streaming technologies, processing massive volumes of data (or “big data”) with time constraints or even in real-time is not only crucial but also challenging [1], in a wide range of applications including MIMO radars [2], biomedical imaging [3], and signal processing [4] A typical situation in those cases is that data are acquired along multiple dimensions, one among which is time Such data can be naturally represented by multi-way arrays, which are called tensors Tensor decomposition, thereby, can be used as an important tool to analyze, understand or eventually compress data Tucker decomposition and Parallel Factor (PARAFAC) decomposition are two widely used tensor decomposition methods and can be considered as generalizations of singular value decomposition (SVD) to multi-way arrays While Tucker decomposition lacks uniqueness and often requires dealing with imposed constraints such as orthogonality, non-negativity, or sparseness [5], PARAFAC decomposition is unique up to scale and permutation indeterminacy under a mild condition For recent surveys on tensor decomposition, the reader is referred to [4,6,7], and references therein for more details In this paper, PARAFAC decomposition is the method of interest For streaming tensors, direct application of batch (i.e., offline) PARAFAC decomposition is computationally demanding Instead, an 59 60 61 62 63 64 65 66 * Corresponding author E-mail addresses: viet-dung.nguyen@univ-orleans.fr (V.-D Nguyen), karim.abed-meraim@univ-orleans.fr (K Abed-Meraim), linhtrung@vnu.edu.vn (N Linh-Trung) http://dx.doi.org/10.1016/j.dsp.2017.01.006 1051-2004/© 2017 Elsevier Inc All rights reserved adaptive (i.e., incremental) approach is more suitable and, hence, should provide a good trade-off between quality and efficiency In contrast to adaptive filtering [8] or subspace tracking [9–11], which have a long standing history and are well-understood, adaptive tensor decomposition has received little attention so far In [2], Nion and Sidiropoulos proposed an adaptive decomposition model for a class of third-order tensors that have one dimension growing with time Accordingly, they proposed two algorithms: recursive least-squares tracking (PARAFAC-RLST) and simultaneous diagonalization tracking (PARAFAC-SDT) In [3], Mardani, Mateos and Giannakis also proposed an adaptive PARAFAC method for streaming data under partial observation A common basis in these studies is the use of first-order methods (i.e., using gradients) to optimize an exponentially weighted least-squares cost function For the above class of third-order tensors, we have recently proposed a fast algorithm for adaptive PARAFAC decomposition that has only linear complexity [12] This algorithm, called 3D-OPAST, generalizes the orthonormal projection approximation subspace tracking (OPAST) algorithm [13] by exploiting a special interpretation of the Khatri–Rao product as collinear vectors inside each column of the estimated subspace In this paper, we will provide an improved version of 3D-OPAST Compared to 3D-OPAST, SOAP is slightly better in terms of performance and has long run-time stability In particular, we propose a new algorithm for second-order optimization based adaptive PARAFAC decomposition (SOAP) The main contributions of the proposed algorithm are summarized as follows 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 JID:YDSPR AID:2072 /FLA 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 [m5G; v1.195; Prn:13/01/2017; 9:02] P.2 (1-12) V.-D Nguyen et al / Digital Signal Processing ••• (••••) •••–••• SOAP has lower complexity and comparable or superior convergence as compared to PARAFAC-RLST and PARAFAC-SDT In terms of complexity, if I and K are the two tensor dimensions other than time, and R is the tensor rank, then SOAP requires only O ( I K R ) flops per iteration (linear complexity with respect to R) while PARAFAC-RLST and PARARAC-SDT require O ( I K R ) (quadratic complexity) This is achieved by first exploiting a second-order stochastic gradient algorithm, in replace of the first-order gradient algorithm used in [2], to improve estimation accuracy Then, at each step in the algorithm, a column of the estimated subspace is forced to have a Kronecker product structure so that the overall subspace approximately preserves the Khatri–Rao product structure When possible, a rank-one update is also exploited to achieve linear complexity A variant of SOAP is proposed for adaptive PARAFAC decomposition with a non-negativity constraint It is known that imposing a non-negativity constraint on PARAFAC, when applicable, not only improves the physical interpretation [14] but also helps to avoid diverging components [15,16] To the best of our knowledge, adaptive non-negative PARAFAC has not been addressed in the literature SOAP is ready for parallel/decentralized computing implementation, an advantage not considered in [2] This is especially important when performing large-scale online processing tasks SOAP allows reduction of algorithm complexity and storage when several parallel computing units (DSP) are available 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 Notations: We follow the notations used in [7] Calligraphic letters are used for tensors (A, B , ) Matrices, (row and column) vectors, and scalars are denoted by boldface uppercase, boldface lowercase, and lowercase respectively; for examples A, a, and a The element (i , j , k) of a tensor A ∈ C I × J × K is symbolized as jk , the element (i , j ) of a matrix A ∈ C I × J as j , and the entry i of a vector a ∈ C I as A ⊗ B denotes the Kronecker product of A and B, A B the Khatri–Rao product (column-wise Kronecker product), A ∗ B the Hadamard product (element-wise matrix product), a ◦ b the outer product of a and b, and [A]+ = max{0, j }, for all i , j, is a positive orthant projection of a real-valued matrix A A T , A∗ , A H and A# denote the transpose, the complex conjugate, the complex conjugate transpose and the pseudo-inverse of A, respectively A denotes a non-negative A, whose entries j ≥ for all i , j 44 45 Batch and adaptive PARAFAC 46 47 48 49 50 2.1 Batch PARAFAC Consider a tensor X ∈ C I × J × K The PARAFAC decomposition of X can be written as follows: R 51 52 X= 55 56 57 58 59 (1) r =1 53 54 ar ◦ br ◦ cr , summing R rank-one tensors, where R is the rank of X The set of vectors {ar }, {br }, and {cr } can be grouped into the so-called loading matrices A = [a1 a R ] ∈ C I × R , B = [b1 b R ] ∈ C J × R , and C = [ c1 c R ] ∈ C K × R In practice, (1) is only an approximate tensor In other words, in a noisy environment we should have 60 61 62 63 64 65 66 R X= ar ◦ br ◦ cr + N , 67 68 69 70 71 72 73 74 Fig Adaptive third-order tensor model and its equivalent matrix form (1 ) X T = (A C)B , 75 (3) IK× J (1) ∈C with X(i −1) K +k, j = xi jk We can write analo( 2) and X(3) [2] Without loss of generality, gous expressions for X where X(1) we assume that ≤ I ≤ K ≤ J PARAFAC is generically unique if it satisfies the following condition [17], [18]: R ≤ min( I + K − 2, J ) (4) Moreover, if ≤ I ≤ K , then the generic uniqueness holds generally (see [18] and references therein) when R ≤ ( I − 1)( K − 1) and ( I − 1)( K − 1) ≤ J (5) 76 77 78 79 80 81 82 83 84 85 86 87 88 89 2.2 Adaptive PARAFAC 90 In batch PARAFAC, the dimensions of X are fixed In contrast, in adaptive PARAFAC, they grow with time, that is, X (t ) ∈ C I (t )× J (t )× K (t ) in the general case In this paper, we consider the case where only one dimension grows with time, in particular X (t ) ∈ C I × J (t )× K For the ease of comparison, we follow the basic adaptive PARAFAC model and assumptions that were introduced in [2] Under this model, the mode-1 tensor represented in matrix form at time t, X(1) (t ), is given by 91 92 93 94 95 96 97 98 99 100 X(1) (t ) H(t )B T (t ), (6) 101 where H(t ) = A(t ) C(t ) ∈ C , B(t ) ∈ C When taking into account two successive times, it can be expressed as a concatenation of the mode-1 tensor of past data at time t − and a vector of new data at time t, i.e., 102 X(1) (t ) = (X(1) (t − 1), x(t )), 107 I K ×R J (t )× R (7) where x(t ) ∈ C I K is obtained from vectorizing the new slide of data at time t Fig illustrates this formulation The loading matrices A and C assumed to follow unknown but slowly time-varying models such that A(t ) A(t − 1) and C(t ) C(t − 1), and hence H(t ) H(t − 1) Accordingly, we have T B (t ) T T (B (t − 1), b (t )) (8) It means that at each time instant we only need to estimate the row vector b(t ) and augment it to B(t − 1) to obtain B(t ), instead of updating the whole B(t ) Also, the tensor rank, R, is assumed to be known and constant so that at each time instant, when new data is added to the old tensor, the uniqueness property of the new tensor is fulfilled by (4) and (5) We note that estimating tensor rank is NP-complete problem [7] Several heuristic methods can be found in [19] and references therein 103 104 105 106 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 Proposed second-order optimization based adaptive PARAFAC 126 127 (2) r =1 where N is a noise tensor Thus, given a data tensor X , PARAFAC tries to achieve R-rank best least squares approximation Equation (1) can also be re-formulated in matrix form as Consider the following exponentially weighted least-square cost function: (t ) = t τ =1 128 129 130 λt −τ φ(τ ), (9) 131 132 JID:YDSPR AID:2072 /FLA [m5G; v1.195; Prn:13/01/2017; 9:02] P.3 (1-12) V.-D Nguyen et al / Digital Signal Processing ••• (••••) •••–••• where 10 11 φ(τ ) = x(τ ) − H(t )bT (τ ) 2 (10) and λ ∈ (0, 1] is referred to as the forgetting factor In (10), we rely on the slow time-varying assumption of the loading matrices so that H(τ ) H(t ) in the considered processing window Now, finding the loading matrices of the adaptive PARAFAC model of (6) corresponds to minimizing (9), that is, H(t ),B(t ) (t ), s.t H(t ) = A(t ) C(t ) (11) 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 This cost function is well-known in adaptive filter theory [8] and can be solved by a recursive least-square method as used in [2] In this paper, we provide an alternative way to not only improve the performance of the algorithm but also reduce the complexity For the performance, our idea is to first optimize the exponentially-weighted least-squares cost function by using second-order stochastic gradient, then approximately preserve the Khatri–Rao product of the estimated subspace H(t ) at each step To achieve linear complexity, we propose to update each column of the subspace at a time instant using a cyclic strategy Thus, our algorithm is called Second-Order Optimization based Adaptive PARAFAC (SOAP) Given the estimates of A(t − 1), B(t − 1) and C(t − 1), the objective of SOAP is to construct the recursive update expressions for A(t ), B(t ) and C(t ) using alternating minimization The algorithm includes four main steps as follows: Step 1: Estimate b T (t ) Step 2: Given b(t ), estimate H(t ) Step 3: Extract A(t ) and C(t ), and estimate one column of H(t ) Step 4: Calculate H# (t ) and update b(t ) While the steps are similar to those in [2], the details in these steps contain our various improvements We now explain these steps in detail The summary of SOAP is given in Table A(t − 1), B(t − 1), C(t − 1), H(t − 1), H# (t − 1), R−1 (t − 1) Estimate b T (t ) b T (t ) = H# (t − 1)x(t ) Estimate H(t ) β(t ) = + λ−1 b∗ (t )R−1 (t − 1)bT (t ) u(t ) = λ−1 R−1 (t − 1)b T (t ) R−1 (t ) = λ−1 R−1 (t − 1) − β −1 (t )u(t )u H (t ) γ (t ) = η(1 − β −1 (t )b∗ (t )u(t )) d(t ) = γ (t )[x(t ) − H(t − 1)b T (t )] H(t ) = H(t − 1) + d(t )u T (t ) Extract A(t ) and C(t ), update one column of H(t ) – Extract A(t ) and C(t ) for i = 1, , R Hi (t ) = unvec(hi (t )) (t ) = HiT (t )ci (t − 1) Hi (t )ai (t ) ci (t ) = Hi (t )ai (t ) – Update column j of H(t ) j = (t mod R ) + ˆ j (t ) = a j (t ) ⊗ c j (t ) h ˆ j (t ) − h j (t ) z(t ) = h ˆ j (t ) H(:, j )(t ) = h Calculate H# (t ) and update b(t ) Calculate H# (t ) using fast matrix inversion lemma b T (t ) = H(t )# x(t ) B T (t ) = [B T (t − 1), b T (t )] 40 43 44 45 46 47 48 3.1 Estimate b T (t ) 53 54 55 56 57 58 59 60 61 This step is the same as in [2] Vector b T (t ) can be obtained as the least-square solution of (10), according to arg bT T x(t ) − H(t − 1)b (t ) 2, (12) which is bˆ T (t ) = H# (t − 1)x(t ), (13) φ(τ ) = x(τ ) − (b(τ ) ⊗ I I K )h(t ) wherein we have exploited the fact that vec(ABC T ) = (C ⊗ A) vec(B) As mentioned in [20], the direction at the maximum rate of change of a function with respect to h is given by the derivative of with respect to h∗ , i.e., [Dh∗ (h, h∗ )] T Consequently, we have 64 65 66 [Dh∗ (h, h∗ )]T t h=h(t −1) =− λt −τ [(b H (τ ) ⊗ I I K )x(τ ) τ =1 − (b H (τ )b(τ ) ⊗ I I K )h(t − 1)] (15) 75 76 77 78 82 83 84 85 (30) (32) 86 87 88 89 90 (33) (8) 91 92 93 94 H = Dh ([Dh∗ (h, h∗ )]T ) t t −τ = λ 96 97 98 h=h(t −1) 99 H [(b (τ )b(τ )) ⊗ I I K ] = R(t ) ⊗ I I K , (16) 101 103 t R(t ) = 104 t −τ λ 105 H b (τ )b(τ ) 106 τ =1 107 = λR(t − 1) + b H (t )b(t ) ∗ [Dh∗ φ(h, h , t )] 108 T h=h(t −1) 109 110 111 112 H = −[(b (t ) ⊗ I I K )x(t ) 113 114 115 116 117 h(t ) = h(t − 1) + ηH−1 [Dh∗ φ(h, h∗ , t )] T , where obtain 100 102 − (b (t )b(t )) ⊗ I I K h(t − 1)] (17) (14) 74 81 The update rule of h is thus given by 2, 73 80 H Unlike [2], we use here a Newton-type method to find H(t ) Let h = vec(H) Function φ(τ ) in (10) can be rewritten as 72 95 where H# (t − 1) has been calculated in the previous iteration 3.2 Estimate H(t ) 71 79 To achieve linear complexity, we replace (15) by instant gradient estimation (i.e., stochastic gradient) 62 63 (22) (23) (21) (26) (25) (24) where 51 52 (13) Thus, its Hessian is computed as 49 50 70 A(t ), B(t ), C(t ), H(t ), H# (t ), R−1 (t ) 41 42 69 Outputs: 38 39 68 Inputs: τ =1 36 37 67 Table Summary of SOAP 3 (18) η is a step size By substituting (16) and (17) into (18), we 118 119 120 121 h(t ) = h(t − 1) + η[R −1 122 H (t )(b (t ) ⊗ I I K )x(t ) 123 − R−1 (t )(b H (t )b(t ) ⊗ I I K )h(t − 1)] (19) 125 We can stack (i.e., unvec) (19) in matrix form as follows: T ∗ H(t ) = H(t − 1) + η[x(t ) − H(t − 1)b (t )]b (t )R −1 (t ) 124 126 (20) Here, we can see that calculating and storing the Hessian explicitly as in (16) is not necessary Instead, we only need to calculate the pseudo-inverse of R(t ) Since R(t ) has a rank-1 update structure, its inverse can be efficiently updated using the inversion lemma as 127 128 129 130 131 132 JID:YDSPR AID:2072 /FLA [m5G; v1.195; Prn:13/01/2017; 9:02] P.4 (1-12) V.-D Nguyen et al / Digital Signal Processing ••• (••••) •••–••• R−1 (t ) = [λR(t − 1) + b T (t )b∗ (t )]−1 −1 −1 =λ R (t − 1) − β −1 (t )u(t )u (t ), β(t ) = + λ−1 b∗ (t )R−1 (t − 1)bT (t ), −1 −1 T u(t ) = λ Substituting (21) into (20) yields 11 12 13 14 15 16 (21) where 10 H R (t − 1)b (t ) (22) (23) H(t ) = H(t − 1) + d(t )u H (t ), (24) where T d(t ) = γ (t )[x(t ) − H(t − 1)b (t )], γ (t ) = η(1 − β −1 (t )b∗ (t )u(t )) (25) (26) 21 22 23 24 25 26 27 28 29 30 C(t ) 31 = [a1 (t ) ⊗ c1 (t ) · · · a R (t ) ⊗ c R (t )] 32 = [vec(c1 (t )a1T (t )) · · · vec(c R (t )aTR (t ))] 33 34 35 36 37 38 39 (27) Each column is the vectorization of a rank-1 matrix The loading vectors ci (t ) and (t ) are, thus, the principal left singular vector and the conjugate of the principal right singular vector of matrix Hi (t ) = unvec(ai (t ) ⊗ ci (t )), respectively [2] However, using batch SVD is not suitable for adaptive tracking, we use a single Bi-SVD iteration [22] to update (t ) and ci (t ) recursively according to 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 for i = 1, , R (t ) = HiT (t )ci (t − 1) ci (t ) = Hi (t )ai (t ) Hi (t )ai (t ) (28) (29) Then, having obtained A(t ) and C(t ), H(t ) may be re-updated according to (27), as done in [2] However, to achieve linear complexity, in this paper we choose to re-update only one column of H(t ) at each iteration In particular, at time instant t, we select the column of H(t ) to be updated in a cyclic way; that is, select column j where j = (t mod R ) + This column is then updated as ˆ j (t ) = a j (t ) ⊗ c j (t ) h (30) Because of the best rank-1 approximation property of SVD, we take advantage of the denoised loading vectors a j (t ) and c j (t ) and, thereby, improve the accuracy of H(t ) estimation The other columns of H(t ) are left unchanged to preserve the reduced-rank structure of the updated matrix The updated version of H(t ) can be expressed as ˆ (t ) = H H(t ) + z(t )e Tj (t ), (31) where ˆ j (t ) − h j (t ), z(t ) = h 70 74 As mentioned in Step 3, we can compute the pseudo-inverse ˆ (t ) efficiently thanks to its rank-2 update structure Our case of H corresponds to Theorem in [23], as given in the Appendix Then, we can update b(t ) by 76 ˆ # (t )x(t ), b T (t ) = H 72 73 75 (33) 77 78 79 80 81 82 83 84 The purpose of this step is to (i) preserve an approximate Khatri–Rao product structure of H(t ) in order to improve the estimation accuracy and ensure the convergence of the algorithm, (ii) provide a reduced rank update structure that allows the calculation of H# (t ) in the next step with linear complexity, and (iii) extract from H(t ) the loading matrices A(t ) and C(t ) This can be implemented efficiently as follows Before proceeding further, recall from [21,2] that A(t ) and C(t ) can be extracted from H(t ), based on A(t ) 69 3.4 Calculate H# (t ) and update b(t ) 3.5 Algorithm initialization H(t ) 68 71 3.3 Extract A(t ) and C(t ) & update one column of H(t ) 19 20 67 ˆ (t ) has a rank-2 update structure by substituting (24) to see that H into (31) and hence obtain B(t ) from (8) 17 18 and e j (t ) is the unit vector whose j-th entry is one Here, vector h j (t ) defines the j-th column of H(t ) estimated from previous step and z(t ) is an error term of j-th column between the current step ˆ j (t )) and the previous step (i.e, h j (t )) It is straightforward (i.e, h (32) 85 To initialize A(0), B(0), C(0), H# (0) and R−1 (0) before tracking, we can capture J slices, where J is chosen to satisfy the uniqueness condition of (4) and (5), and then run a batch PARAFAC algorithm to obtain A(0), B(0), and C(0) After that, we compute H# (0) = (A(0) C(0))# and R−1 (0) = (B T (0)B(0))−1 86 87 88 89 90 91 92 Adaptive non-negative PARAFAC 93 94 In this section, we consider the case where the constraint of non-negativity is imposed, and modify SOAP accordingly We call this modification of SOAP as non-negative SOAP (NSOAP) Given the non-negative estimates of A(t − 1) 0, B(t − 1) 0, and C(t − 1) 0, we want to find recursive updates of A(t ) 0, B(t ) 0, and C(t ) 0, which are the loading matrices of the PARAFAC decomposition We note that, while SOAP works with the general case of complex values, in this section we only consider real non-negative ones A simple approach is to use the positive orthant projection That is, at each step of SOAP, we project the result on the positive orthant, for example, in Step 1, set b T (t ) := [b T (t )]+ However, this naive combination does not work for Step (the so-called projected Newton-type method), as indicated in the context of constrained optimization [24] or least-squares non-negative matrix approximation [25] In practice, for batch processing, the projected Newton-type method requires a combination of restrictions on the Hessian (e.g., diagonal or partly diagonal [26]) and the Armijo-like step rule [24] to guarantee a quadratic rate of convergence In spite of their advantage, computing the Armijo-like step rule is expensive and not suitable for adaptive algorithms It is even more difficult in our case because the global optimum value can be changed continuously, depending on new data Therefore, we propose to use a simpler strategy Particularly, because of the slowly time-varying model assumption, we restrict ourselves to only calculating the diagonal of the Hessian and using a fixed step rule Even though the convergence proof is not available yet, this strategy still gives an acceptable performance and represents a good trade-off between the performance and the complexity, as indicated in Section Now, for the details, several modifications of SOAP for handling the non-negativity constraint are as follows At the end of Step 1, we add one minor step after obtaining b T (t ), by setting b T (t ) := [b T (t )]+ (34) In Step 2, after calculating R−1 (t ), we extract the diagonal matrix (which is non-negative since R is positive Hermitian) as 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 JID:YDSPR AID:2072 /FLA [m5G; v1.195; Prn:13/01/2017; 9:02] P.5 (1-12) V.-D Nguyen et al / Digital Signal Processing ••• (••••) •••–••• Table Summary of NSOAP Inputs: A(t − 1) 0, B(t − 1) 0, C(t − 1) 0, H(t − 1), H# (t − 1), R−1 (t − 1) Estimate b T (t ) Perform Step of SOAP b T (t ) = [b T (t )]+ Estimate H(t ) β(t ) = + b(t )R# (t − 1)b(t )T u(t ) = λ−1 R−1 (t − 1)b(t ) T R−1 (t ) = λ−1 R−1 (t − 1) − β −1 (t )u(t )u(t ) T ˆ −1 (t ) = diag(diag(R−1 (t ))) R γ (t ) = η(1 − β −1 (t )b∗ (t )u(t )) d(t ) = γ (t )(x(t ) − H(t − 1)b T (t )) ˆ (t ) = λ−1 Rˆ −1 (t )bT (t ) u ˆ T (t ) H(t ) = H(t − 1) + d(t )u ˜ (t ) = [H(t )]+ H ˜ (t ) Same as Step of SOAP but with H Calculate H# (t ) and update b(t ) Perform Step of SOAP b T (t ) = [b T (t )]+ B T (t ) = [B T (t − 1), b T (t )] 10 11 12 13 14 15 16 17 18 19 20 21 (34) (22) (23) (21) (35) (26) (25) (36) (38) (34) (8) Outputs: 22 A(t ) 0, B(t ) 0, C(t ) H(t ), H# (t ), R−1 (t ) 23 0, 24 25 26 ˆ −1 R (t ) = diag(diag(R −1 27 and then calculate 28 ˆ (t ) = λ−1 Rˆ −1 (t )bT (t ) u 29 30 31 32 33 (t ))), (35) (36) ˆ (t ) instead of d(t ) and u(t ), Thus, H(t ) is updated, using d(t ) and u by ˆ T (t ) H(t ) = H(t − 1) + d(t )u (37) 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Then, we project H(t ) on the positive orthant to obtain ˜ (t ) = [H(t )]+ H (38) As previously discussed in Step 3, ci (t ) and (t ) are respectively the principal left singular vector and the conjugate of the principal right singular vector of matrix Hi (t ) = unvec(ai (t ) ⊗ ci (t )) The updated loading matrices A(t ) and C(t ), obtained by (28) ˜ i (t ) is non-negative and so and (29), are still non-negative since H are A(t − 1) and C(t − 1) (already obtained in the previous time instant) In Step 4, a positive orthant projection like (34) is used A summary of all the steps of the NSOAP algorithm is given in Table The initialization of NSOAP is similar to that of SOAP, except that a batch non-negative PARAFAC algorithm is used instead of the standard PARAFAC Third, the main reason that helps SOAP achieve linear complexity, while still having a comparable or even superior performance (as shown in the next section) stems from Steps and In fact, the subspace H(t ) in SOAP is updated two times In Step 2, both the gradient (stochastic) and Hessian are used instead of only the gradient as in PARAFAC-RLST This first time update is for all R columns (i.e equation (24)) However, the subspace update does not preserve Khatri–Rao product structure of H(t ) Thus, in Step 3, we exploit the Khatri–Rao product structure to enhance the performance This is the second update Moreover, we note that, in [2], PARAFAC-RLST and PARAFAC-SDT extract A(t ) and C(t ) directly from the subspace update without preserving Khatri–Rao product as our proposed algorithm The fast calculation of H# (t ) using the pseudo-inverse lemma in Step is a consequence of designing H(t ) to have a rank-2 update in Steps and Finally, in Step 3, we can update all columns of H(t ) using (28) and (29) for i = 1, , R However, it leads to calculate H# (t ) without a rank-2 update structure as in SOAP, by using the Khatri–Rao product structure as follows: # T T H (t ) = [A (t )A(t ) ∗ C (t )C(t )] −1 [A(t ) T C(t )] (39) The cost for this implementation is O ( I K R ) and is thus disregarded in this paper Now, we show that both SOAP and NSOAP are easy to realize in a parallel scheme This implementation is important when used for massive data (large dimensional systems) It can be observed that the main computational cost comes from the matrix–vector product Assume that R computational units (DSPs) are available Then, in Step 1, Equation (13) corresponds to ˜ i (t − 1)x(t ), biT (t ) = h i = 1, , R , 53 54 55 56 57 58 59 60 61 62 63 64 65 66 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ˜ i (t − 1) is i-th row of H# (t − 1) It means that we have where h replaced the matrix–vector product by the vector–vector product This procedure can also be applied in Step Steps and themselves have already a parallel structure and, again, note that each column of H(t ) can be estimated independently By this way, the overall cost can be reduced, by approximately a factor of R, to O ( I K ) flops per iteration 98 99 100 101 102 103 104 105 106 Simulations 107 108 In this section, we study the performance of the proposed algorithms using both synthetic and real data, from [27] We also consider the effect of different system parameters on the performance 109 110 111 112 113 6.1 Performance comparison 114 115 Discussions 51 52 68 (40) 49 50 67 In this section, we provide some important comments on the similarities and differences between our algorithms and those developed in [2] Discussions on parallel implementations are also given First, it is straightforward to realize that in all steps the main cost of SOAP comes from the matrix–vector product Thus, it has linear complexity of O ( I K R ) NSOAP is slightly more expensive but has the same complexity order as SOAP Second, obviously, in Step of SOAP, we can choose to update d > columns of H(t ) instead of only column This parameter d can be chosen to balance between the estimation accuracy and the numerical cost However, in our simulation contexts, we have observed that the loss of estimation accuracy is quite minor as compared to updating all the columns of H(t ) and so we opted for d = in all our simulation experiments given in this paper First, we use the framework provided by the authors in [2] to verify and compare the performance of the considered algorithms A time-varying model is thereby constructed so that, at time instant t, we generate the loading matrices A(t ) and C(t ) as 116 A(t ) = (1 − ε A )A(t − 1) + ε A N A , (41) 120 C(t ) = (1 − εC )C(t − 1) + εC NC , (42) where ε A and εC control the speed of variation for A and C between two successive observations, N A and NC are random matrices with identical sizes with A and C Generate a vector b(t ) randomly and the noiseless input data x(t ) is given by x(t ) = [A(t ) T C(t )]b (t ) Thus, this observation vector follows the model described in Section 2.2 and are constrained by the assumptions therein Then, the noisy observation is given by 117 118 119 121 122 123 124 125 126 127 128 129 130 131 132 JID:YDSPR AID:2072 /FLA [m5G; v1.195; Prn:13/01/2017; 9:02] P.6 (1-12) V.-D Nguyen et al / Digital Signal Processing ••• (••••) •••–••• 6 67 Table Experimental set-up parameters Fig I 20 20 20 J0 50 50 70 50 68 K R 20 20 61 20 8 ε A , εC λ η 1000 1000 201 1000 10−3 0.8 0.8 0.8 0.8 NA 0.02 0.002 0.02 10 11 8–9 50 20 50 50 50 50 20 61 20 69 T 1000 10000 1000 10−3 NA 10−2 10−3 10−5 10−3 0 70 71 72 73 74 0.8 0.8 0.8 75 NA NA 0.02 76 77 12 13 14 15 16 17 18 19 20 21 22 23 24 78 x˜ (t ) = x(t ) + σ n(t ), 79 (43) 80 81 where n(t ) is a zero mean, unit-variance noise vector while parameter σ is introduced to control the noise level We set a default value of σ to 10−3 To have a fair comparison, we keep all default parameters of the algorithms and the model as offered by the authors of [2] A summary of parameters used in our experiments is showed in Table The performance measures for the loading matrices A(t ) and C(t ) are the standard deviations (STD) between the true loading matrices, A(t ) and C(t ), and their estimates, Aes (t ) and Ces (t ), up to a scale and permutation indeterminacy at each time 82 83 84 85 86 87 88 89 90 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 91 STDA (t ) = A(t ) − Aes (t ) F, (44) STDC (t ) = C(t ) − Ces (t ) F (45) 92 93 94 95 For the loading matrix B(t ), because of its time-shift structure, we verify its performance through x(t ) by STDB (t ) = x(t ) − xes (t ) 96 97 (46) 98 99 We would like to remind that, in the noiseless case x(t ) = [A(t ) C(t )]b T (t ) Thus, when we assess performance of algorithms by equation (46), we evaluate the whole model at each time instant and indirectly verify the estimation accuracy of b(t ) To assess the convergence rate of the algorithms, we set up the following scenario: always keep the speed of variation of A and C constant except at a few specific time instants at which the speed of variation arbitrarily increases Thus, the algorithm that recovers faster yields a better convergence rate This scenario is similar to the convergence rate assessment in the context of subspace tracking, see for example [28,29] The first experiment is to compare the performance of SOAP with that of PARAFAC-RLST, PARAFAC-SDT (exponential window), 3DOPAST and batch PARAFAC-ALS (Alternating Least-Squares) Batch PARAFAC here serves as a “lower bound” for adaptive algorithms As shown in Fig 2, SOAP outperforms PARAFAC-RLST, PARAFAC-SDT, 3DOPAST and is closer to batch PARAFAC than the others In addition, SOAP, 3DOPAST, PARAFAC-RLST, and PARAFACSDT approximately have the same convergence rate Again, we note that the computational complexity of SOAP and 3DOPAST are O ( I K R ) as compared to O ( I K R ) of PARAFAC-RLST and PARAFACSDT In the second experiment for non-negative data, we take absolute value of the previous model, i.e., + 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 A (t ) = |A(t )|, (47) 124 C+ (t ) = |C(t )|, (48) 126 + x (t ) = |x(t )|, 125 127 (49) where (+ ) means non-negative Since there exist no other adaptive non-negative PARAFAC algorithms apart from our NSOAP, we compare NSOAP with the batch non-negative PARAFAC (Batch NPARAFAC) algorithm implemented in the N-way toolbox [30] The 128 129 Fig Performance and speed convergence rate comparison of five algorithms when loading matrices change relatively fast, ε A = εC = 10−3 130 131 132 JID:YDSPR AID:2072 /FLA [m5G; v1.195; Prn:13/01/2017; 9:02] P.7 (1-12) V.-D Nguyen et al / Digital Signal Processing ••• (••••) •••–••• results are shown in Fig As expected, performance of batch NPARAFAC is better than that of NSOAP However, the advantage of NSOAP is its low computational complexity suitable for streaming data contexts To further study the performance of NSOAP, we apply it to a typical example using a fluorescence dataset [27] which includes five samples of fluorescence excitation-emission of size samples × 201 emission wavelengths × 61 excitation wavelengths It is showed that the estimated loading matrices from PARAFAC with the nonnegative constraint are similar to the pure spectra Here, we use an initialization tensor of size × 70 × 61 Note that the emission wavelength is relatively short, and during the interval [250, 300] one of three components is almost zero Fig shows that NSOAP can recover the tensor components in this particular example 10 11 12 13 14 15 16 6.2 Relative execution time comparison 18 In Section 5, we indicate that our algorithms have linear complexity O ( I K R ) Here, we provide a rough complexity assessment of the algorithms using CPU execution time as a measure We emphasize the relativity of this comparison because the CPU execution time depends on various factors including platform, programming language, implementation We compare SOAP with PARAFACRLST and PARAFAC-SDT.1 Again, all parameters of PARAFAC-RLST and PARAFAC-SDT are kept default The algorithms are run in a computer Intel Core i7 2.8 GHz with GB RAM and Maltab R2015a Fig shows that SOAP is faster than PARAFAC-RLST and PARAFAC-SDT 20 21 22 23 24 25 26 27 28 29 30 70 71 72 73 74 75 76 77 78 79 80 81 83 85 86 87 88 89 90 91 92 93 94 95 96 31 32 33 34 35 36 37 38 39 40 41 6.3 Effect of the speed of variation 97 In this section, we consider different values of the speed of variation ε A and εC to evaluate its effect on the performance Fig shows that SOAP adapts better to fast variation (ε A = εC = 10−2 ) than PARAFAC-RLST, PARAFAC-SDT, and 3DOPAST In this case, SOAP still converges while the others diverge When the variation is smaller (ε A = εC = 10−3 or 10−5 ), SOAP is comparable to PARAFAC-RLST and PARAFAC-SDT 99 6.4 Long run-time stability 42 43 Performance of various algorithms including the famous recursive least squares (RLS) and least mean square (LMS) can suffer when running for long time This is referred to as long run-time stability or limited precision effect [31] caused by accumulated quantization error in time We show that, by experiment, SOAP is more stable than PARAFAC-RLST, PARAFAC-SDT and 3DOPAST in this aspect2 (Fig 7), at least in the described context and given parameters A theoretical analysis to explain why SOAP is more stable is still an open problem and deserves a future work We note that, in practice, the limited precision effect can be resolved by reinitializing algorithms after typically several thousands of iterations 44 45 46 47 48 49 50 51 52 53 54 55 6.5 Waveform-preserving character 56 57 98 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 The waveform-preserving character is important in communication and bio-medical applications, as for example in the blind receivers for direct-sequence code-division multiple access (DSCDMA) [32] and multi-subject Functional Magnetic Resonance 58 59 60 61 62 124 125 126 127 128 63 66 69 84 19 65 68 82 17 64 67 Fig Performance comparison of NSOAP with batch non-negative PARAFAC when loading matrices change relatively fast, ε A = εC = 10−3 129 We disregard 3DOPAST in this experiment because the code is not optimized yet An experiment where we run SOAP 106 iterations was conducted to confirm the stability of the proposed algorithm 130 131 132 JID:YDSPR AID:2072 /FLA [m5G; v1.195; Prn:13/01/2017; 9:02] P.8 (1-12) V.-D Nguyen et al / Digital Signal Processing ••• (••••) •••–••• 67 68 69 70 71 72 73 74 75 10 76 11 77 12 78 13 79 14 80 15 81 16 82 17 83 18 84 19 85 20 86 21 87 22 88 23 89 24 90 25 91 26 92 27 93 28 94 29 95 30 96 31 97 32 98 33 99 34 100 35 101 36 102 37 103 38 104 39 105 40 106 41 107 42 108 43 109 44 110 45 111 46 112 47 113 48 114 49 115 50 116 51 117 52 118 53 119 54 120 55 121 56 122 57 123 58 124 59 125 60 126 61 127 62 63 64 65 66 128 Fig Performance comparison of NSOAP with batch non-negative PARAFAC in fluorescence data set For B(t ), we present only a part of recovered loading matrix; initialization part is disregarded 129 130 Fig The effect of the speed of variation on the algorithm performance 131 132 JID:YDSPR AID:2072 /FLA [m5G; v1.195; Prn:13/01/2017; 9:02] P.9 (1-12) V.-D Nguyen et al / Digital Signal Processing ••• (••••) •••–••• 67 68 69 70 71 72 73 74 75 10 76 11 77 12 78 13 79 14 80 15 81 16 82 17 83 18 84 19 85 20 21 22 86 87 Fig CPU time-based comparison when loading matrices change relatively fast, ε A = εC = 10−3 88 23 24 25 26 27 28 29 30 31 32 33 34 35 36 89 37 38 39 40 41 42 43 44 45 90 Imaging (fMRI) analysis [33] In this section, we illustrate this property of our algorithms via a synthetic example We first generate the loading matrix B(t ) including three kinds of signals: a chirp, a rectangular wave and a sawtooth wave The signal length is s and the sample rate is kHz For the chirp, the instantaneous frequency is at t = and crosses 300 Hz at t = s The rectangular wave has a frequency of 50 Hz, and the symmetric sawtooth has a repetition frequency of 20 Hz with a sawtooth width of 0.05 s For A(t ) and C(t ), we use the loading matrices from the fluoresence example where components of A(t ) have a sharp change and components of C(t ) have a smooth change The PARAFAC model is then disturbed by a Gaussian noise with signalto-noise-ratio (SNR) of 15 dB The SNR (in dB) is defined by E{ A(t ) C(t )b(t ) }2 91 92 93 94 95 96 97 98 99 100 101 102 103 (50) 104 The simulation results when applying SOAP and NOSAP are shown in Figs and 9, corresponding to the first 200 data samples (iterations) As we can see, both algorithms lead to a good restoration of the original components 106 Conclusions 111 SNR(dB) = σ2 105 107 108 109 110 46 47 48 49 50 51 52 53 54 55 112 113 In this paper, we have proposed two efficient adaptive PARAFAC decomposition algorithms: SOAP for a standard setup and NSOAP for the non-negative constraint case To our best knowledge, no adaptive non-negative PARAFAC algorithms have been addressed before By exploiting the data structure, the proposed algorithms achieve linear computational complexity of O ( I K R ) per iteration while enjoying a good performance as compared to the state-ofthe-art algorithms These algorithms3 can be considered as a starting point of real-time PARAFAC-based applications 114 115 116 117 118 119 120 121 56 57 122 123 Acknowledgments 58 59 60 61 62 63 124 126 127 128 129 Fig Long time run stability of four algorithms when loading matrices change relatively fast, ε A = εC = 10−3 64 65 66 125 We would like to thank the associate editor and the reviewers for their efforts and careful evaluations, comments and suggestions We would like to thank Dr Nion and Prof Sidiropoulos for making their codes available Program codes will be made available on-line after publication of this work 130 131 132 JID:YDSPR 10 AID:2072 /FLA [m5G; v1.195; Prn:13/01/2017; 9:02] P.10 (1-12) V.-D Nguyen et al / Digital Signal Processing ••• (••••) •••–••• 67 68 69 70 71 72 73 74 75 10 76 11 77 12 78 13 79 14 80 15 81 16 82 17 83 18 84 19 85 20 86 21 87 22 88 23 89 24 90 25 91 26 92 27 93 28 94 29 95 30 96 31 97 32 98 33 99 34 100 35 101 36 102 37 103 38 104 39 105 40 106 41 107 42 108 43 109 44 110 45 111 46 112 47 113 48 114 49 115 50 116 51 117 52 118 53 119 54 120 55 121 56 122 57 123 58 124 59 125 60 126 61 127 62 128 63 64 65 66 129 Fig Illustration of waveform-preserving character of SOAP through synthetic example with SNR = 15 dB Solid line represents solution of SOAP while dash–dot line represents ground-truth Fig Illustration of waveform-preserving character of NSOAP through synthetic example with SNR = 15 dB 130 131 132 JID:YDSPR AID:2072 /FLA [m5G; v1.195; Prn:13/01/2017; 9:02] P.11 (1-12) V.-D Nguyen et al / Digital Signal Processing ••• (••••) •••–••• This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.02-2015.32 Appendix A 10 11 12 13 14 To make our paper self-contained, we present here rank-1 update for the pseudo-inverse as discussed in Step of the proposed ˆ (t ) has a rank-2 algorithm (Section 3.4) Because the matrix H structure, we can apply formula (51) twice to obtain its pseudoˆ # (t ) inverse H Given matrix A ∈ C I × J , its pseudo-inverse A# ∈ C I × J and two vectors, c ∈ C I ×1 d ∈ C J ×1 , fast update of (A + cd H )# , corresponding to Theorem in [23], is given by 15 16 (A + cd H )# = A# + 17 18 19 20 21 22 23 24 25 26 27 30 31 32 33 34 β σ A# h H u H − ∗ pq H , (51) where β = + d H A# c H # h=d A k = A# c u = c − Ak p=− 28 29 β∗ qH = − σ= h u β∗ # H A h −k h 2 u uH − h β∗ + | β |2 We note that this update includes only matrix–vector multiplications and, thus, preserves linear complexity of our algorithm 35 36 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 [16] A Stegeman, Finding the limit of diverging components in three-way CANDECOMP/PARAFAC – a demonstration of its practical merits, Comput Stat Data Anal 75 (2014) 203–216 [17] J.B Kruskal, Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics, Linear Algebra Appl 18 (2) (1977) 95–138 [18] I Domanov, L.D Lathauwer, Generic uniqueness conditions for the canonical polyadic decomposition and INDSCAL, SIAM J Matrix Anal Appl 36 (4) (2015) 1567–1589 [19] K Liu, J.P.C da Costa, H.C So, L Huang, J Ye, Detection of number of components in CANDECOMP/PARAFAC models via minimum description length, Digit Signal Process 51 (2016) 110–123 [20] A Hjorungnes, D Gesbert, Complex-valued matrix differentiation: techniques and key results, IEEE Trans Signal Process 55 (6) (2007) 2740–2746 [21] F Roemer, M Haardt, Tensor-based channel estimation (TENCE) for two-way relaying with multiple antennas and spatial reuse, in: IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2009, pp 3641–3644 [22] P Strobach, Bi-iteration SVD subspace tracking algorithms, IEEE Trans Signal Process 45 (5) (1997) 1222–1240 [23] J.C.D Meyer, Generalized inversion of modified matrices, SIAM J Appl Math 24 (3) (1973) 315–323 [24] D.P Bertsekas, Projected Newton methods for optimization problems with simple constraints, SIAM J Control Optim 20 (2) (1982) 221–246 [25] D Kim, S Sra, I.S Dhillon, Fast Newton-type methods for the least squares nonnegative matrix approximation problem, in: SDM, vol 7, SIAM, 2007, pp 343–354 [26] M Schmidt, D Kim, S Sra, Projected Newton-type methods in machine learning, in: Optimization for Machine Learning, 2012, p 305 [27] R Bro, PARAFAC Tutorial and applications, Chemom Intell Lab Syst 38 (2) (1997) 149–171 [28] P Strobach, Fast recursive subspace adaptive ESPRIT algorithms, IEEE Trans Signal Process 46 (9) (1998) 2413–2430 [29] R Badeau, G Richard, B David, Fast and stable YAST algorithm for principal and minor subspace tracking, IEEE Trans Signal Process 56 (8) (2008) 3437–3446 [30] C.A Andersson, R Bro, The N-way toolbox for MATLAB, Chemom Intell Lab Syst 52 (1) (2000) 1–4 [31] J.M Cioffi, Limited-precision effects in adaptive filtering, IEEE Trans Circuits Syst 34 (7) (1987) 821–833 [32] N.D Sidiropoulos, G.B Giannakis, R Bro, Blind PARAFAC receivers for DS-CDMA systems, IEEE Trans Signal Process 48 (3) (2000) 810–823 [33] C.F Beckmann, S.M Smith, Tensorial extensions of independent component analysis for multisubject FMRI analysis, Neuroimage 25 (1) (2005) 294–311 References 37 38 11 [1] K Slavakis, G Giannakis, G Mateos, Modeling and optimization for big data analytics: (statistical) learning tools for our era of data deluge, IEEE Signal Process Mag 31 (5) (2014) 18–31 [2] D Nion, N.D Sidiropoulos, Adaptive algorithms to track the PARAFAC decomposition of a third-order tensor, IEEE Trans Signal Process 57 (6) (2009) 2299–2310 [3] M Mardani, G Mateos, G.B Giannakis, Subspace learning and imputation for streaming big data matrices and tensors, IEEE Trans Signal Process 63 (10) (2015) 2663–2677 [4] A Cichocki, D Mandic, L De Lathauwer, G Zhou, Q Zhao, C Caiafa, H.A Phan, Tensor decompositions for signal processing applications: from twoway to multiway component analysis, IEEE Signal Process Mag 32 (2) (2015) 145–163 [5] M Mørup, L.K Hansen, S.M Arnfred, Algorithms for sparse nonnegative Tucker decompositions, Neural Comput 20 (8) (2008) 2112–2131 [6] P Comon, Tensors: a brief introduction, IEEE Signal Process Mag 31 (3) (2014) 44–53 [7] T.G Kolda, B.W Bader, Tensor decompositions and applications, SIAM Rev 51 (3) (2009) 455–500 [8] S.S Haykin, Adaptive Filter Theory, Pearson Education India, 2007 [9] P Comon, G.H Golub, Tracking a few extreme singular values and vectors in signal processing, Proc IEEE 78 (8) (1990) 1327–1343 [10] Y Hua, Y Xiang, T Chen, K Abed-Meraim, Y Miao, A new look at the power method for fast subspace tracking, Digit Signal Process (4) (1999) 297–314 [11] X.G Doukopoulos, G.V Moustakides, Fast and stable subspace tracking, IEEE Trans Signal Process 56 (4) (2008) 1452–1465 [12] V.-D Nguyen, K Abed-Meraim, N Linh-Trung, Fast adaptive PARAFAC decomposition algorithm with linear complexity, in: International Conference on Acoustics, Speech and Signal Processing, ICASSP, 2016, pp 6235–6239 [13] K Abed-Meraim, A Chkeif, Y Hua, Fast orthonormal PAST algorithm, IEEE Signal Process Lett (3) (2000) 60–62 [14] A Cichocki, R Zdunek, A.H Phan, S Amari, Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation, John Wiley & Sons, 2009 [15] L.-H Lim, P Comon, Nonnegative approximations of nonnegative tensors, J Chemom 23 (7–8) (2009) 432–441 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 Viet-Dung Nguyen received the bachelor degree from the VNU University of Engineering and Technology, Vietnam, in 2009, the M.Sc degree in network and telecommunication from the École Normale Supérieure (ENS), Cachan, France, in 2012, and the Ph.D degree from the University of Orléans, France, in 2016, in the field of signal processing Now he has held a postdoctoral research fellow position at Signals and Systems Laboratory (L2S), University of Paris-Saclay His current research interests include: wireless communication for Internet-of-Things, matrix and tensor decompositions, adaptive signal processing, blind source separation, array signal processing and statistical performance analysis 103 104 105 106 107 108 109 110 111 112 113 114 Karim Abed-Meraim was born in 1967 He received the State Engineering Degree from Ecole Polytechnique, Paris, France, in 1990, the State Engineering Degree from Ecole Nationale Supérieure des Télécommunications (ENST), Paris, France, in 1992, the M.Sc degree from Paris XI University, Orsay, France, in 1992 and the Ph.D degree from the ENST in 1995 (in the field of Signal Processing and communications) From 1995 to 1998, he took a position as a research fellow at the Electrical Engineering Department of the University of Melbourne where he worked on research project related to “Blind System Identification for Wireless Communications” and “Array Processing for Communications” From 1998 to 2012 he has been Assistant then Associate Professor at the Signal and Image Processing Department of Telecom-ParisTech His research interests are related to statistical signal processing with application to communications, system identification, adaptive filtering and tracking, radar and array processing, biomedical signal processing and statistical performance analysis In September 2012 he joined the University of Orléans (PRISME Lab.) as a full Professor He has been also a visiting scholar at the Centre of Wireless Communications (National University of Singapore) in 1999, at the EEE Department of Nanyang Technological University (Singapore) in 2001, at Telecom 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 JID:YDSPR 12 AID:2072 /FLA [m5G; v1.195; Prn:13/01/2017; 9:02] P.12 (1-12) V.-D Nguyen et al / Digital Signal Processing ••• (••••) •••–••• Malaysia Research and Development Centre in 2004, at the School of Engineering and Mathematics of Edith Cowan University (Perth, Australia) in 2004, at the EEE Department of the National University of Singapore in 2006, at Sharjah University (UAE) in 2008–2009 and at King Abdullah University of Science and Technology (KSA) in 2013 and 2014 He is the author of about 400 scientific publications including book chapters, international journal and conference papers and patents 10 11 12 13 14 Dr Nguyen Linh-Trung studied B.Eng and Ph.D both in Electrical Engineering at Queensland University of Technology, Brisbane, Australia Since 2006, he has been at the faculty of the University of Engineering and Technology (VNU-UET), a member university of Vietnam National University, Hanoi (VNU), where he is currently an associate professor of electronic engineering in the Faculty of Electronics and Telecommunications His technical interest is in signal processing methods and algorithms (for, especially, time–frequency signal analysis, blind source separation, compressive sampling, tensor-based signal analysis, graph signal processing), and apply them to wireless communication and networking, biomedical engineering, with a current focus on large-scale processing He has held a postdoctoral research fellow position at the French National Space Agency (CNES), and visiting positions at Télécom ParisTech, Vanderbilt University, CentraleSupélec, the Université Paris Sud, the Université Paris 13, and the University of Illinois He has served the RadioElectronics Association of Vietnam (REV) and the IEEE on a number of positions, including member of REV Standing Committee, senior member of the IEEE, TPC co-chair of REV-IEEE annual International Conference on Advanced Technologies for Communications (ATC), managing editor of REV Journal on Electronics and Communications 67 68 69 70 71 72 73 74 75 76 77 78 79 80 15 81 16 82 17 83 18 84 19 85 20 86 21 87 22 88 23 89 24 90 25 91 26 92 27 93 28 94 29 95 30 96 31 97 32 98 33 99 34 100 35 101 36 102 37 103 38 104 39 105 40 106 41 107 42 108 43 109 44 110 45 111 46 112 47 113 48 114 49 115 50 116 51 117 52 118 53 119 54 120 55 121 56 122 57 123 58 124 59 125 60 126 61 127 62 128 63 129 64 130 65 131 66 132 ... update each column of the subspace at a time instant using a cyclic strategy Thus, our algorithm is called Second-Order Optimization based Adaptive PARAFAC (SOAP) Given the estimates of A(t − 1), B(t... Table The initialization of NSOAP is similar to that of SOAP, except that a batch non-negative PARAFAC algorithm is used instead of the standard PARAFAC Third, the main reason that helps SOAP achieve... second-order optimization based adaptive PARAFAC 126 127 (2) r =1 where N is a noise tensor Thus, given a data tensor X , PARAFAC tries to achieve R-rank best least squares approximation Equation