Một số hàm khoảng cách trong lý thuyết thông tin lượng tử và các vấn đề liên quan Một số hàm khoảng cách trong lý thuyết thông tin lượng tử và các vấn đề liên quan Một số hàm khoảng cách trong lý thuyết thông tin lượng tử và các vấn đề liên quan
Matrix theory fundamentals
Let N represent the set of all natural numbers, and for each n in N, M_n denotes the set of all n×n complex matrices H_n refers to the set of n×n Hermitian matrices, while H^+_n indicates the set of n×n positive semi-definite matrices P_n is the cone of positive definite matrices within M_n, and D_n comprises density matrices, which are positive definite matrices with a trace equal to one Additionally, I and O symbolize the identity and zero elements of M_n, respectively This thesis focuses on matrix-related problems as operators in finite-dimensional Hilbert spaces H, with indications provided for cases involving infinite dimensions.
The inner product of two vectors \( x = (x_j) \) and \( y = (y_j) \) in \( C^n \) is defined as \( \langle x, y \rangle \equiv \sum_j x_j \overline{y_j} \) For a matrix \( A \) in \( M_n \), the conjugate transpose, or adjoint, \( A^* \) is the complex conjugate of the transpose \( A^T \) It follows that \( \langle Ax, y \rangle = \langle x, A^* y \rangle \) A matrix \( A = (a_{ij})_{i,j=1}^n \) in \( M_n \) is defined as follows:
(ii) invertible if there exists an matrixB of ordern×nsuch thatAB =In In this situation
Ahas a unique inverse matrixA − 1 ∈Mnsuch thatA − 1 A − 1 =I n
(vi) positive semi-definite if〈Ax, x〉 ≥0for allx∈C n
(vii) positive definite if〈Ax, x〉>0for allx∈C n \{0}.
Definition 1.1.2(L¨owner’s Order, [86]) LetAandB be two Hermitian matrices of same order n We say thatA≥B if and only ifA−Bis a positive semi-definite matrix.
Definition 1.1.3 A complex numberλis said to be an eigenvalue of a matrix Acorresponding to its non-zero eigenvectorxif
The multiset of the eigenvalues ofAis denoted bySp(A)and called the spectrum ofA.
There are several conditions that characterize positive matrices Some of them are listed in theorem below [10].
A matrix A is considered positive semi-definite if it is Hermitian and has nonnegative eigenvalues In contrast, A is classified as positive definite if it is Hermitian and all its eigenvalues are positive.
A matrix A is classified as positive semi-definite if it meets two criteria: it must be Hermitian, and all of its principal minors must be nonnegative In contrast, A is considered positive definite when it is Hermitian and all of its principal minors are strictly positive.
(iii) A is positive semi-definite if and only if A = B ∗ B for some matrix B Moreover, A is positive definite if and only ifB is nonsingular.
A matrix A is considered positive semi-definite if it can be expressed as A = T* T, where T is an upper triangular matrix with nonnegative diagonal entries In cases where A is positive definite, the matrix T is unique, a concept known as the Cholesky decomposition of A Additionally, A is positive definite if and only if T is nonsingular.
A matrix A is considered positive semi-definite if there exists a unique positive matrix B such that A equals B squared, denoted as B = A^(1/2), which is referred to as the positive square root of A Furthermore, A is classified as positive definite if and only if the matrix B is also positive definite.
(vi) Ais positive semi-definite if and only if there existx1, , xninHsuch that aij =〈xi, xj〉.
Ais positive definite if and only if the vectorsx j ,1≤j ≤n, are linearly independent. Let A ∈ M n , we denote the eigenvalues of A by λj(A), for j = 1,2, , n For a matrix
A ∈M n , the notationλ(A) ≡(λ1(A),λ2(A), ,λn(A))means thatλ1(A)≥ λ2(A)≥ ≥ λn(A) Theabsolute value of matrixA∈M n is the square root of matrixA ∗ Aand denoted by
We call the eigenvalues of|A|by thesingular valueofAand denote assj(A), forj = 1,2, , n. For a matrix A ∈ Mn, the notation s(A) ≡ (s 1 (A), s 2 (A), , s n (A)) means that s 1 (A) ≥ s2(A)≥ .≥sn(A).
There are some basic properties of the spectrum of a matrix.
(ii) IfAis a Hermitian matrix thenSp(A)⊂R.
(iii) A is a positive semi-definite (respectively positive definite) if and only ifAis a Hermitian matrix andSp(A)⊂R≥ 0(respectively Sp(A)⊂R + ).
Thetraceof a matrixA= (aij)∈M n , denoted byTr(A), is the sum of all diagonal entries, or, we often use the sum of all eigenvaluesλi(A)ofA, i.e.,
Related to the trace of the matrix, we recall the Araki-Lieb-Thirring trace inequality [18] used consistently throughout the thesis.
Theorem 1.1.1 LetAandB be two positive semi-definite matrices, and letq >0, we have
ThedeterminantofAis denoted and defined by det(A) = ρ∈S n
n j=1 λj. whereS n is the set of all permutationsρof the setS={1,2, , n}.
Proposition 1.1.3 LetA, B ∈ Hnwithλ(A) = (λ1,λ2, ,λn)andλ(B) = (à1, à2, , àn). Then
(i) IfA >0andB >0, thenA≥Bif and only ifB − 1 ≥A − 1
(ii) IfA≥B, thenX ∗ AX ≥X ∗ BX for everyX ∈M n
(iii) IfA≥B, thenλ j ≥à j for eachj = 1,2, , n.
A functionã:M n →Ris said to be a matrix norm if for allA, B ∈M n and∀α ∈Cwe have:
(ii) A= 0if and only ifA= 0.
In addition, a matrix norm is said to be sub-multiplicative matrix norm if
A matrix norm is said to be a unitarily invariant norm if for every A ∈ M n , we have
U AV=Afor allU, V ∈U n unitary matrices It is denoted as| ã |.
These are some important norms overM n
The operator norm ofA, defined by
The Ky Fank-norm is the sum of all singular values, i.e.,
The Schattenp-norm is defined as
Whenp= 2, we have the Frobenius norm or sometimes called the Hilbert-Schmidt norm :
Letx = (x1, x2, , xn)andy = (y1, y2, , yn)be in R n Let x ↓ = x [1] , x [2] , , x [n] denote a rearrangement of the components ofxsuch thatx[1] x[2] .x[n] We say thatx is majorized byy, denoted byx≺y, if
We say thatxis weakly majorized byyif
Ifx >0(i.e.,xi > 0fori= 1, , n) andy > 0, we say thatxislog-majorized byy, denoted byx≺log y, if
In other words,x≺log yif and only iflogx≺logy.
In linear algebra, a matrix \( P \in M_n \) is classified as a projection if it satisfies the condition \( P^2 = P \) When a projection is also Hermitian, it is referred to as a Hermitian projection Furthermore, an orthogonal projection is defined as one where the range of \( P \) is orthogonal to its null space The relationship between projections follows a straightforward partial ordering; specifically, if \( P \) and \( Q \) are projections, then \( P \leq Q \) indicates that the range of \( P \) is contained within the range of \( Q \).
In an algebraic context, the relationship between projections is represented as P Q = P Within the space Mn, the identity matrix I serves as the largest projection, while the smallest is represented by 0, establishing the inequality 0 ≤ P ≤ I for any projection P in Mn When considering projections P and Q on the same Hilbert space, it is important to note that among the projections smaller than a certain threshold, specific properties and relationships can be derived.
P andQthere is a maximal projection, denoted byP ∧Q, which is the orthogonal projection onto the intersection of the ranges ofP andQ.
Theorem 1.1.2 [45] Assume thatP andQare orthogonal projections Then
Matrix function and matrix mean
Now let us recall the spectral theorem which is one of the most important tools in functional analysis and matrix theory.
Theorem 1.2.1(Spectral decomposition, [9]) Letλ1 >λ2 >λkbe eigenvalues of a Hermi- tian matrix A Then
Ak j=1 λjPj, wherePj is the orthogonal projection onto the subspace spanned by the eigenvectors associated to the eigenvalueλ j
For a real-valued functionf defined on some intervalK ⊂R, and for a self-adjoint matrix
A ∈ M n with spectrum in K, the matrixf(A)is defined by means of the functional calculus, i.e.,
Or, ifA =Udiag (λ 1 , ,λ n )U ∗ is a spectral decomposition ofA(whereU is some unitary), then f(A) :=Udiag (f(λ 1 ),ã ã ã , f(λ n ))U ∗
In this section, we will explore matrix and operator functions, highlighting the foundational work of L¨owner, who first studied operator monotone functions in his influential 1930 papers Simultaneously, Kraus examined the concept of operator convex functions, contributing to the understanding of these mathematical principles.
Definition 1.2.1([63]) A continuous functionf defined on an intervalK(K ⊂R)is said to be operator monotone of ordernonKif for two Hermitian matricesAandB inM n with spectras inK, one has
Iff is operator monotone of any orders thenf is calledoperator monotone.
Theorem 1.2.2(L¨owner-Heinz’s Inequality, [86]) The functionf(t) = t r is operator monotone on[0,∞)for 0 ≤ r ≤ 1 More specifically, for two positive semi-definite matrices such that
Definition 1.2.2([55]) A continuous functionf defined on an intervalK(K ⊂R)is said to be operator convex of ordern onK if for any Hermitian matricesAandB inM n with spectra in
K, and for all real numbers0≤λ ≤1, f(λA+ (1−λ)B)≤λf(A) + (1−λ)f(B).
Iff is operator convex of any ordernthenfis calledoperator convex If−fis operator convex then we callf is operator concave.
Theorem 1.2.3([10]) Functionf(t) =t r in[0,∞)is operator convex whenr ∈[−1,0]∪[1,2]. More specifically, for any positive semi-definite matricesA, Band for anyλ ∈[0,1],
Another important example is the function f(t) = logt, which is operator monotone on
(0,∞)and the functiong(t) =tlogtis operator convex The relations between operator mono- tone and operator convex via the theorem below.
Theorem 1.2.4 ([9]) Let f be a (continuous) real function on the interval [0,α) Then the following two conditions are equivalent:
(ii) The functiong(t) = f(t) t is operator monotone on(0,α).
Definition 1.2.3([10]) Letf(A, B)be a real valued function of two matrix variables Then,f is calledjointly concave, if for all0≤α≤1, f(αA1 + (1−α)A2,αB1+ (1−α)B2)≥αf(A1, B1) + (1−α)f(A2, B2) for allA1, A2, B1, B2.If−f is jointly concave, we sayf isjointly convex.
This article provides a brief overview of fundamental concepts in Fréchet differential calculus, focusing on matrix analysis We consider real Banach spaces X and Y, along with the space of bounded linear operators L(X, Y) that map from X to Y A continuous function f defined on an open subset U of X is differentiable at a point u in U if there exists a bounded linear operator T in L(X, Y) such that the limit as v approaches 0 is satisfied.
If a derivative operator T exists, it is unique When a function f is differentiable at a point u, we denote its derivative as Df(u) or ∂f(u), often referred to as the Fréchet derivative A function f is considered differentiable on a set U if it is differentiable at every point within that set Furthermore, if f is differentiable at point u, then it holds true for every vector v in the space X.
This is also called the directional derivative off atuin the directionv.
Iff1, f2are two differentiable maps, thenf1 +f2is differentiable and
The composite of two differentiable mapsf andg is differentiable and we have the chain rule
One important rule of differentiation for real functions is the product rule: (f g) ′ = f ′ g+gf ′
In the context of Banach spaces, the product of two maps, f and g, is not inherently defined unless their range forms an algebra However, a general product rule can be established for differentiable maps Specifically, if f and g are differentiable functions from a set X into Banach spaces Y1 and Y2, respectively, and B is a continuous bilinear map, then a coherent framework for their interaction can be developed.
Y1×Y2 intoZ Letϕ be the map fromX toZ defined asϕ(x) =B(f(x), g(x)) Then for all u, vinX
The product rule for differentiation is a fundamental principle in calculus, particularly relevant when dealing with bounded operators in a Banach space Y, where Y1 = Y2 = L(Y) In this context, when considering the product of two operators, denoted as ϕ(x) = f(x)g(x), the product rule provides a systematic approach to differentiate this combination effectively.
Higher-order Fréchet derivatives are represented as multilinear maps For a differentiable function f mapping from space X to space Y, the derivative Df(u) at each point u is an element of the Banach space L(X, Y) Consequently, we can define a mapping Df from X to L(X, Y), expressed as Df : u → Df(u).
If a map is differentiable at a point \( u \), it is considered twice differentiable at that point The derivative of the map \( Df \) at \( u \) is referred to as the second derivative, denoted as \( D^2 f(u) \) This second derivative is an element of the space \( L(X, L(X, Y)) \) The space \( L^2(X, Y) \) consists of bounded bilinear maps from \( X \times X \) into \( Y \), where the maps \( f \) are linear in both variables and there exists a constant \( c \) such that certain conditions are met.
f(x1, x2) ≤cx1 x2 for all x1, x2 ∈ X The infimum of all such c is called f This is a norm on the space
L 2 (X, Y), and the space is a Banach space with this norm Ifϕis an element ofL(X,L(X, Y)), let ˜ ϕ(x1, x2) = [ϕ(x1)] (x2) forx1, x2 ∈X.
The function ϕ˜ belongs to the space L²(X, Y), and it is evident that the mapping from ϕ to ϕ˜ represents an isometric isomorphism Consequently, the second derivative of a twice differentiable function f, which maps from X to Y, can be interpreted as a bilinear map from the Cartesian product X × X to Y Additionally, it is clear that this bilinear map exhibits symmetry in its two variables.
D 2 f(u) (v1, v2) = D 2 f(u) (v2, v1) for allu, v1, v2 Derivatives of higher order can be defined by repeating the above procedure. Thepth derivative of a mapffromXtoY can be identified with ap-linear map from the space
XìXìã ã ãìX(pcopies) intoY A convenient method of calculating thepth derivative of f is provided by the formula
For the convenience of readers, let us provide some examples for the derivatives of matrices. Example 1.2.1 In these examplesX =Y =L(H).
(B1, B2) =B1B2+B2B1. (ii) Letf(A) =A − 1 for each invertibleA Then
(iii) Letf(A) =A − 2 for each invertibleA Then
In connections with electrical engineering, Anderson and Duffin [3] defined theparallel sum of two positive definite matricesAandBby
The harmonic meanis2(A : B)which is the dual of thearithmetic meanA∇B = A+B
2 In this period time, Pusz and Woronowicz [69] introduced thegeometric meanas
A 1/2 They also proved that the geometric mean is the unique positive solution of the Riccati equation
In 2005, Moakher [65] conducted a study, and then in 2006, Bhatia and Holbrook [14] investi- gated the structure of the Riemannian manifoldH + n They showed that the curve γ(t) = AtB =A 1/2
A 1/2 (t ∈[0,1]) is the unique geodesic joining A and B, and called t-geometric mean or weighted geometric mean The weighted harmonic and the weighted arithmetic means are defined by
The well-known inequality related to these quantities is the harmonic, geometric, and arithmetic means inequality [47, 60] , that is,
The Kubo-Ando means theory outlines three key concepts relevant to the analysis of bounded and continuous functions Specifically, for values where x > 0 and t ≥ 0, the function φ(x, t) = x(1 + t) / (x + t) is established as being bounded and continuous over the extended half-line [0, ∞] Additionally, the Lowner theory on operator-monotone functions provides a framework for understanding the mapping defined by m → f, where f(x) plays a crucial role in this theoretical context.
[0, ∞ ] φ(x, t)dm(t)forx >0, establishes an affine isomorphism from the class of positive Radon measures on[0,∞]onto the class of operator-monotone functions In the representation abvove,f(0) = inf x f(x) =m({0}) andinf x f(x)/x=m({∞}).
Theorem 1.2.5 [Kubo-Ando] For each operator connection σ, there exists a unique operator monotone functionf :R + →R + , satisfying f(t)In=Inσ(tIn), t >0, and forA, B >0the formula
AσB =A 1 2 f(A − 1 2 BA − 1 2 )A 1 2 holds, with the right hand side defined via functional calculus, and extended to A, B ≥ 0 as follows
We callf the representing function ofσ.
The next theorem follows from the integral representation of matrix monotone functions and from the previous theorem.
Theorem 1.2.6 The map,m→σ, defined by
1 +t t {(tA) :B}dm(t) where a=m({0})andb=m({∞}), establishes an affine isomorphism from the class of positive Radon measures on[0,∞]onto the class of connections.
IfP andQare two projections, then the explicit formulation forPσQis simpler.
Theorem 1.2.7 Ifσis a mean, then for every pair of projectionsP andQ
An immediate consequence of the above theorem is the following relation for projectionsP andQ
Let \( f \) denote the representing function of \( \sigma \) The function \( xf(x - 1) \) serves as the representing function for the transpose \( \sigma' \) Therefore, \( \sigma \) is symmetric if and only if \( f(x) = xf(x - 1) \) The subsequent theorem provides the representation for a symmetric connection.
Theorem 1.2.8 The map,n→σ, defined by
1 +t 2t {(tA) :B+A : (tB)}dn(t) wherec=n({0}), establishes an affine isomorphism from the class of positive Radon measures on the unit interval[0,1]onto the class of symmetric connections.
In recent years, many researchers have paid attention to different distance functions on the setP n of positive definite matrices Along with the traditional Riemannian metricdR(A, B) n
The eigenvalues of the matrix A^(-1/2)BA^(-1/2) play a crucial role in understanding various important functions, including the Bures-Wasserstein distance This distance, denoted as db(A, B), is derived from the principles of optimal transport theory, highlighting its significance in the study of quantum states and their relationships.
, and the Hellinger metric or Bhattacharya metric [11] in quantum information : dh(A, B) =
Notice that the metric dh is the same as the Euclidean distance between A 1/2 and B 1/2 , i.e.,
Recently, Minh [43] introduced the Alpha Procrustes distance as follows: Forα>0and for two positive semi-definite matricesAandB, db,α = 1 αdb(A 2α , B 2α ).
The Alpha Procrustes distances represent Riemannian distances linked to a set of Riemannian metrics on the manifold of positive definite matrices, including both Log-Euclidean and Wasserstein Riemannian metrics Since these distances are derived from the Bures-Wasserstein distance, they are also referred to as weighted Bures-Wasserstein distances In this chapter, we introduce the weighted Hellinger metric for two positive semi-definite matrices, defined as dh,α(A, B) = 1 αdh(A 2α , B 2α ), and explore its properties within this context.
The results of this chapter are taken from [32].
Weighted Hellinger distance
Definition 2.1.1 For two positive semi-definite matricesAandB and forα>0, the weighted Hellinger distance betweenAandBis defined as dh,α(A, B) = 1 αdh(A 2α , B 2α ) = 1 α(Tr(A 2α +B 2α )−2 Tr(A α B α )) 1 2 (2.1.1)
The metric dh,α(A, B) serves as an interpolating metric bridging the Log-Euclidean and Hellinger metrics As α approaches 0, the weighted Hellinger distance converges to the Log-Euclidean distance Additionally, we establish the equivalence between the weighted Bures-Wasserstein distance and the weighted Hellinger distance, as demonstrated in Proposition 2.1.2.
Proposition 2.1.1 For two positive semi-definite matricesAandB, αlim→ 0d 2 h,α (A, B) = ||log(A)−log(B)|| 2 F.Proof We rewrite the expression ofd h,α (A, B)as d 2 h,α (A, B) = 1 α 2 d 2 h (A 2α B 2α )
A α B α −A α −B α +I =α 2 logAãlogB+ã ã ã Consequently, d 2 h,α (A, B) = ||A α −I|| 2 F α 2 + ||B α −I|| 2 F α 2 −2 Tr(logA.logB)
F. Tendingαto zero, we obtain d 2 h,α (A, B) =||logA|| 2 F +||logB|| 2 B−2 logA,logB
It is interesting to note that the weighted Bures-Wasserstein and weighted Hellinger distances are equivalent.
Proposition 2.1.2 For two positive semi-definite matricesAandB, db,α(A, B)≤dh,α(A, B)≤√
Proof According the Araki-Lieb-Thirring inequality [43] , we have
2 we obtain the following Tr(A α B 2α A α ) 1/2 ≥Tr(A α B α ).
In other words, db,α(A, B)≤dh,α(A, B).
Withρ,σ ∈D n , we have d 2 h (ρ,σ) = 2−2 Tr(ρ 1/2 σ 1/2 )≤4−4 Tr((ρ 1/2 σρ 1/2 ) 1/2 ) = 2d 2 b (ρ,σ), or,
In the above inequality replaceρwith A 2α
The above inequality is equivalent to
In-betweenness property
In 2016, Audenaert introduced the in-betweenness property of matrix means, stating that for any pair of positive definite operators A and B, a matrix mean σ satisfies the condition d(A, AσB) ≤ d(A, B) with respect to the metric d.
In their research, the authors in [34] explored the in-sphere property of matrix means, contributing to the understanding of geometric properties in this area Dinh, Franco, and Dumitru further examined the matrix power mean, defined as \( p(t; A, B) := (tA^p + (1−t)B^p)^{1/p} \), in multiple studies [26, 28], focusing on its relationship with various distance functions Additionally, they investigated the matrix power mean from the perspective of Kubo-Ando [54], expanding the theoretical framework surrounding this concept.
This study examines the in-betweenness properties of matrix power means concerning weighted Bures-Wasserstein and weighted Hellinger distances We demonstrate that the matrix power mean exhibits the in-betweenness property with respect to the distances dh,α (Theorem 2.2.1) and db,α (Theorem 2.2.2) by leveraging the operator convexity and concavity of power functions Additionally, we establish that among symmetric means, the arithmetic mean uniquely satisfies the in-betweenness property in the context of weighted Bures-Wasserstein and weighted Hellinger distances.
Now we are ready to show that the matrix power meansà p (t;A, B)satisfy the in-betweenness property indh,α anddb,α.
Theorem 2.2.1 Let0< p/2≤α≤pand0≤t≤1 Then dh,α(A, àp(t;A, B))≤dh,α(A, B), for allA, B ∈H + n.
Proof We have d 2 h,α (A, àp(t;A, B)) = 1 α 2 Tr
. Therefore, the above result follows if
By the operator convexity of the mapx→x 2α/p , when p
≤tA 2α + (1−t)B 2α Thus, the desired result follows if
By the operator concavity of the mapx→x α/p , when p
≥tA α + (1−t)B α Therefore, the distance monotonicity follows if
≥0, which is from AM-GM inequality.
Theorem 2.2.2 Let0< p/2≤α≤pand1/2≤t≤1 Then, db,α(A, àp(t;A, B))≤db,α(A, B), for allA, B ∈H + n.
Proof Firstly, we show that for any positive semi-definite matricesAandB, forp/2≤ α ≤p and1/2≤t ≤1, db,α(A, àp(t;A, B))≤dh,α(A, àp(t;A, B))≤√
By the Araki-Lieb-Thirring inequality, we have
By the operator convexity of the functionx→ x 2α/p and the operator concavity of the function x→x α/p , we obtain d 2 b,α (A, àp(t;A, B)) ≤ 1 α 2 Tr
From here, applying the square root function to both sides witht∈[ 1 2 ,1], we have db,α(A, àp(t;A, B))≤√
The Kubo-Ando power mean \( P_p(t, A, B) \) has been shown to satisfy the in-betweenness property, as established in [28, Theorem 2], due to the concavity of the function \( g(t) = \text{Tr}(A^{1/2} P_p(t; A, B) A^{1/2}) \) It is important to note that \( P_t(A, B) \neq P_t(B, A) \), indicating that \( P_t \) is not symmetric However, for symmetric means, a related result can be derived, with the proof adapted from [22].
Theorem 2.2.3 states that if σ is a symmetric mean and satisfies either of the inequalities dh,α(A, AσB) ≤ dh,α(A, B) or db,α(A, AσB) ≤ db,α(A, B) for any pair of positive definite matrices A and B, then σ can be identified as the arithmetic mean.
Proof By Theorem 1.2.6 and 1.2.8, the symmetric operator meanσ is represented as follows:
(0, ∞ ) λ+ 1 λ {(λA) :B+A: (λB)}dà(λ), (2.2.4) whereA, B ≥0,λ ≥0andàis a positive measure on(0,∞)withδ+à((0,∞)) = 1, and the parallel sumA :B is given byA:B = (A − 1 +B − 1 ) − 1 , whereAandB are invertible.
For two orthogonal projections P, Qacting on a Hilbert space H, let us denote by P ∧Q their infimum which is the orthogonal projection on the subspaceP(H)∩Q(H) IfP ∧Q= 0, then by Theorem 1.2.7,
Let us consider the following orthogonal projections
cos 2 θ cosθsinθ cosθsinθ sin 2 θ
Notice thatQθ →P asθ→0andQθ∧P = 0.From the projections above, it is easy to see that the inequality (2.2.2) becomes dh,α(P,δ(P +Qθ)/2)≤dh,α(P, Qθ).
For all θ > 0, taking the limit as θ approaches 0+ leads to the conclusion that dh,α(P,δP) is less than or equal to dh,α(P, P), with equality holding only when δ equals 1 This indicates that à equals 0 and σ represents the arithmetic mean A similar proof can be applied to the statement regarding dh,α.
In this chapter, we present the weighted Hellinger distance and explore its properties, building on Minh’s method for the weighted Bures distance The weighted Bures distance serves as an extended version of the Bures distance, incorporating a single parameter Additionally, in the following chapter, we will introduce the α-z-Bures Wasserstein divergence, which expands the Bures distance with two parameters, further advancing the concept of quantum divergence.
In the Riemannian manifold of positive definite matrices, the weighted geometric mean \( A^{\#}_tB = A^{1/2}(A^{-1/2}BA^{-1/2})^t A^{1/2} \) uniquely connects matrices \( A \) and \( B \) for \( A, B \in P_n \) Specifically, when \( t = \frac{1}{2} \), \( A^{\#}_{1/2}B \) represents the geometric mean of \( A \) and \( B \), serving as a matrix extension of the geometric mean \( \sqrt{ab} \) for positive numbers \( a \) and \( b \) Furthermore, in 2004, researchers Moakher and later Bhatia and Holbrook explored the least squares problem involving positive definite matrices \( A_1, A_2, \ldots, A_m \).
The equation \( m i=1 \delta^2(X, A_i) \) defines the Riemannian distance \( \delta^2(A, B) = ||\log(A^{-1}B)||^2 \) between matrices \( A \) and \( B \) This formulation leads to the Karcher mean of matrices \( A_1, A_2, \ldots, A_m \), which is recognized in literature by various names including Fréchet mean, Cartan mean, and Riemannian center of mass Notably, the solution to this equation is the unique positive definite solution to the Karcher equation.
In [60], Lim and Palfia showed that the solution of (3.0.2) is nothing but the limit of the solution of the following matrix equation ast →0,
Franco and Dumitru recently introduced R´enyi power means of matrices, focusing on positive definite matrices Ai and Bi Their work builds on the approach developed by Lim and P´alfia, demonstrating that for parameters 0 < αi ≤ zi ≤ 1, a specific equation holds true.
The equation \( X = \sum_{i=1}^{m} \omega_i P_{\alpha_i, z_i}(X, A_i) \) has a unique positive definite solution, where \( \omega_i \) represents a probability vector, satisfying \( \omega_i \geq 0 \) and \( \sum_{i=1}^{m} \omega_i = 1 \) The function \( P_{\alpha, z}(A, B) \) is defined as \( (B_1 - 2z \alpha A^{\alpha} z B_1^{2z - \alpha}) \), which is a matrix function related to the α-z-Rényi relative entropy, as introduced by Audenaert and Datta in 2015 Additionally, substituting \( P_{\alpha_i, z_i}(X, A_i) \) with the weighted geometric mean \( X^{\tilde{t}}A_i \) yields the solution to the associated matrix equation as the weighted power mean.
Changing the distance function can lead to different solutions, if they exist In various applications, distance-like functions that measure the separation between two data points are often of interest These functions may lack symmetry and do not always adhere to the triangle inequality Divergences serve as examples of such distance-like functions.
The Bures-Wasserstein metric, as explored by Bhatia and collaborators, is a significant example of divergences, defined as db(A, B) = (Tr((A+B)/2)−Tr(A 1/2 BA 1/2 ) 1/2 ) 1/2 Here, Tr((A 1/2 BA 1/2 ) 1/2 ) represents the quantum fidelity between two positive definite matrices, A and B The authors demonstrated that d²b qualifies as a quantum divergence and addressed the least squares problem concerning this metric In a subsequent study, they introduced the weighted Bures-Wasserstein distance, expressed as db,t(A, B) = (Tr((1−t)A+tB)−Tr(Ft(A, B)) 1/2), where Ft(A, B) = Tr(A 1 2t − t BA 1 2t − t ) and t denotes the sandwiched quasi-relative entropy They also resolved the least squares problem related to this divergence Notably, (A 1/2 BA 1/2 ) 1/2 and (A 1 2t − t BA 1 2t − t ) t serve as matrix generalizations of the geometric mean and the weighted geometric mean of positive numbers, respectively.
In this chapter, we explore the properties of the α-z-Bures Wasserstein divergence, defined for positive definite matrices A and B as Φ(A, B) = Tr((1−α)A + αB) − Tr(Qα,z(A, B)), where Qα,z(A, B) is equal to Pα,z(B, A).
Qα,z(A, B)is also a parameterized matrix version of the weighted geometric meana 1 − α b α
In the following section, we demonstrate that Φ(A, B) qualifies as a quantum divergence By applying the Brouwer fixed point theorem, we establish that the average of m positive semi-definite matrices A1, A2, , Am is the singular positive definite solution to a specific matrix equation.
m i=1 wiQα,z(X, Ai) = X, which is called theα-z-weighted right mean We also establish some properties for this quantity.
The α-z-Bures Wasserstein divergence and the least squares problem
The first main result of this section is that we show theα-z-Bures Wasserstein divergence, defined in (3.0.5), is a quantum divergence.
Recall that for0< p 0, we have x p = sinpπ π
The next lemma follows from the Spectral Theorem and the above representation.
Lemma 3.1.1([9]) Let0< p 0, and0< p 0) is a divergence.
Proof Forz =α, the theorem was obtained in [14] The case 0≤α ≤ z = 1was proved in a recent paper by Nguyen and Le in [56] We need to consider the case00, focusing on optimizing the solution while adhering to the constraints of positive definiteness.
This problem was solved by Bhatia, Lim and Jain in [14] whenz =α.
F(X) m i=1 ωiΦ(Ai, X) attains minimum atX0, whereX0is the unique positive definite solution of the following matrix equation
, whereC m i=1 wizA i 1−α 2z ϕ z − 1 A i 1−α 2z Therefore, the only critical point ofF(X)is the solution of the equation αI ∞
Now, let us choose an orthonormal basis in which the matrixXis diagonal, i.e.,X=diag(x1, x2,ã ã ã , xn) and letC= (cij)be the representation ofCin this basis From the equation (3.1.14) we have αδij ∞
From here, it implies thatCis diagonal, and, α c ii ∞
Cm i=1 wizA i 1−α 2z ϕ z − 1 A i 1−α 2z =zX 1 − α z Multiplying both sides of the last identity from the left and from the right byX 2z α , we get
The equation (3.1.13) has a unique solution due to the strict convexity of the function F(X) This property ensures that if a solution exists, it must be unique To complete the proof, it is sufficient to demonstrate that the function F(X) has a fixed point We consider positive numbers a and b such that aI ≤ Ai ≤ bI for all 1 ≤ i ≤ m.
By the operator monotone of the mapx→x z ,whenz ∈[0,1]we have
In other words,F(X)is a self-map on the compact and convexK, where
According to Brouwer’s fixed point theorem,F(X)has a fixed point.
Jeong et al in [49] introduced the concept of the α-z-weighted right mean, denoted as R α,z (ω;A), highlighting its beneficial properties To aid reader comprehension, we will revisit some relevant notations.
Let ∆m represent the collection of all positive probability vectors in R^m that are convexly spanned by unit coordinate vectors Consider A = (A1, , Am) as an element of P^m_n, ω = (w1, , wm) as a member of ∆m, and σ as a permutation of m letters Additionally, let p be a real number in R and M denote an invertible matrix in GL_n, the set of n×n invertible matrices We define ωσ as the vector (wσ1, , wσm).
For the completeness, we recall some properties that were obtained in [49].
Proposition 3.1.1 The weighted right meanR α,z satisfies the following:
(ii) Rα,z(ω, cA) =cRα,z(ω,A)for anyc >0.
(iii) R α,z (ωπ,A π ) = R α,z (ω,A)for any permutationπon{1, , m}.
(v) R α,z (ω, UAU ∗ ) =UR α,z (ω,A)U ∗ for any unitary matrixU.
(detAj) w j , and equality holds if and only ifA1 =ã ã ã=Am. (vii) X =Rα,z(ω, A1, , A m−1 , X)implies thatX =Rα,z(ˆω, A1, , A m−1 ).
j=1 wj, wk+1, , wm;A1, Ak+1, , Am
IfR α,z (ω,A)≥I, then the reverse inequality holds.
IfRα,z(ω,A)≤I, then the reverse inequality holds.
The matrix norm ||| ã ||| on M n is said to be unitarily invariant if |||U AV||| = |||A||| for any matrixA ∈ Mn and unitary matrices U, V In [49, Remark 3.6] the authors showed the following
We establish an upper bound as follows.
Proof LetX =Rα,z(ω;A) By the triangle inequality, the sub-multiplicativity for the operator norm, and the fact that A t = A t for any positive definite A and t ≥ 0, from equation (3.1.13) we get
We derive a version of the AM-GM inequality for theα-z-weighted right mean However, we need the following lemma ([61]).
Lemma 3.1.3 LetT > 0 The following inequalites hold:
Theorem 3.1.3 Let 0 ≤ α ≤ z ≤ 1,α ∕= 1, z ∕= 0 Let A = (A1, , Am)be a m-tuple of positive definite matrices, andω = (w 1 , , w m )a probability vector We have
The second inequality holds when(1 +z−α)I−z
Proof Recall thatR α,z (ω,A)is the unique solution of the following equation
Multiplying both sides of the above inequality from the left and from the right byX − 2z α we get
By the AM-GM inequality,
Letϕ(X) =X 1+t −(1−z)X t ,wheret=−α z ∈[−1,0].By Lemma 3.1.3, we have ϕ(X) = X − 1 2 X 2+t X − 1 2 −(1−z)X − 1 2 X 1+t X − 1 2
Now let us prove the first inequality in Theorem By the harmonic mean-geometric mean inequality we have
Since the mapx→x −1 is convex, from the last inequality we get
LetΨ(X) =X p − 1 −(1−z)X p ,wherep= α z ∈[0,1] Using Lemma 3.1.3 again we obtain Ψ(X) ≥ (p−1)X+ (2−p)I−(1−z) pX + (1−p)I
The matrix power meanPt(ω,A)fort∈(0,1]was introduced by Lim and Palfia [60] as the unique solutionX ∈P n of the following equation
Note that fort∈[−1,0)we defineP t (ω,A) = P −t (ω,A − 1 ) − 1 Especially,
=H(ω,A) are the weighted arithmetic and harmonic means, respectively.
Letω = (w1, w2, , wm)be a probability vector, form≥2, a weightedm-meanGmdefined onP m n is an idempotent map Gm(ω,ã) : P m n −→ Pn, that is, Gm(ω, X, X, , X) = X, for all
X ∈ P n LetAm := Am(ω,A) m j=1 wjAj andHm := Hm(ω,A) = m j=1 wjA −1 j − 1 be the arithmetic mean and the harmonic mean, respectively In [48], Hwang and Kim proved that for anyGmbetweenAmandHm, i.e.,
H m ≤G m ≤A m , (3.1.18) the functionGm ω :=Gm(ω,ã) :P m →Pis differentiable atI= (I, , I)with
Notice that theα-z-weighted right mean does not satisfy the inequality (3.1.18) However we have the following result.
Theorem 3.1.4 Let ω = (w1, , wm) be a probability vector and let R ω α,z := R α,z (ω,ã) :
P m n −→P n ThenR ω α,z is differentiable atI= (I, , I), and
In the proof, we consider vectors X1, X2, , Xm in Hn If all vectors are zero (X1 = X2 = = Xm = 0), the outcome is evident Assuming, without loss of generality, that at least one vector, Xi, is non-zero, we define τ as the maximum spectral radius among the vectors, expressed as τ = max{spr(Xi) : i = 1, 2, , m} > 0, where spr(X) denotes the spectral radius of X We then examine the function f(t) defined as f(t) = 1 + z^(-α).
> 0, whereλ(X)is some eigenvalue ofX, we have,I +tXj ∈ P n , for allt ∈(− 1 τ , τ 1 ) Therefore, f(t)andg(t)are well-defined on(− τ 1 , 1 τ ), andf(0) =g(0) =I.We have d dt(I+tXj) − 1 − z α =−(I+tXj) − 1 − z α d dt(I+tXj) 1 − z α (I+tXj) − 1 − z α
Att = 0, we have d dt(I+tXj) − 1 − z α
Similarly, d dtg(t) = −h(t) − 1 d dth(t)h(t) − 1 , whereh(t) = g(t) − 1 Sinceh(0) =Iand d dth(t)
By Theorem 3.1.3, we have f(t)−I ≤R ω α,z(ω, I+tX1, , I+tXm)−I ≤g(t)−I.
SinceR ω α,z(ω, I, , I) = I, for any sufficiently smallt >0, we have f(t)−f(0) t ≤ R ω α,z(ω, I+tX1, , I +tXm)−R ω α,z(ω, I, , I) t ≤ g(t)−g(0) t
Lett →0 + , from the above inequality, we obtain t→0lim +
Similarly, fort 0,γj : (−ε,ε) −→ P n are differentiable curves with γj(0) = I, for all j 1,2, , m.
Proof For eachj = 1,2, , m, sinceγjis a continuous map withγj(0) =I, there existsδj >0 such that γj(s) ∈ Br(I)
γj(s)−I≤||γj(s)−I||≤r, whereλi(A)is thei-th eigenvalue ofA ∈H n in the decreasing order Therefore, λi
1 − z α From here it implies that γj(s)≥ z
According to the operator monotonicity of the logarithmic function, from the last inequalities fors >0we get
Using the L’Hopital rule, we obtain slim→ 0 + logR α,z (ω,γ1(s), ,γm(s)) 1 s m j=1 γ j ′ (0).
Similarly, fors 0.
Proof (1) Since A andB commute, so are A − 1 andB Thus A − 1 tB = (A − 1 ) 1 − t B t and we have
(2) For anya, b >0, we have(aA)t(bB) =a 1 − t b t (AtB) Consequently,
(3) Note that U ∗ (A t B) 1/2 U = (U ∗ (A t B)U) 1/2 and U ∗ A 2 − 2t U = (U ∗ AU) 2 − 2t for any
=Ft(U ∗ AU, U ∗ BU), where the last equality follows fromU ∗ (AtB)U = (U ∗ AU)t(U ∗ BU).
(4) Applying(AtB) − 1 =A − 1 tB − 1 , we obtain
(5) Sincedet(AB) = detA detB, we obtain detFt(A, B) = det
= (detA) t − 1 (detB) t (detA) 2 − 2t = (detA) 1 − t (detB) t
(6) LetX =F t (A, B) By the Arithmetic-Geometric-Harmonic inequality and the operator monotonicity of the functionX →X t whent∈[0,1], we have
SinceF t (A, B) = (F t (A − 1 , B − 1 )) − 1 , we obtain the first inequality.
Using the second inequality in (4.1.4) and similar arguments, one can prove the second inequality.
Remark 4.1.1.An analog of Lemma 4.1.1(3) forFt(A, B)is not true, i.e., the equalityFt(A, B) F 1 − t(B, A)does not hold Indeed, from the last identity we have
(A −1 tB) 1/2 A 2−2t (A −1 tB) 1/2 = (A −1 tB) −1/2 B 2t (A −1 tB) −1/2 , or equivalently,
According to the Riccati equation, it implies that
A − 1 tB =B 2t A 2t − 2 which is not true.
4.2 The Lie-Trotter formula and weak log-majorization
In the context of the Banach space of bounded operators on a Hilbert space, denoted as LetB(H), and the open convex cone of positive definite operators, referred to as P(H), it is evident that applying calculus to operator mappings and operator-valued functions allows for an expansion of the classical Lie-Trotter formula.
Proposition 4.2.1([1]) For any differentiable curveγ : (−ε,ε)→P n withγ(0) =I, e γ ′ (0) = lim t → 0γ 1/t (t) = lim n →∞γ n (1/n).
Indeed, the exponential function e : B(H) → P(H ) and the logarithm function log :
The mappings P(H) and B(H) are both well-defined and diffeomorphic, with the derivative of the exponential function at the origin in B(H) being the identity map As a result, the derivative of the inverse function, log, at the identity operator in P(H) also serves as the identity map on B(H) Thus, it follows that γ′(0) equals (log◦γ)′(0), which can be expressed as the limit as t approaches 0 of the difference between log(γ(t)) and log(γ(0)) divided by t, ultimately simplifying to the limit as t approaches 0 of log(γ(t)) over t.
Notice that forX, Y ∈H n andα∈[0,1], the following curves are smooth and pass through the identity matrixIatt= 0: γ1(t) = e t(1 − α)X/2 e tαY e t(1 − α)X/2 , γ2(t) = (1−α)e tX +αe tY , γ3(t) = ((1−α)e − tX +αe − tY ) − 1 , γ4(t) = e tX αe tY , γ5(t) = e tX αe tY
Applying Proposition 4.2.1 one obtains the following Lie-Trotter formulas: e (1−α)X+αY = lim n →∞(e t(1−α)X/2n e tαY /n e t(1−α)X/2n ) n
In next theorem, we show the Lie-Trotter formula forF t , namely, plim→ 0Ft 1/p (e pA , e pB ) = e (1−t)A+tB , whenA, B ∈H n andt∈[0,1].
Theorem 4.2.1 LetA, B ∈Hnandt ∈[0,1] Then plim→ 0Ft 1/p (e pA , e pB ) = e (1 − t)A+tB Proof SinceFt − 1 (A, B) = F t (A − 1 , B − 1 )we have plim→ 0 − Ft − 1/p
So we only need to prove plim→ 0 + Ft(e pA , e pB ) 1/p =e (1−t)A+tB Forp∈(0,1)we may expressp= m+s 1 , wherem ∈N, ands∈(0,1) Set
By [62, Theorem 1.1], e pA te pB ≺log ep[(1−t)A+tB] so we have
X(p)= e −pA te pB 1 2 e p(2−2t)A (e −pA te pB ) 1 2
≤ e −pA te pB 1 2 e p(2−2t)A e −pA te pB 1 2
Aspm ≤ 1, we haveX(p) m ≤ e pm[(3 − 3t) A +2t B ] ≤ e (3 − 3t) A +2t B 0, there exists a unitary matrixU and a diagonal matrixD = diag(λ 1 , ,λ n ) such thatA =U DU ∗ Therefore
2.This is equivalent to f(x) :=t−1 t x 1/2t +1 tx (1−t)/2t
Thus, f ′ (x) = 0 if only ifx = 1 Hence, f(x) attains its maximum at f(1) = 1, and f(x) ≤ 1, for all0 ≤ x ≤ 1 2 and x > 0 Therefore, λ1(F t (A, B)) ≤ 1, which implies
(ii) Let 1 2 ≤ t ≤ 1, set C = B − 1 1 − tA = B − 1/2 (B 1/2 AB 1/2 ) 1 − t B − 1/2 Consequently,
B 1/2 CB 1/2 = (B 1/2 AB 1/2 ) 1−t This implies (B 1/2 CB 1/2 ) 1/(2−2t) = (B 1/2 AB 1/2 ) 1/2 Recall that,
IfA⋄tB ≤I, then we have tB+ (1−t)(B 1/2 AB 1/2 ) 1/2 ≤B 1/2 This is equivalent to
1−tB 1/2 , since the mapx→x 2 − 2t is operator monotone when 1 2 ≤t≤1 Then we have
SinceB > 0, there exists a unitary matrixU and a diagonal matrixD= diag(λ1, ,λn) such thatB =U DU ∗ Therefore,
1−tx t/(2 − 2t) , where 1 2 ≤t≤1andx >0 We have f ′ (x) = t
Thus, f ′ (x) = 0 if only if x = 1 Hence, f(x) attains its maximum at f(1) = 1 and f(x) ≤ 1, for all 1 2 ≤ x ≤ 1 and x > 0 Therefore, λ1(F1−t(B, A)) ≤ 1, that is
In this chapter, we present the F-mean, a novel spectral geometric mean, and outline its fundamental properties We demonstrate that the F-mean adheres to the Lie-Trotter formula and compare it to the solution of the least squares problem in relation to the Bures distance.
This thesis obtained the following main results:
We present a novel Weighted Hellinger distance, denoted as dh,α(A, B), which serves as an interpolating metric between the Log-Euclidean and Hellinger metrics Our findings establish the equivalence of the weighted Bures-Wasserstein and weighted Hellinger distances, demonstrating that both metrics satisfy the in-betweenness property Furthermore, we reveal that among symmetric means, only the arithmetic mean fulfills the in-betweenness property within the context of the weighted Bures-Wasserstein and weighted Hellinger distances.
We introduce a novel quantum divergence known as the α-z-Bures-Wasserstein divergence, which fulfills the in-betweenness property and the data processing inequality in quantum information theory Additionally, we address the least squares problem related to this divergence and show that its solution aligns precisely with the unique positive solution of a corresponding matrix equation.
A 1 2z − α B α z A 1 − 2z α z and0< α ≤ z ≤ 1.Afterwards, we proceed to study the properties of the solution to this problem and achieve several significant results.
In addition, we provide an inequality for quantum fidelity and its parameterized versions.Then, we utilizeα-z-fidelity to measure the distance between two quantum orbits.
We present the F-mean, a novel weighted geometric mean, and explore its properties Our findings demonstrate that the F-mean adheres to the Lie-Trotter formula Additionally, we compare the F-mean with the Wasserstein mean in terms of weak-log majorization.
In the future, we intend to continue the investigation in the following directions:
• Construct some new distance function based on non-Kubo-Ando means.
• Construct a new distance function between two matrices with different dimensions.
• ForX, Y >0and0< t