1. Trang chủ
  2. » Giáo án - Bài giảng

algorithms for sparsity constrained optimization bahmani 2013 10 18 Cấu trúc dữ liệu và giải thuật

124 22 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Cấu trúc

  • Supervisor's Foreword

  • Acknowledgements

  • Parts of This Thesis Have Been Published in the Following Articles

  • Contents

  • List of Algorithms

  • List of Figures

  • List of Tables

  • Notations

  • 1 Introduction

    • 1.1 Contributions

    • 1.2 Thesis Outline

    • References

  • 2 Preliminaries

    • 2.1 Sparse Linear Regression and Compressed Sensing

    • 2.2 Nonlinear Inference Problems

      • 2.2.1 Generalized Linear Models

      • 2.2.2 1-Bit Compressed Sensing

      • 2.2.3 Phase Retrieval

    • References

  • 3 Sparsity-Constrained Optimization

    • 3.1 Background

    • 3.2 Convex Methods and Their Required Conditions

    • 3.3 Problem Formulation and the GraSP Algorithm

      • 3.3.1 Algorithm Description

        • 3.3.1.1 Variants

      • 3.3.2 Sparse Reconstruction Conditions

      • 3.3.3 Main Theorems

    • 3.4 Example: Sparse Minimization of L2-Regularized Logistic Regression

      • 3.4.1 Verifying SRH for 2-Regularized Logistic Loss

      • 3.4.2 Bounding the Approximation Error

    • 3.5 Simulations

      • 3.5.1 Synthetic Data

      • 3.5.2 Real Data

    • 3.6 Summary and Discussion

    • References

  • 4 1-Bit Compressed Sensing

    • 4.1 Background

    • 4.2 Problem Formulation

    • 4.3 Algorithm

    • 4.4 Accuracy Guarantees

    • 4.5 Simulations

    • 4.6 Summary

    • References

  • 5 Estimation Under Model-Based Sparsity

    • 5.1 Background

    • 5.2 Problem Statement and Algorithm

    • 5.3 Theoretical Analysis

      • 5.3.1 Stable Model-Restricted Hessian

      • 5.3.2 Accuracy Guarantee

    • 5.4 Example: Generalized Linear Models

      • 5.4.1 Verifying SMRH for GLMs

      • 5.4.2 Approximation Error for GLMs

    • 5.5 Summary

    • References

  • 6 Projected Gradient Descent for Lp-Constrained Least Squares

    • 6.1 Background

    • 6.2 Projected Gradient Descent for Lp-Constrained Least Squares

    • 6.3 Discussion

    • References

  • 7 Conclusion and Future Work

  • Appendix A Proofs of Chap.3

    • A.1 Iteration Analysis For Smooth Cost Functions

    • A.2 Iteration Analysis For Non-smooth Cost Functions

  • Appendix B Proofs of Chap.4

    • B.1 On Non-convex Formulation of BM:PlanRobust2013

    • Reference

  • Appendix C Proofs of Chap.5

  • Appendix D Proofs of Chap.6

    • D.1 Proof of Theorem 6.1

    • D.2 Lemmas for Characterization of a Projection onto Lp-Balls

    • References

Nội dung

Springer Theses Recognizing Outstanding Ph.D Research Sohail Bahmani Algorithms for SparsityConstrained Optimization CuuDuongThanCong.com Springer Theses Recognizing Outstanding Ph.D Research For further volumes: http://www.springer.com/series/8790 CuuDuongThanCong.com Aims and Scope The series “Springer Theses” brings together a selection of the very best Ph.D theses from around the world and across the physical sciences Nominated and endorsed by two recognized specialists, each published volume has been selected for its scientific excellence and the high impact of its contents for the pertinent field of research For greater accessibility to non-specialists, the published versions include an extended introduction, as well as a foreword by the student’s supervisor explaining the special relevance of the work for the field As a whole, the series will provide a valuable resource both for newcomers to the research fields described, and for other scientists seeking detailed background information on special questions Finally, it provides an accredited documentation of the valuable contributions made by today’s younger generation of scientists Theses are accepted into the series by invited nomination only and must fulfill all of the following criteria • They must be written in good English • The topic should fall within the confines of Chemistry, Physics, Earth Sciences and related interdisciplinary fields such as Materials, Nanoscience, Chemical Engineering, Complex Systems and Biophysics • The work reported in the thesis must represent a significant scientific advance • If the thesis includes previously published material, permission to reproduce this must be gained from the respective copyright holder • They must have been examined and passed during the 12 months prior to nomination • Each thesis should include a foreword by the supervisor outlining the significance of its content • The theses should have a clearly defined structure including an introduction accessible to scientists not expert in that particular field CuuDuongThanCong.com Sohail Bahmani Algorithms for Sparsity-Constrained Optimization 123 CuuDuongThanCong.com Sohail Bahmani Carnegie Mellon University Pittsburgh, Pennsylvania, USA ISSN 2190-5053 ISSN 2190-5061 (electronic) ISBN 978-3-319-01880-5 ISBN 978-3-319-01881-2 (eBook) DOI 10.1007/978-3-319-01881-2 Springer Cham Heidelberg New York Dordrecht London Library of Congress Control Number: 2013949675 © Springer International Publishing Switzerland 2014 This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer Permissions for use may be obtained through RightsLink at the Copyright Clearance Center Violations are liable to prosecution under the respective Copyright Law The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made The publisher makes no warranty, express or implied, with respect to the material contained herein Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) CuuDuongThanCong.com To my parents : : : CuuDuongThanCong.com CuuDuongThanCong.com Supervisor’s Foreword The problem of sparse optimization has gathered a lot of attention lately The reason is simple: sparsity is a fundamental structural characteristic of much of the data we encounter Indeed, one may claim that the structure in these data is an expression of sparsity The sparsity may manifest in different ways Often the data themselves are sparse, in that the majority of their components are zero-valued More commonly, the data may simply be restricted in the values they can take or the patterns they may follow Here, the structure in the data can often be characterized as sparsity in a transformed domain For instance, the data may be restricted to only inhabit a restricted set of subspaces In this case descriptions of the data in terms of their projections on these subspaces will be sparse This sparsity can be exploited for a variety of purposes, e.g., compressed sensing techniques exploit sparsity in signals to characterize them using far fewer measurements than would otherwise be required, RADAR and SONAR applications exploit the spatial sparsity of sources for better detection and localization of sources, etc At other times, sparsity may be imputed to characterizations of various aspects of the data, in an attempt to bring out the structure in it Thus, statistical analyses and various machine learning techniques often attempt to fit sparse models to data, enable better predictions, identify important variables, etc At yet other times, sparsity may be enforced simply to compensate for paucity of data to learn richer or more detailed models In all cases, one ends up having to estimate the sparsest solution that minimizes a loss function of some kind, i.e., with an instance of the aforementioned sparseoptimization problem The specifics vary chiefly in the loss function minimized For instance, compressed sensing attempts to minimize the squared error between observations of the data and observations that might be engendered by the sparse solution, machine learning techniques attempt to minimize the negative log probability of the observed data, as predicted by the model, and so on Obtaining sparse solutions, however, is not trivial Sparsity is defined through the `0 norm—the number of nonzero components—of the variable being optimized To obtain a sparse solution, this norm must hence be directly minimized or, alternately, imposed as a constraint on the optimization problem Unfortunately, vii CuuDuongThanCong.com viii Supervisor’s Foreword optimization problems involving the `0 norm require determination of the optimal set of components to be assigned nonzero values and are hence combinatorial in nature and are generally computationally intractable As a result, one must either employ greedy algorithms to obtain a solution or employ proxies that are relaxations of the `0 norm Both of these approaches have yielded highly effective algorithms for optimization, when the loss function is quadratic or, more generally, convex in nature For more generic classes of loss functions, however, the situation is not so clear Proxies to the `0 norm which can be shown to result in optimally sparse solutions for quadratic or convex loss functions are no longer guaranteed to provide optimal solutions for other loss functions It is similarly unclear whether greedy algorithms that are effective for well-behaved loss functions will be equally effective in the most general case This is the problem space that Sohail tackles in this monograph In an outstanding series of results, he develops and analyzes a greedy framework for sparsityconstrained optimization of a wide class of loss functions, shows how it may be applied to various problems, and finally extends it to handle the case where the solutions are not merely sparse, but restricted to lie in specified subspaces GraSP is the proposed greedy framework for sparse optimization of loss functions Through rigorous analysis, Sohail demonstrates that it imposes far fewer constraints on the loss function, only requiring it to be convex on sparse subspaces, and converges linearly to the optimal solution As an illustrative application he applies GraSP to the problem of feature selection through sparse optimization of logistic functions, and demonstrates that it results in significantly better solutions than current methods One-bit compressive sensing is the problem of reconstructing a signal from a series of one-bit measurements, a challenging but exciting problem Sohail demonstrates that GraSP-based solutions can result in greatly improved signal recovery over all other current methods Subsequently, he develops a solution to deal with model-based sparsity: problems where the solutions are not only required to be sparse, but are further restricted to lie on only specific subspaces Such problems frequently arise, for instance, when additional information is available about the interdependence between the location of nonzero values in the estimated variables Finally he reverses gear and addresses a more philosophical problem—that of identifying the best proxy for gradient-based algorithms for sparsity-constrained least-squares optimization—and arrives at the remarkable result that the optimal proxy is the `0 norm itself Together, the contributions of this monograph lay a solid foundation of techniques and results for any aspiring or established researcher wishing to work on the problem of sparse optimization of difficult-to-optimize loss functions As such, I believe that this monograph is a mandatory inclusion in the library of anybody working on the topic Language Technologies Institute Carnegie Mellon University Pittsburgh, USA CuuDuongThanCong.com Prof Bhiksha Raj Acknowledgements I would like to thank Professor Bhiksha Raj, my advisor, for his continuous support and encouragement during my studies at Carnegie Mellon University He made every effort to allow me to achieve my goals in research during the course of the Ph.D studies I would also like to thank Dr Petros T Boufounos for his insightful comments that helped me improve the quality of my work during our collaboration and for serving in my thesis defense committee I would like to thank Professor José M F Moura and Professor Soummya Kar, who also served in the defense committee, for their enlightening advice and comments on my thesis Above all, I would like to express my sincere gratitude to my parents who supported me throughout my life in every aspect I especially thank my mother for giving me motivation and hope that helped me endure and overcome difficulties ix CuuDuongThanCong.com 92 C Proofs of Chap It is straightforward to verify that using (C.1) for u0 and v0 as the unit-norm vectors and multiplying both sides of the resulting inequality by kukkvk yields the desired general case Proof of Theorem 5.1 Using optimality of x.t C1/ and feasibility of x one can deduce x.t C1/ z.t / Ä x z.t / ; with z.t / as in line of Algorithm Expanding the squared ˝ ˛ norms using the inner product of H then shows Ä x.t C1/ x; 2z.t / x.t C1/ x or equivalently D Á E Ä .t C1/ ; 2x.t / 2Á.t / rf x C .t / .t C1/ ; whereD .t / D x.t /E x and .t C1/ D x.t C1/ 2Á.t / .t C1/ ; rf x/ and rearranging yields .t C1/ x Adding and subtracting D E D Á Ä .t C1/ ; x.t / 2Á.t / .t C1/ ; rf x C .t / D E 2Á.t / .t C1/ ; rf x/ rf x/ E (C.2) Since f is twice continuously differentiable by assumption, it follows D Á E D form t C1/ t / the mean-value theorem that  rf x/ D .t C1/ ; ; rf x C  Á E r f x C .t / .t / , for some 0; 1/ Furthermore, because x, x.t / , x.t C1/ Á all belong to the model set M Ck / we have supp x C .t / M Ck2 and Á Á Á thereby supp .t C1/ [ supp x C .t / M Ck3 Invoking the 3; r Ck SMRH condition of the cost function and applying Lemma C.1 with the sparsity model M Ck3 , x D x C .t / , and Á D Á.t / then yields ˇD E ˇ t C1/ t / ; ˇ D Á Á.t / .t C1/ ; rf x C .t / Eˇ ˇ rf x/ ˇ Ä Using the Cauchy-Schwarz inequality and the fact that t / .t C1/ rf x/jsupp..t C1/ / rf x/jI by the definition of I, (C.2) implies that .t C1/ Ä2 t / .t C1/ .t / C 2Á.t / .t C1/ Canceling .t C1/ from both sides proves the theorem CuuDuongThanCong.com .t / : rf x/jI : Ä C Proofs of Chap 93 Lemma C.2 (Bounded Model Projection) Given an arbitrary h0 H, a positive real number r, and a sparsity model generator Ck , a projection PCk ;r h0 / can be obtained as the projection of PCk ;C1 h0 / on to the sphere of radius r Á Proof To simplify the notation let hO D PCk ;r h0 / and SO D supp hO For S  Œp define h0 S/ D arg kh h h0 k s.t khk Ä r and supp h/  S: It follows from the definition of PCk ;r h0 / that SO arg minS2Ck kh0 S/ Using kh0 S/ h0 k2 D kh0 S/ h0 jS h0 jS c k2 D kh0 S/ h0 k h0 jS k2 C k h0 jS c k2 ; we deduce that h0 S/ is the projection of h0 jS onto the sphere of radius r Therefore, we can write h0 S/ D f1; r=k h0 jS kg h0 jS and from that SO arg kmin f1; r=k h0 jS kg h0 jS S2Ck D arg kmin f0; r=k h0 jS k S2Ck D arg S2Ck r=k h0 jS k/2C D arg max q S/ WD k h0 jS k2 S2Ck h0 k2 1g h0 jS k2 C k h0 jS c k2 Á k h0 jS k2 k h0 jS k r/2C : Furthermore, let S0 D supp PCk ;C1 h0 // D arg max k h0 jS k: S2Ck (C.3) If h0 jS0 Ä r then q S/ D k h0 jS k Ä q S0 / for any S Ck and thereby SO D S0 Thus, we focus on cases that h0 jS0 > r which implies q S0 / D h0 jS0 r r For any S Ck if k h0 jS k Ä r we have q S/ D k h0 jS k2 Ä r < h0 jS0 r r D q S0 /, and if k h0 jS k > r we have q S/ D 2k h0 jS kr r Ä h0 jS0 r r D q S0 / where (C.3) is applied Therefore, we have shown that SO D S0 It is then straightforward to show the desired result that projecting PCk ;C1 h0 / onto the centered sphere of radius r yields PCk ;r h0 / CuuDuongThanCong.com Appendix D Proofs of Chap D.1 Proof of Theorem 6.1 To prove Theorem 6.1 first a series of lemmas should be established In what follows, x?? is a projection of the s-sparse vector x? onto BO and x? x?? is denoted by d? Furthermore, for t D 0; 1; 2; : : : we denote x.t / x?? by d.t / for compactness Lemma D.1 If x.t / denotes the estimate in the t-th iteration of `p -PGD, then d.t C1/ E hD Ä 2< d.t / ; d.t C1/ 2 D Ei D E Á.t / Ad.t / ; Ad.t C1/ C 2Á.t / < Ad.t C1/ ; Ad? C e : Proof Note that x.t C1/ is a projection of x.t / Á.t / AH Ax.t / O we have is also a feasible point (i.e., x?? B) x.t C1/ x.t / C Á.t / AH Ax.t / Á y 2 Ä x?? O Since x? y onto B ? Á x.t / C Á.t / AH Ax.t / y 2 : Using (2.1) we obtain d.t C1/ d.t / C Á.t / AH A d.t / Therefore, we obtain D < d.t C1/ ; d.t C1/ d? Á Á e 2 Ä d.t / C Á.t / AH A d.t / 2d.t / C 2Á.t / AH Ad.t / Ad? C e/ ÁE d? Á Á e : Ä0 that yields the desired result after straightforward algebraic manipulations The following lemma is a special case of the generalized shifting inequality proposed in (Foucart 2012, Theorem 2) Please refer to the reference for the proof S Bahmani, Algorithms for Sparsity-Constrained Optimization, Springer Theses 261, DOI 10.1007/978-3-319-01881-2, © Springer International Publishing Switzerland 2014 CuuDuongThanCong.com 95 96 D Proofs of Chap Lemma D.2 (Shifting Inequality Foucart (2012)) If < p < and u1 u2 ul then for C D max r p ulC1 q ; p 2 ur l p Á 12 p ! 12 lCr X ÄC u2i urC1 urCl 0; , r X ! p1 p ui : i D1 i DlC1 O we have supp x? Lemma D.3 For x?? , a projection of x? onto B, ? ? supp x /  S D Proof Proof is by contradiction Suppose that there exists a coordinate i such that ? xi? D but x?i Ô Then one can construct vector x0 which is equal to x?? except at the i -th coordinate where it is zero Obviously x0 is feasible because kx0 kpp < p x?? p Ä c O Furthermore, x0 x? 2 D n ˇ X ˇ ? ˇxj ˇ2 ˇ xj0 ˇ j D1 D n ˇ X ˇ ? ˇxj j D1 j ¤i < n ˇ X ˇ ? ˇxj j D1 D x? ˇ2 ? ˇ x? jˇ ˇ2 ? ˇ x? jˇ x?? : Since by definition x?? arg x ? kx xk22 s.t kxkpp Ä c; O we have a contradiction To continue, we introduce the following sets which partition the coordinates of vector d.t / for t D 0; 1; 2; : : : As defined previously in Lemma D.3, let S D supp x? / Lemma D.3 shows that supp x??  S, thus we can assume that x?? is s-sparse Let St;1 be the support of the s largest entries of d.t / jS c in magnitude, and define Tt D S [ St;1 Furthermore, let St;2 be the support of the s largest entries of d.t / jTtc , St;3 be the support of the next s largest entries of d.t / jTtc , and so on We also set Tt;j D S t;j [ S t;j C1 for j This partitioning of the vector d.t / is illustrated in Fig D.1 CuuDuongThanCong.com D.1 Proof of Theorem 6.1 97 Fig D.1 Partitioning of vector d.t/ D x.t/ x?? The color gradient represents decrease of the magnitudes of the corresponding coordinates Lemma D.4 For t D 0; 1; 2; : : : the vector d.t / obeys X d jSt;i t / i 2 p Ä 2p  à 12 2s p p d.t / jS c p : Proof Since St;j and St;j C1 are disjoint and Tt;j D S t;j [ S t;j C1 for j have d.t / jSt;j C d.t / jSt;j C1 Ä p d.t / jTt;j 1, we : Adding over even j ’s then we deduce X d.t / jSt;j j 2 Ä p X t / d jTt;2i i : Because of the structure of the sets Tt;j , Lemma D.2 can be applied to obtain d.t / jTt;j Ä p p  à 12 2s p p d.t / jTt;j p : (D.1) To be precise, based on Lemma D.2 the coefficient on the RHS should be ( C D max 2s/ p r ;  Ã1 2s p 2 p For simplicity, however, we use the upper bound C Ä p ) : p p 2s p Á 12 Á 12 p To verify 1 p this upper bound it suffices to show that 2s/ p Ä p 22sp or equivalently p/ D p log p C p/ log p/ for p 0; 1 Since / is a deceasing function over 0; 1, it attains its minimum at p D which means that p/ 1/ D as desired CuuDuongThanCong.com p 98 D Proofs of Chap Then (D.1) yields X d.t / jSt;j Ä j  p 2p p à 12 2s p p d.t / jTt;2i i Ä@ p X d.t / jTt;2i i p Since !1 C !2 C C !l Ä !1 C !2 C p 0; 1, we can write X p C !l X p holds for !1 ; p : ; !l and p1 d.t / jTt;2i p i p A : The desired result then follows using the fact that the sets Tt;2i S c S D i Tt;2i are disjoint and Proof of the following Lemma mostly relies on some common inequalities that have been used in the compressed sensing literature (see e.g., (Chartrand 2007, Theorem 2.1) and (Gribonval and Nielsen 2007, Theorem 2)) Lemma D.5 The error vector d.t / satisfies d.t / jS c t D 0; 1; 2; p Ä sp d.t / jS for all Proof Since supp x??  S D supp x? / we have d.t / jS c D x.t / jS c Furthermore, p p because x.t / is a feasible point by assumption we have x.t / p Ä cO D x?? p that implies, d.t / jS c p p p D x.t / jS c Ä x?? p p Ä x?? Ä s1 x.t / jS x.t / jS D d.t / jS p p p p p p p p d.t / jS p ; (power means inequality/ which yields the desired result The next lemma is a straightforward extension of a previously known result (Davenport and Wakin 2010, Lemma 3.1) to the case of complex vectors and asymmetric RIP Lemma D.6 For u;v2Cn suppose that matrix A satisfies RIP of order max fkuCvk0 ; ku vk0 g with constants ˛ and ˇ Then we have CuuDuongThanCong.com D.1 Proof of Theorem 6.1 99  hu; vij Ä j< ŒÁ hAu; Avi Á ˛ ˇ/ ˇ ˇ Á ˛ C ˇ/ C ˇˇ ˇÃ ˇ 1ˇˇ kuk2 kvk2 : Proof If either of the vectors u and v is zero the claim becomes trivial So without loss of generality we assume that none of these vectors is zero The RIP condition holds for the vectors u ˙ v and we have ˇku ˙ vk22 Ä kA u ˙ v/k22 Ä ˛ku ˙ vk22 : Therefore, we obtain Á kA u C v/k22 kA u v/k22 Á Ä ˛ku C vk22 ˇku vk22 Á ˛Cˇ ˛ ˇ < hu; vi : D kuk22 C kvk22 C < hAu; Avi D Applying this inequality for vectors Ä u kuk2 v kvk2 yields  Á ˛ C ˇ/ C Ä 2 ˇ Á ˛ ˇ/ ˇˇ Á ˛ C ˇ/ Cˇ Ä 2 Á ˛ u v ; kuk2 kvk2 u v < Á A ;A kuk2 kvk2 and ˇ/ à < ˇ ˇ 1ˇˇ : u v ; kuk2 kvk2 Similarly it can be shown that Ä v u ;A < Á A kuk2 kvk2 Á ˛ u v ; kuk2 kvk2 ˇ/ ˇ ˇ Á ˛ C ˇ/ ˇ ˇ ˇ ˇ 1ˇˇ : The desired result follows by multiplying the last two inequalities by kuk2 kvk2 ˇ ˇ Lemma D.7 If the step-size of `p -PGD obeys ˇÁ.t / ˛3s C ˇ3s / =2 1ˇ Ä for some 0, then we have hD t / < d ;d t C1/ E Á t / D t / Ad ; Ad t C1/ Ei Ä C / d.t / CuuDuongThanCong.com 3s C / 1C d.t C1/ : p  2p Ã1 2 p p !2 100 D Proofs of Chap Proof Note that hD E D Ei < d.t / ; d.t C1/ Á.t / Ad.t / ; Ad.t C1/ E D Ei hD Á.t / Ad.t / jTt ; Ad.t C1/ jTt C1 D < d.t / jTt ; d.t C1/ jTt C1 E D Ei X hD < d.t / jSt;i ; d.t C1/ jTt C1 Á.t / Ad.t / jSt;i ; Ad.t C1/ jTt C1 C i C X E hD < d.t / jTt ; d.t C1/ jSt C1;j j C X D Ei Á.t / Ad.t / jTt ; Ad.t C1/ jSt C1;j E hD < d.t / jSt;i ; d.t C1/ jSt C1;j D Ei Á.t / Ad.t / jSt;i ; Ad.t C1/ jSt C1;j : (D.2) i;j ˇ ˇ Note that jTt [ Tt C1 j Ä for i; j we have ˇTt [ St C1;j ˇ Ä 3s, ˇ ˇ 3s Furthermore, jTt C1 [ St;i j Ä 3s, and ˇSt;i [ St C1;j ˇ Ä 2s Therefore, by applying Lemma D.6 for each of the summands in (D.2) and using the fact that 3s WD C / Á.t / ˛3s we obtain hD E < d.t / ; d.t C1/ 3s C ˇ ˇ ˇ3s / =2 C ˇÁ.t / ˛3s C ˇ3s / =2 D Ei Á.t / Ad.t / ; Ad.t C1/ Ä C 3s d.t / jTt X d.t C1/ jTt C1 3s d.t / jSt;i X 3s d.t / jTt j C X 3s d.t / jSt;i d.t C1/ jSt C1;j d.t C1/ jSt C1;j i;j 2 d.t C1/ jTt C1 i C ˇ ˇ 1ˇ : Hence, applying Lemma D.4 yields hD E < d.t/ ; d.tC1/ D Ei Á.t/ Ad.t/ ; Ad.tC1/ Ä C C 3s p p d.t/ jTt  2p 2p C 2p CuuDuongThanCong.com à 12 2s p p Ã1 2s 2 p p  Ã1 2s  d.tC1/ jTtC1 p p 3s 3s d.t/ jS c 3s d.t/ jTt d.t/ jS c p p d.tC1/ jTtC1 d.tC1/ jS c d.tC1/ jS c p p : D.1 Proof of Theorem 6.1 101 Then it follows from Lemma D.5, hD E < d.t/ ; d.tC1/ D Ei Á.t/ Ad.t/ ; Ad.tC1/ Ä C C 3s p p d.t/ jTt  2p C 2p Ä 3s p p  Ã1 2  Ã1 2 2p d.tC1/ jTtC1 p p Ã1 2 p 1C p 2p p 3s  3s d.t/ jS 3s d.t/ jTt d.t/ jS Ã1 2 2 p d.tC1/ jTtC1 2 d.tC1/ jS d.tC1/ jS 2 !2 p d.t/ d.tC1/ ; which is the desired result Now we are ready to prove the accuracy guarantees for the `p -PGD algorithm Proof of Theorem 6.1 Recall that Lemmas D.1 and D.7 that d.t / 2 Ä2 d.t / Ä2 t / d is defined by (6.5) It follows from d.t 1/ t 1/ d 2 D t / E C 2Á.t / < Ad ; Ad? C e C 2Á.t / Ad.t / kAd? C ek2 : Furthermore, using (D.1) and Lemma D.5 we deduce Ad.t / Ä Ad.t / jTt Ä C p ˛2s d.t / jTt p Ä ˛2s d.t / jTt Ä Ä Ä CuuDuongThanCong.com p ˛2s d.t / jTt p ˛2s d.t / jTt X Ad.t / jTt;2i i 2 2 C Xp ˛2s d.t / jTt;2i i p p C ˛2s p C p p ˛2s p C p p ˛2s p  à 12 p p ˛2s C p 2 p   p p 2s  à 12 2s 2 à 12 p à p 2 p 1! p t / d d.t / jTt;2i i p X : d.t / jS c d.t / jS p p 102 D Proofs of Chap Therefore, d t / Ä2 d t / d t 1/ C2Á which after canceling d.t / d t / Ä2 d D2 t 1/ d C2Á t 1/ 2 t / p C2Á t / t / p p 1C p ˛2s Ä2 d Ã1 2 p ! d.t / p Ad? C e ; yields  p ˛2s 1C p Ã1 p ˛2s ˛3s Cˇ3s / ˛3s Cˇ3s ˛2s C4 1C / ˛3s Cˇ3s ! p Ad? C e p p t 1/  1C p 1C  p p  p p x? cO kx? kpp Ä p ! Ad? C e p Ad? p cO kx? kpp Á1=p C kek2 : x? BO we !1=p x? D kx? k2 : x? Furthermore, supp d /  S; thereby we can use RIP to obtain p kAd? k2 Ä ˛s kd? k2 p Ä ˛s kx? k2 : ? Hence, p d.t/ Ä2 d.t 1/ Ä2 d.t 1/ C C / ˛2s ˛3s C ˇ3s C C / C p  p Ã1 2 p 1C p  p Ã1 2 p ! p C ? 3s / kx k2 C Applying this inequality recursively and using the fact that t X i D0 CuuDuongThanCong.com /i < X i D0 /i D 1 ; ! Since x?? is a projection of x? onto the feasible set BO and have kd? k2 D x?? Ã1 2 Ã1 2 p ! p ˛s kx? k2 C kek2 p à ˛2s kek2 : ˛3s C ˇ3s D.2 Lemmas for Characterization of a Projection onto `p -Balls which holds because of the assumption x.t/ x? D d.t/ d? t/ Ä d 103 < 12 , we can finally deduce C kd? k2 Ä /t x?? C 2 C / C p//  C ? 3s / kx k2 C C kd? k2 Ä /t kx? k2 C C / C p//  C ? 3s / kx k2 C p à ˛2s kek2 ˛3s C ˇ3s p à ˛2s kek2 ˛3s C ˇ3s C kx? k2 ; where p/ D p p 2 p Á 12 p as defined in the statement of the theorem D.2 Lemmas for Characterization of a Projection onto `p -Balls In what follows we assume that B is an `p -ball with p-radius c (i.e., B D Fp c/) For x Cn we derive some properties of x? arg kx uk22 s.t u B; (D.3) a projection of x onto B Lemma D.8 Let x? be a projection ˇ ofˇ x onto B Then for every i f1; 2; : : : ; ng we have Arg xi / D Arg xi? and ˇxi? ˇ Ä jxi j Proof Proof ˇ by ˇ contradiction Suppose that for some i we have Arg xi / ¤ Arg xi? or ˇxi? ˇ > jxi j Consider the vector x0 for which xj0 D xj? for j ¤ i and ˇ ˇ« ˚ xi0 D jxi j ; ˇxi? ˇ exp {Arg xi // ; p We have kx0 kp Ä x? p where the character { denotes the imaginary unit ˇ ˇ ˇ ˇ which implies that x0 B Since ˇxi xi0 ˇ < ˇxi xi? ˇ we have kx0 xk2 < x? x which contradicts the choice of x? as a projection Assumption Lemma D.8 asserts that the projection x? has the same phase components as x Therefore, without loss of generality and for simplicity in the following lemmas we assume x has real-valued non-negative entries CuuDuongThanCong.com 104 D Proofs of Chap Lemma D.9 For any x in the positive orthant there is a projection x? of x onto the set B such that for i; j f1; 2; : : : ; ng we have xi? Ä xj? iff xi Ä xj Proof Note that the set B is closed under any permutation of coordinates In particular, by interchanging the i -th and j -th entries of x? we obtain another vector x0 in B Since x? is a projection of x onto B we must have x x? Ä kx x0 k22 Á2 Á2 2 Therefore, we have xi xi? C xj xj? Ä xi xj? C xj xi? and Á from that Ä xi xj xi? xj? : For xi ¤ xj the result follows immediately, and for xi D xj without loss of generality we can assume xi? Ä xj? Lemma D.10 Let S ? be the support set of x? Then there exists a ?.1 p/ xi xi such that xi? D p for all i S ? Proof The fact that x? is a solution to the minimization expressed in (D.3) implies that x? jS ? must be a solution to arg v kxjS ? vk22 s.t kvkpp Ä c: The normal to the feasible set (i.e., the gradient of the constraint function) is uniquely defined at x? jS ? since all of its entries are positive by assumption Consequently, the Lagrangian L v; / D kxjS ? has a well-defined partial derivative appropriate Hence, 8i S ? xi? @L @v vk22 C kvkpp Á c at x? jS ? which must be equal to zero for an ?.p 1/ xi C p xi D0 which is equivalent to the desired result Lemma D.11 Let p Á 1p and p Œ0; 1 be fixed numbers and set T0 D p/ p p/ Denote the function t p T t/ by hp t/ The following statements hold regarding the roots of hp t/ D p : (i) For p D and T T0 the equation h1 t/ D has a unique solution at t DT Œ0; T  which is an increasing function of T CuuDuongThanCong.com D.2 Lemmas for Characterization of a Projection onto `p -Balls 105 (ii) For p Œ0; 1/ and T Ti0 the equation h hp t/ DÁ p has two roots t and tC satisfying t 0; and tC 12 pp T; C1 As a function of T , t and tC are decreasing and increasing, respectively and they coincide at T D T0 p T p Proof Figure D.2 illustrates hp t/ for different values of p Œ0; 1 To verify part (i) observe that we have T0 D thereby T The claim is then obvious since h1 t/ D T t is zero at t D T Part (ii) is more intricate and we divide it into two cases: p D and p Ô At p D we have T0 D and h0 t/ D t T t/ has two zeros at t D and tC D T that obviously satisfy the claim So we can now focus on the case p 0; 1/ It is straightforward to verify that tmax D 12 pp T is the location at which hp t/ peaks Straightforward algebraic manipulations also show that T > T0 is equivalent to p < hp tmax / Furthermore, inspecting the sign of h0p t/ shows that hp t/ is strictly increasing over Œ0; tmax  while it is strictly decreasing over Œtmax ; T  Then, using the fact that hp 0/ D hp T / D Ä p < hp tmax /, it follows from the intermediate value theorem that hp t/ D p has exactly two roots, t and tC , that straddle tmax as claimed Furthermore, taking the derivative of t p T t / D p with respect to T yields p/ t t p T t / C t1 p t D 0: Hence, p/ T t / t / t0 D t which because t Ä tmax D 12 pp T implies that t < Thus t is a decreasing function of T Similarly we can show that tC is an increasing function of T using the fact that tC tmax Finally, as T decreases to T0 the peak value hp tmax / decreases to p which implies that t and tC both tend to the same value of 12 pp T0 Lemma D.12 Suppose that xi D xj > for some i Ô j If xi? D xj? > then xi? 12 pp xi Proof For p f0; 1g the claim is obvious since at p D we have xi? D xi > 12 xi and at p D we have 12 pp xi D Therefore, without loss of generality we assume p 0; 1/ The proof is by contradiction Suppose that w D xi? xi D xj? xj < p p Since x? is a projection it follows that a D b D w must be the solution to arg a;b D 1h a/2 C b/2 i s.t ap C b p D 2wp ; a > 0; and b > 0; otherwise the vector x0 that is identical to x? except for xi0 D axi Ô xi? and xj0 D bxj Ô xi? is also a feasible point (i.e., x0 B) that satisfies CuuDuongThanCong.com 106 D Proofs of Chap Fig D.2 The function t x0 x 2 x? x p T 2 D a/2 xi2 C D a/2 C t / for different values of p b/2 xj2 b/2 w/2 xi2 Á w/2 xi2 < 0; w/2 xj2 which is absurd If b is considered as a function of a then can be seen merely as a function of a, i.e., Á a/ Taking the derivative of with respect to a yields a/ D a Da D b1 D C b b 1/ a Áp 1 b b p p/ b b/ a/ a1 p 1/ p  1 a/ ap à p ; p where the last equation holds by the mean value theorem n for some o p p 1=p minfa; bg; maxfa; bg/ Since w < p we have r1 WD w; p > w p 1=p and r0 WD 2wp r1 < w With straightforward algebra one can show that if either a or b belongs to the interval Œr0 ; r1 , then so does the other one By varying a in Œr0 ; r1  we always have < r1 Ä 12 pp , therefore as a increases in this interval the sign of changes at a D w from positive to negative Thus, a D b D w is a local maximum of which is a contradiction CuuDuongThanCong.com References 107 References R Chartrand Nonconvex compressed sensing and error correction In Proceedings of the 32nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume 3, pages 889–892, Apr 2007 M Davenport and M Wakin Analysis of orthogonal matching pursuit using the restricted isometry property IEEE Transactions on Information Theory, 56(9):4395–4401, Sept 2010 S Foucart Sparse recovery algorithms: sufficient conditions in terms of restricted isometry constants In Approximation Theory XIII: San Antonio 2010, volume 13 of Springer Proceedings in Mathematics, pages 65–77, San Antonio, TX, 2012 Springer New York R Gribonval and M Nielsen Highly sparse representations from dictionaries are unique and independent of the sparseness measure Applied and Computational Harmonic Analysis, 22(3): 335–355, 2007 CuuDuongThanCong.com ... ISSN 219 0-5 053 ISSN 219 0-5 061 (electronic) ISBN 97 8-3 -3 1 9-0 188 0-5 ISBN 97 8-3 -3 1 9-0 188 1-2 (eBook) DOI 10.1007/97 8-3 -3 1 9-0 188 1-2 Springer Cham Heidelberg New York Dordrecht London Library of Congress... Sparsity-Constrained Optimization 123 CuuDuongThanCong.com Sohail Bahmani Carnegie Mellon University Pittsburgh, Pennsylvania, USA ISSN 219 0-5 053 ISSN 219 0-5 061 (electronic) ISBN 97 8-3 -3 1 9-0 188 0-5 ... for Sparsity-Constrained Optimization, Springer Theses 261, DOI 10.1007/97 8-3 -3 1 9-0 188 1-2 3, © Springer International Publishing Switzerland 2014 CuuDuongThanCong.com 11 12 Sparsity-Constrained

Ngày đăng: 29/08/2020, 22:44

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w