Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 190 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
190
Dung lượng
5,81 MB
Nội dung
Towards Scalable Bayesian Nonparametric Methods for Data Analytics by Viet Huu Huynh, M.Eng ˜u Viê.t) (aka Huỳnh Hư Submitted in fulfillment of the requirements for the degree of Doctor of Philosophy Deakin University January, 2017 Acknowledgements In many ways, I wouldn’t have been able to finish this thesis without the guidance, support, and assistance of many great people over the course of this dissertation I would like to gratefully acknowledge the individuals and their contributions here First and foremost, I would like to express sincere gratitude and thanks to my principal supervisor, Prof Dinh Phung, for his endless motivation, constant encouragement and support As an advisor, Dinh has enthusiasm and passion which provide driving inspiration to me, a beginning researcher, while he simultaneously allows me free rein to investigate emerging interests I also would like to thank my co-supervisor Prof Svetha Venkatesh for her valuable encouragement and guidance during the course of this thesis Svetha’s scientific writing workshops greatly helped me to improve my writing and reading skills I, fortunately, benefited from insightful interactions and guidance from two collaborators Dr Hung Bui and A/Prof XuanLong Nguyen Although geographically far away, I received great insights from discussions through video conference and email exchanges with them I am grateful for their time, expertise and sharpness in thinking and in shaping many ideas over the course of this thesis I am also grateful for the opportunity to interact with Dr Matthew Hoffman His helpful discussions and valuable comments shaped the work in stochastic variational inference My thanks also go to all members of PRaDA for creating our workplace an encouraging environment with many social activities after hours Also, my special thanks go to Tu Nguyen and Thin Nguyen for kindly providing datasets which were used in Chapter I also would like to thank PRaDA for providing financial support for this thesis I owe special thanks to my beloved wife, Hien, for her love, understanding, encouragement, and endless support in the best and worst moments My thanks also go to her for proofreading this thesis Last, but surely not least, I am infinitely indebted to my parents Without their eternal support and encouragement, I cannot have the opportunity to freely pursue my academic interests To them, this thesis is dedicated v Relevant Publications Part of this thesis and some related works have been published and documented elsewhere The details are as follows: Chapter 3: • Viet Huynh, Dinh Phung, Long Nguyen, Svetha Venkatesh, Hung Bui (2015) Learning conditional latent structures from multiple data sources In Proceedings of the 19 th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp 343-354, Vietnam Springer-Verlag, Berlin Heidelberg Chapter 4: • Viet Huynh, Dinh Phung, Svetha Venkatesh (2015) Streaming Variational Inference for Dirichlet Process Mixtures In Proceedings of the th Asian Conference on Machine Learning (ACML),volume 45, pages 237–252, Hong Kong • Viet Huynh, Dinh Phung (2017) Streaming Clustering with Bayesian Nonparametric Models Neurocomputing (2017) Chapter 6: • Viet Huynh, Dinh Phung, Svetha Venkatesh, Long Nguyen, Matt Hoffman, Hung Bui (2016) Scalable Nonparametric Bayesian Multilevel Clustering In Proceedings of the 32th Conference on Uncertainty in Artificial Intelligence, New York City, NY,USA vi Contents Acknowledgements v Relevant Publications vi Abstract xvi Abbreviations xix Introduction 1.1 Aims and Approaches 1.2 Significance and Contribution 1.3 Structure of the Thesis Related Background 2.1 2.2 Probabilistic Graphical Models 2.1.1 Representation 2.1.2 Inference and Learning 14 Exponential Family 19 2.2.1 Exponential Family of Distributions 19 2.2.2 Maximum Entropy and Exponential Representation 22 2.2.3 Graphical Models as Exponential Families 22 2.2.4 Some popular exponential family distributions 24 2.2.4.1 Multinomial and Categorical distributions 24 2.2.4.2 Dirichlet distribution 26 2.2.4.3 Generalized Dirichlet distribution 28 vii 2.3 2.4 2.5 Learning from Data with Bayesian Models 30 2.3.1 Bayesian Methods 30 2.3.2 Bayesian Nonparametrics 34 2.3.2.1 Dirichlet process and Dirichlet process mixtures 34 2.3.2.2 Advanced Dirichlet process-based models 41 Approximate Inference for Graphical Models 46 2.4.1 Variational inference 46 2.4.2 Markov Chain Monte Carlo (MCMC) 50 2.4.2.1 Monte Carlo estimates from independent samples 51 2.4.2.2 Markov chain Monte Carlo 52 Conclusion 56 Bayesian Nonparametric Learning from Heterogeneous Data Sources 57 3.1 Motivation 58 3.2 Context sensitive Dirichlet processes 60 3.2.1 Model description 60 3.2.2 Model Inference using MCMC 62 3.3 Context sensitive DPs with multiple contexts 67 3.4 Experiments 69 3.5 3.4.1 Reality Mining dataset 70 3.4.2 Experimental settings and results 70 Conclusion 72 Stream Learning for Bayesian Nonparametric Models 74 4.1 Motivation 75 4.2 Streaming clustering with DPM 77 4.3 4.2.1 Truncation-free variational inference 78 4.2.2 Streaming learning with DPM 83 Clustering with heterogeneous data sources 84 4.4 4.5 4.3.1 DPM with product space (DPM-PS) 84 4.3.2 Inference for DPM-PS 85 Experiments 86 4.4.1 Datasets and experimental settings 87 4.4.2 Experimental results 90 Conclusion 94 Robust Collapsed Variational Bayes for Hierarchical Dirichlet Processes 95 5.1 Problem Statement 96 5.2 Recent Advances in HDP Inference Algorithms 98 5.3 5.4 5.5 5.2.1 Truncation representation of Dirichlet process 98 5.2.2 Variational Inference for HDP 100 Truly collapsed variational Bayes for HDP 102 5.3.1 Marginalizing out document stick-breaking 102 5.3.2 Marginalizing out topic atoms 105 Distributed Inference for HDP on Apache Spark 106 5.4.1 Apache Spark and GraphX 5.4.2 Sparkling HDP 108 Experiments 109 5.5.1 5.5.2 5.6 106 Inference Performance and Running Time 109 5.5.1.1 Datasets and statistics 110 5.5.1.2 Evaluation metric 111 5.5.1.3 Results 111 Robust Pervasive Context Discovery 113 5.5.2.1 Datasets and Experimental Settings 113 5.5.2.2 Learned Patterns from Pervasive Signals 114 Conclusion 116 Scalable Bayesian Nonparametric Multilevel Clustering 117 6.1 Motivation 118 6.2 Multilevel clustering with contexts (MC2) 121 6.3 SVI for MC2 6.4 6.5 123 6.3.1 Truncated stick-breaking representations 123 6.3.2 Mean-field variational approximation 124 6.3.3 Mean-field updates 125 6.3.4 Stochastic variational inference 126 Experiments 127 6.4.1 Datasets 128 6.4.2 Experiment setups 129 6.4.3 Evaluation metrics 130 6.4.4 Experimental result 131 Conclusion 134 Conclusion and Future Directions 135 7.1 Summary of contributions 135 7.2 Future directions 137 A Supplementary Proofs 140 A.1 Properties of Exponential Family 140 A.2 Variational updates for multi-level clustering model (MC2) 141 A.2.1 Naive Variational for MC2 142 A.2.1.1 Stick-breaking variable updates 143 A.2.1.2 Content and context atom updates 144 A.2.1.3 Indicator variable updates 145 A.2.2 Structured Variational for MC2 146 A.2.2.1 Stick-breaking variable updates 147 A.2.2.2 Content and context atom updates 147 A.2.2.3 Indicator variable updates 148 A.3 Stochastic Variational for MC2 149 A.3.1 Stochastic updates for stick-breaking variables 151 A.3.2 Stochastic updates for content and context atoms 153 A.3.3 Stochastic updates for global indicator variables 154 A.3.4 Comparison between naive and structured mean field 155 Bibliography 158 Bibliography Aldous, D J (1985) Exchangeability and related topics Springer Amari, S.-I (1998) Natural gradient works efficiently in learning Neural computation, 10(2), 251–276 Andrieu, C., De Freitas, N., Doucet, A., and Jordan, M I (2003) An introduction to MCMC for machine learning Machine learning, 50(1-2), 5–43 Antoniak, C (1974) Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems The Annals of Statistics, 2(6), 1152–1174 Arnborg, S., Corneil, D G., and Proskurowski, A (1987) Complexity of finding embeddings in a k-tree SIAM Journal on Algebraic Discrete Methods, 8(2), 277– 284 Barber, D (2012) Bayesian reasoning and machine learning Cambridge University Press Beal, M J (2003) Variational Algorithms for Approximate Bayesian Inference PhD thesis Bishop, C M (2006) Pattern recognition and machine learning, volume Springer New York Blackwell, D and MacQueen, J (1973) Ferguson distributions via Pólya urn schemes The Annals of Statistics, 1(2), 353–355 Blei, D M and Jordan, M I (2006) Variational inference for Dirichlet process mixtures Bayesian Analysis, 1(1), 121–144 Blei, D M., Ng, A Y., and Jordan, M I (2003) Latent Dirichlet allocation Journal of Machine Learning Research, 3, 993–1022 Boyd, S and Vandenberghe, L (2004) Convex optimization Cambridge University Press Broderick, T., Boyd, N., Wibisono, A., Wilson, A C., and Jordan, M I (2013) Streaming variational Bayes In Advances in Neural Information Processing Systems, pages 1727–1735 158 Bibliography 159 Bryant, M and Sudderth, E B (2012) Truly nonparametric online variational inference for hierarchical Dirichlet processes In Advances in Neural Information Processing Systems, pages 2699–2707 Buntine, W and Hutter, M (2010) A Bayesian view of the Poisson-Dirichlet process arXiv preprint arXiv:1007.0296 Buntine, W and Jakulin, A (2006) Discrete component analysis In Subspace, Latent Structure and Feature Selection, pages 1–33 Springer Buntine, W L (1995) Chain graphs for learning In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 46–54 Morgan Kaufmann Publishers Inc Buntine, W L and Mishra, S (2014) Experiments with non-parametric topic models In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 881–890 ACM Chickering, D M and Heckerman, D (1997) Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables Machine learning, 29(2-3), 181–212 Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., and Zheng, Y (2009) NUSWIDE: a real-world web image database from National University of Singapore In Proceedings of the ACM International Conference on Image and Video Retrieval, page 48 ACM Connor, R J and Mosiman, J E (1969) Concepts of independence for proportions with a generalization of the Dirichlet distribution Journal of the American Statistical Association, 64, 194–206 Cover, T M and Thomas, J A (2006) Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) Wiley-Interscience Cowles, M K and Carlin, B P (1996) Markov chain Monte Carlo convergence diagnostics: a comparative review Journal of the American Statistical Association, 91(434), 883–904 Cuturi, M (2013) Sinkhorn distances: Lightspeed computation of optimal transport In Advances in Neural Information Processing Systems, pages 2292–2300 Cuturi, M and Doucet, A (2014) Fast computation of Wasserstein barycenters In Proceedings of The 31st International Conference on Machine Learning (ICML), pages 685–693 Bibliography 160 Cuturi, M and Peyré, G (2016) A smoothed dual approach for variational Wasserstein problems SIAM Journal on Imaging Sciences, 9(1), 320–343 Dayan, P., Hinton, G E., Neal, R M., and Zemel, R S (1995) The Helmholtz machine Neural computation, 7(5), 889–904 De Vries, C M., Nayak, R., Kutty, S., Geva, S., and Tagarelli, A (2010) Overview of the INEX 2010 XML mining track: Clustering and classification of XML documents In Comparative evaluation of focused retrieval, pages 363–376 Springer Berlin Heidelberg Dean, J and Ghemawat, S (2008) MapReduce: simplified data processing on large clusters Communications of the ACM, 51(1), 107–113 Diez-Roux, A V (2000) Multilevel analysis in public health research Annual review of public health, 21(1), 171–192 Du, N., Farajtabar, M., Ahmed, A., Smola, A J., and Song, L (2015) Dirichlet-Hawkes processes with applications to clustering continuous-time document streams In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 219–228 ACM Dubey, A., Hefny, A., Williamson, S., and Xing, E P (2012) A non-parametric mixture model for topic modeling over time arXiv preprint arXiv:1208.4411 Eagle, N and Pentland, A (2006) Reality Mining: Sensing complex social systems Personal and Ubiquitous Computing, 10(4), 255–268 Elango, P K and Jayaraman, K (2005) Clustering images using the latent Dirichlet allocation model University of Wisconsin Escobar, M and West, M (1995) Bayesian density estimation and inference using mixtures Journal of the american statistical association, 90(430), 577–588 Ferguson, T (1973) A Bayesian analysis of some nonparametric problems The Annals of Statistics, 1(2), 209–230 Frey, B and Dueck, D (2007) Clustering by passing messages between data points Science, 315(5814), 972–976 Frigyik, B A., Kapila, A., and Gupta, M R (2010) Introduction to the Dirichlet distribution and related processes department of electrical engineering, university of washignton Technical report, UWEETR-2010-0006 Gal, Y and Ghahramani, Z (2015) Bayesian convolutional neural networks with Bernoulli approximate variational inference arXiv preprint arXiv:1506.02158 Bibliography 161 Gelfand, A., Kottas, A., and MacEachern, S (2005) Bayesian nonparametric spatial modeling with Dirichlet process mixing Journal of the American Statistical Association, 100(471), 1021–1035 Genevay, A., Cuturi, M., Peyré, G., and Bach, F (2016) Stochastic optimization for large-scale optimal transport In Advances In Neural Information Processing Systems, pages 3432–3440 Getoor, L (2007) Introduction to statistical relational learning MIT press Gonzalez, J E., Xin, R S., Dave, A., Crankshaw, D., Franklin, M J., and Stoica, I (2014) GraphX: Graph processing in a distributed dataflow framework In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 599–613 Griffiths, T L and Ghahramani, Z (2005) Infinite latent feature models and the Indian buffet process In Advances in neural information processing systems, pages 475–482 Hawkes, A G (1971) Spectra of some self-exciting and mutually exciting point processes Biometrika, 58(1), 83–90 He, K., Zhang, X., Ren, S., and Sun, J (2015) Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification In Proceedings of the IEEE International Conference on Computer Vision, pages 1026–1034 Heckerman, D (2008) A tutorial on learning with Bayesian networks In Innovations in Bayesian networks, pages 33–82 Springer Ho, N and Nguyen, X (2016) Convergence rates of parameter estimation for some weakly identifiable finite mixtures Annals of statistics Hoffman, M., Blei, D., and Bach, F (2010) Online learning for latent Dirichlet allocation Advances in Neural Information Processing Systems, 23, 856–864 Hoffman, M D., Blei, D M., Wang, C., and Paisley, J (2013) Stochastic variational inference The Journal of Machine Learning Research, 14(1), 1303–1347 Hox, J (2010) Multilevel analysis: Techniques and applications Routledge Hsu, L C and Shiue, P J.-S (1998) A unified approach to generalized Stirling numbers Advances in Applied Mathematics, 20(3), 366–384 Hughes, M C and Sudderth, E (2013) Memoized online variational inference for Dirichlet process mixture models In Advances in Neural Information Processing Systems, pages 1133–1141 Bibliography 162 Ishwaran, H and James, L (2001) Gibbs sampling methods for stick-breaking priors Journal of the American Statistical Association, 96(453), 161–173 Ishwaran, H and James, L F (2002) Approximate Dirichlet process computing in finite normal mixtures Journal of Computational and Graphical Statistics, 11 (3), 508–532 Ishwaran, H and Zarepour, M (2002) Exact and approximate sum representations for the Dirichlet process Canadian Journal of Statistics, 30(2), 269–283 Jaakkola, T S Tutorial on variational approximation methods, chapter 10, pages 129–160 Advanced mean field methods: theory and practice MIT Press (2000) Jaynes, E T (1957) Information theory and statistical mechanics Physical review, 106(4), 620 Jaynes, E T (1982) On the rationale of maximum-entropy methods Proceedings of the IEEE, 70(9), 939–952 Jordan, M I (1998) Learning in graphical models, volume 89 Springer Science & Business Media Jordan, M I (2003) An Introduction to Probabilistic Graphical Models Unpublished Book Jordan, M I (2004) Graphical models Statistical Science, pages 140–155 Kingma, D P and Welling, M (2014) Auto-encoding variational Bayes In The International Conference on Learning Representations (ICLR), Banff Kingma, D P., Salimans, T., and Welling, M (2015) Variational dropout and the local reparameterization trick In Advances in Neural Information Processing Systems, pages 2575–2583 Curran Associates, Inc Koller, D and Friedman, N (2009) Probabilistic graphical models: Principles and techniques MIT press Kschischang, F R., Frey, B J., and Loeliger, H.-A (2001) Factor graphs and the sum-product algorithm IEEE Transactions on information theory, 47(2), 498–519 Kurihara, K., Welling, M., and Vlassis, N A (2006) Accelerated variational Dirichlet process mixtures In Advances in Neural Information Processing Systems, pages 761–768 Kurihara, K., Welling, M., and Teh, Y W (2007) Collapsed variational Dirichlet process mixture models In IJCAI, volume 7, pages 2796–2801 Bibliography 163 Laurila, J K., Gatica-Perez, D., Aad, I., Bornet, O., Do, T.-M.-T., Dousse, O., Eberle, J., Miettinen, M., et al (2012) The mobile data challenge: Big data for mobile computing research In Pervasive Computing Lauritzen, S L (1996) Graphical models, volume 17 Clarendon Press Liang, P and Klein, D (2007) Structured Bayesian nonparametric models with variational inference NIPS tutorial Lin, D (2013) Online learning of nonparametric mixture models via sequential variational approximation In Advances in Neural Information Processing Systems, pages 395–403 Liu, X., Zeng, J., Yang, X., Yan, J., and Yang, Q (2015) Scalable parallel EM algorithms for latent Dirichlet allocation in multi-core systems In Proceedings of the 24th International Conference on World Wide Web, pages 669–679 ACM Lochner, R H (1975) A generalized Dirichlet distribution in Bayesian life testing Journal of the Royal Statistical Society Series B (Methodological), pages 103–113 Lu, Y., Mei, Q., and Zhai, C (2011) Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA Information Retrieval, 14(2), 178–203 MacEachern, S (1999) Dependent nonparametric processes In Proceedings of the Section on Bayesian Statistical Science, pages 50–55 MacKay, D J (1992) A practical Bayesian framework for backpropagation networks Neural computation, 4(3), 448–472 MacKay, D J (1998) Introduction to Gaussian processes NATO ASI Series F Computer and Systems Sciences, 168, 133–166 Manning, C., Raghavan, P., and Schütze, H (2008) Introduction to information retrieval Cambridge University Press Meeds, E and Osindero, S (2006) An alternative infinite mixture of Gaussian process experts Advances in Neural Information Processing Systems, 18, 883 Meyn, S P and Tweedie, R L (2012) Markov chains and stochastic stability Springer Science & Business Media Montavon, G., Müller, K.-R., and Cuturi, M (2016) Wasserstein training of restricted Boltzmann machines In Advances In Neural Information Processing Systems, pages 3711–3719 164 Bibliography Murphy, K (2001) Learning Bayes net structure from sparse data sets Technical report, Computer Science Division, UC Berkeley Murphy, K (2012) Machine learning: A probabilistic perspective MIT press Neal, R (2000) Markov chain sampling methods for Dirichlet process mixture models Journal of computational and graphical statistics, 9(2), 249–265 Neal, R M (1993) Probabilistic inference using Markov chain Monte Carlo methods Technical report, Department of Computer Science, University of Toronto Toronto, Ontario, Canada Neal, R M (1996) Bayesian learning for neural networks, volume 118 of Lecture Notes in Statistics New York: Springer-Verlag Neal, R M (2004) Bayesian methods for machine learning NIPS tutorial, 13 Nguyen, T.-B., Nguyen, V., Venkatesh, S., and Phung, D (2016)a Mcnc: Multichannel nonparametric clustering from heterogeneous data In Proceedings of ICPR Nguyen, T.-B., Nguyen, V., Venkatesh, S., and Phung, D (2016)b Learning multi-faceted activities from heterogeneous data with the product space hierarchical dirichlet processes In PAKDD Workshop Nguyen, T C., Phung, D., Gupta, S., and Venkatesh, S (2013) Extraction of latent patterns and contexts from social honest signals using hierarchical Dirichlet processes In Proceedings of the IEEE International Conference on Pervasive Computing and Communications, pages 47–55 Nguyen, V., Phung, D., Venkatesh, S Nguyen, X., and Bui, H (2014) Bayesian nonparametric multilevel clustering with group-level contexts In Proceedings of International Conference on Machine Learning (ICML), pages 288–296, Beijing, China Nguyen, X (2013) Convergence of latent mixing measures in finite and infinite mixture models The Annals of Statistics, 41(1), 370–400 Pearl, J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference Morgan Kaufmann Petersen, K B et al (2012) The matrix cookbook Phung, D and Vo, B.-N (2014) A random finite set model for data clustering In 17th International Conference on Information Fusion (Fusion 2014) Bibliography 165 Phung, D., Nguyen, T C., Gupta, S., and Venkatesh, S (2014) Learning latent activities from social signals with hierarchical Dirichlet process In Gita Sukthankar, et al., editor, Handbook on Plan, Activity, and Intent Recognition, pages 149–174 Elsevier Pitman, J (2002) Combinatorial stochastic processes Technical report, Department of Statistics, University of California at Berkeley Pitman, J and Yor, M (1997) The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator Annals of Probability, 25, 855–900 Rasmussen, C E (2004) Gaussian processes in machine learning In Advanced lectures on machine learning, pages 63–71 Springer Rasmussen, C E and Ghahramani, Z (2002) Infinite mixtures of Gaussian process experts Advances in neural information processing systems, 2, 881–888 Rezende, D J., Mohamed, S., and Wierstra, D (2014) Stochastic backpropagation and approximate inference in deep generative models In Proceedings of The 31st International Conference on Machine Learning (ICML), Beijing, China Robert, C (2007) The Bayesian choice: from decision-theoretic foundations to computational implementation Springer Science & Business Media Robert, C P and Casella, G (2005) Monte Carlo statistical methods (Springer Texts in Statistics) Springer-Verlag New York, Inc., Secaucus, NJ, USA Rodriguez, A., Dunson, D., and Gelfand, A (2008) The nested Dirichlet process Journal of the American Statistical Association, 103(483), 1131–1154 Russell, S J and Norvig, P Artificial intelligence: a modern approach (3rd edition) (2009) Sato, I., Kurihara, K., and Nakagawa, H (2012) Practical collapsed variational Bayes inference for hierarchical Dirichlet process In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 105–113 ACM Scott, J A (2015) Getting Started with Apache Spark: Inception to Production MapR Technologies Sethuraman, J (1994) A constructive definition of Dirichlet priors Statistica Sinica, 4(2), 639–650 Simma, A and Jordan, M I (2012) Modeling events with cascades of Poisson processes In Proceedings of International Conference on Uncertainty in Artificial Intelligence (UAI) Bibliography 166 Smolan, R and Erwitt, J (2013) The human face of big data Against All Odds Productions Speed, T and Kiiveri, H (1986) Gaussian Markov distributions over finite graphs The Annals of Statistics, pages 138–150 Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A (2015) Going deeper with convolutions In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1–9 Tank, A., Foti, N., and Fox, E (2015) Streaming variational inference for Bayesian nonparametric mixture models In Proc of Int Conf on Artificial Intelligence and Statistics (AISTAT), pages 977–985 Tayal, A., Poupart, P., and Li, Y (2012) Hierarchical double Dirichlet process mixture of Gaussian processes In AAAI Teh, Y and Jordan, M (2009) Hierarchical Bayesian nonparametric models with applications In Hjort, N., Holmes, C., Müller, P., and Walker, S., editors, Bayesian Nonparametrics: Principles and Practice, pages 158–180 Cambridge University Press Teh, Y., Jordan, M., Beal, M., and Blei, D (2006) Hierarchical Dirichlet processes Journal of the American Statistical Association, 101(476), 1566–1581 Teh, Y W., Kurihara, K., and Welling, M (2007) Collapsed variational inference for HDP In Advances in neural information processing systems, pages 1481–1488 Thibaux, R and Jordan, M (2007) Hierarchical Beta processes and the Indian buffet process In Proc of Int Conf on Artificial Intelligence and Statistics (AISTAT), volume 11, pages 564–571 Tran, N.-Q., Vo, B.-N., Phung, D., and Vo, B.-T (2016) Clustering for point pattern data In 23rd International Conference on Pattern Recognition (ICPR 2016) Villani, C (2008) Optimal transport: old and new, volume 338 Springer Science & Business Media Vo, B.-N., Tran, N.-Q., Phung, D., and Vo, B.-T (2016) Model-based classification and novelty detection for point pattern data In 23rd International Conference on Pattern Recognition (ICPR 2016) Bibliography 167 Wainwright, M J and Jordan, M I (2008) Graphical models, exponential families, and variational inference Foundations and Trends R in Machine Learning, (1-2), 1–305 Wallach, H., Murray, I., Salakhutdinov, R., and Mimno, D (2009) Evaluation methods for topic models In Proceedings of International Conference on Machine Learning (ICML), pages 1105–1112 ACM Wang, C and Blei, D M (2012) Truncation-free online variational inference for Bayesian nonparametric models In Advances in neural information processing systems, pages 413–421 Wang, C., Paisley, J., and Blei, D M (2011) Online variational inference for the hierarchical Dirichlet process In Artificial Intelligence and Statistics Wang, R., Chen, F., Chen, Z., Li, T., Harari, G., Tignor, S., Zhou, X., Ben-Zeev, D., and Campbell, A T (2014) Studentlife: assessing mental health, academic performance and behavioral trends of college students using smartphones In Proceedings UbiComp Wasserman, L (2013) All of statistics: A concise course in statistical inference Springer Science & Business Media Welling, M and Kurihara, K (2006) Bayesian k-means as a "maximizationexpectation" algorithm In SDM, pages 474–478 SIAM Wong, T.-T (1998) Generalized Dirichlet distribution in Bayesian analysis Applied Mathematics and Computation, 97, 165–181 Wulsin, D., Jensen, S., and Litt, B (2012) A hierarchical Dirichlet process model with multiple levels of clustering for human EEG seizure modeling In Proceedings of International Conference on Machine Learning (ICML) Xie, P and Xing, E P (2013) Integrating document clustering and topic modeling Zaharia, M., Chowdhury, M., Franklin, M J., Shenker, S., and Stoica, I (2010) Spark: cluster computing with working sets In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, volume 10, page 10 Zhai, K., Boyd-Graber, J., Asadi, N., and Alkhouja, M L (2012) Mr LDA: A flexible large scale topic modeling package using variational inference in MapReduce In Proceedings of the 21st international conference on World Wide Web, pages 879–888 ACM Bibliography 168 Every reasonable effort has been made to acknowledge the owners of copyright material I would be pleased to hear from any copyright owner who has been omitted or incorrectly acknowledged 1/8/2017 RightsLink Printable License SPRINGER LICENSE TERMS AND CONDITIONS Jan 08, 2017 This Agreement between Viet H. Huynh ("You") and Springer ("Springer") consists of your license details and the terms and conditions provided by Springer and Copyright Clearance Center License Number 4024020357978 License date Jan 08, 2017 Licensed Content Publisher Springer Licensed Content Publication Springer eBook Licensed Content Title Learning Conditional Latent Structures from Multiple Data Sources Licensed Content Author Viet Huynh Licensed Content Date Jan 1, 2015 Type of Use Thesis/Dissertation Portion Full text Number of copies Author of this Springer article Yes and you are the sole author of the new work Order reference number Title of your thesis / dissertation Towards Scalable Bayesian Nonparametric Methods for Data Analytics Expected completion date Jan 2017 Estimated size(pages) 180 Requestor Location Viet H. Huynh 75 Pigdons Rd Highton, VIC 3216 Australia Attn: Viet H. Huynh Billing Type Invoice Billing Address Viet H. Huynh 75 Pigdons Rd Highton, Australia 3216 Attn: Viet H. Huynh Total 0.00 USD Terms and Conditions Introduction The publisher for this copyrighted material is Springer. By clicking "accept" in connection with completing this licensing transaction, you agree that the following terms and conditions apply to this transaction (along with the Billing and Payment terms and conditions established by Copyright Clearance Center, Inc. ("CCC"), at the time that you opened your Rightslink account and that are available at any time at http://myaccount.copyright.com) https://s100.copyright.com/AppDispatchServlet 1/4 1/8/2017 RightsLink Printable License Limited License With reference to your request to reuse material on which Springer controls the copyright, permission is granted for the use indicated in your enquiry under the following conditions: Licenses are for onetime use only with a maximum distribution equal to the number stated in your request Springer material represents original material which does not carry references to other sources. If the material in question appears with a credit to another source, this permission is not valid and authorization has to be obtained from the original copyright holder This permission • is nonexclusive • is only valid if no personal rights, trademarks, or competitive products are infringed • explicitly excludes the right for derivatives Springer does not supply original artwork or content According to the format which you have selected, the following conditions apply accordingly: • Print and Electronic: This License include use in electronic form provided it is password protected, on intranet, or CDRom/DVD or Ebook/Ejournal. It may not be republished in electronic open access • Print: This License excludes use in electronic form • Electronic: This License only pertains to use in electronic form provided it is password protected, on intranet, or CDRom/DVD or Ebook/Ejournal. It may not be republished in electronic open access For any electronic use not mentioned, please contact Springer at permissions.springer@spi global.com Although Springer controls the copyright to the material and is entitled to negotiate on rights, this license is only valid subject to courtesy information to the author (address is given in the article/chapter) If you are an STM Signatory or your work will be published by an STM Signatory and you are requesting to reuse figures/tables/illustrations or single text extracts, permission is granted according to STM Permissions Guidelines: http://www.stmassoc.org/permissions guidelines/ For any electronic use not mentioned in the Guidelines, please contact Springer at permissions.springer@spiglobal.com. If you request to reuse more content than stipulated in the STM Permissions Guidelines, you will be charged a permission fee for the excess content Permission is valid upon payment of the fee as indicated in the licensing process. If permission is granted free of charge on this occasion, that does not prejudice any rights we might have to charge for reproduction of our copyrighted material in the future If your request is for reuse in a Thesis, permission is granted free of charge under the following conditions: This license is valid for onetime use only for the purpose of defending your thesis and with a maximum of 100 extra copies in paper. If the thesis is going to be published, permission needs to be reobtained includes use in an electronic form, provided it is an authorcreated version of the thesis on his/her own website and his/her university’s repository, including UMI (according to the definition on the Sherpa website: http://www.sherpa.ac.uk/romeo/); is subject to courtesy information to the coauthor or corresponding author Geographic Rights: Scope Licenses may be exercised anywhere in the world Altering/Modifying Material: Not Permitted Figures, tables, and illustrations may be altered minimally to serve your work. You may not alter or modify text in any manner. Abbreviations, additions, deletions and/or any other alterations shall be made only with prior written authorization of the author(s) https://s100.copyright.com/AppDispatchServlet 2/4 1/8/2017 RightsLink Printable License Reservation of Rights Springer reserves all rights not specifically granted in the combination of (i) the license details provided by you and accepted in the course of this licensing transaction and (ii) these terms and conditions and (iii) CCC's Billing and Payment terms and conditions License Contingent on Payment While you may exercise the rights licensed immediately upon issuance of the license at the end of the licensing process for the transaction, provided that you have disclosed complete and accurate details of your proposed use, no license is finally effective unless and until full payment is received from you (either by Springer or by CCC) as provided in CCC's Billing and Payment terms and conditions. If full payment is not received by the date due, then any license preliminarily granted shall be deemed automatically revoked and shall be void as if never granted. Further, in the event that you breach any of these terms and conditions or any of CCC's Billing and Payment terms and conditions, the license is automatically revoked and shall be void as if never granted. Use of materials as described in a revoked license, as well as any use of the materials beyond the scope of an unrevoked license, may constitute copyright infringement and Springer reserves the right to take any and all action to protect its copyright in the materials Copyright Notice: Disclaimer You must include the following copyright and permission notice in connection with any reproduction of the licensed material: "Springer book/journal title, chapter/article title, volume, year of publication, page, name(s) of author(s), (original copyright notice as given in the publication in which the material was originally published) "With permission of Springer" In case of use of a graph or illustration, the caption of the graph or illustration must be included, as it is indicated in the original publication Warranties: None Springer makes no representations or warranties with respect to the licensed material and adopts on its own behalf the limitations and disclaimers established by CCC on its behalf in its Billing and Payment terms and conditions for this licensing transaction Indemnity You hereby indemnify and agree to hold harmless Springer and CCC, and their respective officers, directors, employees and agents, from and against any and all claims arising out of your use of the licensed material other than as specifically authorized pursuant to this license No Transfer of License This license is personal to you and may not be sublicensed, assigned, or transferred by you without Springer's written permission No Amendment Except in Writing This license may not be amended except in a writing signed by both parties (or, in the case of Springer, by CCC on Springer's behalf) Objection to Contrary Terms Springer hereby objects to any terms contained in any purchase order, acknowledgment, check endorsement or other writing prepared by you, which terms are inconsistent with these terms and conditions or CCC's Billing and Payment terms and conditions. These terms and conditions, together with CCC's Billing and Payment terms and conditions (which are incorporated herein), comprise the entire agreement between you and Springer (and CCC) concerning this licensing transaction. In the event of any conflict between your obligations established by these terms and conditions and those established by CCC's Billing and Payment terms and conditions, these terms and conditions shall control Jurisdiction All disputes that may arise in connection with this present License, or the breach thereof, shall be settled exclusively by arbitration, to be held in the Federal Republic of Germany, in accordance with German law Other conditions: https://s100.copyright.com/AppDispatchServlet 3/4 1/8/2017 RightsLink Printable License V 12AUG2015 Questions? customercare@copyright.com or +18552393415 (toll free in the US) or +19786462777 https://s100.copyright.com/AppDispatchServlet 4/4 ... the Bayesian Choice” (Robert, 2007, Chapter 11) for an interesting discussion on the choice for Bayesian methods 31 2.3 Learning from Data with Bayesian Models Define models and priors Gather data. .. resilience to over-fitting of Bayesian nonparametrics makes them be the suitable framework for learning with big data Therefore, we ground our works in this thesis on Bayesian nonparametric methodology... 6.2 Log perplexity of Wikipedia and PubMed data 6.3 Extended Normalized mutual information (NMI) for Pubmed data 6.4 Clustering performance for AUA data 111 112