Scalable data parallel graph algorithms from generation to management

236 425 0
Scalable data parallel graph algorithms from generation to management

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Scalable Data-Parallel graph algorithms from generation to management Sadegh Nobari (B.Eng.(Hons.),IUST) (Ph.D.,NUS) A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF COMPUTER SCIENCE SCHOOL OF COMPUTING NATIONAL UNIVERSITY OF SINGAPORE 2012 Declaration I hereby declare that this thesis is my original work and it has been written by me in its entirety I have duly acknowledged all the sources of information which have been used in the thesis This thesis has also not been submitted for any degree in any university previously Sadegh Nobari 23 July 2012 i Acknowledgements Ph.D was a wonderful extraordinary one time in life experience I would like to say thanks To my parents (Zeynab and Nader) and my only brother (Ghasem), through their sacrifice my opportunities were possible To my advisors, Professor St´ phane Bressan e Professors Anastasia Ailamaki, Panagiotis Karras, Panos Kalnis, Nikos Mamoulis and Yannis Velegrakis for patiently supporting me To my committee, ă Professors Tan Tiow Seng, Tan Kian-Lee, M Tamer Ozsu and Leong Hon Wai for gladly suffering my impenetrable prose and helping me to better communicate ii To my friends, Xuesong Lu, Song Yi, Tang Ruiming, Antoine Veillard, Quoc Trung Tran, Cao Thanh Tung, Ehsan Kazemi, Siarhei Bykau, Mohammad Oliya, Behzad Nemat Pajouh, Thomas Heinis, Clemens Lay, Reza Sherkat and for accompanying me To my groups, people in Database research and Embedded System labs of NUS, DIAS of EPFL, dbTrento of University of Trento and Dennis Shasha’s group of NYU for accepting me To my wife Mozhdeh, for redefining my senses Best Wishes, Dr Sadegh Nobari With a quarter-century of life experience 2012 iii Abstract J J Sylvester, in 1878, in an article on chemistry and algebra in Nature, called a mathematical structure to model connections between objects, ”graph” More than a century later, the versatility of graphs as a data model is demonstrated by the long list of applications in mathematics, science, engineering and the humanities Cormen, Leiserson, Rivest, and Stein describe the role of graphs and graph algorithms in computer science as follows in their popular textbook: ”graphs are a pervasive data structure in computer science, and algorithms working with them are fundamental to the field.” Graphs are natural data structures for modern applications Social network data are typically represented as graphs, semantic web is based on RDF formalism that is a graph model, software models and program dependence in software engineering represented via graphs In many cases these are very large and dynamic graphs The convergence of applications managing large graphs and the availability of cheap parallel processing hardware caused a renewed interest in managing very large graphs over parallel systems In this dissertation, we design scalable and practical graph algorithms for a selected set of large graph generation and management problems In particular, we provide par- i allel solutions for graph generation with both random and real-world graph models Afterward, we propose techniques for processing large graphs in parallel, specifically for computing the Minimum Spanning Forest and the Shortest Path between vertices Chapter focuses on the generation of very large graphs The nave algorithm for generating eros graphs does not scale to large graphs In this chapter we take a systematic approach to the development of the PPreZER algorithm by proposing a series of seven algorithms The results of our study depict that our fine tuned algorithm, PPreZER, for generating random graph data can be executed on a typical GPU on average 19 times faster than its fastest sequential version on the CPU Chapter moves beyond random graphs and considers the generation of real-world graphs This chapter considers the spatial datasets and the generation of graphs by taking the spatial join of the elements in the two datasets We propose an algorithm (called HiDOP) to perform this spatial join operation efficiently Consequently we design a data parallel algorithm inspired from HiDOP algorithm Chapters and cover the data management part of the thesis Two graph algorithms, a.k.a graph queries, are studied: Minimum Spanning Forest (Chapter 5) and All-Pairs Shortest Path (Chapter 6) In Chapter 5, PMA, a novel data parallel algorithm this is inspired from Bor˙ vka’s and Prims MSF algorithm is proposed PMA experimentally u shows to be superior over the state of the art MSF algorithms Chapter introduces a threshold Lto the problem definition of all-pairs shortest path such that only the paths that have weight less than Lare found, the problem is called L-APSP This threshold is advantageous when only close connections are of interest, like in large social networks A large number of APSP algorithms are studied and for each a counterpart L-APSP algorithm is designed and a parallel version algorithm that exploits GPU is proposed Finally, this dissertation has led to the proposal of four scalable data-parallel algorithms for graph data processing ii Table of Contents Acknowledgements Abstract Table of Contents List of Figures List of Tables List of Algorithms Introduction 1.1 Graph 1.2 Parallel processing 1.3 Contributions 1.4 Graph data generation 1.4.1 Generating random graphs Application Existing algorithms Proposed algorithm 1.4.2 Generating real-world graphs Application Existing algorithms Proposed algorithm 1.5 Graph data management 1.5.1 Finding Minimum Spanning Forest Application Existing algorithms iii ii i viii xii xiii xv 1 6 7 8 9 10 10 1.6 Proposed algorithm 1.5.2 Finding Shortest Path Application Existing algorithms Proposed algorithm Overview Parallel processing on Graphics Processing Unit (GPU) 2.1 Many and Multi core architectures 2.2 GPU Architecture 2.3 The CUDA and BrookGPU programming frameworks 2.4 SIMT: Single Instruction, Multiple Threads 2.5 Parallel Thread Execution (PTX) 2.6 GPU Memory hierarchy 2.7 GPU Optimizations 2.8 GPU empirical analysis 2.9 Programming the GPU 2.9.1 Parallel Pseudo-Random Number Generator 2.9.2 Parallel Prefix Sum 2.9.3 Parallel Stream Compaction 2.10 chapter summary Scalable Random Graph Generation 3.1 Introduction 3.2 Related Work 3.3 Baseline algorithm 3.4 Sequential algorithms 3.4.1 Skipping Edges 3.4.2 ZER 3.4.3 PreLogZER 3.4.4 PreZER 3.5 Parallel algorithms 3.5.1 PER 3.5.2 PZER 3.5.3 PPreZER 3.6 Performance Evaluation 3.6.1 Setup 3.6.2 Results Overall Comparison iv 10 11 12 13 13 14 15 15 16 16 17 21 22 22 25 27 27 29 30 30 33 33 36 38 40 40 42 43 44 45 45 47 50 51 51 51 51 3.7 Speedup Assessment Comparison among Parallel algorithms Parallelism Speedup Size Scalability Performance Tuning 3.6.3 Discussion Chapter Summary Scalable Real-world graph generation 4.1 Introduction 4.2 Related Work 4.2.1 In-Memory Approaches 4.2.2 On-disk Approaches Both Datasets Indexed One Dataset Indexed Unindexed 4.3 Motivation 4.3.1 Touch Detection 4.3.2 Motivation Examples 4.3.3 Motivation Experiments 4.4 HiDOP: Hierarchical Data Oriented Partitioning 4.4.1 Problem Definition 4.4.2 HiDOP Ideas 4.4.3 Algorithm Overview 4.4.4 Tree Building Phase 4.4.5 Assignment Phase 4.4.6 Probing Phase 4.4.7 Proof of Correctness 4.5 Implementation 4.5.1 Partitioning 4.5.2 Design Parameters Tree Parameters Local Join Parameters Join Order 4.6 Parallel algorithms 4.7 Experimental Evaluation 4.7.1 Setup 4.7.2 Experimental Methodology 4.7.3 Loading the Data v 53 54 55 56 57 58 59 63 63 66 66 67 67 67 68 70 71 72 73 75 75 76 76 78 80 82 83 84 84 85 85 86 87 87 90 90 91 92 4.7.4 4.8 Varying Dataset B Small Datasets Large Datasets 4.7.5 Varying Epsilon 4.7.6 Neuroscience Datasets 4.7.7 Parallel HiDOP experiments Overall Comparison Speedup Assessment Chapter Summary Scalable Parallel Minimum Spanning Forest Computation 5.1 Introduction 5.2 Related Work 5.2.1 Sequential algorithms Bor˙ vka u Kruskal Reverse-Delete Prim 5.2.2 Parallel algorithms 5.3 DPMST: Bor˙ vka based Data Parallel MST algorithm u 5.3.1 Implementation on GPU 5.4 Motivation for scalability 5.5 PMA: Scalable Parallel MSF algorithm 5.5.1 Partial Prim 5.5.2 Unification step 5.5.3 Proof of Correctness 5.5.4 Complexity Analysis 5.6 PMA implementation 5.6.1 Partial Prim implementation MinPMA algorithm SortPMA algorithm HybridPMA algorithm 5.6.2 Unification implementation 5.6.3 Implementation notes 5.7 Experiments 5.7.1 DPMST performance evaluation Experimental Setup Experimental Results 5.7.2 PMA performance evaluation vi 93 93 94 96 96 99 99 100 101 103 103 106 106 106 107 108 108 108 110 112 113 114 114 115 117 118 119 119 120 121 121 122 123 123 124 124 124 127 sampling these gigantic graphs in order to discover the properties of the whole graph given the properties of the sampled sub graphs The goal of processing the sampled graphs is to quickly find properties of huge dynamic graphs like social networks 7.4.2 Long term goals Parallel graph processing We are interested to similarly data parallelize the algorithms for the dynamically changing graphs, i.e dynamic graphs [157] Cloud computing, e.g MapReduce [54], and streaming data parallel graph processing are the two approaches to study this data deluge [6] Distributed graph processing Our current work focuses on the algorithms tuned for a single many core processor, a single GPU However, given the recently developed clusters of many core processors, we want to redesign our solutions to exploit the parallel processing power of these clusters 203 REFERENCES [1] AMD CTM http://ati.amd.com/products/streamprocessor/ 16 [2] Boost C++ graph library http://www.boost.org/ 129, 134 [3] BrookGPU http://graphics.stanford.edu/projects/brookgpu/index.html 16 [4] CUDA Zone: Toolkit & SDK http://www.nvidia.com/object/what is cuda new.html 4, 16, 22, 27, 37, 57, 105, 119 [5] CUDPP http://code.google.com/p/cudpp 119 [6] The data deluge, the economist http://www.economist.com/node/15579717 203 [7] Folding@home http://www.scei.co.jp/folding/ 16 [8] GTgraph a suite of synthetic random https://sdm.lbl.gov/∼kamesh/software/GTgraph/ 129 graph [9] The ninth DIMACS challenge on http://www.dis.uniroma1.it/∼challenge9/ 124, 129 shortest generators paths [10] OpenGL http://www.opengl.org/ 16 [11] Scopus indexing database http://www.scopus.com ix, 2, 3, [12] SETI@home http://setiathome.berkeley.edu/ 16 [13] Stanford large network dataset collection http://snap.stanford.edu/data/index.html 129, 157, 187 204 [14] Statistics http://news.netcraft.com/archives/2012/04/04/april-2012-web-serversurvey.html/, http://www.radicati.com/wp/wp-content/uploads/2012/04/EmailStatistics-Report-2012-2016-Executive-Summary.pdf, http://newsroom.fb.com/, http://www.slideshare.net/amover/linked-in-demographics-and-statistics-2011 [15] M.M Abd Ellah, A.A Megawer, and Y.M Kadah Software development for low cost, high quality, real-time 4d ultrasound imaging system on personal computers (pcs) In Radio Science Conference, 2009 NRSC 2009 National, pages 1–2, March 2009 15 [16] C.C Adams The Knot Book: An Elementary Introduction To The Mathematical Theory Of Knots American Mathematical Society, 2004 [17] Charu C Aggarwal, Fatima Al-Garawi, and Philip S Yu Intelligent crawling on the world wide web with arbitrary predicates In Proceedings of the 10th international conference on World Wide Web, WWW ’01, pages 96–105, New York, NY, USA, 2001 ACM [18] Charu C Aggarwal and Haixun Wang Managing and Mining Graph Data Springer Publishing Company, Incorporated, 1st edition, 2010 2, 5, [19] Anastassia Ailamaki, Naga K Govindaraju, Stavros Harizopoulos, and Dinesh Manocha Query co-processing on commodity processors In VLDB ’06: Proceedings of the 32nd international conference on Very large data bases, pages 1267–1267 VLDB Endowment, 2006 15 [20] Reka Albert, Hawoong Jeong, and Albert-Laszlo Barabasi The diameter of the world wide web NATURE, 401:130, 1999 2, [21] Walid G Aref and Hanan Samet A Cost Model for Query Optimization Using R-Trees In ACM Conference on Geographical Informaton Systems (GIS), 1994 92 [22] Lars Backstrom, Cynthia Dwork, and Jon Kleinberg Wherefore art thou R3579X?: Anonymized social networks, hidden patterns, and structural steganography In WWW, 2007 167, 171 [23] David A Bader and Guojing Cong Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs In IPDPS, 2004 108, 109, 127 [24] David A Bader and Guojing Cong Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs J Parallel Distrib Comput., 66(11):1366–1378, 2006 105, 109, 113 205 [25] Gisele Busichia Baioco, Agma J M Traina, and Caetano Traina Jr Mamcost: Global and local estimates leading to robust cost estimation of similarity queries In Proceedings of the 19th International Conference on Scientific and Statistical Database Management, SSDBM ’07, pages 6–, Washington, DC, USA, 2007 IEEE Computer Society 88 [26] Albert-Laszlo Barabasi, Reka Albert, and Hawoong Jeong Mean-field theory for scale-free random networks, 1999 [27] A Barabsi Emergence of scaling in random networks Science, 286(5439):509– 512, 1999 [28] Vladimir Batagelj and Ulrik Brandes Efficient generation of large random networks Physical Review E, 71(3):036113, 2005 33, 37, 41, 42 [29] C Berge Th´ orie des graphes et ses applications Methuen, 1962 e [30] C Berge Graphs and Hypergraphs North-Holland Mathematical Library NorthHolland Publishing Company, 1976 5, 14, 200 [31] Smriti Bhagat, Graham Cormode, Balachander Krishnamurthy, and Divesh Srivastava Class-based graph anonymization for social network data PVLDB, 2(1), 2009 12, 167, 169, 172 [32] Smriti Bhagat, Graham Cormode, Balachander Krishnamurthy, and Divesh Srivastava Privacy in dynamic social networks In WWW, 2010 167, 172 [33] D Blythe The direct3d 10 system ACM Trans Graph.,25(3):724734, 2006 16 [34] B Bollobas Random graphs Academic Press, 2nd edition, 2001 36 [35] A Borel Linear Algebraic Groups Graduate Texts in Mathematics SpringerVerlag, 1991 [36] Otakar Boruvka O jist´ m probl´ mu minim´ ln´m (about a certain minimal probe e a ı lem) Pr´ ce mor pr´rodoved spol v Brne III, pages 37–58, 1926 104, 106 a ı [37] Paul Bratley, Bennet L Fox, and Linus E Schrage A Guide to Simulation Springer, 2nd edition, 1987 28 [38] Thomas Brinkhoff, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger Multi-step Processing of Spatial Joins In ACM SIGMOD International Conference on Management of Data, 1994 65 [39] Thomas Brinkhoff, Hans-Peter Kriegel, and Bernhard Seeger Efficient Processing of Spatial Joins Using R-Trees In ACM SIGMOD International Conference on Management of Data, 1993 67, 73, 90 206 [40] Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener Graph structure in the web Computer Networks, 33(16):309 – 320, 2000 [41] Andrei Z Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet L Wiener Graph structure in the web Computer Networks, 33(1-6):309–320, 2000 36 [42] Aydin Buluc, John R Gilbert, and Ceren Budak Solving path problems on the ¸ gpu Parallel Computing, 36(5-6):241–253, 2010 143 [43] Alina Campan and Traian Marius Truta Data and structural k-anonymity in social networks In PinKDD, 2008 12, 167, 169, 171 [44] Jean Carle and David Simplot-Ryl Energy-efficient area monitoring for sensor networks Computer, 37(2):40–46, 2004 10, 103 [45] Yifeng Chen and J W Sanders Abstraction of object graphs in program verification In Proceedings of the 10th international conference on Mathematics of program construction, MPC’10, pages 80–99, Berlin, Heidelberg, 2010 SpringerVerlag [46] James Cheng, Ada Wai-Chee Fu, and Jia Liu k-isomorphism: Privacy-preserving network publication against structural attacks In SIGMOD, 2010 13, 167, 169, 170, 173 [47] Jong-Sheng Cherng and Mei-Jung Lo A hypergraph based clustering algorithm for spatial data sets In Proceedings of the 2001 IEEE International Conference on Data Mining, ICDM ’01, pages 83–90, Washington, DC, USA, 2001 IEEE Computer Society 63 [48] Sun Chung and Anne Condon Parallel implementation of boruvka’s minimum spanning tree algorithm Parallel Processing Symposium, International, 0:302, 1996 109 [49] Guojing Cong and David A Bader Designing irregular parallel algorithms with mutual exclusion and lock-free protocols volume 66, pages 854–866, Orlando, FL, USA, 2006 Academic Press, Inc 104 [50] Jason Cong, Lei He, Cheng-Kok Koh, and Patrick H Madden Performance optimization of vlsi interconnect layout Integration, 21(1-2):1–94, 1996 10, 103 [51] Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein Introduction to Algorithms, Third Edition The MIT Press, 3rd edition, 2009 2, 5, 105, 136, 200 207 [52] Graham Cormode, Divesh Srivastava, Ting Yu, and Qing Zhang Anonymizing bipartite graph data using safe groupings The VLDB Journal, 19(1):115–139, 2010 167, 172 [53] S Cris´ stomo, U Schilcher, C Bettstetter, and J Barros Analysis of probabilistic o flooding: How we choose the right coin In IEEE ICC, 2009 6, 34 [54] Jeffrey Dean and Sanjay Ghemawat MapReduce: simplified data processing on large clusters In OSDI, 2004 37, 203 [55] F Dehne and S Gă tz Practical parallel algorithms for minimum spanning trees o In SRDS ’98: Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems, page 366, Washington, DC, USA, 1998 IEEE Computer Society 104, 109, 113 [56] Ramez Elmasri and Shamkant B Navathe Fundamentals of Database Systems Addison Wesley, 3rd edition edition, 2000 67, 73, 90 [57] E Elmroth, F Gustavson, I Jonsson, and B Kågstr ”om Recursive blocked algorithms and hybrid data structures for dense matrix library software SIAM review, 46(1):3–45, 2004 142 [58] P Erd˝ s and A R´ nyi On random graphs I Publicationes Mathematicae, 6:290– o e 297, 1959 6, 34, 37 [59] Leonhard Euler Solutio problematis ad geometriam situs pertinentis Comment Academiae Sci I Petropolitanae 8, pages 128–140, 1736 [60] Alton Farris, Ashish Sharma, Cristobal Niedermayr, Daniel Brat, David Foran, Fusheng Wang, Joel Saltz, Jun Kong, Lee Cooper, Tae Oh, Tahsin Kurc, Tony Pan, and Wenjin Chen A Data Model and Database for High-resolution Pathology Analytical Image Informatics Journal of Pathology Informatics, 2(1):32, 2011 8, 64 [61] Robert W Floyd Algorithm 97: Shortest path Communications of the ACM, 5(6):345, 1962 180 [62] James H Fowler, Christopher T Dawes, and Nicholas A Christakis Model of genetic variation in human social networks PNAS, 106(6):1720–1724, 2009 6, 34 [63] Geoffrey Fox Parallel computing comes of age: Supercomputer level parallel computations at caltech Concurrency - Practice and Experience, pages 63–103, 1989 17 [64] Ayalvadi J Ganesh, Laurent Massouli´ , and Donald F Towsley The effect of e network topology on the spread of epidemics In INFOCOM, 2005 6, 34 208 [65] Michael R Garey and David S Johnson Computers and Intractability: A Guide to the Theory of NP-Completeness W H Freeman & Co., New York, NY, USA, 1979 176 [66] Gabriel Ghinita, Panagiotis Karras, Panos Kalnis, and Nikos Mamoulis Fast data anonymization with low information loss In VLDB, 2007 172 [67] P B Gibbons A more practical PRAM model 1st ACM Symposium on Parallel Algorithms and Architectures, page 158168, 1989 31 [68] E N Gilbert Random graphs 30(4):1141–1144, 1959 6, 34, 37 The Annals of Mathematical Statistics, [69] Aristides Gionis, Heikki Mannila, Taneli Mielikă inen, and Panayiotis Tsaparas a Assessing data mining results via swap randomization TKDD, 1(3), 2007 37, 199 [70] S Gnanakaran, Hugh Nymeyer, John Portman, Kevin Y Sanbonmatsu, and Angel E Garca Peptide folding simulations Current Opinion in Structural Biology, 13(2):168–174, 2003 8, 64 [71] Sharad Goel, Roby Muhamad, and Duncan Watts Social search in ”small-world” experiments In WWW, 2009 138, 170 [72] Naga K Govindaraju, Jim Gray, Ritesh Kumar, and Dinesh Manocha GPUTeraSort: high performance graphics co-processor sorting for large database management In SIGMOD, 2006 5, 16, 23, 104 [73] Naga K Govindaraju, Brandon Lloyd, Wei Wang, Ming Lin, and Dinesh Manocha Fast computation of database operations using graphics processors In SIGMOD, 2004 5, 16, 104 [74] Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Introduction to Parallel Computing, 2nd edition Addison-Wesley, 2003 5, 140 [75] Antonin Guttman R-trees: a Dynamic Index Structure for Spatial Searching In ACM SIGMOD International Conference on Management of Data, 1984 67, 84 [76] Sami Hanhijă rvi, Gemma C Garriga, and Kai Puolamă ki Randomization techa a niques for graphs In SDM, 2009 36 [77] Pawan Harish and P J Narayanan Accelerating large graph algorithms on the gpu using cuda In Proceedings of the 14th international conference on High performance computing, HiPC’07, pages 197–208, Berlin, Heidelberg, 2007 SpringerVerlag 143 209 [78] J.M Harris, J.L Hirst, and M.J Mossinghoff Combinatorics and Graph Theory Undergraduate Texts in Mathematics Springer, 2008 ix, [79] Michael Hay, Gerome Miklau, David Jensen, Don Towsley, and Philipp Weis Resisting structural re-identification in anonymized social networks PVLDB, 1(1), 2008 12, 167, 169, 171 [80] Michael Hay, Gerome Miklau, David Jensen, Donald F Towsley, and Philipp Weis Resisting structural re-identification in anonymized social networks PVLDB, 1(1):102–114, 2008 199 [81] Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga Govindaraju, Qiong Luo, and Pedro Sander Relational joins on graphics processors In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 511–524, New York, NY, USA, 2008 ACM 16 [82] Xiaoyun He, Jaideep Vaidya, Basit Shafiq, Nabil Adam, and Vijay Atluri Preserving privacy in social networks: A structure-aware approach In WI-IAT, 2009 12, 167, 169, 172, 173 [83] Justin Hensley, Thorsten Scheuermann, Greg Coombe, Montek Singh, and Anselmo Lastra Fast summed-area table generation and its applications Computer Graphics Forum, 24(3):547–555, 2005 29 [84] W Daniel Hillis and Guy L Steele, Jr Data parallel algorithms Commun ACM, 29(12):1170–1183, December 1986 17 [85] W Daniel Hillis and Lewis W Tucker The cm-5 connection machine: a scalable supercomputer Commun ACM, 36(11):31–40, November 1993 17, 18 [86] Daniel Horn Stream reduction operations for GPGPU applications In Matt Pharr, editor, GPU Gems Addison Wesley, 2005 29, 30 [87] E Horowitz and S Sahni Fundamentals of computer algorithms Potomac, Md., Computer Science Press, 1978 9, 103 [88] Lee Howes and David Thomas Efficient random number generation and application using CUDA In Hubert Nguyen, editor, GPU Gems Addison Wesley, 2007 16, 27 [89] Akihiro Inokuchi, Takashi Washio, and Hiroshi Motoda Complete mining of frequent patterns from graphs: Mining graph data Machine Learning, 50(3):321– 354, 2003 36 [90] Yannis M Ioannides Random graphs and social networks: An economics perspective Technical Report 0518, Department of Economics, Tufts University, 2005 6, 34 210 [91] Kenneth E Iverson A programming language Wiley, 1962 29 [92] Edwin H Jacox and Hanan Samet Spatial Join Techniques ACM Transactions on Database Systems, 32(1):7, 2007 63, 65, 78 [93] Changhao Jiang and Marc Snir Automatic tuning matrix multiplication performance on graphics hardware In PACT ’05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, pages 185– 196, Washington, DC, USA, 2005 IEEE Computer Society 16 [94] Jr Joseph B Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem Proceedings of the American Mathematical Society, 7:48–50, Feb., 1956 104, 106 [95] Marcus Kaiser, Robert Martin, Peter Andras, and Malcolm P Young Simulation of robustness against lesions of cortical networks European Journal of Neuroscience, 25(10):3185–3192, 2007 33, 36 [96] Seunghwa Kang and David A Bader An efficient transactional memory algorithm for computing minimum spanning forest of sparse graphs In Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP ’09, pages 15–24, New York, NY, USA, 2009 ACM x, 104, 105, 109, 110, 113 [97] N Kashtan, S Itzkovitz, R Milo, and U Alon Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs Bioinformatics, 20(11):1746–1758, July 2004 157 [98] D Kirk and W Hwu Programming Massively Parallel Processors: A Hands-On Approach Applications of GPU Computing Series Morgan Kaufmann Publishers, 2010 ´ [99] Jon Kleinberg and Eva Tardos The minimum spanning tree problem In Algorithm Design Pearson/Addison-Wesley, Boston, 2006 104, 106 [100] Jon M Kleinberg, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins The web as a graph: Measurements, models, and methods In COCOON, pages 1–17, 1999 36 [101] D E Knuth The linear congruential method In The Art of Computer Programming, Volume 2: Seminumerical Algorithms, pages 10–26 Addison Wesley, 3rd edition, 1997 28 [102] Aleksandra Korolova, Rajeev Motwani, Shubha U Nabar, and Ying Xu Link privacy in social networks In CIKM, 2008 167, 171 211 [103] Nick Koudas and Kenneth C Sevcik Size separation spatial join In ACM SIGMOD International Conference on Management of Data, 1997 x, 69, 70, 73, 90, 93 [104] J Kozloski, K Sfyrakis, S Hill, F Schă rmann, C Peck, and H Markram Identiu fying, Tabulating, and Analyzing Contacts Between Branched Neuron Morphologies IBM Journal of Research and Development, 52(1/2):43–55, 2008 64, 72 [105] William B Langdon A fast high-quality pseudo-random number generator for graphics processing units In IEEE Congress on Evolutionary Computation, 2008 28, 29, 37 [106] William B Langdon A fast high-quality pseudo-random number generator for NVIDIA CUDA In GECCO (Companion), 2009 28 [107] William B Langdon A many-threaded CUDA interpreter for genetic programming In EuroGP, 2010 28 [108] E Scott Larsen and David McAllister Fast matrix multiplies using graphics hardware In Supercomputing, pages 55–55, New York, NY, USA, 2001 ACM 5, 16, 104 [109] Jure Leskovec and Eric Horvitz Planetary-scale views on a large instantmessaging network In WWW, 2008 11, 137, 170 [110] Scott T Leutenegger, J M Edgington, and Mario A Lopez STR: A Simple and Efficient Algorithm for R-Tree Packing In IEEE ICDE International Conference on Data Engineering, 1997 84, 85 [111] Xiang-Yang Li, Yu Wang, and Wen-Zhan Song Applications of k-local mst for topology control and broadcasting in wireless ad hoc networks IEEE TPDS, 15(12):1057–1069, 2004 10, 103 [112] Michael D Lieberman, Jagan Sankaranarayanan, and Hanan Samet A fast similarity join algorithm using graphics processing units In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE ’08, pages 1111–1120, Washington, DC, USA, 2008 IEEE Computer Society 89 [113] Kun Liu and Evimaria Terzi Towards identity anonymization on graphs In SIGMOD, 2008 12, 167, 169, 171, 178 [114] Weiguo Liu, Bertil Schmidt, Gerrit Voss, and Wolfgang Muller-Wittig Streaming algorithms for biological sequence alignment on gpus volume 18, pages 1270– 1281, Piscataway, NJ, USA, 2007 IEEE Press 16 [115] Ming-Ling Lo and Chinya V Ravishankar Spatial Joins Using Seeded Trees In SIGMOD International Conference on Management of Data, 1994 67 212 [116] Ming-Ling Lo and Chinya V Ravishankar Spatial Hash-Joins In ACM SIGMOD International Conference on Management of Data, 1996 68 [117] David P Luebke, Mark Harris, Naga K Govindaraju, Aaron E Lefohn, Mike Houston, John D Owens, Mark Segal, Matthew Papakipos, and Ian Buck S07 - GPGPU: general-purpose computation on graphics hardware In SC, page 208, 2006 [118] Qin Lv, Pei Cao, Edith Cohen, Kai Li, and Scott Shenker Search and replication in unstructured peer-to-peer networks In SIGMETRICS, 2002 6, 34 [119] A Ma’ayan, A Lipshtat, R Iyengar, and E.D Sontag Proximity of intracellular regulatory networks to monotone systems Systems Biology, IET, 2(3):103–112, 2008 6, 34 [120] Bruce M Maggs and Serge A Poltkin Minimum-cost spanning tree as a pathfinding problem Inf Process Lett., 26:291–293, January 1988 10, 14 [121] Nikos Mamoulis and Dimitris Papadias Slot index spatial join IEEE Transactions on Knowledge Data Engineering, 15(1):211–231, 2003 68 [122] William B March, Parikshit Ram, and Alexander G Gray Fast euclidean minimum spanning tree: algorithm, analysis, and applications In KDD, 2010 104 [123] Henry Markram The Blue Brain Project 7(2):153–160, 2006 64, 70, 71 Nature Reviews Neuroscience, [124] George Marsaglia and Wai Wan Tsang The ziggurat method for generating random variables Journal of Statistical Software, 5(8):1–7, 2000 7, 35 [125] Paolo Massa and Paolo Avesani Trust Metrics in Recommender Systems 2009 187 [126] Daniel McDonald, Laura Waterbury, Rob Knight, and M Betterton Activating and inhibiting connections in biological network dynamics Biology Direct, 3(1):49, 2008 6, 34 [127] Stanley Milgram The small world problem Psychology Today, 2:60–67, 1967 11, 137, 170 [128] Harvey J Miller and Jiawei Han Geographic Data Mining and Knowledge Discovery 2009 63 [129] Priti Mishra and Margaret H Eich Join Processing in Relational Databases ACM Computing Surveys, 24(1):63–113, 1992 66 213 [130] B M E Moret and H D Shapiro An empirical assessment of algorithms for constructing a minimal spanning tree In DIMACS Monographs, pages 99–117, 1994 104 [131] Rajeev Motwani and Prabhakar Raghavan Randomized Algorithms Cambridge University Press, 1995 33 [132] Marc Najork, Dennis Fetterly, Alan Halverson, Krishnaram Kenthapadi, and Sreenivas Gollapudi Of hammers and nails: an empirical comparison of three paradigms for processing large graphs In Proceedings of the fifth ACM international conference on Web search and data mining, WSDM ’12, pages 103–112, New York, NY, USA, 2012 ACM [133] M E J Newman, D J Watts, and S H Strogatz Random graph models of social networks PNAS, 99(Suppl 1):2566–2572, 2002 6, 34, 36, 129 [134] John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron Scalable parallel programming with cuda Queue, 6(2):40–53, March 2008 [135] Sadegh Nobari, St phane Bressan, and Chedy Raăssi A data parallel minimum e ı spanning tree algorithm for most graphics processing units In Proceedings of the 2010 international conference on Advances in Distributed and Parallel Computing, ADPC ’10 GSTF, 2010 14, 104 [136] Sadegh Nobari, Thanh-Tung Cao, Panagiotis Karras, and St´ phane Bressan Scale able parallel minimum spanning forest computation In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP ’12, pages 205–214, New York, NY, USA, 2012 ACM 14 [137] Sadegh Nobari, Panagiotis Karras, and St´ phane Bressan L-opacity: Linkagee aware graph anonymization under review 14, 137 [138] Sadegh Nobari, Xuesong Lu, Panagiotis Karras, and St´ phane Bressan Fast rane dom graph generation In Proceedings of the 2011 international conference on Extending Database Technology, EDBT ’11, New York, NY, USA, 2011 ACM 14, 124, 129, 157 [139] Sadegh Nobari, Farhan Tauheed, Thomas Heinis, Panagiotis Karras, St´ phane e Bressan, and Anastasia Ailamaki Hidop: In-memory spatial join by hierarchical data-oriented partitioning under review 14 [140] NVIDIA Compute PTX: Parallel Thread Execution and Instruction Set Architecture (ISA) Version 2.0, 2010 21, 22 [141] Jack Orenstein A Comparison of Spatial Query Processing Techniques for Native and Parameter Spaces In ACM SIGMOD International Conference on Management of Data, 1990 65, 75 214 [142] Stephen K Park and Keith W Miller Random number generators: Good ones are hard to find Communications of the ACM, 31(10):1192–1201, 1988 28 [143] Jignesh M Patel and David J DeWitt Partition Based Spatial-Merge Join In ACM SIGMOD International Conference on Management of Data, 1996 68, 73, 90, 93 [144] Seth Pettie and Vijaya Ramachandran Randomized minimum spanning tree algorithms using exponentially fewer random bits ACM Transactions on Algorithms, 4(1):5:1–5:27, 2008 33 [145] Fabio Pietrucci and Wanda Andreoni Graph theory meets ab initio molecular dynamics: Atomic structures and transformations at the nanoscale Physical Review Letters, 107(8):1–4, 2011 [146] Franco Preparata and Michael Shamos Computational Geometry: An Introduction Springer, 1993 66 [147] R C Prim Shortest connection networks and some generalizations Bell System Technical Journal, 36:1389–1401, 1957 104, 106 [148] Paul Wintz Rafael C Gonzalez Digital Image Processing Addison-Wesley, second edition, 1987 10, 103 [149] Christian P Robert and George Casella Monte Carlo Statistical Methods (Springer Texts in Statistics) Springer, 2005 7, 35 [150] Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas The earth mover’s distance as a metric for image retrieval 40(2):99–121, 2000 187 [151] Pierangela Samarati Protecting respondents’ identities in microdata release IEEE TKDE, 13(6):1010–1027, 2001 12, 167 [152] Nadathur Satish, Mark Harris, and Michael Garland Designing efficient sorting algorithms for manycore GPUs In IPDPS, 2009 31, 119, 121 [153] J Scott Social Network Analysis: A Handbook Sage Publications, 2000 [154] Shubhabrata Sengupta, Mark Harris, and Michael Garland Efficient parallel scan algorithms for gpus Technical Report NVR-2008-003, NVIDIA Corporation, December 2008 ix, 29, 30 [155] Shubhabrata Sengupta, Mark Harris, Yao Zhang, and John D Owens Scan primitives for GPU computing In Graphics Hardware, 2007 29, 119 [156] Shubhabrata Sengupta, Aaron Lefohn, and John D Owens A work-efficient stepefficient prefix sum algorithm In Proceedings of the Workshop on Edge Computing Using New Commodity Architectures (EDGE), 2006 29 215 [157] D.D Siljak Large-Scale Dynamic Systems: Stability and Structure Dover Civil and Mechanical Engineering Series Dover Publications, 2007 203 [158] Yasin N Silva, Walid G Aref, and Mohamed H Ali The similarity join database operator In ICDE, pages 892–903, 2010 89 [159] Yi Song, Sadegh Nobari, Xuesong Lu, Panagiotis Karras, and St´ phane Bressan e On the privacy and utility of anonymized social networks In iiWAS, pages 246– 253, 2011 2, 157 [160] J A Stratton, S S Stone, and W W Hwu Mcuda: An efficient implementation of cuda kernels for multi-core cpus LCPC 2008, page 1630, Jul 2008 17 [161] M Sussman, W Crutchfield, and M Papakipos Pseudorandom number generation on the GPU In ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, 2006 27 [162] John Joseph Sylvester Chemistry and algebra Nature, 17:284, 1878 [163] R E Tarjan Data structures and network algorithms Society for Industrial and Applied Mathematics, Philadelphia, 1983 106 [164] Charalampos E Tsourakakis, U Kang, Gary L Miller, and Christos Faloutsos DOULION: counting triangles in massive graphs with a coin In KDD, 2009 34, 36, 37 [165] Michael Ubell The Montage Extensible DataBlade Architecture In ACM SIGMOD International Conference on Management of Data, 1994 8, 64 [166] Vibhav Vineet, Pawan Harish, Suryakant Patidar, and P J Narayanan Fast minimum spanning tree for large graphs on the gpu In HPG, 2009 104, 110, 113, 122, 127, 129, 133 [167] Jeffrey Scott Vitter Random sampling with a reservoir ACM Transactions on Mathematical Software, 11(1):37–57, 1985 37, 42 [168] Vasily Volkov and James W Demmel Benchmarking gpus to tune dense linear algebra In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC ’08, pages 31:1–31:11, Piscataway, NJ, USA, 2008 IEEE Press 143, 144 [169] Xiaochun Wang, Xiali Wang, and D Mitchell Wilkes A divide-and-conquer approach for minimum spanning tree-based clustering IEEE TKDE, 21(7):945– 958, 2009 10, 103 [170] S Wasserman and K Faust Social Network Analysis: Methods and Applications Structural Analysis in the Social Sciences Cambridge University Press, 1994 216 [171] Duncan J Watts Six Degrees: The Science of a Connected Age W W Norton, New York, 2003 11, 137, 170 [172] Wentao Wu, Yanghua Xiao, Wei Wang, Zhenying He, and Zhihui Wang ksymmetry model for identity anonymization in social networks In EDBT, 2010 12, 167, 169, 172, 173 [173] Ying Xu, Victor Olman, and Dong Xu Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees Bioinformatics, 18(4):536–545(10), 2002 10, 103 [174] Xiaowei Ying and Xintao Wu Graph generation with prescribed feature constraints In SDM, 2009 37 [175] Xiaowei Ying and Xintao Wu On link privacy in randomizing social networks In PAKDD, 2009 13, 167, 169, 172 [176] Mingxuan Yuan, Lei Chen, and Philip S Yu Personalized privacy protection in social networks PVLDB, 4:141–150, Nov 2010 173 [177] F Benjamin Zhan and Charles E Noon Shortest path algorithms: An evaluation using real road networks Transportation Science, 32(1):65–73, 1998 [178] Lijie Zhang and Weining Zhang Edge anonymity in social network graphs In CSE, 2009 12, 13, 167, 169, 172, 179, 184, 189, 190, 191, 193 [179] Elena Zheleva and Lise Getoor Preserving the privacy of sensitive relationships in graph data In PinKDD, 2007 12, 167, 169, 171 [180] Caiming Zhong, Duoqian Miao, and Ruizhi Wang A graph-theoretical clustering method based on two rounds of minimum spanning trees Pattern Recogn., 43(3):752–766, 2010 10, 103 [181] Bin Zhou and Jian Pei Preserving privacy in social networks against neighborhood attacks In ICDE, 2008 12, 167, 169, 171 ă [182] Lei Zou, Lei Chen, and M Tamer Ozsu Distancejoin: Pattern match query in a large graph database PVLDB, 2(1):886–897, 2009 5, 9, 14 ă [183] Lei Zou, Lei Chen, and M Tamer Ozsu k-automorphism: A general framework for privacy-preserving network publication PVLDB, 2(1), 2009 12, 167, 169, 172, 173 217 ... graph algorithms, with database applications for these massively parallel processors in order to design scalable graph algorithms We study both graph data generation problems and graph data management. .. detail 1.4 Graph data generation This thesis first studies the algorithms for generating graphs Graphs may be generated from a random process, i.e random graphs, or from modeling a real-world data, ... narrow our attention to graph data generators and graph data managements We first study the algorithms for generating random graphs [138] in chapter and for generating real-world graphs [139] in chapter

Ngày đăng: 09/09/2015, 10:15

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan