ProcessingLinkedDataonMobileDevices Tuan Anh, Le Submitted in fulfillment of the requirements for the degree of Master of Science SUPERVISOR Prof Manfred Hauswirth Dr Danh Le Phuoc Digital Enterprise Research Institute, National University of Ireland, Galway January 2015 i ii “If you optimize everything you will always be unhappy.” Donald Knuth iii Abstract The Semantic Web has been evolving and becoming the preferred choice for representing information not only on the Web but also onmobile systems Applying LinkedData technologies onmobile systems could enable a wide range of novel mobile applications Furthermore, the computational, storage and network capabilities of mobiledevices are constantly improving, thereby raising interest in moving LinkedDataprocessing closer to mobile applications Using RDF triplestores for storing and processingLinkedData locally onmobiledevices could reduce data transmission costs, lowering the risks of intermittent connectivity and increasing the security of personal data Due to these advantages, there are several works that port existing tools for processingLinkedData from workstations to mobiledevices However, whereas the systems for workstations are designed for using GBs of main memory and efficiently storing dataon magnetic disks, mobiledevices have limited memory and are equipped with flash disks The differences in computing environments and storage mediums lead to performance and scalability issues in existing mobile RDF triplestores Therefore, algorithms and techniques for RDF triplestores need to be carefully re-engineered to create an efficient RDF triplestore specifically for mobiledevices In this work, we first conduct an empirical evaluation of existing mobile RDF triplestores to identify their performance and storage shortcomings These findings serve as input for a novel design of an RDF triple store tailor-made for mobiledevices We then implement this design on the Android platform as a faster and more scalable version of RDF on the Go Finally, we evaluate the performance and scalability of RDF on the Go to demonstrate the advantages of our design Acknowledgements First and foremost, I would like to thank my mentor Dr Danh Le Phuoc for his guidance and valuable support throughout the preparation of this work and in particular for his input and feedback on the formal model I also would like to thank my supervisor Prof Manfred Hauswirth for his helpful and constructive contributions during the GRC meetings that have helped a lot in directing the research efforts of this work Despite his positions as both vice director of DERI and group leader, he always had an open ear for my matters Special thanks are due to my colleagues Dr Gregor Schiele and Dr Martin Serrano with whom I co-authored two papers Moreover, I want to express my gratitude to Aidan Hogan and Hung Ngo Quoc, my friends in DERI who have spent their valuable time helping me correct my writing Above all, I wish to express my sincere gratitude to my beloved family for all their kind support and the sacrifices made throughout this work v Contents Abstract iv Acknowledgements v Contents vi List of Figures viii List of Tables Introduction 1.1 Motivation 1.2 Problem Statement 1.3 Thesis Contributions 1.4 Thesis Outline ix 1 Background 2.1 Resource Description Framework 2.2 Physical Storage of RDF data 2.3 Mobile Database 2.3.1 Mobile Applications 2.3.2 Requirements of Mobile Databases 2.4 Flash Memory 2.5 Mobile Semantic Web Frameworks 2.5.1 Mobile RDF Frameworks 2.5.2 Query and Persistence Frameworks 7 10 11 12 13 14 16 16 17 Empirical Evaluation and Experiment Analysis 3.1 Evaluation Design 3.2 Evaluation Results 3.3 Analysis of Evaluation Results 3.4 Conclusion 18 18 21 24 26 27 27 31 33 33 34 An 4.1 4.2 4.3 Architecture of Mobile RDF Architecture Overview Dictionary Physical Storage 4.3.1 Access Pattern 4.3.2 Sorted list Triplestore vi vii Contents 4.4 4.3.3 Two-layer Index Summary RDF On The Go - mobile RDF triplestore 5.1 Performance Evaluation 5.1.1 Experimental Setup 5.1.2 Evaluation Results 5.2 Scalability Evaluation 5.2.1 Experiment Setup 5.2.2 Evaluation Results 35 36 38 38 39 39 43 43 45 Conclusions and Future Researches 48 6.1 Conclusion 48 6.2 Future research 48 A Performance of Berkeley DB & SQLLite 50 A.1 Comparison of write performance 51 A.2 Comparison of read performance 51 B Evaluation queries 52 Bibliography 58 List of Figures 2.1 2.2 2.3 2.4 2.5 Triple concept RDF triple RDF Graph SPARQL Query SPARQL Triple Pattern 8 9 3.1 3.2 3.3 3.4 3.5 Testing application design of Android RDF store Updating throughput of OTG-BDB & TDBoid Memory usage of OTG-BDB & TDBoid Query response time (OTG-BDB & TDBoid) Inserting throughput and memory usage of OTG-BDB 19 21 22 23 25 4.1 4.2 4.3 4.4 4.5 System architecture overview Node Store Triple lists Index table Spare Index on Triple Table 28 31 33 35 36 5.1 5.2 5.3 5.4 5.5 The update throughput of RDF-OTG, OTG-BDB and TDBoid Memory usage of RDF-OTG, RDF-BDB and TDBoiD Inserting throughput and memory usage of RDF-OTG Query response time of RDF-OTG, TDBoiD and RDF-BDB Scalability of RDF-OTG 40 41 42 42 45 A.1 Writing speed of SQLLite and Bekelery DB A.2 Read speed of SQLLite and Bekelery DB 51 51 viii List of Tables 4.1 Possible query patterns and corresponding indexes 34 5.1 5.2 5.3 5.4 5.5 5.6 5.7 Memory consumption of mix queries/size of data Android devices Memory consumption of queries mix/size of data Query response time (seconds) on HTC Desire Query response time (seconds) on Galaxy Nexus Query response time (seconds) on Nexus tablet Memory consumption query mix on different size of data 43 44 44 46 46 47 47 B.1 Parameters of SPARQL queries 57 ix ... interest in moving Linked Data processing closer to mobile applications Using RDF triplestores for storing and processing Linked Data locally on mobile devices could reduce data transmission costs, lowering... of data Query response time (seconds) on HTC Desire Query response time (seconds) on Galaxy Nexus Query response time (seconds) on Nexus tablet Memory consumption query mix on. .. intermittent connectivity and increasing the security of personal data Due to these advantages, there are several works that port existing tools for processing Linked Data from workstations to mobile devices