The 13th IEEE International Symposium on Consumer Electronics (ISCE2009) An Ontological Approach to Lifelog Representation for Disclosure Control Truong Thi Thu Hien1), Shin-ichiro Eitoku2), Tomohiro Yamada2), Shin-yo Muto2) and Masanobu Abe2) 1) Department of Information System, College of Technology, Vietnam National University, Hanoi 2) NTT Cyber Solutions Laboratories, Nippon Telegraph and Telephone Corporation handled (e.g time point, time interval and social time), because the representation of time is different from the kinds of information (3) Disclosure attribute is set on each lifelog item (fine-grained control) The judgment of disclosure depends on each event of the lifelogs, even if the kind of event (e.g visited place, searched word, etc.) is same Abstract— In services using lifelog collected continuously and over a long period of time, disclosure control is essential in order to deal with the privacy because this lifelog includes confidential information which users are unwilling to disclose freely This paper proposes an ontology model for disclosure control needed to handle each user's lifelog The user is able to easily set disclosure rules on his/her diverse data collected from various data sources Features of the proposed model are as follows: (1) Values can be differentiated by their context, (2) Various time expressions can be handled and (3) Disclosure attribute is set on each lifelog item (fine-grained control) We show an example of implementation and the result of experiments using an implemented system In order to satisfy these requirements, we propose a domain ontology model that yields a semantic representation of everyday lifelogs The model serves to represent a user’s lifelog data The system employs this model to represent disclosure rules given by the end-user We compare it to the existing ontology models for realizing lifelog disclosure control, implement a system based on this model, and show the results of experiments on this system Keywords-component; Lifelog, Disclosure Control, Ontology I INTRODUCTION II The miniaturization of mobile terminals with various sensors has made it possible to collect lifelog data from sensors (GPS devices, an acceleration sensor, etc.) in addition to collecting lifelog data from PCs (schedules, search words, Email etc.) The trend has stimulated the rise of services that use lifelogs Because each user can carry one or more mobile terminals all the time, he/she can not only collect various kinds of his/her lifelog but also collect the data continuously over long periods of time Therefore, the amount of data will be huge, and more over, various time expressions are needed to time stamp the data For example, search keywords might be given instant time stamps while staying places should be given time intervals The other requirement is to deal with the privacy concerns of users The collected data might include information the users are unwilling to disclose For example, place data in the lifelog could be useful for organizing business records, but some places where a user met his girl friend the user may want to keep secret In order to realize these services, we are developing a lifelog processing system Fig.1 shows the architecture It consists of local system (user’s PC, home server, etc.), an aggregated lifelog manager, and application system The local system generates and manages the user’s lifelogs The aggregated lifelog manager manages many users’ lifelogs uploaded from their local systems, and creates valuable information from these lifelogs The information is used to provide the appropriate services to each user In the local system, the system generates lifelogs from the data acquired by various sensors [4][5][6] For example, by using latitude and longitude data acquired from GPS devices, the local system generates address information, transportation, etc To receive a service, the user first obtains information about what lifelogs the application system needs After the user decides which lifelogs are to be disclosed, the system uploads them to the aggregated lifelog manager Therefore, only lifelogs that the user has decided to disclose to the application system, are stored in the aggregated lifelog manager The user can control the lifelog data in his/her local system The uploading of lifelogs to the aggregated lifelog manager is our focus point with regard to disclosure control Taking this background into account, we have proposed the architecture of a lifelog processing system [1] To receive a service, the user has to provide the lifelog data that the service requires However, the part of the lifelog data might include unwilling data as mentioned above Therefore, it is very important to provide a framework that reassures users that their privacy will not be compromised Requirements with regard to disclosure control can be summarized as follows: (1) Values can be differentiated by their context Lifelogs collected by different information sources should have different meaning (e.g actually visited locations (GPS data) differ from locations intended to visit (schedule data) (2) Various time expressions should be 978-1-4244-2976-9/09/$25.00 ©2009 IEEE LIFELOG PROCESSING SYSTEM Recently, location-based services based on GPS sensors or cell tower information have become popular [2][3] Data used in such services (person’s location, used transportation or activity, etc.) is one kind of lifelog data We foresee that the number of lifelogs captured during our daily life will increase and concierge services based on lifelogs will become essential to us 932 “Where” type info Address Place, … Sensor Accelera-tion Sensor “Who” type info Mr ** Feature Ms ** … extraction GPS Sensor Disclosure rule DB “What” type info … Generate lifelog from sensor data Train Walk, … Download requirement for types of lifelogs Add disclosure attribute to each lifelog Disclosure control User’s lifelog DB The kinds of lifelog the service provider requests to the user *** *** Data mining on many users’ lifelog Lifelogs which user decides to disclose Upload lifelogs user decides to disclose Other users’ lifelog DB Service provider Send lifelogs to service provider *** Other users’s lifelog *** Local system (User’s PC, Home server, etc.) Service provider Data mining processing User’s lifelog DB *** *** Aggregated lifelog manager Application system Figure Lifelog processing system III RELATED WORK AND COMPARISON TABLE I COMPARISON OF MODELS Ontologies play a pivotal role in not only the Semantic Web, but also pervasive computing and next generation mobile communication systems [7] Some ontology models exist to capture context in pervasive computing However, there is no ontology model that satisfies the three points mentioned in Section GLOSS [8] (GLObal Smart Space) supports interactions between people and places on a global scale By exploiting the features of physical spaces, GLOSS software provides appropriate information or service, GLOSS provides an easy-to-extend ontology that includes personal profile, location, mode of transportation, time, and activity It is a powerful tool for modeling the human activities that are closely related to geographical information However, it lacks a representation for information sources (only activity is mentioned) Thus, for example, it cannot distinguish between events actually done and events scheduled to be done SOUPA[9] (Standard Ontology for Ubiquitous and Pervasive Applications) is a well-structured model that can capture context and to make policies, but the model cannot treat disclosure control for each item in context because represented events are not connected to the acquired channel (only to location and time attributes) COMANTO[10] (COntext MAnagement oNTOlogy) was implemented as middleware in order to support context-awareness services It represents the wide range of concepts needed for context modeling; unfortunately, it pays no attention to user privacy; moreover, only activities are mentioned and time can be described only in terms of absolute time Accordingly, it can not represent the time intervals or social time needed to handle lifelog events CAUSB[11] (Context-Aware Ubiquitous Services Browsers) was proposed for classifying services based on the user’s context It supports the policy adaptability needed in ubiquitous environments but does not support the classification of possibilities of rule condition (no condition, simple or complex condition) Table I shows a comparison of these models IV Requirement \ Model GLOSS SOUPA COMANTO CAUSB Distinguishable meaning No No Yes Basic(*) Various time expression Yes Yes Basic (**) Basic(**) Fine-grained control No No No Limited(***) (*) No fundamental class for supporting other types of information (e.g search word or schedules) (**) Time is only data type property or absolute value therefore it does not support policy description with abstract time or social time (***) Rule is only for type of data with pre-defined action, no attempt to control each item of event information sources include sensors (GPS, accelerator sensor, etc.) and PC-based applications (search word logger, scheduler, etc.) The Channel class corresponds to the various kinds of lifelog information related to the user Generally, each kind of sensor collects a specific type of data However, different sensors can collect the same type of data, and several sensors may be needed to collect a complex type of data The Time TV Prog.-Event Address-Event Transport-Event Search-Event Place-Event People-Event … EVENT TIME SENSOR GPS Time-Point Scheduler … PC Log Address TV Prog Place Transport People … Clock Calendar CHANNEL Year Week Week-Day Search word … Head POLICY PROPOSED ONTOLOGY MODEL Holiday Pre-defined Sub-Class Relation … Complex Condition Condition Figure Overview of lifelog ontology model 933 Break-Time Office-Hour Simple Condition Rule Pre-defined Basic Class Sub-Class Of Fig.2 shows our lifelog ontology model It consists of five abstract basic classes: Sensor, Channel, Time, Event, and Policy The Sensor class represents Information sources the Social-Time Second Hour Minute Day Month Time-Interval Extended Sub-Class TABLE II RELATIONS IN THIS MODEL class represents time attribute Time includes Clock, Calendar, Time-Point, and Time-Interval classes [12][13] The Event class represents actual personal lifelogs The Policy class represents disclosure rules given by the user In this model, sub-classes can be added easily to the basic classes They are pre-defined or attached by the system according to the lifelog data received For example, once the system receives visited place information from GPS, GPS can be added to Sensor class and Place to Channel class as new sub-classes The sub-class can be further extended to archive detailed expressions of lifelog data Relation sensorOf channelOf timeOf inCalendar atClock start end atPoint inInterval ruleOf hasHead hasCondition hasSimpleCondition withEvent withType withValue The basic classes and extended sub-classes satisfy the requirements mentioned in Section I as follows Firstly, the Sensor and Channel classes serve to distinguish the semantic context of data from different information sources in different channels, so they are related to point (1) Secondary, the Time class covers time expressions, mentioned as key point (2) Additionally, in order to improve the manageability for users, we introduce Social-Time class to represent time which has different definitions in different communities (e.g holiday, office-hour) Thirdly, for point (3), the disclosure rules can be set in Policy class as instances Each rule consists of a head and conditions, instantiated in two sub-classes SimpleCondition and ComplexCondition SimpleCondition is defined based on lifelog data by withType, and withValue relations ComplexCondition consists of several simple conditions linked by AND operator In order to ease user’s management concerns, complex conditions exclude operators NOT, OR, XOR (following Rei [14]); instead, a complex condition is defined as a chain of simple conditions Range Event Event Event Time-Point Time-Point Time-Interval Time-Interval Social-Time Social-Time Rule Rule Rule ComplexCondition Head SimpleCondition SimpleCondtion Sensor Channel Time Calendar Clock Time-Point Time-Point Time-Point Time-Interval Policy Head Condition SimpleCondition Event Any class of lifelog domain Any instance of lifelog domain rules may have only head and no condition, which means all instances in the class related to head are undisclosed To show the potential of this model for lifelog data representation and the formation of disclosure rules, we use the example shown in Table III The user can easily define disclosure rules to handle the data The following three rules prevent the disclosure of highlighted entries in Table III r1: Don’t disclose every accompanied people information r2: Don’t disclose the search word from 08/12/21 00:00 to 12/21 10:00 r3: Don’t disclose the visited places in Tokyo collected by GPS sensor The corresponding disclosure rule representations in the system are illustrated in Fig.3 Rule r1 has head People-Event and no condition, rule r2 has head Search word-Event with one simple condition, which is Time as 12/21 00:00 – 10:00 Similarly, Rule r3 can be easily created with head is PlaceEvent, and one complex condition consisting of two connected simple conditions: “Sensor is GPS” and “Address is Tokyo” The links among classes indicate the relations from one class (domain) to another class (range) Table II shows these relations Left column shows the relation name, center column shows the domain class, and right column shows the range class and instance SensorOf, channelOf and timeOf make an association of Events to the data source, the kind of information and the time when lifelog is recorded, respectively In the TimePoint class, inCalendar and atClock relations are used to represent a point of time in different formats For example, January 1st has month and day attribute, so only inCalendar relation is used In contrast, January 1st 10:00:00 has month, day, hour, minute and second attributes; therefore inCalendar and atClock relations are used Time-Interval class has start and end relations Social-Time class has two relations: atPoint and inInterval They represent the association of one social time instance to a point or an interval time In the Policy class, the relation ruleOf, hasCondition, and hasHead serve for rule description The relations withEvent, withType, and withValue make direct connections between Policy and classes or instances in domain model These relations permit all lifelog events to be connected semantically They make it easy for users to manage and maintain disclosure rules V Domain TABLE III EXAMPLE OF LIFELOG DATA Sensor Channel 08/12/21 00:00 08/12/21 08:00 08/12/22 09:00 T i 08/12/22 m 14:00 e 08/12/22 17:00 08/12/22 21:00 08/12/23 19:00 GPS Place Scheduler Present Home Place Yokohama Home Yokohama Home Tokyo Restaurant Tokyo University University Tokyo Plaza Plaza Yokohama Home Yokohama Cinema VI DISCLOSURE CONTROL PC Log Search word Traintime Address People Professor Film Cinema Classmate IMPLEMENTATION A OWL Representation and Disclosure setting The Lifelog domain model and lifelog disclosure rules are defined by OWL language and stored as RDF/XML documents Fig.4 shows the flowchart of this process The example of Place-Event: place name is University at 2008-06-21T11:50: This section explains how disclosure rules are defined using this ontology model The two components of each rule are head and condition The system uses head as information for undisclosed data if all corresponding conditions are true Some 934 Domain Search wordEvent People-Event Place-Event Italic: Class Under Line: Instance Sensor Time 08/12/21 00:00-10:00 GPS Address Head Simple Condition POLICY r1 : Instance Tokyo Complex Condition r2 Condition Rule r3 Figure Example of setting rules using domain model 00 from GPS is shown in Fig.5 Disclosure rules, defined by the user, are created as instances of Rule class A rule can belong to one or several policies Each policy is applied to the corresponding service provider(s) Fig.6 shows the description of the simple condition “Address is Tokyo” Figure Example of instance in Place-Event Although OWL provides power for modeling lifelogs and disclosure rules, it has some limitations OWL cannot capture the relationship of one property to another in the domain For example, if watching TV and place are recorded at the same time, it is possible to infer that user watch TV at such place, but the relation atPlace is not directly defined without time consideration In order to overcome this restriction, we use the additional definition created by the Semantic Web Rule Language (SWRL) [15] which realizes extra relations in the lifelog domain A simple rule editor is used to define disclosure rules with simple or complex conditions Lifelog data in ontology model is loaded and displayed to help end-user make original disclosure rules For example, rule r3 (in Fig.3) can be set by rule editor as shown in Fig.7 Figure Example of instance in Rule We implemented the prototype of this model and disclosure control system by using the Protégé library and Jess (Java Expert System Shell) rule engine which uses the well-known Rete algorithm for rule matching Various personal lifelog data are represented in this model For example, lifelogs of address, place, transport, people, TV-program, search word, activity, are represented in different sub-classes of Channel class The domain model was utilized for setting disclosure rules These rules were then translated into SWRL Jess accepts rules formulated in SWRL that are translated into Jess rules by the Java API SWRL factory Jess fires rules and returns disclosed/undisclosed attributes for each instance in the lifelog model Figure User interface for setting rules B Experiments In order to clarify the feasibility of the proposed ontology model for lifelog data representation, and to check its performance at run time for matching the domain data with user-defined disclosure rules, we performed two experiments In these experiments, we used a laptop computer with Intel core duo 1.40 GHz CPU and 1.5 GB RAM Results are calculated as the average of runs Lifelog data and disclosure rules are represented in five basic classes and seven sub-classes GPS-sensor, Remote-controller, Address, Place, Transport, People, TV-program The size of dataset was measured in term of the number of instances in Event class Lifelog OWL model Lifelog Data Domain (OWL) Policy (OWL) Policy Editor Policy (SWRL) Rule engine Disclosed data Figure Flow of disclosure processing in this implementation 935 rules Therefore, in order to deal with huge data sets collected continuously over long periods, we should manage the scale of dataset input to the rule engine One solution is for the system to perform the matching process on partial events instead the entire dataset in a single step The dataset could be divided into several parts depending on the structure of rules and the time period (by one day, by one week, etc) In the first experiment, we checked the run time for matching disclosure rules and lifelog data with an increasing number of rules The rule in this case has simple condition (Place is home, Transportation is Private car, Address is Tokyo, etc.) The number of instances in the Event class was 1000 Data for this experiment were extracted from the real life records of a person In the second experiment, we checked the run time with an increasing number of lifelogs The number of classes and the three following complex disclosure rules were kept consistent These complex rules require the combination of different events where time attribute is used as the key relation Don’t disclose the Place located in Tokyo which is collected by GPS sensor Don’t disclose People accompanying me in private car Don’t disclose TV-program at Friend’s home when time is weekend VII Fig.8 and Fig.9 show the results When the number of lifelogs is 1000, run time was under 5[sec] If we assume that one lifelog is obtained in every minute, the number of lifelogs per day can reaches almost 1000 Thus, the user can adapt her rules to the lifelogs in seconds Therefore, we think this result shows that performance of our algorithm is acceptable for users This paper studied the representation of lifelogs collected continuously from sensors and PC-based applications, and the rules to control their disclosure We proposed an ontology model that consists of five basic classes: Channel, Sensor, Time, Event, and Policy, together with their extensible subclasses We compared the model to previous works and showed its superiority with regard to disclosure control In the future, we will investigate user-friendly interfaces for making rules that more closely mimic natural language policies Also, we will apply this model to actual lifelog services and evaluate its performance and the ease with which rules can be set REFERENCES [1] [2] [3] [4] [5] [6] [7] Figure Run time with growing number of disclosure rules [8] [9] [10] [11] [12] [13] Figure Run time with growing number of lifelogs [14] Run time grows linearly with the number of disclosure rules However, it grows power with respect to the number of lifelog events added to rule engine This shows that the run time for extracting disclosed data upon rules largely depends on the dataset and the nature of rule rather than the number of [15] 936 CONCLUSION S Eitoku, et al., “A study on Visualization of Personal Lifelog Considering Disclosure Control”, IEICE Technical Report, ISEC200888, OIS2008-64, pp.97-104, 2008 (in Japanese) J Hightower, et al, “Practical Lessons from Place Lab.”, IEEE Pervasive Computing, vol 5, no 3, pp.32-39, 2006 A Tomioka, et al, “Information Distributing System Based on User Behavior”, NTT Docomo Technical Journal, Vol 9, No 1, pp 51-56, 2007 M Abe, et al., “A Life Log Collector Integrated with a RemoteController for Enabling User Centric Services”, IEEE Transaction on Consumer Electronics, Vol 55, No.1, 2009 (in appear) M Nishino, et al, “A Place Prediction Algorithm Based on TimeSensitive Frequent Patterns”, Proc of Pervasive 2009 (in appear) S Seko, et al, “An Algorithm to Estimate the Level of Friendship Based on the Mode of Transportation and the Time Spent Sharing Movement Tracks”, Proc Pervasive 2009 (in appear) P Floreen, et al, “Towards a Context Management Framework for MobiLife”, In IST Mobile & Wireless Communications Summit, 2005 G Kirby, et al, “GloSS Ontology and Narratives”, GLOSS Consortium Report D7, 2002 H Chen, et al, “SOUPA: Standard Ontology for Ubiquitous and Pervasive Applications”, Proc of First Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services (MobiQuitous'04), pp.258-267, 2004 M A Strimpakou, et al, “A Context Ontology for Pervasive Service Provision”, Proc of 20th International Conference on Advanced Information Networking and Applications, pp.775-779, 2006 Hamdeh N A., et al, “OWL-based Ontology for Secure and Adaptable Ubiquitous Environment”, Proc of Third International Conference on Semantics, Knowledge and Grid, pp.230-235, 2007 F Pan, et al, “Temporal Aggregates in OWL-Time”, Proc of 18th Inter Conf Florida Artificial Intelligence Research Society, pp.560-565, AAAI Press, 2005 Q Zhou, et al, “A Reusable Time Ontology”, Proc of the AAAI Workshop on Ontologies for the Semantic Web, 2002 L Kagal, “Rei: A Policy Language for the Me-Centric Project”, Technical report of HP Laboratories Palo Alto, 2002 M O'Connor, et al, “Supporting Rule System Interoperability on the Semantic Web with SWRL”, The Semantic Web - ISWC 2005, pp.974986, 2005 ... of the proposed ontology model for lifelog data representation, and to check its performance at run time for matching the domain data with user-defined disclosure rules, we performed two experiments... only head and no condition, which means all instances in the class related to head are undisclosed To show the potential of this model for lifelog data representation and the formation of disclosure. .. domain class, and right column shows the range class and instance SensorOf, channelOf and timeOf make an association of Events to the data source, the kind of information and the time when lifelog