Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 88 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
88
Dung lượng
7,49 MB
Nội dung
111 Rank Visualization Profile • Coffee/Tea lovers • Group Size: 795 • Food lovers • Group Size: 679 • Shoppers • Group Size: 599 • Travellers • Group Size: 584 • Fast Food Lovers • Group Size: 576 112 4.9 Top Five Communities in New York City In this section, we show the top five communities detected in New York City. We manually observe and derive the community profiles from the prominent venue categories, tip topics and photo concepts of the communities. Rank Visualization Profile • Food lovers • Group Size: 784 • Clubbing lovers • Group Size: 698 • Shoppers • Group Size: 643 • Travellers • Group Size: 639 • Tourists • Group Size: 589 113 4.10 Summary In this chapter, we investigated the problem of community understanding in LBSNs. We proposed a novel and unified framework which models the heterogenous entities and interactions by constructing a heterogenous, non-uniform hypergraph. We then formulated it as a problem to detect dense subgraph over hypergraph, where constraints were added to ensure the interpretability of the detected communities. We then proposed an efficient procedure to solve the optimization problem. Extensive experiments have been performed both qualitatively and quantitatively to verify our proposed approach. Meaningful and interpretable communities were detected in an optimal way while interesting culture differences were revealed by analyzing the communities in Singapore and New York City. There are a few interesting aspects worth further exploration. First, the time-dependent users’ behaviors allow interest communities to be detected and understood in a timely manner. For example, it is interesting to mine and profile different interest groups, which are active during different time periods. Second, users often participate at various social networks. The aggregation of user behaviors across multiple sources is expected to lead to more accurate and timely communities that enrich community understanding. 114 115 Chapter Community Matching across Geographical Regions People are usually active only within small geographical regions, as described in the previous chapters. While it is easy to connect users when they visit similar sets of venues in the same geographical regions, it is interesting and challenging to investigate ways to correlate users across geographical regions based on their local behaviors. In this chapter, we study the problem of community matching in LBSNs across geographical regions in the context of generating personalized recommendations of locally interesting venues to tourists. To so, we first propose a Bayesian approach to extract the latent social dimensions for users based on their local behaviors. We then match users’ preferences across geographical regions based on latent global interest factors. In the experiments, we both validate the quality of the extracted latent social dimensions and the community matching across geographical regions in the recommendation frameworks. The rest of the chapter is organized as follows. Section 5.1 reiterates the motivation of community matching across geographical regions by describing a real 116 world application scenario and the use of LBSNs as a solution and discusses the challenges of the problem. Section 5.2 reviews related works, which have not been covered in the literature review in Chapter 2, and yet are related to the problem and techniques described in this chapter. Section 5.3 gives an overview of the proposed framework, while Section 5.4 formally defines the problem. Sections 5.5 and 5.6 detail social dimensions extraction and recommendations generation, respectively. Section 5.7 reports the experimental results. Finally, Section 5.8 gives the concluding remarks. 5.1 Motivation and Challenges When we travel to new places, in additional to sightseeing, we are often interested in exploring local cultures, which match our personal interests, such as sampling local cuisines, understanding local customs, and visiting shops selling local special items, etc. However, there exists a large gap between what we want and what are provided by the dominant tourism resources, such as the Wikitravel, Lonely Planet and official tourism boards of certain countries, such as YourSingapore1 and AustraliaTravel2 . The gap is caused by two main reasons. First, these sites mainly provide information of famous attractions or popular local landmarks instead of locally interesting places. However, many tourists may want to experience local cultures that match their interests in terms of local food, events and shops. These locally interesting places or activities may not be famous enough to be included in these tourism resources. Second, they generate user-independent contents while people usually have drastically different personal preferences in reality. For example, people who love shopping may want to visit more popular local shops while food lovers are more interested in sampling different kinds of local foods, such as http://www.yoursingapore.com/ http://www.australiatravel.com/ 117 the local foods in Shilin Night Market in Taipei. On the other hand, rich location data at fine-grained level is now available from the recently emerging LBSNs. They are becoming more and more popular thanks to the recent availability of open mobile platforms, which makes LBSNs much more accessible to mobile users. These LBSNs are able to provide sufficient resources to bridge the aforementioned gap. First, they allow users to voluntarily annotate the real world with check-ins which indicate the specific times that the users were at particular locations. In addition, LBSNs provide “location-specific data”, in which users may check in at nearly the same geographical coordinates but at very different venues. For example, users can check in at a cinema or a restaurant in the same shopping mall where both venues share the same geographical coordinates. In contrast, cell phone data provides coarse location accuracy and cannot differentiate users’ presence across different floors in the same building. The active participation of Foursquare users and the fine-grained venue annotations make personalized recommendation of locally interesting venues possible. Collaborative filtering (CF) based approaches [48, 58] seem to be the plausible solutions to this problem as demonstrated by their great successes in commercial applications, such as Amazon [69], Netflix [11], Tivo [2] and eBay [152] and research on point-of-interest (POI) recommendations [150, 24, 156, 151]. These approaches automatically generate recommended items of a user using known preferences of other users or known items preferred by the target user. However, CF-based algorithms, being memory or model based, require sufficient overlaps among users in terms of items rated so that the correspondences among users or items can be readily identified. In LBSNs, however, users usually visit venues that are within a small geographical distance apart from their homes [25, 27], which makes it hard if not impossible to correlate users if they visit a very different set of venues with little/no overlap. Let’s consider the user-venue matrix shown in Table 5.1 where 118 Table 5.1: User-Venue Matrix (Values indicate number of visits). u1 u2 u3 u4 u5 u6 u7 u8 v1 0 0 v2 0 0 11 v3 0 0 v4 0 0 3 v5 0 0 v6 10 21 10 0 0 v7 15 0 0 v8 3 0 0 v9 0 0 0 v10 12 0 0 Table 5.2: “?” stands for preferences to be predicted. u1 u2 u3 u4 u5 u6 u7 u8 v1 ? ? ? ? v2 ? ? ? ? 11 v3 ? ? ? ? v4 ? ? ? ? 3 v5 ? ? ? ? v6 10 21 10 ? ? ? ? v7 15 ? ? ? ? v8 3 ? ? ? ? v9 0 ? ? ? ? v10 12 ? ? ? ? users {u1 , · · · , u4 } never visited venues {v1 , · · · , v5 } and users {u5 , · · · , u8 } never visited venues {v6 , · · · , v10 }. If we were to use traditional CF techniques, the ratings marked with “?” in Table 5.2 would be hard to be estimated. In addition, most CF algorithms are based on static models in which relations are assumed to be fixed at different times. However, users’ visiting behaviours often evolve over time [97] and exhibit strong temporal patterns, such as daily/weekly patterns and periodic property [25]. For example, people perform more check-ins at restaurants during meal time and visit shops mostly during weekend and weekday evenings. Hence, it requires an effective way to incorporate the temporal information. 119 5.2 Related Work The problem that we are investigating and the techniques we propose are related to four research areas, namely mobility prediction, location recommendation, travel recommendation and latent factor models. 5.2.1 location Prediction Location prediction based on cellular network traces has recently spurred lots of attention in the mobile computing [49, 154, 113, 43, 138]. The various proposed mobility models aim to provide an accurate prediction of individual’s future location, which is an essential requirement for various mobile applications, such as home heating control [115], urban planning [105, 20], mobile advertising [8, 7] and demographic prediction [16]. The basic concept in this research line is to compare a current pattern with historical data and to extract similar patterns for predicting the next location. Different from the objective of location prediction, the objective of location recommendation reviewed in Section 5.2.2 aims to recommend new locations to users to widen their choices though they adopt the similar evaluation strategies. In addition, location semantics are usually not readily available in the mobile phone data while venues in user-generated data come with rich annotations in various aspects, such as categories, comments, photos, etc. 5.2.2 Location Recommendation The recent boom of LBSNs have motivated emerging research on point-of-interest (POI) or more generally location recommendations [150, 151, 156]. Location recommendation aims to recommend a list of POIs or locations to a user based on the user’s past visiting histories. These lines of work usually focus on general recommendation tasks in a traditional CF framework. For example, Ye et al. compared the 120 influences from user similarity that based on historical behavior, geographical distance and friend network in POI recommendation task [150]. Ying et al. proposed to consider both user preferences and location properties in their recommendation framework [151]. Recently, Zhou et al. studied and compared the performances of different CF recommenders, including user-based, item-based and probabilistic latent semantic analysis in location recommendation [156], where they reported that the probabilistic approach gives the optimum performance. There are two main differences between our work and these related work: (1) we study a new problem which aims to provide tourists with recommendations based on their local visits; and (2) none of these work has studied the effects of simultaneously considering time, social relations and venue similarities. 5.2.3 Travel Recommendation In Web 2.0 communities, people often share their traveling experience in blogs, forums and social networks in terms of travelogues, photos, etc. These geo-referenced media resources contain rich information of tourism, which motivates research on generating travel recommendations from these user generated contents. Hao et al. proposed a location-topic model to model travelogue documents and develop a tour destination recommendation [57]. To recommend a destination, a user needs to issue an query and then the system utilizes the topic model to select a destination with the highest matching score. Cheng et al. leveraged community-contributed photos from Flickr to provide personalized travel recommendation based on people’s attributes, such as gender, race and age in a probabilistic Bayesian learning framework [23]. More recently, Lucchese et al. proposed an interactive random walk approach for personalized recommendations of touristy places based on the knowledge mined from Flickr and Wikipedia [75]. While these works all aim to provide personalized recommendations of touristy points based on users’ past be- 183 [96] A. Noulas, S. Scellato, R. Lambiotte, M. Pontil, and C. Mascolo. A tale of many cities: universal patterns in human urban mobility. PloS one, 7(5):e37027, 2012. [97] A. Noulas, S. Scellato, C. Mascolo, and M. Pontil. An empirical study of geographic user activity patterns in foursquare. Proceedings of the 5th International AAAI conference on Weblogs and Social Media, 2011. [98] A. Noulas, S. Scellato, C. Mascolo, and M. Pontil. Exploiting semantic annotations for clustering geographic areas and users in location-based social networks. The Social Mobile Web, 11, 2011. [99] G. Palla, I. Der´enyi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814–818, 2005. [100] F. Perronnin, J. S´anchez, and T. Mensink. Improving the fisher kernel for large-scale image classification. In Proceedings of the 11th European conference on Computer vision: Part IV, ECCV’10, pages 143–156, 2010. [101] M. A. Porter, J.-P. Onnela, and P. J. Mucha. Communities in networks. Notices of the AMS, 56(9):1082–1097, 2009. [102] D. Preotiuc-Pietro and T. Cohn. Mining user behaviours: A study of check-in patterns in location based social networks. In ACM Web Science, 2013. [103] I. Psorakis, S. Roberts, M. Ebden, and B. Sheldon. Overlapping community detection using bayesian non-negative matrix factorization. Physics Review E, 83:066114, 2011. 184 [104] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Defining and identifying communities in networks. Proceedings of the National Academy of Sciences of the United States of America, 101(9):2658–2663, 2004. [105] C. Ratti, S. Williams, D. Frenchman, and R. Pulselli. Mobile landscapes: using location data from cell phones for urban analysis. Environment and Planning B Planning and Design, 33(5):727, 2006. [106] W. Ren, G. Yan, X. Liao, and L. Xiao. Simple probabilistic algorithm for detecting community structure. Physical Review E, 79(3):036111, 2009. [107] I. Rhee, M. Shin, S. Hong, K. Lee, S. J. Kim, and S. Chong. On the levywalk nature of human mobility. Networking, IEEE/ACM Transactions on, 19(3):630–643, 2011. [108] M. Rosvall and C. T. Bergstrom. An information-theoretic framework for resolving community structure in complex networks. Proceedings of the National Academy of Sciences, 104(18):7327–7331, 2007. [109] D. M. Roy, C. Kemp, V. K. Mansinghka, and J. B. Tenenbaum. Learning annotated hierarchies from relational data. Advances in neural information processing systems, 19:1185, 2007. [110] R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization using markov chain monte carlo. In Proceedings of the 25th international conference on Machine learning, ICML ’08, pages 880–887, 2008. [111] P. Sarkar and A. W. Moore. Dynamic social network analysis using latent space models. ACM SIGKDD Explorations Newsletter, 7(2):31–40, 2005. 185 [112] S. Scellato, C. Mascolo, M. Musolesi, and V. Latora. Distance matters: geosocial metrics for online social networks. In Proceedings of the 3rd conference on Online social networks, WOSN’10, pages 8–8, 2010. [113] S. Scellato, M. Musolesi, C. Mascolo, V. Latora, and A. Campbell. Nextplace: A spatio-temporal prediction framework for pervasive systems. 6696:152–169, 2011. [114] S. Scellato, A. Noulas, R. Lambiotte, and C. Mascolo. Socio-spatial properties of online location-based social networks. Proceedings of the 5th International AAAI Conference on Weblogs and Social Media, 11:329–336, 2011. [115] J. Scott, A. Bernheim Brush, J. Krumm, B. Meyers, M. Hazas, S. Hodges, and N. Villar. Preheat: controlling home heating using occupancy prediction. In Proceedings of the 13th international conference on Ubiquitous computing, pages 281–290. ACM, 2011. [116] D. Seung and L. Lee. Algorithms for non-negative matrix factorization. Advances in neural information processing systems, 13:556–562, 2001. [117] J. Shi and J. Malik. Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(8):888–905, 2000. [118] Y. Shi, X. Zhao, J. Wang, M. Larson, and A. Hanjalic. Adaptive diversification of recommendation results via latent factor portfolio. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’12, pages 175–184, 2012. [119] B. Sigurbj¨ornsson and R. van Zwol. Flickr tag recommendation based on collective knowledge. In Proceedings of the international conference on World wide web, pages 327–336, 2008. 186 [120] A. P. Singh and G. J. Gordon. A unified view of matrix factorization models. In Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II, ECML PKDD ’08, pages 358–373, 2008. [121] S. Smyth and S. White. A spectral clustering approach to finding communities in graphs. In Proceedings of the fifth SIAM international conference on data mining, volume 119, page 274, 2005. [122] C. Song, Z. Qu, N. Blumm, and A.-L. Barab´asi. Limits of predictability in human mobility. Science, 327(5968):1018–1021, 2010. [123] W. Stanley and K. Faust. Social network analysis: Methods and applications. New York: Cambridge University, 1994. [124] J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos. Neighborhood formation and anomaly detection in bipartite graphs. In The 5th IEEE International Conference on Data Mining, pages pp.–, 2005. [125] P. Tan, M. Steinbach, V. Kumar, et al. Introduction to data mining. 2006. [126] P.-N. Tan et al. Introduction to data mining. Pearson Education India, 2007. [127] L. Tang and H. Liu. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’09, pages 817–826, 2009. [128] L. Tang and H. Liu. Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM conference on Information and knowledge management, CIKM ’09, pages 1107–1116, 2009. 187 [129] L. Tang and H. Liu. Understanding group structures and properties in social media. In Link Mining: Models, Algorithms, and Applications, pages 163–185. Springer, 2010. [130] L. Tang, H. Liu, J. Zhang, and Z. Nazeri. Community evolution in dynamic multi-mode networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’08, pages 677– 685, 2008. [131] L. Tang, X. Wang, and H. Liu. Uncoverning groups via heterogeneous interaction analysis. In Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, ICDM ’09, pages 503–512, 2009. [132] L. Tang, X. Wang, and H. Liu. Understanding emerging social structures: A group-profiling approach. Technical Report, 2010. [133] M. Thelwall. Homophily in myspace. Journal of the American Society for Information Science and Technology, 60(2):219–231, 2009. [134] M. A. Vasconcelos, S. Ricci, J. Almeida, F. Benevenuto, and V. Almeida. Tips, dones and todos: uncovering user profiles in foursquare. In Proceedings of the 5th ACM international conference on Web search and data mining, WSDM ’12, pages 653–662, 2012. [135] A. Vedaldi and B. Fulkerson. Vlfeat: an open and portable library of computer vision algorithms. In Proceedings of the international conference on Multimedia, MM ’10, pages 1469–1472, 2010. [136] U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395–416, 2007. 188 [137] G. Wang, Y. Shen, and M. Ouyang. A vector partitioning approach to detecting community structure in complex networks. Comput. Math. Appl., 55(12):2746–2752, 2008. [138] J. Wang and B. Prabhala. Periodicity based next place prediction. In Nokia Mobile Data Challenge 2012 Workshop. p. Dedicated task, volume 2, 2012. [139] K. Wang, J. Zhang, D. Li, X. Zhang, and T. Guo. Adaptive affinity propagation clustering. arXiv preprint arXiv:0805.1096, 2008. [140] X. Wang, L. Tang, H. Gao, and H. Liu. Discovering overlapping groups in social media. In Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM ’10, pages 569–578, 2010. [141] Z. Wang, D. Zhang, D. Yang, Z. Yu, X. Zhou, and Z. Yu. Investigating city characteristics based on community profiling in lbsns. In 2012 Second International Conference on Cloud and Green Computing, pages 578–585, 2012. [142] C.-Y. Weng, W.-T. Chu, and J.-L. Wu. Rolenet: Movie analysis from the perspective of social networks. Multimedia, IEEE Transactions on, 11(2):256– 271, 2009. [143] A. W. Wolfe. Social network analysis: Methods and applications. American Ethnologist, 24(1):219–220, 1997. [144] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma. Robust face recognition via sparse representation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31(2):210–227, 2009. [145] X. Wu, J. Yan, N. Liu, S. Yan, Y. Chen, and Z. Chen. Probabilistic latent semantic user segmentation for behavioral targeted advertising. In Proceedings 189 of the 3rd International Workshop on Data Mining and Audience Intelligence for Advertising, ADKDD ’09, pages 10–17, 2009. [146] J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. Sun database: Largescale scene recognition from abbey to zoo. In Computer Vision and Pattern Recognition, 2010 IEEE Conference on, pages 3485 –3492, 2010. [147] J. Xie, S. Kelley, and B. Szymanski. Overlapping community detection in networks: the state of the art and comparative study. Arxiv preprint arXiv:1110.5813, 2011. [148] L. Xiong, X. Chen, T. Huang, J. Schneider, and J. Carbonell. Temporal collaborative filtering with bayesian probabilistic tensor factorization. In SIAM, 2010. [149] B. Xu, J. Bu, C. Chen, and D. Cai. An exploration of improving collaborative recommender systems via user-item subgroups. In Proceedings of the 21st international conference on World Wide Web, WWW ’12, pages 21–30, 2012. [150] M. Ye, P. Yin, W.-C. Lee, and D.-L. Lee. Exploiting geographical influence for collaborative point-of-interest recommendation. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, SIGIR ’11, pages 325–334, 2011. [151] J. J.-C. Ying, E. H.-C. Lu, W.-N. Kuo, and V. S. Tseng. Urban point-ofinterest recommendation by mining user check-in behaviors. In Proceedings of the ACM SIGKDD International Workshop on Urban Computing, UrbComp ’12, pages 63–70, 2012. [152] T. T. Yuan, Z. Chen, and M. Mathieson. Predicting ebay listing conversion. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, SIGIR ’11, pages 1335–1336, 2011. 190 [153] S. Zhang, R.-S. Wang, and X.-S. Zhang. Identification of overlapping community structure in complex networks using fuzzy c-means clustering. Physica A: Statistical Mechanics and its Applications, 374(1):483–490, 2007. [154] Y. Zheng, Q. Li, Y. Chen, X. Xie, and W.-Y. Ma. Understanding mobility based on gps data. In Proceedings of the 10th international conference on Ubiquitous computing, pages 312–321. ACM, 2008. [155] D. Zhou, S. A. Orshanskiy, H. Zha, and C. L. Giles. Co-ranking authors and documents in a heterogeneous network. In Proceedings of the 7th IEEE International Conference on Data Mining, ICDM ’07, pages 739–744, 2007. [156] D. Zhou, B. Wang, S. Rahimi, and X. Wang. A study of recommending locations on location-based social network by collaborative filtering. 7310:255– 266, 2012. [157] T. Zhou, J. Ren, M. c. v. Medo, and Y.-C. Zhang. Bipartite network projection and personal recommendation. Physical Review E, 76:046115, 2007. [158] V. Zlati´c, G. Ghoshal, and G. Caldarelli. Hypergraph topological quantities for tagged social networks. Physical Review E, 80(3):036118, 2009. 191 Appendix A Conditional Distribution in Gibbs Sampling In this appendix, we give the updated conditional distributions used in Algorithm 3. According to the graphical model shown in Fig. 5.2 in the paper, the joint posterior distribution can be factorized as: p(U, V, S, D, T, τQ , τB , τR , ΘU , ΘV , ΘS , ΘD , ΘT |Q, R, B) ∝ p(Q|U, V, T, τQ )p(R|U, S, τR )p(B|V, D, τB ) (A.1) p(U|ΘU )p(V|ΘV )p(S|ΘS )p(D|ΘD )p(T|ΘT ) p(τQ )p(τB )p(τR )p(ΘU )p(ΘV )p(ΘS )p(ΘD )p(ΘT ) By plugging in all the model components described in Section 5.5.3.4 in the paper and carrying out marginalization for each variable, we derive the conditional distributions in the following subsections for hyperparameters and model parameters, respectively. 192 A.1 Conditional Distributions of Hyperparameters In this subsection, we give details of conditional distributions of hyperparameters, which include precision variables and model parameters for latent variables. Precision Variables (1) τQ By using the conjugate prior for the precision τQ , we have that the conditional distribution of τQ given Q, U, V, T also follows the Wishart distribution: p(τQ |Q, U, V, T) = W(τQ |W1∗ , v1∗ ), (A.2) where W1∗ and v1∗ are the parameters in the posterior distribution and updated as follows. (W1∗ )−1 = W1−1 + v ∗ = v1 + N i=1 N i=1 M j=1 M j=1 T k k k=1 Iij (Qij − ui , vj , tk )2 , T k k=1 Iij . (2) τR Similarly, we derive the conditional distribution of τR given S, U, R as follows. p(τR |R, U, S) = W(τR |W1∗ , v1∗ ), (A.3) where (W1∗ )−1 = W1−1 + v ∗ = v1 + N r=1 N r=1 N R i=1 Iri . N R i=1 Iri (Rri − sTr ui )2 , (A.4) 193 (3) τB And the conditional distribution of τB given B, V, D is: p(τB |B, V, D) = W(τB |W1∗ , v1∗ ), (A.5) where (W1∗ )−1 = W1−1 + v ∗ = v1 + M j=1 M B l=1 Ijl (Bjl M j=1 − vjT dl )2 , (A.6) M B l=1 Ijl . Hyperparameters for Model Variables Next, we work out the conditional distribution for ΘU ≡ {µU , ΛU }, ΘV ≡ {µV , ΛV }, ΘS ≡ {µS , ΛS }, ΘD ≡ {µD , ΛD } and ΘT ≡ {µT , ΛT }. (1) ΘU ΘU is conditionally independent on all the other parameters given U. We thus integrate out all the random variables in Eq (A.1) except U and obtain the GaussianWishart distribution. p(ΘU |U) = N (µU |µ∗0 , (β ∗ ΛU )−1 )W(ΛU |W0∗, v0∗ ), (A.7) where the parameters are updated as follows. µ∗0 = ¯ βµ0 +N u , β+N β ∗ = β + N, v0∗ = v0 + N, (W∗ )−1 = W−1 + NΦ + 0 βN (µ0 β+N (A.8) T ¯ )(µ0 − u ¯) . −u 194 ¯= where u N i=1 N N ui and Φ = N i=1 (ui ¯ )(ui − u ¯ )T . −u (2) ΘV Similarly, we can get the conditional distribution for ΘV is as follows. p(ΘV |V) = N (µV |µ∗0 , (β ∗ ΛV )−1 )W(ΛV |W0∗, v0∗ ), (A.9) where µ∗0 = ¯= where v ¯ βµ0 +M v , β+M β ∗ = β + M, v0∗ = v0 + M, (W∗ )−1 = W−1 + MΦ + 0 M M j=1 vj and Φ = N βM (µ0 β+M M j=1 (vi (A.10) T ¯ )(µ0 − v ¯) . −v ¯ )(vj − v ¯ )T . −v (3) ΘS And the conditional distribution for ΘS is: p(ΘS |S) = N (µS |µ∗0 , (β ∗ ΛS )−1 )W(ΛS |W0∗, v0∗ ), (A.11) where µ∗0 = where ¯s = N βµ0 +N¯ s , β+N β ∗ = β + N, v0∗ = v0 + N, (W∗ )−1 = W−1 + NΦ + 0 N i=1 si and Φ = N βN (µ0 β+N N i=1 (si (A.12) − ¯s)(µ0 − ¯s)T . − ¯s)(si − ¯s)T . (4) ΘD The conditional distribution for ΘD is: p(ΘD |D) = N (µD |µ∗0 , (β ∗ΛD )−1 )W(ΛD |W0∗, v0∗ ), (A.13) 195 where µ∗ = ¯= where d ¯ βµ0 +M d , β+M β ∗ = β + M, v0∗ = v0 + M, (W∗ )−1 = W−1 + MΦ + 0 M M j=1 vj and Φ = N βM (µ0 β+M M j=1 (di ¯ ¯ T − d)(µ − d) . ¯ i − d) ¯ T. − d)(d (5) ΘT Finally, the conditional distribution of ΘT also follows the Gaussian-Wishart distribution: p(ΘT |T) = N (µT |µ∗1 , (β ∗ΛT )−1 )W(ΛT |W0∗, v0∗ ), (A.14) where µ∗ = βµ1 +t1 , β+1 β ∗ = β + 1, v0∗ = v0 + T, (W∗ )−1 = W−1 + 0 A.2 T k=2 (tk (A.15) T − tk−1 )(tk − tk−1) + β (t β+1 T − µ1 )(t1 − µ1 ) . Conditional Distributions of Model Variables In this subsection, we give details of conditional distributions of model parameters: U, V, S, D, T. (1) U The conditional distribution of U can be factorized with respect to each individual user: N p(U|Q, V, T, R, S, τQ , τR , ΘU ) = i=1 p(ui |Q, V, T, R, S, τQ , τR , ΘU ) (A.16) 196 and p(ui |Q, V, T, R, S, τQ , τR , ΘU ) = N (ui |µ∗i , (Λ∗i )−1 ), (A.17) where µ∗i = (Λ∗i )−1 (ΛU µU + τQ Λ∗ = ΛU + τQ i M j=1 M j=1 T k k k=1 Iij Qij ajk T k T k=1 Iij ajk ajk + τR + τR N R r=1 Iri Rri sr ), N R T r=1 Iri sr sr . where ajk = vj ◦ tk is the element-wise product of vj and tk . (2) V Similarly, the conditional distribution of V can be factorized with respect to each venue as follows: p(vj |Q, U, T, B, D, τQ , τB , ΘV ) = N (vj |µ∗j , (Λ∗j )−1 ), (A.18) where µ∗ = (Λ∗ )−1 (ΛV µV + τQ j j Λ∗ = ΛV + τQ j N i=1 N i=1 T k T k=1 Iij fik fik T k k k=1 Iij Qij fik + τB + τB M B l=1 Ijl Bjl dl ), M B T l=1 Ijl dl dl . where fik = ui ◦ tk is the element-wise product of ui and tk . (3) S And the conditional distribution of sr is: p(sr |R, U, τR , ΘS ) = N (sr |µ∗r , (Λ∗r )−1 ), where (A.19) 197 µ∗r = (Λ∗r )−1 (ΛS µS + τR N R i=1 Iri Rri ui ), N R T i=1 Iri ui ui Λ∗ = ΛS + τR r (4) D The conditional distribution of dl is p(dl |B, V, τB , ΘD ) = N (dl |µ∗l , (Λ∗l )−1 ), (A.20) where µ∗ = (Λ∗ )−1 (ΛD µD + τB l l Λ∗ = ΛD + τB l M B j=1 Ijl Bjl vj ), M T B j=1 Ijl vj vj (5) T Finally, the conditional distribution of tk also follows the Gaussian distribution as follows: p(tk |Q, U, V, t−k , τQ , ΘT ) = N (tk |µ∗k , (Λ∗k )−1 ), (A.21) where t−k denotes all the time feature vectors except tk . The mean vectors and the precision matrices are updated as follows: 198 t2 +µT if k = M k k µ∗k = (Λ∗k )−1 ΛT (tk−1 + tk+1 ) + τQ N if ≤ k < T j=1 Iij Qij xij i=1 M k k (Λ∗k )−1 (ΛT tk−1 + τQ N if k = T j=1 Iij Qij xij ) i=1 M k T 2ΛT + τQ N if k < T j=1 Iij xij xij i=1 ∗ Λk = M k T ΛT + τQ N if k = T j=1 Iij xij xij i=1 where xij = ui ◦ vj is the element-wise product of ui and vj . [...]... exercising preferences in terms of venues and times: some prefer jogging in the morning in their neighbourhoods; some like to exercise during weekends in nature parks and the others may prefer to exercise in the gyms after work The inherent heterogenous user preferences make it hard to interpret the connections between people in social networks Towards gaining insights on the underlying users’ interests, Tang... foreign check-ins New York City Chicago Singapore London N 26, 41 1 7, 138 8, 033 6, 320 ML 64, 249 36, 1 64 50, 722 25, 031 CL 44 8, 072 353, 290 40 6, 49 0 258, 605 MF 301, 782 120, 940 20, 940 66, 031 CF 810, 545 341 , 651 36, 8 74 158, 605 where Cb , k = 1, 2, · · · are communities at region b and s(·, ·) is the cosine similarity k between two communities’ sparse representations 5.7 Experiments In this section,... Continued on next page 1 24 Table 5.3 – continued from previous page Notation Explanation T the set of location- independent time periods Cg the set of check-ins in geographical region g ug i the ith user in geographical region g g vi the ith venue in geographical region g ti the ith location- independent time period g (ug , vj , tk ) i the i’s user visits the jth venue in geographical region g during... simultaneously considering all users as potential community centres and then keep exchanging messages among them until a good set of communities emerges [39] In this work, to avoid parameter tuning, we use adaptive affinity propagating (AAP) which improves AP by automatically adjusting the damping factor and preference during the learning process [139] Given the set of interest communities detected in each geographical... locally interesting venues in a, and Nb is the number of users in b 5.5 Social Dimensions Extraction In LBSNs, users exhibit heterogenous visiting behaviors, which naturally classify them into different interest groups, such as food lovers, shoppers, etc In addition, even within the similar interest groups, people exhibit different preferences For example, sports lovers may have different exercising preferences... 0.3351) 143 (a) Total check-ins (b) Sampled check-ins Figure 5.5: Comparison between the popular venue distributions in London (JS Divergence = 0 .40 66) (a) Total check-ins (b) Sampled check-ins Figure 5.6: Comparison between the popular venue distributions in Singapore (JS Divergence = 0.2 841 ) 144 5.7.1 Dataset Reliability/Representativeness We conduct the experiments on the dataset described in Section... the dataset is reproduced in Table 5 .4 In this section, we aim to investigate whether the number of check-ins we obtained are reliable This is because we sample the Foursquare check-ins by using Twitter streaming API, while not all users share their check-ins through Twitter To verify that the check-ins we sampled are reliable, we count the number of check-ins for each venue of interest and compare it... community matching 5.6.1 Local Community Profiling With the extracted users’ underlying social dimensions, we seek to first group them according to their latent interests at the regional level There are several approaches to detecting communities or dense subgroups, such as clustering or community detection However, we do not know the number of communities in a region beforehand Also the number of interest... Gg the undirected social network graph in geographical region g Eg 1 the edge set representing the social relations between users in geographical region g Rg the adjacency matrix representing the social relations between users in geographical region g Hg the undirected social network graph in geographical region g Eg 2 the edge set representing the affinity relations between venues in geographical region... ka , kb are the 140 number of local interest communities in region a and b, respectively The joint community representation of communities at these two regions are then Cab = [Ca Cb ] ∈ Rl×(ka +kb ) People usually have multiple interests with different strength For example, most of the tourists are interested in local food sampling and shopping but some of them are more interested in food while others . to exercise in the gyms after work. The inherent heterogenous user preferences make it hard to interpret the connections between people in social networks. Towards gaining insights on the underlying. proposed approach. Meaningful and interpretable communities were detected in an optimal way while interesting culture differences were revealed by analyzing the communities in Singapore and New Yo. considering time, social relations and venue similarities. 5.2.3 Travel Recommendation In Web 2.0 communities, people often share their traveling experience in blogs, fo- rums and social networks in