Three local subsets were selected for comparing with 1:1000000 scale China vegetation map, and classification products based on the Landsat-7 satellite data to further evaluate the integrated product (Fig. 3). Generally, the land cover pattern of the integrated prod‐
ucts in each subset appears similar to the China vegetation map and Landsat-7 classifi‐
cation. The integrated product was validated against the Landsat-7 classification. The accuracy of the integrated product was 53.4%, 52.8%, and 55.3% for region A, B and C, respectively, which have improved by about 8.2% to 24.8%. For region A in the Tibetan Plateau, the distribution pattern of the land cover classes in integrated product presented good agreement with Landsat-based classification and national vegetation map. The MC51 products apparently underestimated grassland. For region B in the boundary area of the Daxin’anling Mountains, the MC51 products significantly overestimated crop‐
lands, and the transition zone were also not clear. The extent of the cropland in integrated product is accordant with the reality, and forest types were also classified at high accu‐
racy. For region C in the Sanjiang plain, MC51 product underestimated water, permanent wetlands, and overestimated the grassland. But in integrated product, they agreed with reference data well. Overall, our integrated product achieved greater overall accuracy than MC51 product. But, some land cover classes were more or less overestimated in some regions. It is interesting where the grassland or wetlands was overestimated in local product, they were also overestimated in integrated product. Apparently, the accu‐
racy of the participant products also can affect the quality of the integrated product.
672 H. Gao et al.
Fig. 3. Comparison of the three local subsets among four land cover map. (a) Subset A in the Tibetan Plateau of northwest China; (b) Subset B in the Daxin’anling Mountain of northeast China;
(c) Subset C in the Sanjiang plain of northeast China.
4 Discussion and Conclusion
A method based on the accuracy of multi-satellite land cover products was proposed and applied to integrate a land cover product for 2001 over China. The accuracy of the MODIS, GLC2000, WESTDC, and NLCD2000 land cover products were validated with validation points to establish integration rules. Generally, the distribution patterns of the land cover in integrated product were captured reliably. The overall accuracy of the integrated product was 68.7%, a major improvement compared to the original MC51 product. Most of the class-specific accuracy in integrated product increased. Addition‐
ally, the distribution patterns of the land cover in our integrated product showed good agreement with Landsat-based classification and 1:1000000 scale national vegetation map. The quality of the land cover products for integration are critical in our integration method, the accuracy of the integrated product is dependant on the accuracy of partici‐
pant land cover products for integration. Local land cover products played an important role in the integration process, their contribution are significant. Major improvement of classification accuracy was found in classes that are supported by the local products.
The integration rules in our study are established in terms of the accuracy validated by the ground truth, which are more objective and credible, but some limitations still need to be considered. First, in the preprocessing process, the LCCS classification schemes used in the GLC2000 product was translated to IGBP classification scheme according to recommended relationship. Meanwhile, the different spatial resolution with the products were resampled to the same spatial resolution, which may have introduced Generate Integrated Land Cover Product for Regional Climate Model 673
additional uncertainty. Second, the accuracy of participant land cover products need to be further validated as additional reference data become available. Our results implicate high accuracy of the participant products could bring great improvement of the quality of the integrated product. Finally, the integrated product also need to be further rigorous validated against ground truth. With high quality land cover products become available in the future, we also expect to improve methods to produce more accurate and up-to- date land cover products for climate modeling.
Acknowledgments. The study was supported by the National Key Research and Development Program of China (Grant No. 2016YFA0600303). We thank the Land Processes Distributed Active Archive Center (LP DAAC), the Joint Research Centre (JRC) of Global Vegetation Monitoring Units, the Data Sharing Infrastructure of Earth System Science, and the Environmental
& Ecological Science Data Center for West China, National Natural Science Foundation of China for their helpful response to our inquiry on satellite product.
References
1. Turner, B.L., Lambin, E.F., Reenberg, A.: The emergence of land change science for global environmental change and sustainability. Proc. Natl. Acad. Sci. 104, 20666–20671 (2007) 2. Bonan, G.B., Levis, S., Kergoat, L., Oleson, K.W.: Landscapes as patches of plant functional
types: an integrating concept for climate and ecosystem models. Global Biogeochem. Cycles 16, 1021 (2002)
3. Feddema, J.J., Oleson, K.W., Bonan, G.B., et al.: The importance of land-cover change in simulating future climates. Science 310, 1674–1678 (2005)
4. Bartholomé, E., Belward, A.S.: GLC2000: a new approach to global land cover mapping from earth observation data. Inter. J. Remote Sens. 26, 1959–1977 (2005)
5. Friedl, M.A., Sulla-Menashe, D., Tan, B., et al.: Modis collection 5 global land cover:
algorithm refinements and characterization of new datasets. Remote Sens. Environ. 114, 168–
182 (2010)
6. GLOBCOVER 2009: Products Description and Validation Report. http://ionia1.esrin.esa.int/
docs/GLOBCOVER2009_Validation_Report_2.2.pdf
7. Liu, J., Liu, M., Deng, X., Zhuang, D., Zhang, Z., Luo, D.: The land use and land cover change database and its relative studies in China. J. Geog. Sci. 12, 275–282 (2002)
8. Ran, Y.H., Li, X., Lu, L., Li, Z.Y.: Large-scale land cover mapping with the integration of multi-source information based on the Dempster-Shafer theory. Int. J. Geogr. Inf. Sci. 26, 169–191 (2012)
9. Ge, J., Qi, J., Lofgren, B.M., et al.: Impacts of land use/cover classification accuracy on regional climate simulations. J. Geophys. Res. 112, D05107 (2007)
10. Sertel, E., Robock, A., Ormeci, C.: Impacts of land cover data quality on regional climate simulations. Int. J. Climatol. 30, 1942–1953 (2010)
11. McCallum, I., Obersteiner, M., Nilsson, S., Shvidenko, A.: A spatial comparison of four satellite derived 1 km global land cover datasets. Int. J. Appl. Earth Obs. Geoinf. 8, 246–255 (2006)
12. Herold, M., Mayaux, P., Woodcock, C.E., et al.: Some challenges in global land cover mapping: an assessment of agreement and accuracy in existing 1 km datasets. Remote Sens.
Environ. 112, 2538–2556 (2008) 674 H. Gao et al.
13. Kaptué, T.A.T., Roujean, J.L., De Jong, S.M.: Comparison and relative quality assessment of the GLC2000, GLOBCOVER, MODIS and ECOCLIMAP land cover data sets at the African continental scale. Int. J. Appl. Earth Obs. Geoinf. 13, 207–219 (2011)
14. Herold, M., Woodcock, C.E., Gregorio, A.D., et al.: A joint initiative for harmonization and validation of land cover datasets. IEEE Trans. Geosci. Remote Sens. 44, 1719–1727 (2006) 15. Jung, M., Henkel, K., Herold, M., Churkina, G.: Exploiting synergies of global land cover
products for carbon cycle modeling. Remote Sens. Environ. 101, 534–553 (2006)
16. Kinoshita, T., Iwao, K., Yamagata, Y.: Creation of a global land cover and a probability map through a new map integration method. Int. J. Appl. Earth Obs. Geoinf. 28, 70–77 (2014) 17. Fritz, S., You, L., Bun, A., et al.: Cropland for sub-Saharan Africa: a synergistic approach
using five land cover data sets. Geophys. Res. Lett. 38, L04404 (2011)
18. Pérez-Hoyos, A., García-Haro, F.J., San-Miguel-Ayanz, J.: A methodology to generate a synergetic land-cover map by fusion of different land-cover products. Int. J. Appl. Earth Obs.
Geoinf. 19, 72–87 (2012)
19. Mayaux, P.: Validation of the global land cover 2000 map. IEEE Trans. Geosci. Remote Sens.
44, 1728–1739 (2006)
20. Zhang, Z., Wang, X., Zhao, X., et al.: A 2010 update of National Land Use/Cover Database of China at 1:100000 scale using medium spatial resolution satellite images. Remote Sens.
Environ. 149, 142–154 (2014)
21. Zhang, S.: An introduction of wetland science database in China. Sci. Geogr. Sin. 22, 188–
189 (2002)
Generate Integrated Land Cover Product for Regional Climate Model 675
Security and Privacy in Collaborative System: Workshop on Social Network
Analysis
A Novel Social Search Model Based on Clustering Friends in LBSNs
Yang Sun, Jiuxin Cao(✉), Tao Zhou, and Shuai Xu Jiangsu Provincial Key Laboratory of Network and Information Security, School of Computer Science and Engineering, Southeast University, Nanjing, China
{sunyang,jx.cao,zhoutao,xushuai}@seu.edu.cn
Abstract. With the development of online social networks (OSNs), OSNs have become an indispensable part in people’s life. People tend to search information through OSNs rather than traditional search engines. Especially with the appear‐
ance of location-based social networks (LBSNs), social search in LBSNs is increasingly important in the burgeoning mobile trend. This paper proposes a novel social search model, harnesses users’ social relationship and location features provided by LBSNs to design a ranking algorithm that takes three kinds of ranking scores into account comprehensively: Social Score (scores based on social influence), Searching Score (scores based on professional relevance) and Spatial Score (scores based on distance), finally produces high-quality searching results. Once receiving users’ query, the social search engine aims to return a list of ranking POIs (points of interests) that satisfies users. The dataset is extracted from Foursquare, a real-world LBSN. The experiment results show that the ranking algorithm can benefit the social search model in LBSNs evidently.
Keywords: LBSNs ã Social search model ã Social Score ã Searching Score ã Spatial Score
1 Introduction
In the past few years, with the rapid development in the mobile field, location-based social network services such as Foursquare and Yelp, have seen increasing popularity, attracting millions of users. Supported by the capabilities of portable devices like smart phones, the location-aware technology like GPS and Wi-Fi, people can easily share their locations, comments and other information with other users. The LBSN [1] services not only help users to strengthen their social connections, but also provide useful searching information.
Information retrieval and knowledge discovery are the main purposes of the web search. Because of the fast and updated information available on the web, users usually rely on search engines to obtain the information. Searching is always considered as an individual activity [2] in traditional social engines like Google, however, with the popu‐
larity of OSNs, people are pursuing personalized searching and mass collaboration.
Social search [3] could meet people’s needs, which makes users find out right people (friends, other similar users or domain experts) quickly and accurately to answer
© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2017 S. Wang and A. Zhou (Eds.): CollaborateCom 2016, LNICST 201, pp. 679–689, 2017.
DOI: 10.1007/978-3-319-59288-6_68
questions. Recently, Facebook has partnered with Bing and introduced a social search engine called “Graph Search” [4] that associates the results with friends’ suggestions.
Applying social search on the LBSNs is an appealing trend. When users search a nearest POI with friends visiting experiences provided by LBSNs, in addition to the tradi‐
tional social information, exploiting useful location information could make searching results more accurate. For example, if a user wants to search a suitable restaurant for dinner, however, he does not be familiar with the surrounding restaurants, then all the restaurants are candidates and it is better to pick a restaurant which is near the place and has received high evaluations from his friends. On the one hand, the searching results are high-quality. The picked restaurants are both short-distance and high-evaluation, which is better than the traditional social search that only considers social relation; on the other hand, the social search engine provides believable results to users. Users are more inclined to believe and choose POIs once showing friends’ experience or evaluations.
Considering such problems, in this paper, we propose a social search model and design a novel ranking algorithm. The dataset is extracted from Foursquare that is a heterogeneous network, and the data is quite sparse. Sparse data could largely influence the accuracy of results. To enhance the data density, we creatively cluster user’s friends in the research of social search. Based on clustering friends, the ranking algorithm crea‐
tively considers Social Score, Searching Score and Spatial Score comprehensively.
Social Score means social influence, social features include not only the traditional social relationship but also location features; Searching Score means professional relevance, which measures the similarity between the query and POIs; Spatial Score is the distance between the locations of users and POIs, the shorter distance means the better score.
The contributions of this paper can be summarized as follows:
1. To enhance the data density and reduce the influence of the sparse data, as far as we know, it is the first time to apply clustering users’ friends in the research of social search in LBSNs;
2. To get high-quality ranking results and consider the distance factor in reality, in addition to the traditional Social Score, we take Searching Score and Spatial Score into account in the research of social search.
The rest of this paper is organized as follows. Section 2 reviews related work on social search in OSNs and LBSNs. Then an overview of the social search model is introduced in Sect. 3. The details of the ranking algorithm is presented in Sect. 4.
Section 5 describes the validation of our model. Finally, we conclude this paper and state several directions for future work in Sect. 6.
2 Related Work
With the increasing popularity of social networking platforms, social search is attracting significant number of interests in the research field since traditional search engines do not always provide high-quality searching results. However, social search is personal‐
ized, so there are different social search engines and social search algorithms [5], a lot of works are done based on different start points.
680 Y. Sun et al.
Some researches concentrate on the problem of designing social search engine.
HeyStaks [6, 7] is an Irish social search engine, it applies the recommended technology on Google, Bing and Yahoo based on users’ interests and reputation, then returns searching results from Twitter and OSNs. M. R. Bouadjenek, H. Hacid and M. Bouzeghoub [8] introduce a social search engine called LAICOS, which includes social information and personal services. On the one hand, it can provide personalized social document representations; on the other hand, users can use its personalized social query expansion framework to expand searching process. Horowitz and Kamvar [9]
design a large-scale social search engine called Aardvark. They use an intimacy metric between users and connect users with specific questions to find the user who is most likely to be able to answer the question. The intimacy is set based on many features, including vocabulary match, profile similarity, social connection and so on.
Some other researches focus on the problem of improving social algorithms.
D. Sharma et al. [10] present a self-adapting social search algorithm based on proximity, similarity and interaction. Bao et al. [11] explore the use of social annotations, they propose SocialPageRank to measure the page popularity based on its annotations and SocialSimRank for the similarity between social annotations and web queries. Guo Liang et al. [12] present two ranking algorithms: topic relevance rank (TRR) evaluates users’ professional score on the relevant topics; social relation rank (SRR) captures the social relation strength between users.
However, there are quite few researches on social search in LBSNs. Hu et al. [13]
define friends-based k nearest neighbors (F-KNN) query, which aims at finding objects near the query location as well as receiving high evaluation from user’s friends. But they pay main attention to increasing the searching speed, so they design a F-Quadtree index, and do not perform well on the searching accuracy based on social features. Yuan et al.
[14] propose a KNN search on road networks by incorporating social influence, but they do not perform well on the speed of the computation of the social influence over large road and social networks. In contrast to the above works, our research aims to design a good social search model that provides accurate results quickly.
3 The Social Search Model
The social search model is shown in Fig. 1. Vertically, like traditional search engines, the whole architecture is divided into two parts: offline crawler and online searching.
Horizontally, there are three main parts: Social Score, Searching Score and Spatial Score. The different functions of these components are described below.
Database. Database maintains the data basis of the social search architecture. The dataset is crawled and extracted from Foursquare, the data types include user’s ID and relation; POIs’ information that users need to search, including POIs’ name, ID, cate‐
gory, description, latitude and longitude; check-ins’ ID; timezone.
Searching Score. In this part, we design a search engine based on Lucene [15]. The core of search engine is Inverted Index [16]. We build an Inverted Index based on POIs’
information. Then by calculating the similarity scores between user’s query and the A Novel Social Search Model Based on Clustering Friends in LBSNs 681
“document” in the index, the alternative POIs’ names are picked out and sent to the Social Score and Spatial Score. The similarity scores are sent to Ranking Engine.
Social Score. This part provides offline algorithms, which are updated K-means and updated KNN, the purpose of the updated K-Means is to cluster users’ friends to enhance the data density; the purpose of updated KNN is to find out some friends that are most similar with users. Users are always more inclined to believe the most similar friends [17]. Then we extract some social features including friends’ activity and evaluation on the alternative POIs to calculate social scores that are sent to the Ranking Engine.
Spatial Score. Users are more inclined to visit short-distance POIs, so the distance between the locations of users and alternative POIs could be transformed as scores that are sent to the Ranking Engine.
Ranking Engine. This part produces the final alternative POIs’ ranking results. We assign proper weight coefficients to the three kinds of ranking scores according to their own importance, the sum of three weight coefficients is 1. The final results are returned to users.
4 The Ranking Algorithm
The ranking algorithm of the social search in this paper is to take Searching Score, Social Score and Spatial Score into account comprehensively. Each kind of score has a different weight coefficient according to their own importance. The whole process is that users input a query like “Starbucks coffee”, the ranking result is a list of POIs’ names about Starbucks and coffee, the aim in our research is fast speed and accurate results. The detailed description of the algorithm is given below.
Fig. 1. This is the figure displaying the model of social search 682 Y. Sun et al.