Multimedia question answering 6

135 0.75 NDCG@50-t NDCG@50 0.70 0.65 0.60 0.55 0.50 0.45 0.40 15 20 25 t 30 35 40 Figure 5.7: The performance with different t when K is fixed as 304 We first perform grid search with step size 1, to seek the t and K with optimal reranking performance 28 and 304 are located for t and K, respectively The NDCG@50-t curve is presented in Figure 5.7 with K fixed as 304 As illustrated, the performance increases with t growing and arrives at a peak at a certain t, then the performance sharply decreases, and finally becomes relatively constant This result is consistent with our previous analysis that when t tends towards infinite, all the starting points become indistinguishable Similarly, Figure 5.8 shows the NDCG@50-K curve with t fixed as 28, where the performance varies according to different K With the gradual increase of K, more relevant samples are connected to each other, and “incorrect” edges between the relevant samples and irrelevant samples are potentially introduced From Figure 5.8, it can be observed that NDCG@50 obtains the peak performance at K = 304, which is a trade-off value 136 0.75 NDCG@50-K 0.70 NDCG@50 0.65 0.60 0.55 0.50 0.45 0.40 0.35 240 250 260 270 280 290 300 310 320 330 340 350 K Figure 5.8: The performance with different K when t is fixed as 28 5.7 Applications In this section, we introduce two potential application scenarios of image reranking for complex queries: photo-based question answering and textual news visualization 5.7.1 Photo-based Question Answering Community question answering (cQA) services have gained great popularity over the past decades [102, 124, 29], which encourage askers to post their specific questions on any topic and obtain answers provided by other participants It also facilitates general users to seek information from the large repository of well-answered questions However, existing cQA forums, such as Y!A, Answerbag, MetaFilter, usually support only textual answers, which are not intuitive for many questions, such as the question “what is the difference between alligators and crocodiles” Even when the answer is described by several very long sentences in Y!A, it is still hard for users to grasp the appearance differences Here it reflects the fact that a picture is worth a thousand words However, noting that not all the QA pairs prefer image 137 Table 5.4: The distribution of visual concepts embedded in the generated queries for photo-based QA More Than One Visual Concepts Two Visual Concepts Two Visual Concepts 46.15% 38.85% 15.0% answers Textual answer is sufficient when it comes to the quantity-type questions, such as “what is the population in China” Also video answers will be much more lively and interesting for procedure-oriented questions, such as “how to assemble a computer ” Actually this is the so-called multimedia question answering [29], a rising topic in media search domain In this work, we only focus on the QA pairs which may be better explained with images However, as stated in [102], the queries generated from the textual QA pairs are usually very verbose and complex, not supported well by the current commercial image search engines Based on our proposed approach, we develop a photo-based QA system, which automatically complements the original textual answers with relevant web images To demonstrate the effectiveness of the PQA system, we conducted the experiment on 1000 non-conversational QA pairs, selected from Y!A dataset [124], which contains 4, 483, 032 QA pairs For each QA pair, five volunteers were invited to vote whether it can provide users with better experience by adding images instead of using purely texture descriptions Around 260 QA pairs were selected We then directly employed the method in [102] to generate a most informative query from each QA pair Our statistics are shown in Table 5.4, which show that more than 53% of queries contain two or more visual concepts Accordingly, a query-aware reranking approach is proposed to select the top 10 relevant images To be specific, if the query is simple, i.e., containing only one visual concept, then the RW [53] will be used directly On the other hand, if the query is complex, we employ the proposed NRCC We compare our proposed 138 Table 5.5: The distribution of the number of pictures involved in news documents Without Any Picture One Pictures Two Pictures More Than Two Pictures 46.15% 38.85% 15.0% 38.85% approach with the following methods • Naive Search: Simply perform image search with each query on Google Image without reranking • Naive Fusion: Simply perform image search with each visual concept in the generated complex query, and then fuse the results Figure 5.12 shows the comparison of these three methods It can be observed that our query-aware reranking approach outperforms the other two methods remarkably 5.7.2 Textual News Visualization “Every picture tells a story” suggests to us the essence of visual communication via pictures This phrase is also consistent with our common sense, i.e., pictures in textual news always facilitate and expedite our understanding, especially for elderly and juvenile Meanwhile, searching the image database in order to provide several meaningful and illustrative pictures to their textual news is a routine task for news writers However, the pictures contained in news documents are usually very few as shown in Table 5.5 which shows that more than 46% news documents not contain any pictures The statistical result is based on the experimental dataset To assist news readers and news writers, we propose a scheme to automatically seek relevant web images that best contextualize the content of news We directly used the news dataset in [79], crawled from ABCNews.com, BBS.co.uk, CNN.com and GoogleNews; it contains up to 48, 429 unique documents 139 after duplicate removal To save manual labelling efforts, we randomly select 100 news documents from the whole data set for evaluation It is observed that most of the news articles are fairly long, and it is not an easy task to extract descriptive queries So we simply regard the expert generated titles of the news documents as complex queries due to their obvious summarizing attribute Further, it is observed that more than 43% of titles contain at least one person-related visual concept So we propose to employ query dependent image representations for reranking Specifically, let Xc and X the set of images retrieved by the visual concept qc and complex query Q, respectively; and qc is predicted as person related query by the method in [35] Then for each image in Xc and X , we performed face detection We extracted the 256-dimensional Local Binary Pattern (LBP) features [110] from the largest face region for any xi in Xc ; and the same features are extracted for all the detected faces for any xu in X The similarity between xi and xu is then computed as, Wiu = max K(xi , x) x∈Ou (5.19) where Ou is the set of LBP features extracted from the faces in image xu Other image pair similarity is the same as previously introduced We call this the queryaware presentation method To demonstrate the effectiveness of our proposed query-aware image presentation method, we compare it with the query independent unified image presentation method as described earlier, i.e., all the images are presented by the combination of bag-of-visual-words and global features The result is presented in Figure 5.13, which shows that our query-aware image presentation is better than query-independent image presentation approach, even though both of them are based on our same reranking principles The initial ranking performance reflects lower search performance This is because the news titles generally contain some redundant terms, which overwhelm the key concepts and potentially confuse the 140 search engines 5.8 5.8.1 System Evaluation Data Presentation After reranking, we perform duplicate removal and present the images together with the textual answers, depending on the results of answer medium selection Figure 5.9 shows the multimedia answers for example queries 5.8.2 On Informativeness of Enriched Media Data In this work, all the complementary media data are collected based on textual queries, which are extracted from QA pairs and maybe somewhat biased away from the original meanings In other words, the queries not always reflect the original QA pairs’ intention In our above evaluation, NDCG is used to measure the relevance of the ranked images/videos to the generated query However, it cannot reflect how well these media data answer the original questions or enrich textual answers due to the fact that there is a gap between a QA pair and the generated query So, in addition to evaluating search relevance, we further define an informativeness measure to estimate how informative the media data can answer a question Specifically, there are three score candidates, i.e., 2, and The three scores indicate that the media sample can perfectly, partially and cannot answer the question, respectively We randomly select 300 QA pairs that have enriched media data for evaluation For each QA pair, we manually label the informativeness score of each enriched image or video by the previously introduced five labelers Figure 5.10 illustrates the distribution of the informativeness scores The results actually indicate that, for at least 79.57% questions, there exist enriched media data that can well answer the questions The average rating score is 1.248 141 Answering Who was the most talented member of NWA ? Original QA Question: Who was the most talented member of NWA? Description: Eazy E Dr Dre, Ice Cube, Arabian Prince, Yella, MC Ren? Best Answer(from Y!A): Ice Cube was the best talent - look at the longevity of his career Dre was the best producer and he's still in the business, but Cube is by far the best on mic/on screen personality Complementary Images (a) Answering How can I teach my son to tie his shoelaces ? Original QA Question: How can I teach my son to tie his shoelaces? Description: He just turned and is not at all interested in learning He can read… but won’t learn to tie his shoes Any suggestions for the stubborn lil ones ??? Best Answer(from Y!A): Bribes usually work Try using that stringy candy and tie bows in that – then he can eat it afterwards!!! I’ m not sure how effective that would be ComplementaryVideos (b) Answering Original QA Questions: What happened on September 11, 2001 ? Description: I Need it for a school project I need to know what happened Best Answer(From Y!A): Three buildings in the World Trade Centre were destroyed by controlled demolition, and a hole was blown by a missile, and the whole thing was blamed by the government on 19 Arabs whom were said to have hijacked four passenger jets What happened on September 11, 2001 ? Images: cQA Complement Videos: (c) Figure 5.9: Results of multimedia answering for example queries, “the most talented member of NWA”, “tie shoelace”, and “September 11” Our scheme answers the three questions with “text + image”, “text + video”, and “text + image + video”, respectively 142 ϵϬ͘ Ϭ ϳϱϵϳ͘Ϭ ϴϬ͘ Ϭ ϳϬ͘ Ϭ ϲϬ͘ Ϭ W ƌ ϰϬ͘ Ϭ Ž ď Ă ď ŝ ů ŝ ƚ Ǉ ϱϬ͘ Ϭ ϯϬ͘ Ϭ ϮϬ͘ Ϭ ϭϬ͘ Ϭ Ϭ Ϯ ϵ͘ ϭ ϴ͘ϭ ϳ͘ ϭ ϲ͘ ϭ ϱ͘ ϭ ϰ͘ ϭ ϯ͘ ϭ Ϯ͘ ϭ ϭ͘ ϭ ϭ ϵ͘ Ϭ ϴ͘ Ϭ ϳ͘ Ϭ ϲ͘ Ϭ ϱ͘ Ϭ ϰ͘ Ϭ ϯ͘ Ϭ Ϯ͘ Ϭ ϭ͘ Ϭ Ϭ ƐƐĞŶĞǀŝƚĂŵƌŽĨŶ/ ĞŐĂƌĞǀ Figure 5.10: The distribution of informativeness It is observed that for more than 79% of the questions the average informativeness scores of multimedia answers are above Table 5.6: The left part illustrates the average rating scores and standard deviation values comparison of textual QA before and after media data enrichment The right part illustrates the ANOVA test results Average and Variance The Factor of Schemes The Factor of Uses MMQA Textual cQA F -statistic p-value F -statistic p-value −4 2.25 ± 0.6184 1.15 ± 0.1342 21.09 × 10 0.31 0.9927 Table 5.7: Statistics of the comparison of our multimedia answer and the original textual answer The results of left part are based on the whole testing set While the right part statistics are conducted with exclusion of questions where only textualbased answers are sufficient Including the questions with Excluding the questions with pure textual answers pure textual answers Prefer Prefer Original Prefer Prefer Original Neutral Neutral MM answer textual answer MM answer textual answer 47.99% 49.01% 3.0% 88.66% 6.17% 5.54% 143 5.8.3 Subjective Test of Multimedia Answering We first conduct a user study from the system level Twenty volunteers that frequently use Y!A or WikiAnswers are invited They are from multiple countries and their ages vary from 22 to 31 They not know the researchers and also get no knowledge about which method developed by the researcher Each user is asked to freely browse the conventional textual answers and our multimedia answers for different questions they are interested in (that means, they are information seekers in this process) Then, they can provide their ratings of the two systems We adopt the following quantization approach: score is assigned to the worse scheme and the other scheme is assigned with score 1, and if it is comparable, better and much better than this one, respectively They are trained with the rules before coding: if the enriched media data are fairly irrelevant to the contextual content, users should assign to our scheme, because users are distractive rather than obtaining valuable visual information; Otherwise, these volunteers should assign to the original system The average rating scores and the standard deviation values are illustrated in Table 5.6 From the results we can clearly see the preference of users towards the multimedia answering We also perform a two-way ANOVA test and the results are illustrated in the right part of Table 5.6 The p-values demonstrate that the superiority of our system is statistically significant, but the difference among users is statistically insignificant We then conduct a more detailed study For each question in the testing set, we simultaneously demonstrate the conventional best answer and the multimedia answer generated by our approach Each user is asked to choose the preferred one Table 5.7 presents the statistical results From the left part of this table, it is observed that, in about 47.99% of the cases users prefer our answer and only in 3.0% of the cases they prefer the original answers But there are 49.01% neutral cases This is because there are many questions that are classified to be answered 144 Table 5.8: The classification accuracy of answer medium selection comparison between with and without textual answers ``` ``` Testing Set ``` Y!A WikiAnswers Both ``` Method `` ` With Textual Answers 81.72% 84.97% 83.49% Without Textual Answers 78.30% 82.01% 80.32% by only texts, and for these questions our answer and the original textual answer are the same If we exclude such questions, i.e., we only consider questions of which the original answer and our answer are different, then the statistics will turn to the right part of Table 5.7 We can see that for more than 88.66% of the questions, users will prefer the multimedia answers, i.e., the added image or video data are helpful For cases that users prefer original textual answers, it is mainly due to the irrelevant image or video contents 5.8.4 On the Absence of Textual Answer In our proposed scheme, the existing community-contributed textual answers play an important role in question understanding So, here a question is that whether the scheme can deal with and how it will perform when there is no textual answer For example, there may exist newly added questions that not have textual answers yet or not well answered in cQA forums From the introduction of the proposed scheme in Section 3.3, 3.4 and 3.5, we can see that it can easily deal with the cases that there is no textual answer Actually, we only need to remove the information clues from textual answers in the answer medium selection and multimedia query generation components Here we further investigate the performance of the scheme without textual answers We first observe answer medium selection When there is no textual answer, there will only be 7-D features for classification in the integration of multiple evidences (see Section 3.3.4) We compare the performance of answer medium se- 158 [41] S Gerard and B Christopher Term-weighting approaches in automatic text retrieval In Information Processing and Management, 1988 [42] S A Golder and B A Huberman Usage patterns of collaborative tagging systems Journal of Information Science, 2006 [43] M Gupta, R Li, Z Yin, and J Han Survey on social tagging techniques SIGKDD Explorations Newsletter, 2010 [44] F M Harper, D Moy, and A Joseph Facts or friends ?: distinguishing informational and conversational questions in social qa sites In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2009 [45] F M Harper, D Raban, S Rafaeli, and J A Konstan Predictors of answer quality in online qa sites In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2008 [46] C Hauff, L Azzopardi, and D Hiemstra The combination and evaluation of query performance prediction methods In Proceedings of the European Conference on Information Retrieval Research, 2009 [47] C Hauff, D Hiemstra, and F d Jong A survey of pre-retrieval query performance predictors In Proceedings of Conference on Information and Knowledge Management, 2008 [48] C Hauff, V Murdock, and R Baeza-Yates Improved query difficulty prediction for the web In Proceedings of Conference on Information and Knowledge Management, 2008 [49] B He and I Ounis Inferring query performance using pre-retrieval predictors In Proceedings of Symposium on String Processing and Information Retrieval, 2004 159 [50] Y M Hironobu, H Takahashi, and R Oka Image-to-word transformation based on dividing and vector quantizing images with words In International Workshop on Multimedia Intelligent Storage and Retrieval Management, 1999 [51] L Hirschman and R Gaizauskas Natural language question answering: The view from here Natural Language Engineering, 2001 [52] R Hong, M Wang, G Li, L Nie, Z.-J Zha, and T.-S Chua Multimedia question answering IEEE Multimedia, 2012 [53] W H Hsu, L S Kennedy, and S.-F Chang Video search reranking through random walk over document-level context graph In Proceedings of the ACM International Conference on Multimedia, 2007 [54] Y Huang, Q Liu, S Zhang, and D Metaxas Image retrieval via probabilistic hypergraph ranking In IEEE Conference on Computer Vision and Pattern Recognition, 2010 [55] Z Huang, M Thint, and Z Qin Question classification using head words and their hypernyms In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2008 [56] J Jeon, W B Croft, and J H Lee Finding semantically similar questions based on their answers In Proceedings of the International ACM SIGIR Conference, 2005 [57] J Jeon, W B Croft, and J H Lee Finding similar questions in large question and answer archives In Proceedings of the ACM International Conference on Information and Knowledge Management, 2005 160 [58] J Jeon, W B Croft, J H Lee, and S Park A framework to predict the quality of answers with non-textual features In Proceedings of the International ACM SIGIR Conference, 2006 [59] J Jeon, V Lavrenko, and R Manmatha Automatic image annotation and retrieval using cross-media relevance models In Proceedings of the International ACM SIGIR Conference, 2003 [60] Y Jing and S Baluja Visualrank: Applying pagerank to large-scale image search IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008 [61] G Kacmarcik Multi-modal question-answering: Questions without keyboards Asia Federation of Natural Language Processing, 2005 [62] F Kang, R Jin, and R Sukthankar Correlated label propagation with application to multi-label learning In IEEE Conference on Computer Vision and Pattern Recognition, 2006 [63] J W Kim, K Candan, and J Tatemura Cdip: Collection-driven, yet individuality-preserving automated blog tagging In International Conference on Semantic Computing, 2007 [64] G Kumaran and J Allan A case for shorter queries, and helping users create them In North American Chapter of the Association for Computational Linguistics, 2007 [65] G Kumaran and J Allan Effective and efficient user interaction for long queries In Proceedings of the International ACM SIGIR Conference, 2008 161 [66] M Lease, J Allan, and W B Croft Regression rank: Learning to meet the opportunity of descriptive queries In Proceedings of the European Conference on Information Retrieval Research, 2009 [67] Y.-S Lee, Y.-C Wu, and J.-C Yang Bvideoqa: Online english/chinese bilingual video question answering Journal of the American Society for Information Science and Technology, 2009 [68] M S Lew Content-based multimedia information retrieval: State of the art and challenges ACM Transactions on Multimedia Computing, Communications and Applications, 2006 [69] B Li, Y Liu, A Ram, E V Garcia, and E Agichtein Exploring question subjectivity prediction in community qa In Proceedings of the International ACM SIGIR Conference, 2008 [70] G Li, R Hong, Y.-T Zheng, S Yan, and T.-S Chua Learning cooking techniques from youtube In Proceedings of the International Conference on Advances in Multimedia Modeling, 2010 [71] G Li, H Li, Z Ming, R Hong, S Tang, and T.-S Chua Question answering over community contributed web video ACM International Conference on Multimedia, 2010 [72] H Li, M Wang, and X S Hua Msra-mm 2.0: A large-scale web multimedia dataset In IEEE International Conference on Data Mining Workshop, 2009 [73] J Li, N Allinson, D Tao, and X Li Multi-training support vector machine for image retrieval IEEE Transactions on Image Processing, 2006 162 [74] J Li and J Z Wang Automatic linguistic indexing of pictures by a statistical modeling approach IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003 [75] X Li and D Roth Learning question classifiers In Proceedings of the International Conference on Computational Linguistics, 2002 [76] X Li, C Snoek, and M Worring Annotating images by harnessing worldwide user-tagged photos In IEEE International Conference on Acoustics, Speech and Signal Processing, 2009 [77] X Li, C G M Snoek, M Worring, and A W M Smeulders Harvesting social images for bi-concept search IEEE Transactions on Multimedia, 2012 [78] Y Li, Y Luo, D Tao, and C Xu Query difficulty guided image retrieval system In Proceedings of International Conference on Advances in Multimedia Modeling, 2011 [79] Z Li, M Wang, J Liu, C Xu, and H Lu News contextualization with geographic and visual information In Proceedings of the ACM International Conference on Multimedia, 2011 [80] D Liu, X.-S Hua, M Wang, and H.-J Zhang Image retagging In Proceedings of the ACM International Conference on Multimedia, 2010 [81] D Liu, X.-S Hua, L Yang, M Wang, and H.-J Zhang Tag ranking In Proceedings of the International Conference on World Wide Web, 2009 [82] Q Liu, E Agichtein, G Dror, E Gabrilovich, Y Maarek, D Pelleg, and I Szpektor Predicting web searcher satisfaction with existing communitybased answers In Proceedings of the International ACM SIGIR Conference, 2011 163 [83] Y Liu and E Agichtein You’ve got answers: towards personalized models for predicting success in community question answering In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2008 [84] Y Liu, S Li, Y Cao, C.-Y Lin, D Han, and Y Yu Understanding and summarizing answers in community-based question answering services In Proceedings of the International Conference on Computational Linguistics, 2008 [85] Y Liu and T Mei Optimizing visual search reranking via pairwise learning IEEE Transactions on Multimedia, 2011 [86] Y Liu, T Mei, X.-S Hua, J Tang, X Wu, and S Li Learning to video search rerank via pseudo preference feedback In IEEE International Conference on Multimedia and Expo, 2008 [87] C D Manning, P Raghavan, and H Schtze Introduction to information retrieval Cambridge University Press, 2008 [88] M Manoj and J Elizabeth Information retrieval on internet using metasearch engines: A review Journal of Scientific and Industrial Research, 2008 [89] Z.-Y Ming, K Wang, and T.-S Chua Prototype hierarchy based clustering for the categorization and navigation of web collections In Proceedings of the International ACM SIGIR Conference, 2010 [90] G Mishne Autotag: a collaborative approach to automated tag assignment for weblog posts In Proceedings of the International Conference on World Wide Web, 2006 [91] D Moll´ and J L Vicedo Question answering in restricted domains: An a overview Computational Linguistics, 2007 164 [92] F Monay and D Gatica-Perez On image auto-annotation with latent space models In Proceedings of the ACM International Conference on Multimedia, 2003 [93] F Monay and D Gatica-Perez Plsa-based image auto-annotation: constraining the latent space In Proceedings of the ACM International Conference on Multimedia, 2004 [94] C Monz From document retrieval to question answer Ph.D Thesis, University of Amsterdam, 2003 [95] N Morioka and J Wang Robust visual reranking via sparsity and ranking constraints In Proceedings of the ACM International Conference on Multimedia, 2011 [96] J Mothe and L Tanguy Linguistic features to predict query difficulty In Proceedings of ACM International Proceedings of the International ACM SIGIR Conference Workshop, 2005 [97] S Narr, E W De Luca, and S Albayrak Extracting semantic annotations from twitter In Proceedings of the Workshop on Exploiting Semantic Annotations in Information Retrieval, 2011 [98] A P Natsev, A Haubold, J Teˇi´, L Xie, and R Yan Semantic conceptsc based query expansion and re-ranking for multimedia retrieval In Proceedings of the ACM International Conference on Multimedia, 2007 [99] A P Natsev, M R Naphade, and J TeˇiC Learning the semantics of multis´ media queries and concepts from a small number of examples In Proceedings of the ACM International Conference on Multimedia, 2005 165 [100] S.-y Neo, J Zhao, M.-y Kan, and T.-s Chua Video retrieval using high level features: Exploiting query matching and confidence-based weighting In Proceedings of the International Conference on Image and Video Retrieval, 2006 [101] L Nie, M Wang, Z.-J Zha, and T.-S Chua Oracle in image search: A content-based approach to performance prediction ACM Transactions on Information Systems, 2012 [102] L Nie, M Wang, Z.-j Zha, G Li, and T.-S Chua Multimedia answering: enriching text qa with media information In Proceedings of the International ACM SIGIR Conference, 2011 [103] L Nie, S Yan, M Wang, R Hong, and T.-S Chua Harvesting visual concepts for image search with complex queries In Proceedings of the ACM International Conference on Multimedia, 2012 [104] J H Park and W B Croft Query term ranking based on dependency parsing of verbose queries In Proceedings of the International ACM SIGIR Conference, 2010 [105] S Patwardhan Using wordnet-based context vectors to estimate the semantic relatedness of concepts In Proceedings of the European Association of Chinese Linguistics, 2006 [106] S U Pillai, T Suel, and S Cha The perron-frobenius theorem: some of its applications IEEE Signal Processing Magazine, 2005 [107] A Popescu Image retrieval using a multilingual ontology In Large Scale Semantic Access to Content (Text, Image, Video, and Sound), 2007 166 [108] A L Powell, J C French, J Callan, M Connell, and C L Viles The impact of database selection on distributed searching In Proceedings of the International ACM SIGIR Conference, 2000 [109] J Prager Open-domain question: answering Foundations and Trends in Information Retrieval, 2006 [110] H.-T Pu An analysis of failed queries for web image retrieval Journal of Information Science, 2008 [111] G.-J Qi, X.-S Hua, Y Rui, J Tang, T Mei, and H.-J Zhang Correlative multi-label video annotation In Proceedings of the ACM International Conference on Multimedia, 2007 [112] S A Quarteroni and S Manandhar Designing an interactive open domain question answering system Journal of Natural Language Engineering, 2008 [113] G Quńot, T P Tan, V B Le, S Ayache, L Besacier, and P Mulhem e Content-based search in multilingual audiovisual documents using the international phonetic alphabet Multimedia Tools and Applications, 2010 [114] E Selberg and O Etzioni Multi-service search and comparison using the metacrawler In Proceedings of the International Conference on World Wide Web, 1995 [115] C Shah and J Pomerantz Evaluating and predicting answer quality in community qa In Proceedings of the International ACM SIGIR Conference, 2010 [116] L Si and J Callan Modeling search engine effectiveness for federated search In Proceedings of the International ACM SIGIR Conference, 2005 167 [117] S Siegel and Castellan Nonparametric statistics for the social sciences New York: McGraw-Hill, 1988 [118] B Sigurbjărnsson and R van Zwol Flickr tag recommendation based on o collective knowledge In Proceedings of the International Conference on World Wide Web, 2008 [119] A Smeulders, M Worring, S Santini, A Gupta, and R Jain Content-based image retrieval at the end of the early years IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000 [120] L S Smith and A R Hurson A search engine selection methodology In Proceedings of the International Conference on Information Technology: Computers and Communications, 2003 [121] C G M Snoek and M Worring Concept-based video retrieval Foundations and Trends in Information Retrieval, 2009 [122] S C Sood and K J Hammond Tagassist: Automatic tag suggestion for blog posts In International Conference on Weblogs and Social, 2007 [123] S B Subramanya and H Liu Socialtagger - collaborative tagging for blogs in the long tail In Proceeding of the ACM Workshop on Search in Social Media, 2008 [124] M Surdeanu, M Ciaramita, and H Zaragoza Learning to rank answers on large online QA collections In Proceedings of the Association for Computational Linguistics, 2008 [125] M Szummer and T Jaakkola Partially labeled classification with markov random walks Advances in Neural Information Processing Systems, 2002 168 [126] A Tamura, H Takamura, and M Okumra Classification of multiple-sentence questions In International Joint Conference on Natural Language Processing, 2005 [127] J Tang, H Li, G.-J Qi, and T.-S Chua Image annotation by graph-based inference with integrated multiple/single instance representations IEEE Transactions on Multimedia, 2010 [128] X Tian, L Yang, J Wang, Y Yang, X Wu, and X.-S Hua Bayesian video search reranking In Proceedings of the ACM International Conference on Multimedia, 2008 [129] A Torralba, R Fergus, and W Freeman 80 million tiny images: A large data set for nonparametric object and scene recognition IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008 [130] K Wang, Z Ming, and T.-S Chua A syntactic tree matching approach to finding similar questions in community-based qa services In Proceedings of the International ACM SIGIR Conference, 2009 [131] M Wang, K Yang, X.-S Hua, and H.-J Zhang Towards a relevant and diverse search of social images IEEE Transactions on Multimedia, 2010 [132] R C Wang, N Schlaefer, W W Cohen, and E Nyberg Automatic set expansion for list question answering In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2008 [133] X.-J Wang, X Tu, D Feng, and L Zhang Ranking community answers by modeling question-answer relationships via analogical reasoning In Proceedings of the International ACM SIGIR Conference, 2009 169 [134] P Wu, S C.-H Hoi, P Zhao, and Y He Mining social images with distance metric learning for automated image tagging In Proceedings of the International Conference on Web Search and Data Mining, 2011 [135] W Wu, B Zhang, and M Ostendorf Automatic generation of personalized annotation tags for twitter users In The Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010 [136] Y.-C Wu, C.-H Chang, and Y.-S Lee Clvq: Cross-language video question/answering system IEEE International Symposium on Multimedia Software Engineering, 2004 [137] Y.-C Wu and J.-C Yang A robust passage retrieval algorithm for video question answering IEEE Transactions on Circuits and Systems for Video Technology, 2008 [138] Y Xiang, X Zhou, T.-S Chua, and C.-W Ngo A revisit of generative model for automatic image annotation using markov random fields In IEEE Conference on Computer Vision and Pattern Recognition, 2009 [139] X Xing, Y Zhang, and M Han Query difficulty prediction for contextual image retrieval In Proceedings of the European Conference on Information Retrieval Research, 2010 [140] H Xu, J Wang, X.-S Hua, and S Li Image search by concept map In Proceedings of the International ACM SIGIR Conference, 2010 [141] Z Xu, Y Fu, J Mao, and D Su Towards the Semantic Web: Collaborative Tag Suggestions In Proceedings of the International Conference on World Wide Web, 2006 170 [142] X Xue, J Jeon, and W B Croft Retrieval models for question and answer archives In Proceedings of the International ACM SIGIR Conference, 2008 [143] R Yan, E Hauptmann, and R Jin Multimedia search with pseudo-relevance feedback In Proceedings of the International Conference on Image and Video Retrieval, 2003 [144] C Yang, M Dong, and J Hua Region-based image annotation using asymmetrical support vector machine-based multiple-instance learning In IEEE Conference on Computer Vision and Pattern Recognition, 2006 [145] H Yang, L Chaisorn, Y Zhao, S.-Y Neo, and T.-S Chua Videoqa: question answering on news video In Proceedings of the ACM International Conference on Multimedia, 2003 [146] H Yang, T.-S Chua, S Wang, and C.-K Koh Structured use of external knowledge for event-based open domain question answering In Proceedings of the International ACM SIGIR Conference, 2003 [147] L Yang and A Hanjalic Supervised reranking for web image search In Proceedings of the ACM International Conference on Multimedia, 2010 [148] T Yeh, J J Lee, and T Darrell Photo-based question answering In Proceedings of the ACM International Conference on Multimedia, 2008 [149] H Yu and V Hatzivassiloglou Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2003 [150] J Yu, D Tao, and M Wang Adaptive hypergraph learning and its application in image classification IEEE Transactions on Image Processing, 2012 171 [151] J Yuan, Z.-J Zha, Y.-T Zheng, W Meng, X Zhou, and T.-S Chua Utilizing related samples to enhance interactive concept-based video search IEEE Transactions on Multimedia, 2011 [152] J Yuan, Z.-J Zha, Y.-T Zheng, M Wang, X Zhou, and T.-S Chua Learning concept bundles for video search with complex queries In Proceedings of the ACM International Conference on Multimedia, 2011 [153] D Zhang and W S Lee Question classification using support vector machines In Proceedings of the International ACM SIGIR Conference Conference, 2003 [154] J Zhang, R Lee, and Y J Wang Support vector machine classifications for microarray expression data set In Proceedings of the International Conference on Computational Intelligence and Multimedia Applications, 2003 [155] W Zhang, L Pang, and C.-W Ngo Snap-and-ask: answering multimodal question by naming visual instance In ACM Proceedings of the ACM International Conference on Multimedia, 2012 [156] Y Zhao, F Scholer, and Y Tsegay Effective pre-retrieval query performance prediction using similarity and variability evidence In Proceedings of the European Conference on Information Retrieval Research, 2008 [157] D Zhou, O Bousquet, T N Lal, J Weston, and B Schălkopf Learning with o local and global consistency In Advances in Neural Information Processing Systems, 2004 [158] D Zhou, J Huang, and B Schlkopf Learning with hypergraphs: Clustering, classification, and embedding In Advances in Neural Information Processing Systems, 2006 172 [159] Y Zhou and W B Croft Query performance prediction in web search environments In Proceedings of the International ACM SIGIR Conference, 2007 [160] G Zoltan, K Georgia, P Jan, and G.-M Hector Questioning yahoo! answers Technical report, Stanford InfoLab, 2007 ... Research 6. 1 Conclusions This thesis explored the question- aware multimedia question answering system in a penetrating way A systematic and novel framework suitable for answering given textual questions... reranking for complex queries: photo-based question answering and textual news visualization 5.7.1 Photo-based Question Answering Community question answering (cQA) services have gained great... answers can partially bridge the gap between questions 1 46 Naive Search Naive Fusion Query-Aware 0.90 NDCG@n 0.85 0.80 0.75 0.70 0 .65 0 .60 10 20 30 40 50 60 70 80 90 100 NDCG-Depth n Figure 5.12:

Định dạng
Số trang	38
Dung lượng	1,89 MB