Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 165 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
165
Dung lượng
1,36 MB
Nội dung
AdaptiveDualControlof Topic-Based InformationRetrieval A dissertation submitted in partial fulfilment of the requirement for the degree of Doctor of Philosophy by Vitaliy Vitsentiy M.Sc Ternopil Academy of National Economy, Ukraine School ofInformation Technology Faculty of Science and Technology Queensland University of Technology Brisbane, Australia 2009 ii Keywords topic-based information retrieval, dual control, stochastic programming iii iv Abstract InformationRetrieval is an important albeit imperfect component ofinformation technologies A problem of insufficient diversity of retrieved documents is one of the primary issues studied in this research This study shows that this problem leads to a decrease of precision and recall, traditional measures ofinformationretrieval effectiveness This thesis presents an adaptive IR system based on the theory ofadaptivedualcontrol The aim of the approach is the optimization ofretrieval precision after all feedback has been issued This is done by increasing the diversity of retrieved documents This study shows that the value of recall reflects this diversity The Probability Ranking Principle is viewed in the literature as the “bedrock” of current probabilistic InformationRetrieval theory Neither the proposed approach nor other methods of diversification of retrieved documents from the literature conform to this principle This study shows by counterexample that the Probability Ranking Principle does not in general lead to optimal precision in a search session with feedback (for which it may not have been designed but is actively used) Retrieval precision of the search session should be optimized with a multistage stochastic programming model to accomplish the aim However, such models are computationally intractable Therefore, approximate linear multistage stochastic programming models are derived in this study, where the multistage improvement of the probability distribution is modelled using the proposed feedback correctness method The proposed optimization models are based on several assumptions, starting with the assumption that InformationRetrieval is conducted in units of topics v The use of clusters is the primary reasons why a new method of probability estimation is proposed The adaptivedualcontrolof topic-based IR system was evaluated in a series of experiments conducted on the Reuters, Wikipedia and TREC collections of documents The Wikipedia experiment revealed that the dualcontrol feedback mechanism improves precision and S-recall when all the underlying assumptions are satisfied In the TREC experiment, this feedback mechanism was compared to a state-of-the-art adaptive IR system based on BM-25 term weighting and the Rocchio relevance feedback algorithm The baseline system exhibited better effectiveness than the cluster-based optimization model of ADTIR The main reason for this was insufficient quality of the generated clusters in the TREC collection that violated the underlying assumption vi Table of Contents Keywords iii Abstract v Table of Contents vii List of Tables xiii List of Figures xv List of Acronyms and Abbreviations xvii Basic Mathematical Notation xix Statement of Original Authorship xxiii Acknowledgements xxv I Introduction 1 Motivation Importance of IR Empirical Evidence of Problems in IR Uncertainty as the Main Cause of the Problems Feedback in IR and its Problems Summary Definition ofAdaptiveDual Topic-Based IR AdaptiveDual IR Topic-Based IR vii The Proposed Vision of IR 11 Summary 11 A Counterexample to PRP 12 Relevance for a Minority User 12 Expected Relevance across all Users 14 Summary 16 II Design of the Research 17 Methodology of the Research 17 Guidelines 17 Hypotheses of the Research 18 Contributions 19 Summary 21 Taxonomy of the Research Problem 21 InformationRetrieval 21 AdaptiveDualControl and Stochastic Programming 21 Theory of Algorithms 22 Artificial Intelligence 23 Machine Learning 23 Summary 24 Outline of the Further Narrative 24 III Review of Probabilistic and Topic-Based IR 26 Probabilistic Approaches to IR 26 Probability Ranking Principle 26 Probabilistic Models 27 Language models 28 Summary 30 viii Topic-Based IR 31 Latent Semantic Analysis 31 Cluster model 32 Probabilistic Latent Semantic Analysis 34 Latent Dirichlet Allocation 35 Summary 39 Feedback in Adaptive IR 40 Feedback for Vector Space Models 40 Feedback for Probabilistic Models 41 Feedback for Language Models 41 Feedback for LSA model 42 Summary 42 IV Review of Uncertainty-Related Methods 43 Problems of Uncertainty and Diversity in IR 43 Uncertainty in IR 43 Diversity in IR 45 Evaluation of Diversity in IR 46 Summary 47 Approaches to Tackle Uncertainty and Diversity in IR 48 Diversity Stimulation 48 Multicriterion Matching Scores 49 Active Learning 50 Reinforcement Learning 52 Summary 52 AdaptiveDualControl and Stochastic Programming 52 AdaptiveDualControl 52 ix Direct Methods 54 Indirect Methods 55 Stochastic Programming 56 Summary 58 V Relevance Estimation 60 Probability Estimation 60 Modelling Probabilities Based on Searched Features 60 The Language Modelling Approach 62 The Document Sampling Approach 62 Probabilistic User-based Model 65 Summary 67 Expected Relevance 67 The General Approach 67 Smoothing 67 Learning Topic-Relevance and Bias Coefficients 68 Feedback 70 Summary 71 VI Decision Optimization 73 Two-stage Stochastic Program 73 Optimization in Space of Documents 73 Optimization in the Space of Clusters 75 Approximate Formulation 77 Relaxed Approximate Formulation 78 Linear Approximate Formulation 80 Summary 80 Multistage Stochastic Program 81 x D Candidate User-based Model Functions Function Bias only Error Coefficients Range xig + k1 ⋅ k2 g xi ( ) + k ⋅ (x ) + k ⋅ (x ) xig + k1 ⋅ xig xig g i g i Beta probability density xig xig + k1 ⋅ log xig 13 ( p(x = xi | t ),α , β ) Relevance + Bias coefficients 13 Error Best 2.230230 0.0, 2.0, 0.1, 1.0, 2.0, 0.1 2.375605 0.0, 2.0, 0.1 1.4 1.2 2.375605 0.0, 2.0, 0.1, 0.0, 2.0, 0.1 2.382519 0.0, 3.0, 0.1, 0.0, 3.0, 0.1 2.774559 –– 0.2 0.0 2.774559 0.0, 3.0, 0.01 For every coefficient: minimal value, maximal value, increment 123 Coefficients Range Best 2.029739 0.0, 2.0, 0.1, 1.0, 2.0, 0.1 2.207589 0.0, 2.0, 0.1 0.2 1.3 0.2 0.0 –– 2.207589 0.0, 2.0, 0.1, 0.0, 2.0, 0.1 2.213101 0.0, 3.0, 0.1, 0.0, 3.0, 0.1 2.573006 –– –– 0.0 2.573006 0.0, 3.0, 0.01 0.0 0.2 2.5 0.1 0.2 2.5 0.1 E An Example of Program Output _solving for terms: 32698 1591 rel topics: rel documents: 2640 (14) 2912 (14) 2914 (43) 4206 (43) 4975 (43) 27770 (14) 27821 (14) 28000 (14) 28003 (14) 28004 (14) 28269 (14) 28271 (14) 49378 (42) 55338 (43) 65083 (14) 79540 (43) 89972 (14) 90284 (14) 91027 (48) 91136 (42) 91274 (42) 91627 (14) 91760 (43) 91761 (43) 91921 (14) 91964 (14) 92557 (14) 92852 (10) 101859 (14) 115444 (14) 117317 (14) 189836 (43) 193780 (14) 218358 (14) 224664 (14) 224868 (10) 256296 (26) 261428 (43) 266033 (14) 267905 (14) 271123 (14) 286702 (43) 298044 (43) 319495 (14) 409922 (14) 411651 (14) 413599 (14) 431931 (14) 436488 (14) 448306 (14) 451917 (10) 452198 (43) 462552 (14) 479146 (10) 486664 (43) 487111 (14) 487371 (43) 489538 (43) 514005 (14) 525528 (26) iteration topic solved # prob 0.000096 0.000000 0.004325 0.000848 0.000073 0.000190 0.000000 0.001574 0.000000 0.056592 10 0.000101 11 0.000110 12 0.010588 13 0.515721 14 0.000075 15 0.000000 16 0.000109 17 0.000526 18 0.000177 19 0.000000 20 0.000084 21 0.000076 22 0.000084 23 0.003723 24 0.000088 25 0.021728 26 0.000000 27 0.000093 28 0.000000 29 0.000106 30 0.000070 31 0.000073 32 0.000000 33 0.000000 nRetr 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 9.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 topicID 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 nResourse 313 48 10 97 2258 549 14684 1 24 17 1 200 1118 12 1 0 124 34 0.000106 0.000000 35 35 0.000092 0.000000 36 36 0.000000 0.000000 37 37 0.062289 0.000000 38 38 0.000000 0.000000 39 39 0.000811 0.000000 40 40 0.000000 0.000000 41 41 0.022099 0.000000 42 42 0.258033 1.000000 43 43 0.000423 0.000000 44 44 0.000000 0.000000 45 45 0.000000 0.000000 46 46 0.000078 0.000000 47 47 0.038763 0.000000 48 48 0.000077 0.000000 49 49 0.000000 0.000000 50 retrieved docs docID topicID relevance 120824 14 2.77361 95553 14 2.59178 27508 14 2.49512 14192 14 2.49512 12782 14 2.49512 56385 14 2.46809 56885 14 2.375 75027 14 2.25236 65839 14 2.2441 473851 43 2.47776 retrieved 10 docs 473851 (43) 65839 (14) 75027 (14) 56885 (14) (14) 95553 (14) 120824 (14) terms: 32698 1591 precision 0.000000 recall 0.000000 33.23 s iteration topic solved # prob 0.004416 0.004082 0.019201 0.007046 0.004337 0.004745 0.004082 0.009585 0.004082 0.201932 10 0.004434 11 0.004465 12 0.041097 13 0.004082 14 0.004343 15 0.004082 16 0.004462 17 0.005921 18 0.004700 19 0.004082 20 0.004376 21 0.004348 22 0.004376 23 0.017098 24 0.004390 25 0.080044 26 0.004082 27 0.004408 28 0.004082 29 0.004451 30 0.004327 2541 71 1170 9060 39 0 1408 act relevance 0 0 0 0 0 56385 (14) nRetr 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 5.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 topicID 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 nResourse 313 48 10 97 2258 549 14675 1 24 17 1 200 1118 12 125 12782 (14) 14192 (14) 27508 31 0.004339 0.000000 32 32 0.004082 0.000000 33 33 0.004082 0.000000 34 34 0.004454 0.000000 35 35 0.004404 0.000000 36 36 0.004082 0.000000 37 37 0.221850 1.000000 38 38 0.004082 0.000000 39 39 0.006917 0.000000 40 40 0.004082 0.000000 41 41 0.081340 1.000000 42 42 0.009025 0.000000 43 43 0.005561 0.000000 44 44 0.004082 0.000000 45 45 0.004082 0.000000 46 46 0.004355 0.000000 47 47 0.139600 1.000000 48 48 0.004349 0.000000 49 49 0.004082 0.000000 50 retrieved docs docID topicID relevance 109899 10 2.10094 419937 10 2.03879 480241 10 2.02461 143446 10 2.02101 398879 10 2.01405 11579 13 1.943 452203 26 2.05277 136218 38 2.06143 68301 42 1.96463 33519 48 2.07049 retrieved 10 docs 33519 (48) 68301 (42) 136218 (38) 452203 (26) 480241 (10) 419937 (10) 109899 (10) terms: 32698 1591 precision 0.000000 recall 0.000000 17.52 s iteration topic solved # prob 0.018599 0.017499 0.067201 0.027244 0.018338 0.019679 0.017499 0.035591 0.017499 0.004082 10 0.018658 11 0.018758 12 0.009329 13 0.017499 14 0.018357 15 0.017499 16 0.018748 17 0.023545 18 0.019532 19 0.017499 20 0.018468 21 0.018376 22 0.018467 23 0.060287 24 0.018512 25 0.009329 26 0.017499 27 0.018571 0 2541 71 1170 9059 39 0 1408 act relevance 0 0 0 0 0 11579 (13) nRetr 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000 topicID 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 nResourse 313 48 10 97 2253 548 14675 1 24 17 1 200 1117 126 398879 (10) 143446 (10) 28 0.017499 0.000000 29 29 0.018714 0.000000 30 30 0.018306 0.000000 31 31 0.018344 0.000000 32 32 0.017499 0.000000 33 33 0.017499 0.000000 34 34 0.018722 0.000000 35 35 0.018557 0.000000 36 36 0.017499 0.000000 37 37 0.009329 0.000000 38 38 0.017499 0.000000 39 39 0.026820 1.000000 40 40 0.017499 0.000000 41 41 0.009329 1.000000 42 42 0.033750 1.000000 43 43 0.022361 0.000000 44 44 0.017499 0.000000 45 45 0.017499 0.000000 46 46 0.018399 0.000000 47 47 0.009329 1.000000 48 48 0.018379 0.000000 49 49 0.017499 0.000000 50 retrieved docs docID topicID relevance 408546 1.97124 107618 1.92927 90848 14 2.11752 53333 18 1.02339 141354 24 1.67508 413804 26 2.03918 82190 40 1.47736 80427 42 1.95278 490804 43 2.19617 8767 48 2.07049 retrieved 10 docs 8767 (48) 490804 (43) 80427 (42) 82190 (40) 90848 (14) 107618 (8) 408546 (3) terms: 32698 1591 precision 0.000000 recall 0.000000 4.27 s iteration topic solved # prob 0.023550 0.022399 0.009635 0.032600 0.023277 0.024681 0.022399 0.009635 0.022399 0.008354 10 0.023612 11 0.023717 12 0.013847 13 0.009635 14 0.023297 15 0.022399 16 0.023706 17 0.009635 18 0.024527 19 0.022399 20 0.023413 21 0.023316 22 0.023412 23 0.009635 24 0.023459 12 1 0 2540 71 1169 9059 39 0 1407 act relevance 0 0 0 0 0 413804 (26) nRetr 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 topicID 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 nResourse 312 48 10 96 2253 548 14674 1 23 17 1 199 127 141354 (24) 53333 (18) 25 0.009635 7.000000 26 1116 26 0.022399 0.000000 27 27 0.023521 2.000000 28 28 0.022399 0.000000 29 29 0.023671 0.000000 30 12 30 0.023244 0.000000 31 31 0.023283 0.000000 32 32 0.022399 0.000000 33 33 0.022399 0.000000 34 34 0.023679 0.000000 35 35 0.023507 0.000000 36 36 0.022399 0.000000 37 37 0.013847 0.000000 38 2540 38 0.022399 0.000000 39 39 0.009635 0.000000 40 70 40 0.022399 0.000000 41 41 0.009635 0.000000 42 1168 42 0.009635 0.000000 43 9058 43 0.027488 0.000000 44 39 44 0.022399 0.000000 45 45 0.022399 0.000000 46 46 0.023341 0.000000 47 47 0.009635 0.000000 48 1406 48 0.023320 0.000000 49 49 0.022399 0.000000 50 retrieved docs docID topicID relevance act relevance 73216 25 0.478847 525175 26 2.01996 433112 26 2.01384 479212 26 2.00459 459849 26 1.90883 477625 26 1.90265 409194 26 1.89928 455969 26 1.89655 73252 28 0.636823 100372 28 0.047518 retrieved 10 docs 100372 (28) 73252 (28) 455969 (26) 409194 (26) 477625 (26) 459849 (26) 433112 (26) 525175 (26) 73216 (25) terms: 32698 1591 precision 0.000000 recall 0.000000 1.40 s 128 479212 (26) Bibliography Ambiguous queries: test collections need more sense Sanderson, Mark Singapore : s.n., 2008 31st Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 499-506 Multitasking during Web search sessions Spink, Amanda, Park, Minsoo and Jansen, Bernard J 1, 2006, Information Processing and Management, Vol 42, pp 264-275 Croft, Bruce W What people want from informationretrieval d-lib Magazine [Online] November 1995 [Cited: 27 November 2008.] http://www.dlib.org/dlib/november95/11croft.html Filatov, Nikolai Michailovich and Unbehauen, Heinz AdaptiveDualControl s.l : Springer, 2004 Dual Interactive InformationRetrieval Vitsentiy, Vitaliy Brisbane : s.n., 2006 11th Australasian Document Computing Symposium p 89 — Vitsentiy, Vitaliy 2, 2006, Australian Journal of Intelligent Information Processing Systems, Vol 9, p 89 A decision making model for dual interactive informationretrieval Vitsentiy, Vitaliy 2006 International conference on systems, computing sciences and software engineering Vitsentiy, Vitaliy A decision maing model for dual interactive informationretrieval Advances and innovations in systems, computing sciences and software engineering s.l : Springer Netherlands, 2007, pp 475-480 A user interface of relevance feedback for interactive informationretrieval systems Vitsentiy, Vitaliy 2007 IEEE international workshop on intelligent data acquisition and advanced computing systems: technology and applications pp 449-453 129 10 Robertson, Steven E The probability ranking principle in IR [book auth.] Karen Spark Jones and Peter Willet Readings in InformationRetrieval San Francisco : Morgan Kaufman Publishers, inc., 1997, pp 281-286 11 Beyond independent relevance: methods and evaluation metrics for subtopic retrieval Zhai, ChengXiang, Cohen, William W and Lafferty, John Toronto : s.n., 2003 26th Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 10-17 12 Novelty and Diversity in InformationRetrieval Evaluation Clarke, Charles L A., et al Singapore : s.n., 2008 31st Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 659-666 13 DualControl Theory I-IV Fel'dbaum, A.A 1960-1961, Automation and Remote Control, Vols 21, 22, pp 874-880, 1033-1039, 1-12, 109-121 14 Ruszczynski, Andrsej and Shapiro, Alexander Stochastic programming models Stoachastic Programming s.l : Elsevier, 2003, pp 1-64 15 Sutton, Richard S and Bartko, Andrew G Reinforcement learning s.l : The MIT Press, 1998 16 Less is more Chen, Harr and Karger, David R Seattle, Washington : s.n., 2006 29th Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 429-436 17 A probability ranking principle for interactive informationretrieval Fuhr, Norbert 3, s.l : Springer, 2008, Information Retrieval, Vol 11, pp 251-265 18 A probabilistic model ofinformation retrieval: development and comparative experiments Part Sparck Jones, K., Walker, S and Robertson, S.E 6, 2000, Information Processing and Management, Vol 36, pp 779-808 19 Relevance-based language models Lavrenko, Victor and Croft, W Bruce New Orleans : s.n., 2001 24th Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 120-127 20 Aprobabilistic model ofinformation retrieval: development and comparative experiments Part2 Sparck Jones, K., Walker, S and Robertson, S.E 6, 2000, Information Processing and Management, Vol 36, pp 809-840 21 Okapi at TREC-3 Robertson, S E., et al 1995 Third Text Retrieval Conference pp 109126 130 22 A language modeling approach to informationretrieval Ponte, Jay M and Croft, W Bruce 1998 21st Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 275-281 23 A general language model for informationretrieval Song, Fei and Croft, W Bruce 1999 22nd Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 316-321 24 Document language models, query models, and risk minimization for informationretrieval Lafferty, John and Zhai, Chengxiang New Orleans : s.n., 2001 24th Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 111-119 25 Jelinek, F and Mercer, R Interpolated estimation of Markov source parameters from sparce data [book auth.] E.S Gelsema and L.N Kanal Pattern Recognition in Practice Amsterdam : s.n., 1980, pp 381-402 26 On structuring probabilistic dependencies in stochastic language modeling Ney, H., Essen, U and Knesser, R 1994, Computer Speech and Language, Vol 8, pp 1-38 27 Risk minimization and language modeling in text retrieval Zhai, C s.l : Carnegie Mellon University, 2002 PhD thesis 28 Topicbased language models for ad hoc informationretrieval Azzopardi, L., Girolami, M and van Rijsbergen, C J Budapest, Hungary : s.n., 2004 International Joint Conference on Neural Networks pp 3281-3286 29 A comparative study of probabilistic and language models for informationretrieval Bennett, Graham, Scholer, Falk and Uitdenbogerd, Elexandra Wollongong : s.n., 2008 19th Australasian Database Conference pp 65-74 30 Deerwester, Scott, et al 4839853 USA, 1988 31 An introduction to latent semantic analysis Landauer, Thomas K., Foltz, Peter W and Laham, Darrel 1998, Discourse Processes, Vol 25, pp 259-284 32 Dumais, Susan T LSA and information retrieval: getting back to basics [book auth.] Thomas K Landauer, et al Handbook of Latent Semantic Analysis NewJersy, London : Lawrence Erlbaum Associates, Publishers, 2007, pp 293-321 33 Indexing by latent semantic analysis Deerwester, S, et al 6, 1990, Journal of the American Society for Information Science, Vol 41, pp 391-407 131 34 Cluster-based retrieval using language models Liu, Xiaoyong and Croft, Bruce W Sheffield, South Yorkshire : s.n., 2004 27th Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 186-193 35 Representing clusters for retrieval Liu, Xiaoyong and Croft, Bruce W Seattle, Washington : s.n., 2006 29th Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 671-672 36 The opposite of smoothing: a language model approach to ranking query-specific document clusters Kurland, Oren Singapore : s.n., 2008 31st Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 171-178 37 Probabilistic latent semantic analysis Hofmann, Thomas Stockholm : s.n., 1999 Uncertainty in Artificial Intelligence 38 Probabilistic latent semantic indexing Hofmann, Thomas 1999 22nd Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 50-57 39 Latent dirichlet allocation Blei, David M., Ng, Andrew Y and Jordan, Michael I 2003, Journal of Machine Learning Research, Vol 3, pp 993-1022 40 On an equivalence between plsi and lda Rirolami, M and Kaban, A Toronto, Canada : s.n., 2003 26th Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 433-434 41 Steyvers, Mark and Griffiths, Tom Probabilistic topic models [book auth.] Thomas K Landauer, et al Handbook of Latent Semantic Analysis New Jersey, London : Lawrence Erlbaum Associates, Publishers, 2007, pp 427-448 42 LDA-based document models for ad-hoc retrieval Wei, Xing and Croft, Bruce W Seattle, Washington : s.n., 2006 29th Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 178-185 43 Rocchio, J.J Relevance Feedback in InformationRetrieval The SMART Retrieval System Experiments in Automatic Document Processing s.l : Prentice Hall, 1971, pp 313-323 44 A vector-space model for automatic indexing Salton, G., Yang, C.S and Wong, A 11, 1975, Communications of the ACM, Vol 18, pp 613-620 45 Ide, E New experiments in relevance feedback The SMART retrieval system experiments in automatic document processing s.l : Prentice Hall, 1971, pp 337-354 132 46 Relevance weighting of search terms Robertson, S.E and Jones, Spark K 3, 1976, Journal of American Society for Information Science, Vol 27, pp 129-146 47 Salton, Gerald and Buckley, Chris Improving retrieval performance by relevance feedback [book auth.] Karen Sparck Jones and Peter Willet Readings in InformationRetrieval San Francisco : Morgan Kaufman Publishers, inc., 1997, pp 355-364 48 Optimizatio in CLARIT TREC-8 adaptive filtering Zhai, Chengxiang, et al 1999 TREC 49 Model-based feedback in the language modeling approach to informationretrieval Zhai, Chengziang and Lafferty, John 2001 ACM 10th Conference on Information and Knowledge Management pp 403-410 50 Term feedback for informationretrieval with language models Tan, Bin, et al Amsterdam : s.n., 2007 30th Annual ACM SIGIR Conference on Research and Development in InformationRetrieval 51 Martin, Dian I and Berry, Michael W Mathematical foundations behind latent semantic analysis [book auth.] Thomas K Landauer, et al Handbook of Latent Semantic Analysis New Jersey, London : Lawrence Erlbaum Associates, Publishers, 2007, pp 35-55 52 The ostensive model of developing information-needs Campbell, Ian s.l : University of Glasgow, 2000 PhD Thesis 53 Vagueness and uncertainty in information retrieval: how can fuzzy sets help? Kraft, Donald H., Pasi, Gabriella and Bordogna, Gloria Kolkata : s.n., 2006 Proceedings of the 2006 international workshop on research issues in digital libraries 54 Exploratory search: from finding to understanding Marchionini, Gary 4, 2006, Communications of the ACM, Vol 49, pp 41-46 55 Creativity support: information discovery and exploratory search Koh, Eunyee, Kerne, Andruid and Hill, Rodney Amsterdam : s.n., 2007 30th Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 895-896 56 Novelty and redundancy detection in adaptive filtering Zhang, Yi, Callan, Jamie and Minka, Thomas Tampere : s.n., 2002 25th Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 81-88 57 An information-pattern-based approach to novelty detection Li, Xiaoyan and Croft, W Bruce 2008, Information Processing and Management, Vol 44, pp 1159-1188 133 58 Recent trends in hierarchical document clustering : a critical review Willet, P s.l : 24, 1988, Information Processing and Management, Vol 5, pp 577-595 59 Information task switching and multitasking web search Spink, Amanda, Park, M and Jansen, B J 2004 ASIS&T Annual Meeting pp 213-217 60 A basis for informationretrieval in context Melucci, Massimo 3, 2008, ACM Transactions on Information Systems, Vol 26, pp 14:1-14:41 61 Cumulated gain-based evaluation of IR techniques Jarvelin, Kalervo and Kekalainen, Jaana 4, October 2002, ACM Transactions on Information Systems, Vol 20, pp 422-446 62 The relationship between IR effectiveness measures and user satisfaction Al-Maskari, Azzah, Sanderson, Mark and Clough, Paul Amsterdam : s.n., 2007 30th Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 773-774 63 A topic model for word sense disambiguation Boyd-Graber, Jordan, Blei, David and Zhu, Xiaojin Prague, Czech Republic : Association for Computational Linguistics, 2007 Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning pp 1024-1033 64 Predicting diverse subsets using structural SVMs Yue, Yisong and Joachims, Thorsten Helsinki, Finland : s.n., 2008 25th International Conference on Maching Learning pp 1224-1231 65 The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries Carbonell, J and Goldstein, J 1998 21st Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 335-336 66 Improving web search results using affinity graph Zhang, Benyu, et al Salvador, Brasil : s.n., 2005 28th Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 504-511 67 Novelty and topicality in interactive informationretrieval Xu, Yunjie and Yin, Hainan 2, 2008, Journal of the American Society for Information Science and Technology, Vol 59, pp 201-215 68 Microsoft Cambridge at TREC-12: HARD track Robertson, S.E., Zaragoza, H and Taylor, M 2003 Text REtrieval Conference pp 418-425 134 69 Active feedback in ad hoc informationretrieval Shen, Xuehua and Zhai, ChengXiang Salvador, Brasil : s.n., 2005 28th Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 59-66 70 A cross-collection mixture model for comparative text mining Zhai, C., Velivelli, A and Yu, B 2004 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp 743-748 71 Learning diverse rankings with multi-armed bandits Radlinski, Filip, Kleinberg, Robert and Joachims, Thorsten Helsinki, Finland : s.n., 2008 25th International Conference on Machine Learning pp 784-791 72 Stochastic dynamic programming: caution and probing Bar-Shalom, Y 5, 1981, IEEE Transactions on Automatic Control, Vol 26, pp 1184-1195 73 Bar-Shalom, Y and Tse, E Concepts and methods in stochastic control [book auth.] C.T Leondes Control and Dynamic Systems New York : Academic Press, 1976, pp 99-172 74 Dualcontrolbased on approximate a posteriori density functions Alspach, D.L 1972, IEEE Transactions on Automatic Control, Vol 17, pp 689-693 75 Aoki, M Optimization of stochastic systems: topics in discrete time dynamics San Diego : Academic Press, 1989 76 Fel'dbaum, A A Optimal Control Systems New York : Academic Press, 1965 77 Stochastic iterative dynamic programming: a Monte Carlo approach to dualcontrol Thompson, A.M and Cluett, W.R 2005, Automatica, Vol 41, pp 767-778 78 Wide-sense adaptivecontrol for nonlinear stochastic systems Tse, E., Bar-Shalom, Y and Meier, L 2, 1973, IEEE Transactions on Automatic Control, Vol 18, pp 98-108 79 Adaptivedualcontrol systems: a survey Unbenhauen, H 2000 IEEE Symposium on Adaptive Systems for Signal Processing, Communication and Control pp 171-180 80 Dualadaptivecontrolof nonlinear stochastic systems using neural networks Fabri, S and Kadirkamathan, V 2, 1998, Automatica, Vol 34, pp 245-253 81 Dual adaptie control for trajectory tracking of mobile robots Bugeja, M.K and Fabri, S.G Rome, Italy : s.n., 2007 IEEE International Conference on Robotics and Automation pp 2215-2220 135 82 Experimental implementation and validation ofdualadaptivecontrol for mobile robots Bugeja, Marvin K and Fabri, Simon G Manchester, UK : s.n., 2008 UKACC Control Conference 83 Goodwin, G.C and Payne, R.L Dynamic system identification: experiment design and data analysis New York : Academic Press, 1977 84 An innovations approach to dualcontrol Milito, R., et al 1, 1982, IEEE Transactions on Automatic Control, Vol 27, pp 133-137 85 Dualcontrol for an unstable mechanical plant Filatov, N.M., Keuchel, U and Unbehauen, H 4, 1996, IEEE Control Systems Magazine, Vol 16, pp 31-37 86 Caution and probing in stochastic control Jacobs, O.L.R and Patchell, J.W 1972, International Journal of Control, Vol 16, pp 189-199 87 Adaptivedualcontrol methods: an overview Wittenmark, B 1995 5th IFAC Symposium on Adaptive Systems in Control and Signal Processing pp 67-72 88 A class of stochastic programs with decision dependent uncertainty Goel, V and Grossmann, I.E 2-3, 2006, Mathematical Programming, Vol 108, pp 355-394 89 Training linear SVMs in linear time Joachims, Thorsten 2006 ACM Conference on Knowledge Discovery and Data Mining pp 217-226 90 A statistical interpretation of term specificity and its application in retrieval Jones, Karen Sparck 1972, Journal of Documentation, Vol 28, pp 11-21 91 Understanding inverse document frequency: on theoretical arguments for IDF Robertson, Stephen 2004, Journal of Documentation, Vol 60, pp 503-520 92 K-means++: the advantages of careful seeding Arthur, D and Vassilvitskii, S 2007 ACM-SIAM symposium on Discrete algorithms pp 1027-1035 93 The Reuters collection Sanderson, Mark 1994 16th BCS IRSG colloquium 94 Incorporating user search behavior into relevance feedback Ruthven, I., Lalmas, M and van Rijsbergen, K 6, 2003, Journal of the American Society for Information Science and Technology, Vol 54, pp 529-549 95 Large-scale informationretrieval with latent semantic indexing Letsche, Todd A and Berry, Michael W 1997, Information Sciences - Applications, Vol 100, pp 105-137 136 96 Spectral geometry for simultaneously clustering and ranking query search results Liu, Ying, et al Singapore : s.n., 2008 31st Annual ACM SIGIR Conference on Research and Development in InformationRetrieval pp 539-546 97 Ruszczynski, A and Shapiro, A., [ed.] Stochastic Programming s.l : Elsevier, 2003 137 ... List of Acronyms and Abbreviations ADIR Adaptive Dual Information Retrieval ADTIR Adaptive Dual Topic- Based Information Retrieval DB Database EM Expectation Maximization IN Information Need IR Information. .. that Information Retrieval is conducted in units of topics v The use of clusters is the primary reasons why a new method of probability estimation is proposed The adaptive dual control of topic- based. .. measures of information retrieval effectiveness This thesis presents an adaptive IR system based on the theory of adaptive dual control The aim of the approach is the optimization of retrieval