manning schuetze statisticalnlp phần 10 ppsx

16.5 Further Reading 607 16.5 Further Reading The purpose of this chapter is to give the student interested in classification for NLP some orientation points. A recent in-depth introduction to machine learning is (Mitchell 1997). Comparisons of several learning algorithms applied to text categorization can be found in (Yang 1999), (Lewis et al. 1996), and (Schtitze et al. 1995). The features and the data representation based on the features used in this chapter can be downloaded from the book ’s website. Some important classification techniques which we have not covered are: logistic regression and linear discriminant analysis (Schutze et al. 1995); decision lists, where an ordered list of rules that change the classification is learned (Yarowsky 1994); winnow, a mistake-driven online linear threshold learning algorithm (Dagan et al. 1997a); and the Rocchio algorithm (Rocchio 1971; Schapire et al. 1998). N AIVE BAYES Another important classification technique, Naive Buyes, was introduced in section 7.2.1. See (Domingos and Pazzani 1997) for a discussion of its properties, in particular the fact that it often does surprisingly well even when the feature independence assumed by Naive Bayes does not hold. Other examples of the application of decision trees to NLP tasks are parsing (Magerman 1994) and tagging (S&mid 1994). The idea of using held out training data to train a linear interpolation over all the distri- butions between a leaf node and the root was used both by Magerman (1994) and earlier work at IBM. Rather than simply using cross-validation to determine an optimal tree size, an alternative is to grow multiple decision trees and then to average the judgements of the individual trees. BAGGING Such techniques go under names like bagging and boosting, and have re- BOOSTING cently been widely explored and found to be quite successful (Breiman 1994; Quinlan 1996). One of the first papers to apply decision trees to text categorization is (Lewis and Ringuette 1994). :IMUM ENTROPY Jelinek (1997: ch. 13-14) provides an in-depth introduction to maxi- MoDELrNo mum entropy modeling. See also (Lau 1994) and (Ratnaparkhi 199713). Darroch and Ratcliff (197.2) introduced the generalized iterative scaling procedure, and showed its convergence properties. Feature selection algorithms are described by Berger et al. (1996) and Della Pietra et al. (1997). Maximum entropy modeling has been used for tagging (Ratnaparkhi 1996), text segmentation (Reynar and Ratnaparkhi 1997), prepositional 608 16 Text Categorization phrase attachment (Ratnaparkhi 1998), sentence boundary detection (Mikheev 1998), determining coreference (Kehler 1997), named entity recognition (Borthwick et al. 1998) and partial parsing (Skut and Brants 1998). Another important application is language modeling for speech recognition (Lau et al. 1993; Rosenfeld 1994,1996). Iterative proportional fitting, a technique related to generalized iterative scaling, was used by Franz (1996, 1997) to fit loglinear models for tagging and prepositional phrase attachment. NEURAL NETWORKS Neural networks or multi-layer perceptrons were one of the statistical techniques that revived interest in Statistical NLP in the eighties based on work by Rumelhart and McClelland (1986) on learning the past tense of English verbs and Elman ’s (1990) paper “F inding Structure in Time,” an attempt to come up with an alternative framework for the conceptu- alization and acquisition of hierarchical structure in language. Introduc- tions to neural networks and backpropagation are (Rumelhart et al. 1986), (McClelland et al. 1986), and (Hertz et al. 1991). Other neural network research on NLP problems includes tagging (Benello et al. 1989; Schiitze 1993) sentence boundary detection (Palmer and Hearst 1997), and parsing (Henderson and Lane 1998). Examples of neural networks used for text categorization are (Wiener et al. 1995) and (Schiitze et al. 1995). Mi- ikkulainen (1993) develops a general neural network framework for NLP. The Perceptron Learning Algorithm in figure 16.7 is adapted from (Lit- tlestone 1995). A proof of the perceptron convergence theorem appears in (Minsky and Papert 1988) and (Duda and Hart 1973: 142). KNN, or memory-based leaming as it is sometimes called, has also been applied to a wide range of different NLP problems, including pronuncia- tion (Daelemans and van den Bosch 1996), tagging (Daelemans et al. 1996; van Halteren et al. 1998), prepositional phrase attachment (Zavrel et al. 1997), shallow parsing (Argamon et al. 1998), word sense disambiguation (Ng and Lee 1996) and smoothing of estimates (Zavrel and Daele- mans 1997). For KNN-based text categorization see (Yang 1994), (Yang 1995), (Stanfill and Waltz 1986; Masand et al. 1992), and (Hull et al. 1996). Yang (1994, 1995) suggests methods for weighting neighbors according to their similarity. We used cosine as the similarity measure. Other com- mon metrics are Euclidean distance (which is different only if vectors are not normalized, as discussed in section 8.5.1) and the Value Difference Metric (Stanfill and Waltz 1986). Tiny Statistical Tables THESE TINY TABLES are not a substitute for a decent statistics text- book or computer software, but they give the key values most commonly needed in Statistical NLP applications. Standard normal distribution. Entries give the proportion of the area under a standard normal curve from oc) to z for selected values of z. Z -3 -2 -1 0 1 2 3 Froaortion 0.0013 0.023 0.159 0.5 0.841 0.977 0.9987 (Student ’s ) t test critical values. A t distribution with d.f. degrees of freedom has percentage C of the area under the curve between -t* and t* (two-tailed), and proportion p of the area under the curve between t* and 03 (one tailed). The values with infinite degrees of freedom are the same as critical values for the z test. P 0.05 0.025 0.01 0.005 0.001 0.0005 C 90% 95% 98% 99% 99.8% 99.9% d.f. 1 6.314 12.71 31.82 63.66 318.3 636.6 10 1.812 2.228 2.764 3.169 4.144 4.587 20 1.725 2.086 2.528 2.845 3.552 3.850 (z) cXJ 1.645 1.960 2.326 2.576 3.091 3.291 x2 critical values. A table entry is the point x2* with proportion p of the area under the curve being in the right-hand tail from x2* to 00 of a x2 curve with d.f. degrees of freedom. (When using an Y x c table, there are (Y - l)(c - 1) degrees of freedom.) 610 Tiny Statistical Tables P 0.99 0.95 0.10 0.05 0.01 0.005 0.001 d.f. 1 0.00016 0.0039 2.71 3.84 6.63 7.88 10.83 2 0.020 0.10 4.60 5.99 9.21 10.60 13.82 3 0.115 0.35 6.25 7.81 11.34 12.84 16.27 4 0.297 0.71 7.78 9.49 13.28 14.86 18.47 100 70.06 77.93 118.5 124.3 135.8 140.2 149.4 Bibliography The following conference abbreviations are used in this bibliography: ACL n Proceedings of the nth Annual Meeting of the Association for Computa- tional Linguistics ANLP n Proceedings of the nth conference on Applied Natural Language Pro- cessing COLZNG n Proceedings of the nth International Conference on Computational Linguistics (COLING-year) EACL n Proceedings of the nth Conference of the European Chapter of the As- sociation for Computational Linguistics EMNLP n Proceedings of the nth Conference on Empirical Methods in Natural Language Processing WVLC n Proceedings of the n rh Workshop on Very Large Corpora These conference proceedings are all available from the Association for Com- putational Linguistics, P.O. Box 6090, Somerset NJ 08875, USA, acl@aclweb.org, http://www.aclweb.org. SZGZR ‘y Proceedings of the (y - 771th Annual International ACM/SIGIR Con- ference on Research and Development in Information Retrieval. Avail- able from the Association for Computing Machinery, acmhelp@acm.org, http://www.acm.org. Many papers are also available from the Computation and Language subject area of the Computing Research Repository e-print archive, a part of the xxx.lanl.gov e-print archive on the World Wide Web. Abney, Steven. 1991. Parsing by chunks. In Robert C. Berwick, Steven P. Ab- ney, and Carol Tenny (eds.), Principle-Bused Pursing, pp. 2 5 7-2 78. Dordrecht: Kluwer Academic. 611 612 7 Bibliography 1 3 Abney, Steven. 1996a. Part-of-speech tagging and partial parsing. In Steve Young and Gerrit Bloothooft (eds.), Corpus-Based Methods in Language and Speech Processing, pp. 118-136. Dordrecht: Kluwer Academic. Abney, Steven. 1996b. Statistical methods and linguistics. In Judith L. Klavans and Philip Resnik (eds.), The Balancing Act: Combining Symbolic and Statistical Approaches to Language, pp. 1-26. Cambridge, MA: MIT Press. Abney, Steven P. 1997. Stochastic attribute-value grammars. Computational Linguistics 23:597-618. Ackley, D. H., G. E. Hinton, and T. J. Sejnowski. 1985. A learning algorithm for Boltzmamr machines. Cognitive Science 9:147-169. Aho, Alfred V., Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers: Principles, Techniques, and Tools. Reading, MA: Addison-Wesley. Allen, James. 1995. Natural Language Understanding. Redwood City, CA: Ben- jamin Cummings. Alshawi, Hiyan, Adam L. Buchsbaum, and Fei Xia. 1997. A comparison of head transducers and transfer for a limited domain translation application. In ACL 35/EACL 8, pp. 360-365. Alshawi, Hiyan, and David Carter. 1994. Training and scaling preference functions for disambiguation. Computational Linguistics 20:635-648. Anderson, John R. 1983. The architecture of cognition. Cambridge, MA: Harvard University Press. Anderson, John R. 1990. The adaptive character of thought. Hillsdale, NJ: Lawrence Erlbaum. Aone, Chinatsu, and Douglas McKee. 1995. Acquiring predicate-argument map- ping information from multilingual texts. In Branimir Boguraev and James Pustejovsky (eds.), Corpus Processing for Lexical Acquisition, pp. 175-190. Cambridge, MA: MIT Press. Appelt, D. E., J. R. Hobbs, J. Bear, D. Israel, and M. Tyson. 1993. Fastus: A finite- state processor for information extraction from real-world text. In Proc. ofthe 13th IJCAI, pp. 1172-1178, Chambery, France. Apresjan, Jurij D. 1974. Regular polysemy. Linguistics 142:5-32. Apt& Chidanand, Fred Damerau, and Sholom M. Weiss. 1994. Automated leaming of decision rules for text categorization. ACM Transactions on Information Systems 12:233-251. Argamon, Shlomo, Ido Dagan, and Yuval Krymolowski. 1998. A memory-based approach to learning shallow natural language patterns. In ACL 36/COLlNG 17, pp. 67-73. Bibliography 613 Atwell, Eric. 1987. Constituent-likelihood grammar. In Roger Garside, Geoffrey Leech, and Geoffrey Sampson teds.), The Computalional Analysis of English: A Corpus-Based Approach. London: Longman. Baayen, Harald, and Richard Sproat. 1996. Estimating lexical priors for low- frequency morphologically ambiguous forms. Computational Linguistics 22: 155-166. Bahl, Lalit R., Frederick Jelinek, and Robert L. Mercer. 1983. A maximum likelihood approach to continuous speech recognition. 1EEE Transactions on Pattern Analysis and Machine Intelligence PAMI-5:179-190. Reprinted in (Waibel and Lee 1990), pp, 308-319. Bahl, Lalit R., and Robert L. Mercer. 1976. Part-of-speech assignment by a statistical decision algorithm. In International Symposium on Information Theory, Ronneby, Sweden. Baker, James K. 1975. Stochastic modeling for automatic speech understanding. In D. Raj Reddy ted.), Speech Recognilion: Invited papers presented at the 1974 ZEEEsymposium, pp. 521-541. New York: Academic Press. Reprinted in (Waibel and Lee 1990), pp. 297-307. Baker, James K. 1979. Trainable grammars for speech recognition. In D. H. Klatt and J. J. Wolf teds.), Speech Communication Papers for the 97th Meeting of the Acoustical Society of America, pp. 547-550. Baldi, Pierre, and Sm-en Brunak. 1998. Bioinformatics: The Machine Learning Approach. Cambridge, MA: MIT Press. Barnbrook, Geoff. 1996. Language and computers: a practical introduction to the computer analysis of language. Edinburgh: Edinburgh University Press. Basili, Roberto, Maria Teresa Pazienza, and Paola Velardi. 1996. Integrating general-purpose and corpus-based verb classification. Computational Linguis- tics 22:559-568. Basili, Roberto, Gianluca De Rossi, and Maria Teresa Pazienza. 1997. Inducing terminology for lexical acquisition. In EMNLP 2, pp. 12 5- 13 3. Baum, L. E., T. Petrie, G. Soules, and N. Weiss. 1970. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical StaGstics 41:164-171. Beeferman, Doug, Adam Berger, and John Lafferty. 1997. Text segmentation using exponential models. In EMNLP 2, pp. 35-46. Bell, Timothy C., John G. Cleary, and Ian H. Witten. 1990. Text Compression. Englewood Cliffs, NJ: Prentice Hall. Benello, Julian, Andrew W. Ma&e, and James A. Anderson. 1989. Syntactic cat- egory disambiguation with neural networks. Computer Speech and Language 3:203-217. i14 Bibliography Benson, Morton. 1989. The structure of the collocational dictionary. Intema- tional Journal of Lexicography 2:1-14. Benson, Morton, Evelyn Benson, and Robert Ilson. 1993. The BBI combinatory dicrionary of English. Amsterdam: John Benjamins. Berber Sardinha, A. P. 1997. Automatic Identification of Segments in Written Texts. PhD thesis, University of Liverpool. Berger, Adam L., Stephen A. Della Pietra, and Vincent J. Della Pietra. 1996. A maximum entropy approach to natural language processing. Computational Linguistics 22:39-71. Berry, Michael W. 1992. Large-scale sparse singular value computations. The International Journal of Supercomputer Applications 6113-49. Berry, Michael W., Susan T. Dumais, and Gavin W. O ’B rien. 1995. Using linear algebra for intelligent information retrieval. SIAMReview 37:573-595. Berry, Michael W., and Paul G. Young. 1995. Using latent semantic indexing for multilanguage information retrieval. Computers and the Humanities 29: 413-429. Bever, Thomas G. 1970. The cognitive basis for linguistic structures. In J. R. Hayes ted.), Cognition and the development of language. New York: Wiley. Biber, Douglas. 1993. Representativeness in corpus design. Literary and Linguis- tic Computing 8:243-257. Biber, Douglas, Susan Conrad, and Randi Reppen. 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press. Black, Ezra. 1988. An experiment in computational discrimination of English word senses. ZBMJournal of Research and Development 32:185-194. Black, E., S. Abney, D. Flickinger, C. Gdaniec, R. Grishman, P. Harrison, D. Hindle, R. Ingria, F. Jelinek, J. Klavans, M. Liberman, M. Marcus, S. Roukos, B. Santorini, and T. Strzalkowski. 1991. A procedure for quantitatively comparing the syntactic coverage of English grammars. In Proceedings, Speech and Natural Language Workshop, pp. 306-311, Pacific Grove, CA. DARPA. Black, Ezra, Fred Jelinek, John Lafferty, David M. Magerman, Robert Mercer, and Salim Roukos. 1993. Towards history-based grammars: Using richer models for probabilistic parsing. In ACL 31, pp. 31-37. Also appears in the Pro- ceedings of the DARPA Speech and Natural Language Workshop, Feb. 1992, pp. 134-139. Bod, Rens. 1995. Enriching Linguistics with Statistics: Performance Models of Natural Language. PhD thesis, University of Amsterdam. Bibliography 615 Bod, Rens. 1996. Data-oriented language processing: An overview. Technical Report LP-96-13, Institute for Logic, Language and Computation, University of Amsterdam. Bod, Rens. 1998. Beyond Grammar: An experience-based theory of language. Stanford, CA: CSLI Publications. Bod, Rens, and Ronald Kaplan. 1998. A probabilistic corpus-driven model for lexical-functional analysis. In ACL 36/COLING 17, pp. 145-15 1. Bod, Rens, Ron Kaplan, Remko Scha, and Khalil Sima ’a n. 1996. A data-oriented approach to lexical-functional grammar. In Computational Linguistics in the Netherlands 1996, Eindhoven, The Netherlands. Boguraev, Bran, and Ted Briscoe. 1989. Computational Lexicography for Natural Language Processing. London: Longman. Boguraev, Branimir, and James Pustejovsky. 1995. Issues in text-based lexicon acquisition. In Branimir Boguraev and James Pustejovsky (eds.), Corpus Processing for Lexical Acquisition, pp. 3-l 7. Cambridge MA: MIT Press. Boguraev, Branimir K. 1993. The contribution of computational lexicography. In Madeleine Bates and Ralph M. Weischedel (eds.), Challenges in natural Zan- guage processing, pp. 99-132. Cambridge: Cambridge University Press. Bonnema, R. 1996. Data-oriented semantics. Master ’s thesis, Department of Computational Linguistics, University of Amsterdam. Bonnema, Remko, Rens Bod, and Remko Scha. 1997. A DOP model for semantic interpretation. In ACL 35,EACL 8, pp. 159-167. Bonzi, Susan, and Elizabeth D. Liddy. 1988. The use of anaphoric resolution for document description in information retrieval. In SIGIR ‘8 8, pp. 53-66. Bookstein, Abraham, and Don R. Swanson. 1975. A decision theoretic foundation for indexing. Journal of the American Society for Information Science 26:45-W. Booth, Taylor L. 1969. Probabilistic representation of formal languages. In Tenth Annual IEEE Symposium on Switching and Automata Theory, pp. 74-81. Booth, Taylor L., and Richard A. Thomson. 1973. Applying probability measures to abstract languages. IEEE Transactions on Computers C-22:442-450. Borthwick, Andrew, John Sterling, Eugene Agichtein, and Ralph Grishman. 1998. Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In ?WLC 6, pp. 152-160. Bourigault, Didier. 1993. An endogeneous corpus-based method for structural noun phrase disambiguation. In EACL 6, pp. 81-86. Box, George E. P., and George C. Tiao. 1973. Bayesian Inference in Statistical Analysis. Reading, MA: Addison-Wesley. ;16 Bibliography Bran&, Thorsten. 1998. Estimating Hidden Markov Model Topologies. In Jonathan Ginzburg, Zurab Khasidashvili, Carl Vogel, Jean-Jacques Levy, and Emit Vallduvi (eds.), The Tbilisi Symposium on Logic, Language and Computa- tion: Selected Papers, pp. 163-176. Stanford, CA: CSLI Publications. Brants, Thorsten, and Wojciech Skut. 1998. Automation of treebank annotation. In Proceedings of NeMLaP-98, Sydney, Australia. Breiman, Leo. 1994. Bagging predictors. Technical Report 421, Department of Statistics, University of California at Berkeley. Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. Belmont, CA: Wadsworth International Group. Brent, Michael R. 1993. From grammar to lexicon: Unsupervised learning of lexical syntax. Computational Linguistics 19:243-262. Brew, Chris. 1995. Stochastic HPSG. In EACL 7, pp. 83-89. Brill, Eric. 1993a. Automatic grammar induction and parsing free text: A transformation-based approach. In ACL 31, pp. 259-265. Brill, Eric. 199313. A Corpus-Based Approach to Language Learning. PhD thesis, University of Pennsylvania. Brill, Eric. 1993c. Transformation-based error-driven parsing. In Proceedings Third International Workshop on Parsing Technologies, Tilburg/Durbuy, The Netherlands/Belgium. Brill, Eric. 1995a. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Lin- guistics 21:543-565. Brill, Eric. 199513. Unsupervised learning of disambiguation rules for part of speech tagging. In M/?/LC 3, pp. 1-13. Brill, Eric, David Magerman, Mitch Marcus, and Beatrice Santorini. 1990. Deduc- ing linguistic structure from the statistics of large corpora. In Proceedings of the DARPA Speech and Natural Language Workshop, pp. 275-282, San Mateo CA. Morgan Kaufmann. Brill, Eric, and Philip Resnik. 1994. A transformation-based approach to prepositional phrase attachment disambiguation. In COLING 1.5, pp. 1198-1204. Briscoe, Ted, and John Carroll. 1993. Generalized probabilistic LR parsing of natural language (corpora) with unification-based methods. Computational Linguistics 19:25-59. Britton, J. L. (ed.). 1992. Collected Works of A. M. Turing: Pure Mathematics. Amsterdam: North-Holland. [...]... to automatic speech recongition Bell System Technical Journal 62 :103 5 -107 4 Lewis, David D 1992 An evaluation of phrasal and clustered representations on a text categorization task In SIGIR ‘9 2, pp 37-50 Lewis, David D., and Karen Sparck Jones 1996 Natural language processing for information retrieval Communications of the ACM 39:92 -101 Bibliography 637 Lewis, David D., and Marc Ringuette 1994 A comparison... pp 228-235 Chen, Stanley F., and Joshua Goodman 1996 An empirical study of smoothing techniques for language modeling In ACL 34, pp 3 10- 3 18 Chen, Stanley F., and Joshua Goodman 1998 An empirical study of smoothing techniques for language modeling Technical Report TR -10- 98, Center for Research in Computing Technology, Harvard University Chi, Zhiyi, and Stuart Geman 1998 Estimation of probabilistic... &Associates Fu, King-Sun 1974 Syntactic Methods in Pattern Recognition London: Academic Press Fung, Pascale, and Kenneth W Church 1994 K-vet: A new approach for aligning parallel texts In COLING 15, pp 109 6- 1102 Fung, Pascale, and Kathleen McKeown 1994 Aligning noisy parallel corpora across language groups: Word pair feature matching by dynamic time warping In Proceedings of the Association for Machine... Donald 1994 A parser for text corpora In B.T.S Atkins and A Zampolli (eds.), Computational Approaches to the Lexicon, pp 103 -151 Oxford: Oxford University Press - 630 Bibliography Hindle, Donald, and Mats Rooth 1993 Structural ambiguity and lexical relations Computational Linguistics 19 :103 -120 Hirst, Graeme 1987 Semantic Interpretation and the Resolution of Ambiguity Cambridge: Cambridge University Press... corpora: Combining corpus and machine-readable dictionary data for building bilingual lexicons Journal of Machine Translation 10 Klein, Sheldon, and Robert F Simmons 1963 A computational approach to grammatical coding of English words Journal of the Association for Computing Machinery 10: 334-347 Kneser, Reinhard, and Hermann Ney 1995 Improved backing-off for m-gram language modeling In Proceedings of the... Eugene 1993 Statistical Language Learning Cambridge, MA: MIT Press Charniak, Eugene 1996 Tree-bank grammars In Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI ‘9 6), pp 103 1 -103 6 Charniak, Eugene 1997a Statistical parsing with a context-free grammar and word statistics In Proceedings of the Fourteenth National Conference on Artificial Inrelligence (AAAI ‘9 7), pp 598-603... newswire text In Branimir Boguraev and James Pustejovsky (eds.), Corpus Processing for Lexical Acquisition, pp 41-59 Cambridge, MA: MIT Press Manning, Christopher D 1993 Automatic acquisition of a large subcategorization dictionary from corpora In ACL 31, pp 23 5-242 Manning, Christopher D., and Bob Carpenter 1997 Probabilistic parsing using left corner language models In Proceedings of the Fifth International... Information Sources PhD thesis, University of California at Santa Barbara Domingos, Pedro, and Michael Pazzani 1997 On the optimality of the simple Bayesian classifier under zero-one loss Machine Learning 29 :103 -130 Doran, Christy, Dania Egedi, Beth Ann Hockey, B Srinivas, and Martin Zaidel 1994 XTAG system - a wide coverage grammar for English In COLING 15, pp 922-928 Dorr, Bonnie J., and Mari Broman Olsen... unrestricted text for information retrieval In ACL 34, pp 17-24 Fagan, Joel L 1987 Automatic phrase indexing for document retrieval: An examination of syntactic and non-syntactic methods In SZGIR ‘87, pp 91 -101 Fagan, Joel L 1989 The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval Journal of the American Society for Information Science 40:115-132 Fano, Robert M... Mercer 1992a Analysis, statistical transfer, and synthesis in machine translation In Proceedings of the 4th International Conference on Theoretical and Methodological Issues in Machine Translation, pp 83 -100 Brown, Peter F., Stephen A Della Pietra, Vincent J Della Pietra, Jennifer C Lai, and Robert L Mercer 1992b An estimate of an upper bound for the entropy of English Computational Linguistics 18:31-40 . 1) degrees of freedom.) 610 Tiny Statistical Tables P 0.99 0.95 0 .10 0.05 0.01 0.005 0.001 d.f. 1 0.00016 0.0039 2.71 3.84 6.63 7.88 10. 83 2 0.020 0 .10 4.60 5.99 9.21 10. 60 13.82 3 0.115 0.35. Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI ‘9 6), pp. 103 1 -103 6. Charniak, Eugene. 1997a. Statistical parsing with a context-free grammar and word statistics modeling. In ACL 34, pp. 3 10- 3 18. Chen, Stanley F., and Joshua Goodman. 1998. An empirical study of smoothing techniques for language modeling. Technical Report TR -10- 98, Center for Research

Định dạng
Số trang	73
Dung lượng	854,02 KB