... utterance segmentation and lack of punctu-
ation. Roland and Jurafsky (1998) have suggested
that there are substantial subcategorization differ-
ences between written corpora and spoken cor-
pora. ... value
drops progressively for both WSJ and Switch-
board. Again, Roland and Jurafsky (1998) have
suggested that one major subcategorization differ-
ence between written and spoke...
... prncessability is dominant. Can these
two functional demands be recrtnciled? There is in fact no
a priori
reason to believe that the demands of learnability and parsability
are necessarily compatible. ...
morphological and phtmological systems, and advances the notion
of
k.reversibility
as the analog of bounded context parsability for
such finite state sysiems.
1I IIOUNDED CON...
... domains
balanced roughly equally: news (80,000 words),
arts and leisure (50,000), business and finance
(50,000), and opinion (50,000) and each covers the
period from April-September 2004. Each ... Baker. 1993. Corpus linguistics and translation
studies: Implications and applications. In Gill Francis
Mona Baker and Elena Tognini Bonelli, editors, Text
and technology: in h...
... project, and a doctoral fellowship,
both from IBM Deutschland GmbH, and by the
Esprit Basic Research Action Project 3175 (DY-
ANA). I thank Jochen D6rre, Glyn Morrill, Remo
Pareschi, and Henk ... has scope over a whole sequent, and
therefore, over a complete subproof, and not only
over a single category. In this way, correct varia-
ble bindings for hypothetic categories, whic...
... example, there are Naive Bayes
(McCallum and Nigam, 1998), Rocchio (Lewis et
al., 1996), Nearest Neighbor (kNN) (Yang et al.,
2002), TCFP (Ko and Seo, 2002), and Support
Vector Machine (SVM) (Joachims, ... co-
occurred with the title words and keywords:
‘driver’, ‘clutch’, ‘trunk’, and so on. They are
words in first-order co-occurrence with the title
words and the keywords....
... between 1 and n the triple
(s,s, ws) E T.
• The tree is binary branching and consistent.
Formally, for every (s,t, X) in T, s ¢ t, there is
exactly one r, Y, and Z such that s < r < t and ...
crosses some constituent in the correct parse. On
the other hand, the Bracketed Recall and Bracketed
Tree Rates are easier to handle, since computing the
probability that a bracke...
... task of data extraction and language identification,
and on using ODIN to “discover” linguistic knowledge.
Then we outline a plan for the demo presentation.
2 Background and Previous work on
ODIN
ODIN ... structures), Xia and
Lewis (2007) proposed to enrich the original IGT and
then extract syntactic information (e.g., context-free
rules) to bootstrap NLP tools such as POS taggers a...
... between lexicon and
morphology from the point of view of both
theoretical linguistics and computational
linguistics. Section 2 discusses the relational
word-based model of the lexicon and the role ...
1973; Aronoff, 1976; Lieber, 1980; Selkirk, 1983,
and others), and lexical redundancy rules (cf.
Jackendoff, 1975; Bresnan, 1977).
By and large, there seems to be widespread...
... would like to thank Eugene Charniak and the other
members ofBLLIP for theircomments andsuggestions. Fer-
nando Pereira was especially generous with comments and
suggestions, as were the ACL reviewers; ... in an LR
parser). The isomorphism between shift-reduce
parses and standard parse trees is well-known
(Hopcroft and Ullman, 1979), and so is not de-
scribed here.
A (joint) shift-re...
...
(b-cost) of a segmentation candidate. By this, the
collocation tends to have priority over the ordi-
nary word. The standard and initial value of each
segment cost is 2, and it is increased by ... approxi-
mately 12.6, the raise ofCS, TS and SS rate is 2.4 %,
5.2 % and 5.7 %, respectively. As a consequence,
the raise ofCS, TS and SS rate is 6.2 %, 9.1% and
3.8 % on the ave...