... obtain automatic word classifications
for large vocabularies (>1 million words) us-
ing such large training corpora (>30 billion to-
kens). The resulting clusterings are then used
in training ... Proceedings of ACL-08: HLT, pages 755–762,
Columbus, Ohio, USA, June 2008.
c
2008 Association for Computational Linguistics
Distributed Word Clustering for Large Scale Cl...
... emphasis in LDA is on modeling top-
ics, not word meanings, there is no guarantee that
the row (word) vectors are sensible as points in a
k-dimensional space. Indeed, we show in section
4 that using ... weighting in previous work
suggests that incorporating sentiment information
into VSM values via supervised methods is help-
ful for sentiment analysis. We adopt this insight,
bu...
... descriptor for word type i. We next
include a normalization step in which each row
in each of L
*
and R
*
is scaled to unit length,
yielding matrices L
**
and R
**
. Finally, we form a
single ... descriptors into
k
1
= 500 groups, using a k-means clustering algo-
rithm. Centroid initialization is done by placing
the k initial centroids on the descriptors of the k
most freq...
... objects.
Therefore, in the English-to-Chinese machine
translation task we need to take additional efforts
to generate the missing measure words in Chinese.
For example, when translating the English ...
four major kinds of errors as listed in Table 8.
Most errors are caused by failures in finding posi-
tions to generate measure words. The main reason
for this is some hint in...
... inspectors for viewing, search-
ing and editing the static and dynamic resources
and a Link Reporter that can summarize and con-
figure the information in the database, including
compiling fine-grained ... up-
dated incrementally during the manual revision
stage. Each time the user confirms a proposed
link the information inherent in the link is stored
in the different dynamic resourc...
... bigram
ACM used in a Chinese text input system [Gao et al.
2002]. However, quite a few techniques (including
clustering) were integrated to construct a Chinese
language modeling system, and ... Asymmetric clustering
The basic criterion for statistical clustering is to
maximize the resulting probability (or minimize the
resulting perplexity) of the training data. Many
tradit...
... similarities between NEs.
The approach that we propose is inspired from
the language modeling framework introduced in
the information retrieval field (see for example
(Lavrenko and Croft, 2003)). Then, we ... cliques containing Oxford
2.4 Cliques clustering
We use a clustering technique in order to group
cliques of NEs which are mutually highly simi-
lar. The clusters of cliques...
... for 10 min in a bath-type
sonicator [22]. Vitamin D3, cholesterol and hydroxyvitamin
D3 derivatives were included in the mixture for sonication as
required. Purified P450scc was incorporated into ... 20,23-di-
hydroxyvitamin D3 (80 lg) for structure determination by
NMR was performed using a 50 mL incubation of 50 lm
vitamin D3 with 2 lm P450scc in 0.45% cyclodextrin, with
the prod...
... x-WISH
INFORMATIVR
various S-INFORM
3.2.Unification-based analysis
Figure 1 diagrams an overview of the
procedure for translating speaker's meaning. In
contrast to a conventional machine ...
REQUESTING
COMPLAINING
ADVISING
CONFIRMING
etc.
Conversely, the same intention can be conveyed
through various surface expressions, as in the
following variations of (2-1):
RE...
... a
basis for the continuity of haemoglobin and myoglobin
functions in vivo, since the autoxidation reaction is inevitable
in nature for all oxygen-binding haem proteins [21,23,24], as
well as for ... contacts in HbA
In haemoglobin (Hb) research, the central problem is
understanding the mechanism for the cooperative oxygen
binding to the a
2
b
2
tetramer. For human HbA, the a...