... efficacy of this
method in the context ofChinese word
segmentationand part -of- speech tagging,
where no segmentationandPOStagging
standards are widely accepted due to the
lack of morphology in Chinese. ... of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pages 522–530,
Suntec, Singapore, 2-7 August 2009.
c
2009 ACL and AFNLP
Automatic AdaptationofAnnotation Standards:
Chinese ... method we choose
Chinese wordsegmentationand part -of- speech
tagging, where the problem of incompatible an-
notation standards is one of the most evident: so
far no segmentation standard is widely...
... Joint ChineseWordSegmentationandPOS Tagging
Canasai Kruengkrai
†‡
and Kiyotaka Uchimoto
‡
and Jun’ichi Kazama
‡
Yiou Wang
‡
and Kentaro Torisawa
‡
and Hitoshi Isahara
†‡
†
Graduate School of ... discriminative
word- character hybrid model for joint Chi-
nese wordsegmentationandPOS tagging.
Our word- character hybrid model offers
high performance since it can handle both
known and unknown words. ... Liu, and Yajuan L
¨
u.
2008a. A cascaded linear model for joint chinese
wordsegmentationand part -of- speech tagging. In
Proceedings of ACL.
Wenbin Jiang, Haitao Mi, and Qun Liu. 2008b. Word
lattice...
... word w
11
the starting characters c
1
and c
2
of two con-
secutive words
12
the ending characters c
1
and c
2
of two con-
secutive words
13
a wordof length l with previous word w
14
a wordof ... UK
{yue.zhang,stephen.clark}@comlab.ox.ac.uk
Abstract
For ChinesePOS tagging, word segmentation
is a preliminary step. To avoid error propa-
gation and improve segmentation by utilizing
POS information, segmentationand tagging
can be ... proposed a hy-
brid model for wordsegmentationandPOS tagging
using an HMM-based approach. Word information is
used to process known-words, and character infor-
mation is used for unknown words...
... −l), and select for position i a N-best list
of candidate results from all these candidates. When
we derive a candidate result from a word- POS pair
p and a candidate q at prior position of p, ... and joint segmentation and
part -of- speech tagging. On the Penn Chinese
Treebank 5.0, we obtain an error reduction of
18.5% on segmentationand 12% on joint seg-
mentation and part -of- speech tagging ... that segmentationandPOStagging task
is to divide a character sequence into several subse-
quences and label each of them a POS tag.
It is a better idea to perform segmentation and
POS tagging...
... sequence of characters c = (c
1
, , c
#c
),
the task ofwordsegmentationandPOStagging is
to predict a sequence ofwordandPOS tag pairs
y = (w
1
, p
1
, w
#y
, p
#y
), where w
i
is a word, ... model, joint word segmen-
tation andPOStagging is decomposed into two
steps: (1) coarse-grained wordsegmentation and
tagging, and (2) fine-grained sub -word tagging. The
workflow is shown in ... decoding for POStagging over sub-
words is efficient. Finally, the Chinese language is
characterized by the lack of morphology that often
provides important clues for POS tagging, and the
POS tags...
... Liang Huang, and Qun Liu. 2009. Au-
tomatic adaptationofannotation standards: Chinese
wordsegmentationandPOStagging – a case study. In
Proceedings of the Joint Conference of the 47th An-
nual ... and Hitoshi
Isahara. 2009. An error-driven word- character hybrid
model for joint Chinesewordsegmentationand POS
tagging. In Proceedings of the Joint Conference of the
47th Annual Meeting of ... model
for word structure parsing is integrated with con-
stituent parsing. There has been many efforts to in-
tegrate Chineseword segmentation, part -of- speech
tagging and parsing (Wu and Zixin,...
... Comparison of Combined
Model and KLD Model
5 Conclusions and Future Work
A discriminative pruning criterion of n-gram lan-
guage model for Chinesewordsegmentation was
proposed in this paper, and ... and
word segmentation performance is also
discussed.
1 Introduction
Chinese wordsegmentation is the initial stage of
many Chinese language processing tasks, and
has received a lot of attention ... of the 41
st
Annual
Meeting of Association for Computational Linguis-
tics (ACL-2003), pages 272-279.
Jianfeng Gao, Mu Li, Andi Wu, and Chang-Ning
Huang. 2005. ChineseWordSegmentation and...
... J. and A. Wu and Mu Li and C N.Huang and H. Li
and X. Xia and H. Qin. 2004. Adaptive Chinese Word
Segmentation. In Proceedings of ACL-2004.
Meng, H. and C. W. Ip. 1999. An Analytical Study of
Transformational ... N. 2003. ChineseWordSegmentation as Charac-
ter Tagging. Computational Linguistics and Chinese
Language Processing. 8(1): 29-48
Redington, M. and N. Chater and C. Huang and L. Chang
and K. Chen. ... that
Chinese wordsegmentation is the classifi-
cation of a string of character-boundaries
(CB’s) into either word- boundaries (WB’s)
and non -word- boundaries. In Chinese, CB’s
are delimited and...
... "~J~:~""7~:
~'"'~}~:~'"'~: ~"should be separated and "~:
~'"'~:~'"'~: [] '"'}~: ~J:" be combined ...
dis(locmax, y:z) = dts(x:y)- dts(y:z)
Definition 7 Suppose 'vxyzw' is a Chinese
1268
Chinese WordSegmentation
without Using Lexicon and Hand-crafted Training Data
Sun Maosong, Shen Dayang*, ...
Any Chineseword is composed of either single
or multiple characters. Chinese texts are explicitly
concatenations of characters, words are not
delimited by spaces as that in English. Chinese...
... beginning of a wordand I
all other positions; and 2) BMES: where B, M and E
represent the beginning, middle and end of a multi-
character word respectively, and S tags a single-
character word. ... decoding.
3 ChineseWordSegmentation (CWS)
3.1 Wordsegmentation as character tagging
Considering the ambiguity problem that a Chinese
character may appear in any relative position in a
word and the ... (c
n−l
k
+1
c
n
) represents a
segmentation of k words and the lengths of the first
and last word are l
1
and l
k
respectively.
In early work, rule-based models find words one
by one based on heuristics...
... ICTCLAS2009.
NUS Chineseword segmenter (NUS): The NUS
Chinese word segmenter uses a maximum entropy
approach to Chineseword segmentation, which
achieved the highest F-measure on three of the four ... into words:
Translation: 多少_钱_的_
伞
_吗_?
Reference: 这些_
雨伞
_多少_钱_?
The word “
伞
” is a synonym for the word “
雨
伞
”, and both words are translations of the English
word “umbrella”. If a word- level ... the four
corpora in the open track of the Second Interna-
tional ChineseWordSegmentation Bakeoff (Ng
and Low, 2004; Low et al., 2005). The segmenta-
tion standard adopted in this paper is CTB...
... segmented Chinese text,
most of the tokens are uni- and bigrams but most of
the types are bi- and trigrams (as unigrams are often
high frequency grammatical words and trigrams the
result of more ... unambiguous cases of numbers and dates in Chinese script.
From h
→
(x
0 n
) and h
→
(x
0 n−1
) on the one hand,
and from h
←
(x
0 n
) and h
←
(x
1 n
) we estimate the
Variation of Branching Entropy ... redefine the sentence
segmentation problem as the maximization of the au-
tonomy measure of its words. For a character se-
quence s, if we call Seg(s) the set of all the possible
segmentations, then...
... synonyms and other types
of semantically related words such as
antonyms, (co)hyponyms and hypernyms.
We present a method based on automatic
word alignment of parallel corpora con-
sisting of documents ... context
and one using translational context based on word
alignment and the combination of both. For both
approaches, we used a cutoff n for each row in our
word- by-context matrix. A word is ... word
P(W) is the probability of seeing the word
P(f) is the probability of seeing the feature
P(W,f) is the probability of seeing the wordand the feature
together.
3.3 Word Alignment
The multilingual...