... understanding, and customizing complex rule-based named- entityannotators for different domains.Ratinov and Roth (2009) systematically studythe challenges in NER, compare several solutions and ... dropnon-English tweets and get about 11,371,389, fromwhich 15,800 tweets are randomly sampled, and arethen labeled by two independent annotators, so thatthe beginning and the end of each named entity ... tweets, forming the gold-standard data set.Figure 1 shows the portion of named entities of dif-ferent types. On average, a named entity has 1.2words. The gold-standard data set is evenly split...