... runs using anoptimal threshold (box 3) for the experiment as de-termined by using the test set. In all remaining ex-periments, we learn the threshold from the trainingset as in the BASELINE ... including the number of documents, annotated CEs, coreference chains, annotatedCEs per chain (average), and number of documents in the train/test split. We use st to indicate a standard train/test ... rarely inaccurate, assumption thatthere are no cataphoric expressions in the data.661Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pages 656–664,Suntec, Singapore,...