medical document clustering using ontology based term similarity measures

Tài liệu Báo cáo khoa học: "Ensemble Document Clustering Using Weighted Hypergraph Generated by NMF" docx

Tài liệu Báo cáo khoa học: "Ensemble Document Clustering Using Weighted Hypergraph Generated by NMF" docx

... results using random initialization, and selected the cluster1 We used the clustering toolkit CLUTO for clustering the hypergraph 79 Î Conclusion This paper proposed a new ensemble document clustering ... the ensemble method using a standard hypergraph and the ensemble method using a weighted hypergraph Our method achieved the best results NMF Ñ Ò Í The NMF decomposes the ¢ term -document matrix to ... our ensemble method Any number is available for each clustering Experience shows that the ensemble clustering using k-means succeeds when each clustering has many clusters, and they are combined...

Ngày tải lên: 20/02/2014, 12:20

4 393 0
Báo cáo khoa học: "Multi-Document Summarization using Sentence-based Topic Models" docx

Báo cáo khoa học: "Multi-Document Summarization using Sentence-based Topic Models" docx

... proposed for document clustering and summarization by making use of both term -document matrix Y and term- sentence matrix B The FGB model computes two matrices U and V by optimizing number of document ... model leads to better summarization results term -document matrix term- sentence matrix the number of latent topics sentence-topic matrix auxiliary document- topic matrix 1: Randomly initialize ... than LexPageRank Note that FGB model makes use of both term -document and term- sentence matrices Our BSTM model outperforms FGB since the document- topic allocation is marginalized out in BSTM and...

Ngày tải lên: 23/03/2014, 17:20

4 381 0
Báo cáo khoa học: "Profile Based Cross-Document Coreference Using Kernelized Fuzzy Relational Clustering" docx

Báo cáo khoa học: "Profile Based Cross-Document Coreference Using Kernelized Fuzzy Relational Clustering" docx

... distinctions between document level and profile based cross document coreference Document level CDC makes a simplifying assumption that a named entity (and its variants) in a document has one underlying ... We have presented a profile -based Cross Document Coreference (CDC) approach based on a novel fuzzy relational clustering algorithm KARC In contrast to traditional hard clustering methods, KARC produces ... pointer to another entity) The profile based CDC method generates a partition of E, 2.2 CDC Using Fuzzy Relational Clustering 2.2.1 Preliminaries Traditionally, hard clustering algorithms (where uij...

Ngày tải lên: 08/03/2014, 00:20

9 207 0
Báo cáo khoa học: "Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities" docx

Báo cáo khoa học: "Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities" docx

... 3.2 Clustering The algorithm for clustering multilingual documents based on cognate NEs is of heuristic nature It consists of main phases: (1) first clusters creation, (2) addition of remaining documents ... event); then, the documents are represented by a set of terms (keywords or named entity types) In addition, they use document frequency to select relevant features among the extracted terms Finally, ... multilingual clustering; the multilingual clustering takes input from the monolingual clusters The authors select different type of features depending on the clustering: for the monolingual clustering...

Ngày tải lên: 31/03/2014, 01:20

8 421 0
Medical image analysis using statistical shape model based on subdivision surface wavelet

Medical image analysis using statistical shape model based on subdivision surface wavelet

... topology of biological objects) based on the subdivision surface wavelet transform, termed Statistical Surface Wavelet Model (SSWM) And besides, a framework of using SSWM for model-guided segmentation ... appropriate mapping [41, 44] is determined by iteratively solving a constrained optimization problem based on the diffusion equation Next, to express a surface using spherical harmonics, the 20 ... caudate nucleus used for model-guided segmentation is built based on a training set using the proposed method 3.1 The Shape Representation Based on Subdivision Surface Wavelets In this section, we...

Ngày tải lên: 12/09/2015, 08:19

122 362 0
Document clustering on target entities using persons and organizations

Document clustering on target entities using persons and organizations

... Common Document Clustering Algorithms Document Clustering algorithms attempt to identify groups of documents that are similar to each other more than the rest of the collection Here each document ... follows Section introduces related work and Section discusses named entity based, link -based, content -based and structure -based document features and presents the algorithm to identify DPs and seeds ... to perform clustering to deliver IDPs for the corresponding Target entities PnO page clustering is a special case of web document clustering, which attempts to identify groups of documents that...

Ngày tải lên: 04/10/2015, 17:04

90 274 0
Thermal error modelling of machine tools based on ANFIS with fuzzy c means clustering using a thermal imaging camera

Thermal error modelling of machine tools based on ANFIS with fuzzy c means clustering using a thermal imaging camera

... homepage: Thermal error modelling of machine tools based on ANFIS with fuzzy c-means clustering using a thermal imaging camera Ali M Abdulshahed ⇑, Andrew P Longstaff, Simon ... machine tools using data obtained from a thermal imaging camera is introduced Different groups of key temperature points were identified from thermal images using a novel schema based on a Grey ... c-means clustering Grey system theory a b s t r a c t Thermal errors are often quoted as being the largest contributor to CNC machine tool errors, but they can be effectively reduced using error...

Ngày tải lên: 08/11/2015, 00:02

17 818 0
Why are US firms using more short-term debt

Why are US firms using more short-term debt

... negatively related to the term spread The interpretation is that managers time the market and prefer to issue shortterm debt when short -term interest rates are low compared with long -term rates In contrast, ... Founding age Taxes Term spread Short -term rate Inflation Real short -term rate Default spread Recession dummy Bank stock index return Government share Definition Ratio of long -term debt (DLTT) minus ... information asymmetry will issue short -term debt to avoid locking in their cost of financing with long -term debt because they expect to borrow at more favorable terms later Consistent with the asymmetric...

Ngày tải lên: 04/04/2013, 23:19

31 586 1
Tài liệu Báo cáo khoa học: "Predicate Argument Structure Analysis using Transformation-based Learning" pdf

Tài liệu Báo cáo khoa học: "Predicate Argument Structure Analysis using Transformation-based Learning" pdf

... Mitchell Marcus 1995 Text chunking using transformation -based learning In Proc of the third workshop on very large corpora, pages 82–94 Dan Shen and Mirella Lapata 2007 Using semantic roles to improve ... Conclusion We performed experiments for Japanese predicate argument structure analysis using transformationbased learning and extracted rules that indicate the tendencies annotators have We presented ... Transformation -based error-driven parsing In Proc of the Third International Workshop on Parsing Technologies Time Loc 51.5 38.0 59.6 1.7 55.8 37.4 Eric Brill 1995 Transformation -based error-driven...

Ngày tải lên: 20/02/2014, 04:20

6 496 0
Tài liệu Báo cáo khoa học: "Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity" pdf

Tài liệu Báo cáo khoa học: "Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity" pdf

... paper we use both monolingual syntaxbased approaches and multilingual alignmentbased approaches and compare their performance when using the same similarity measures and evaluation set ferent ... results from the two synonym extraction approaches based on distributional similarity: one using syntactic context and one using translational context based on word alignment and the combination of ... less than times Distributional Similarity Based on Syntactic Relations This section contains the description of the synonym extraction approach based on distributional similarity and syntactic relations...

Ngày tải lên: 20/02/2014, 12:20

8 516 0
Tài liệu Báo cáo khoa học: "Improving Pronoun Resolution Using Statistics-Based Semantic Compatibility Information" doc

Tài liệu Báo cáo khoa học: "Improving Pronoun Resolution Using Statistics-Based Semantic Compatibility Information" doc

... pronoun resolution systems on the same data set Web -based feature vs Corpus -based feature The third column of the table lists the results using the web -based compatibility feature for neutral pronouns ... corpus -based semantic feature However, the increase is not as large as using the web -based feature: Under the two learning models, the success rate of the best system with the corpus -based feature ... the utility of the statistics -based semantic feature is more salient under TC than under SC for N-Pron resolution: the best gains using the corpus -based and the web -based semantic features under...

Ngày tải lên: 20/02/2014, 15:20

8 377 0


... Sag 1987 Information -Based Syntax and Semantics VoI.1 CSLI Lecture Notes 13 Stanford: CSLI Shieber, S 1985 "Using Restriction to Extend Parsing Algorithms for ComplexFeature -Based Formalisms" 23rd ... u r e - B a s e d Categories: A Preliminary Modification Fig.3 is an example production using feature -based syntactic categories The notations are adapted from Pollard and Sag (1987) and Shieber ... includes the item < v P ~ v NP > The ACTION/GOTO table used in the above example can be constructed using the procedures given in Fig.2 (adapted flom Aho and Uliman (1987)) The procedure CLOSURE coml~utes...

Ngày tải lên: 22/02/2014, 10:20

6 334 0
Báo cáo khoa học: "Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments" ppt

Báo cáo khoa học: "Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments" ppt

... Several measures are compared, including knowledge -based and corpusbased measures, with the best results being obtained with a corpus -based measure using Wikipedia combined with a “relevance feedback” ... subgraph-subgraph) matches Of these, 36 are based upon the semantic similarity [0 3] of four subgraphs defined by Nx All eight WordNet -based similarity measures listed in Section 3.3 plus the LSA ... scores obtained from semantic similarity measures Following Mihalcea et al (2006) and Mohler and Mihalcea (2009), we use eight knowledgebased measures of semantic similarity: shortest path [PATH],...

Ngày tải lên: 07/03/2014, 22:20

11 478 0
Báo cáo khoa học: "Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models " pptx

Báo cáo khoa học: "Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models " pptx

... corpus, existing crossdocument coreference approaches could not be applied to this dataset However, since a majority of related work consists of using clustering after defining a similarity function ... supervised clustering, and Haghighi and Klein (2010) use entity profiles to assist within -document coreference Since many related methods use clustering, there are a number of distributed clustering ... evaluation of clustering with SubSquare (Bshouty and Long, 2010), a scalable, distributed clustering method Subsquare takes as input a weighted graph with mentions as nodes and similarity between...

Ngày tải lên: 07/03/2014, 22:20

11 319 0
Báo cáo khoa học: "An Ontology–Based Approach for Key Phrase Extraction" docx

Báo cáo khoa học: "An Ontology–Based Approach for Key Phrase Extraction" docx

... semantic similarity concept in the ViO ontology as Step After that, the key phrase extracting process will go to phase • Step 3: The idea of the most specific category identification process based ... the semantic similarity concept for each concept t that is still unknown after phase 2, we traverse the ontology hierarchy from its root to find the best node We choose the semantic similarity ... the current node c while traversing, the similarity values between t and all children of c are calculated If the maximum of similarity values is less than similarity value between t and c, then...

Ngày tải lên: 08/03/2014, 01:20

4 429 3
Báo cáo khoa học: "Paraphrase Recognition Using Machine Learning to Combine Similarity Measures" ppt

Báo cáo khoa học: "Paraphrase Recognition Using Machine Learning to Combine Similarity Measures" ppt

... (s1 , s2 ), where fj (1 ≤ j ≤ 9) are the string similarity measures Finally, we locate the s1 with the best average similarity (over all similarity measures) to s2 , namely s1∗ : 10 S2 : Fewer than ... of nouns, the voice of verbs etc.; this increases the similarity of positive s3 , s3 pairs A common problem is that the string similarity measures may be misled by differences in the lengths of ... treating the two verbs as the same token during the calculation of the string similarity measures would yield a higher similarity The second method, called INIT + WN, treats words from S1 and S2...

Ngày tải lên: 08/03/2014, 01:20

9 402 0
Báo cáo khoa học: "Topic-Focused Multi-document Summarization Using an Approximate Oracle Score" doc

Báo cáo khoa học: "Topic-Focused Multi-document Summarization Using an Approximate Oracle Score" doc

... query terms Table shows a list of query terms for our two illustrative topics The number of query terms extracted in this way ranged from a low of terms for document set d360f to 20 terms for document ... d324e 6.2 The second collection of terms we use to estimate P (t|τ ) are signature terms Signature terms are the terms that are more likely to occur in the document set than in the background ... give rise to query terms and the latter to signature terms 6.1 Signature Terms 6.3 An estimate of P (t|τ ) To estimate P (t|τ ), we view both the query terms and the signature terms as “samples”...

Ngày tải lên: 08/03/2014, 02:21

8 339 0
Báo cáo khoa học: "Enriching the Output of a Parser Using Memory-Based Learning" potx

Báo cáo khoa học: "Enriching the Output of a Parser Using Memory-Based Learning" potx

... used simple pattern -based heuristics to detect conjuncts and mark all conjuncts as heads of a conjunction After the conversion, every resulting dependency structure is modified deterministically: ... We then learned a mapping from the parser’s labels to those in the dependency corpus, using TiMBL, a memory -based classifier (Daelemans et al., 2003) The features used for the relabelling were similar ... its head, dependent, and label are correct For traces, this corresponds to the evaluation using the head -based antecedent representation described in (Johnson, 2002), and for empty nodes without...

Ngày tải lên: 08/03/2014, 04:22

8 379 0
Báo cáo khoa học: "Generalised PP-Attachment Disambiguation using Corpus-based Linguistic Diagnostics" pot

Báo cáo khoa học: "Generalised PP-Attachment Disambiguation using Corpus-based Linguistic Diagnostics" pot

... this discrimination is not amenable to a corpus -based treatment In recent work, however, we succeed in distinguishing arguments from adjuncts using evidence extracted from a parsed corpus (Merlo ... linguistic diagnostics to determine whether a PP is an adjunct or an argument We illustrate here those countable diagnostics that can be approximated statistically and estimated using corpus counts, ... disambiguating the noun-verb attachment reaches an accuracy of 80.2% (baseline 71.6% using only the preposition) using information about argumenthood and 77.2% if the decision tree induction is performed...

Ngày tải lên: 08/03/2014, 21:20

8 299 0
Báo cáo khoa học: "Word classification based on combined measures of distributional and semantic similarity" docx

Báo cáo khoa học: "Word classification based on combined measures of distributional and semantic similarity" docx

... weight proportional to its distributional similarity to the test word ("distributional similarity weighting") The weight in the third version was determined according to Equation 3, whereby A ... distributional similarity values, "semantic similarity weighting") Figure describes the precision demonstrated by these three weighting possibilities on the BNC data (for "semantic similarity weighting", ... versions of KNN in terms of precision: (1) without weighting of neighbors ; (2) with weighting by their distributional similarity to the test word and (3) with weighting by their semantic similarity...

Ngày tải lên: 08/03/2014, 21:20

4 345 0