0

a data mining approach

Báo cáo khoa học:

Báo cáo khoa học: "Learning to Tell Tales: A Data-driven Approach to Story Generation" doc

Báo cáo khoa học

... Figure for an example) These stories exhibit several recurrent patterns and are thus amenable to a data- driven approach Although they have limited vocabulary and nonelaborate syntax, they nevertheless ... Briscoe, E and J Carroll 2002 Robust accurate statistical annotation of general text In Proceedings of the 3rd LREC Las Palmas, Gran Canaria, pages 1499–1504 Callaway, Charles B and James C Lester ... (K-F-NSAMP), familiarity (FAM), concreteness (CONC), imagery (IMAG), age of acquisition (AOA), and meaningfulness (MEANC and MEANP) Correlation analysis was used to assess the degree of linear relationship...
  • 9
  • 318
  • 0
Báo cáo y học:

Báo cáo y học: "ProCAT: a data analysis approach for protein microarrays" pot

Báo cáo khoa học

... background signals In addition, spatial variations can be reduced efficiently through a novel two-parameter signal normalization approach and calling positive spots locally After generating a ... three aspects: experimental designs, data file formats and normalization parameters Experimental design contains parameters such as the number of test arrays and negative control arrays for one particular ... one particular assay Data file format describes the layout in the uploaded dataset so that ProCAT can recognize and extract the useful information from it Normalization parameters allow users...
  • 11
  • 268
  • 0
data mining a heuristic approach

data mining a heuristic approach

Kinh tế vĩ mô

... Is Data Modeling? 1.5.3 Data Quality The data held in a database is usually a valuable business asset built up over a long period Inaccurate data (poor data quality) reduces the value of the asset ... functional specification), specifying the business processes that the system is ■ Chapter What Is Data Modeling? Report Report Program Program data data DATABASE data Program data data Program Figure ... common for the same data to appear in more than one database and for problems to arise in drawing together data from multiple databases How many other databases hold similar data about our customers...
  • 562
  • 1,097
  • 1
hash-based approach to data mining

hash-based approach to data mining

Công nghệ thông tin

... work with a small database, however, the data of transactions – which we concern – with a quick increasing we have to face with an extremely large database The idea of reading data repeatedly is ... Example 4: same as in example 2, work with the PHS algorithm Transaction database 25 Hash-Based Approach to Data Mining TID Items 100 ABCD 200 ABCDF 300 BCDE 400 ABCDF 500 ABEF Table 5: Scan and count ... of a product in a period of time… 32 Hash-Based Approach to Data Mining To build the system, we must have database contains transactions Each transaction has two fields: transaction ID (TID) and...
  • 47
  • 566
  • 0
Tài liệu A Comparison of Approaches to Large-Scale Data Analysis pdf

Tài liệu A Comparison of Approaches to Large-Scale Data Analysis pdf

Cơ sở dữ liệu

... Muralikrishna GAMMA - A High Performance Data ow Database Machine In VLDB ’86, pages 228–237, 1986 S Fushimi, M Kitsuregawa, and H Tanaka An Overview of The System Software of A Parallel Relational Database ... data) have approximately constant performance for this task, since each node has the same amount of Document data to process and this amount of data remains constant (7GB) as more nodes are added ... that it was a mistake for the database research community in the 1970s to focus on data sub-languages that could be embedded in any programming language, rather than adding high-level data access...
  • 14
  • 923
  • 0
Tài liệu Mining Database Structure; Or, How to Build a Data Quality Browser docx

Tài liệu Mining Database Structure; Or, How to Build a Data Quality Browser docx

Cơ sở dữ liệu

... )Ô@e5H$ÂƯ5F@$5@XUƯYÔFd)bFE5jkƯE$@e& Actual q-gram vector distance 0 0.2 0.4 0.6 0.8 1.2 Estimated vs actual q-gram vector distance, 150 sketch samples 5.5.1 Using Multiset Resemblance Actual q-gram vector distance 0 0.2 ... bC$XX5Fp5b$#rCFbFắỳắ(y`Ôắ(ể Actual resemblance 0.2 0.4 0.6 0.8 Q-gram resemblance 0.2 0.4 0.6 0.8 0.2 0.2 Q-gram vector distance Estimated resemblance 0.4 0.6 0.8 0.4 0.6 0.8 1.2 1.4 Estimated vs Actual Q-gram Resemblance, ... Resemblance Actual q-gram vector distance 0 0.2 0.4 0.6 0.8 5.5 Qualitative Experiments 1.2 Estimated vs actual q-gram vector distance, 50 sketch samples ộ ị ỉố #$rỗ ú õ ó ị ổ ọ ĩ ễ í ó õễ í ị ỉ ỉ ò ĩ...
  • 12
  • 581
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Transparent combination of rule-based and data-driven approaches in a speech understanding architecture" pot

Báo cáo khoa học

... these features are defined by hand-coded rules, and some by surface utterance characteristics like word Ngrams The available data is used to train statistics which evaluate each feature's reliability ... of training data available, and the statistical methods used are simple and unsophisticated However, we still get a significant improvement on rules alone by adding a trainable component An obvious ... each line in the training file: a text string, a parsed representation and a confidence score For text data, it yields either a parsed representation or an annotation marking that the utterance...
  • 8
  • 461
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A multi-staged approach to identifying complex events in textual data" ppt

Báo cáo khoa học

... phrase vocabulary, the distance to the target head, and local context (words and phrases) Our initial evaluation of this approach has given us encouraging first results Based on a hand-annotated ... to a relational database via CGI scripts (html, JavaScript, and Python) The off-line part of the system hosts the bulk of the linguistic and statistical processing that creates document meta -data: ... set Paragraphs from these documents, which are from 50 to 150 pages long, were annotated with the types of financial transactions they are most related to Paragraphs that did not fall into a category...
  • 4
  • 404
  • 0
agile analytics a value-driven approach to business intelligence and data warehousing

agile analytics a value-driven approach to business intelligence and data warehousing

Đại cương

... Analysis Visualizaion Reports ETL OLAP Flat Files EII External Data Scorecards & Dashboards Data Marts Metadata Management Figure 1.1 Classical data warehouse architecture Data Mining 14 CHAPTER INTRODUCING ... reality the data model was a replication of parts of one of the legacy operational databases This replicated database did not include any data scrubbing and was wrapped in a significant amount of ... operational data store or an enterprise information integration (EII) repository that acts as a system of record for all relevant operational data The integration database is typically based on a...
  • 366
  • 1,106
  • 1
báo cáo hóa học:

báo cáo hóa học: "Investigating the missing data mechanism in quality of life outcomes: a comparison of approaches" pdf

Hóa học - Dầu khí

... MAR MAR - MAR MAR - MAR MAR MAR MAR KAT Baseline months 12 months 24 months MCAR MAR MAR - MCAR MAR MAR - MAR MAR MAR MAR MAR MAR PRISM Baseline 12 months 24 months MAR MAR - MAR MAR - MAR MAR ... analysis method Complete-case analysis (excluding patients who have incomplete data) will only be unbiased (although not optimal) if the data are MCAR Under MAR, available case analysis such as ... study was the ability to makes use of reminder data to investigate the missing data mechanism Previous work has simulated missing data subject to a known mechanism whereas we have used real data to...
  • 10
  • 424
  • 0
DATA MINING IN BANKING AND FINANCE: A NOTE FOR BANKERS pdf

DATA MINING IN BANKING AND FINANCE: A NOTE FOR BANKERS pdf

Kế toán - Kiểm toán

... query languages; because human analysis breaks down with volume and dimensionality Traditional statistical methods not have the capacity and scale to analyse these data, and hence modern data mining ... management as well Foreign exchange Option Equities Custom Data Portfolio Data Company Data Global Data Warehouse & Data Marts Using Data Mining- Techniques for Credit Risk Market Risk Trading Portfolio ... credit and market risk present the central challenge, one can observe a major change in the area of how to measure and deal with them, based on the advent of advanced database and data mining...
  • 15
  • 556
  • 0
Báo cáo hóa học:

Báo cáo hóa học: "Research Article A Rules-Based Approach for Configuring Chains of Classifiers in Real-Time Stream Mining Systems Brian Foo and Mihaela van der Schaar" pot

Hóa học - Dầu khí

... necessary to have a model of EURASIP Journal on Advances in Signal Processing stream dynamics Stream dynamics a ect the APP of positive data arriving at each classifier, which in turn a ects each ... binary classifier partitions input data objects into two classes, a “yes” class H and a “no” class H A binary classifier chain is a special case of a binary classifier tree, where multiple binary ... observable data By exchanging these locally obtained parameters and configurations across all classifiers, each classifier can then estimate the overall stream processing utility Table summarizes...
  • 17
  • 416
  • 0
Báo cáo hóa học:

Báo cáo hóa học: " Research Article Statistical Analysis of Hyper-Spectral Data: A Non-Gaussian Approach" pot

Báo cáo khoa học

... a measurement campaign held in Italy in 2002 The aim of the campaign was to collect data to support the development and the analysis of classification and detection algorithms The data were gathered ... the statistical behavior of each background class in real hyper-spectral images has been investigated The availability of statistical models that properly describe hyper-spectral data variability ... covariance matrix, and A is a scalar nonnegative random variable with unit squared mean value The two variables Z and A are statistically independent According to (4), the p.d.f of X is strictly related...
  • 10
  • 332
  • 0
a data-driven fuzzy rule-based approach for studentacademic performance evaluation

a data-driven fuzzy rule-based approach for studentacademic performance evaluation

Tin học

... SAP-1 for classification of student performance, the SAP-2 dataset is used Data Driven FRBS—Steps Subsethood-Based Rule Generation Algorithm (SBA) handle classification problems classify training ... (SBA) (2) Fuzzy rules dependent on the fuzzy subsethood values and a prespecified threshold value α ∈ [0, 1] Any variables that have a subsethood value that is greater than or equal to α will automatically ... Outline Basic concepts of academic performance evaluation Basic concepts of Fuzzy Rule-Based System demonstration Data Driven FRBS Subsethood-Based Rule Generation Algorithm (SBA) Weighted...
  • 40
  • 264
  • 0
Báo cáo khoa hoc:

Báo cáo khoa hoc:" A quasi-score approach to the analysis of ordered categorical data via a mixed heteroskedastic threshold model" pdf

Báo cáo khoa học

... Marginal likelihood and Bayesian approaches to the analysis of heterogeneous residual variances in mixed linear Gaussian models, Comput Stat Data Anal 13 (1992) 291-305 [15] Foulley J.L., Quaas ... to be a natural alternative to the MAP approach proposed by Foulley and Gianola !8! The main advantage of the MAP approach lies in both its conceptual and computational simplicity Part of this ... variance-covariance matrix of data by a Taylor expansion about small intra-class correlations Moreover, as pointed out by Knuiman and Laird !27!, u solutions to equation (26) have no clear justification An...
  • 18
  • 297
  • 0
Báo cáo y học:

Báo cáo y học: "A blind deconvolution approach to high-resolution mapping of transcription factor binding sites from ChIP-seq data" pps

Báo cáo khoa học

... were aligned as described in [3] GABP ChIP-seq dataset ChIP-seq data for the GABP transcription factor in humans was obtained from Valouev and associates [4] This dataset contains 7,862,231 aligned ... Valencia, CA, USA) The DNA was blunted, and adapters were ligated to each end to facilitate Solexa sequencing PCR was then used specifically to enrich for DNA fragments with adapter molecules ligated ... Initial peak shape estimation We make an initial estimate of the shape of an enrichment profile as follows We aim to select a pulse that is strong (of large amplitude) and narrow, such as to...
  • 12
  • 313
  • 0
báo cáo khoa học:

báo cáo khoa học: " Development of a novel data mining tool to find cis-elements in rice gene promoter regions" pdf

Báo cáo khoa học

... TGACAGGT CCAC [AC ]A [ACGT] [AC] [ACGT] [CT] [AC] GG [ACGT]CCCAC GTGG [ACGT]CCC CAACA [ACGT]*CACCTG A [TC]G [AT ]A [CT]CT AATATATTT TGTCTC TGACGTGG CCA [ACGT]TG CACCC CC [AT]{6}GG AATAAA [CT]AAA ... Kawai J, Nakamura M, Hirozane-Kishikawa T, Kanagawa S, Arakawa T, Takahashi-Iida J, Murata M, Ninomiya N, Sasaki D, Fukuda S, Tagami M, Yamagata H, Kurita K, Kamiya K, Yamamoto M, Kikuta A, Bito ... 6.231 PRHA BS in PAL1*3 PRHA BS in PAL1 PRHA BS in PAL1 PRHA BS in PAL1 - ACACAC ATACACA ATACACAC TACACAC CATGTCTC GTGTCTC TGTCTCCG TGTCTCTG *1 The number of TU possessing the designated motif...
  • 10
  • 397
  • 0
Báo cáo sinh học:

Báo cáo sinh học: " A linear programming approach for estimating the structure of a sparse linear genetic network from transcript profiling data" pdf

Báo cáo khoa học

... is regulated often by a small number of other genes [3,4] so a reasonable representation of a network is a sparse graph A sparse graph is a graph  parametrized by a sparse matrix W, a matrix with ... statistically using gold standard data and evaluation metrics from the Dialogue for Reverse Engineering Assessments and Methods (DREAM) initiative [36] The LP-based approach compares favourably with algorithms ... profiling data sets used to estimate LPSLGNs are an N = 605 × T = 15 matrix (ALPHA) and an N = 605 × T = 18 matrix (CDC15) Training data for regression analysis N A training set for regression analysis,...
  • 15
  • 392
  • 0
a system for managing experiments in data mining

a system for managing experiments in data mining

Quản trị mạng

... forward and can be easily understood The main entities identified are rawdata, ruledata, testdata, experimentdata Raw data contains all information about the data and attributes of the dataset ... performing data mining tasks and making predictive analysis, but this analysis is made in a single data mining task In reality, many data mining tasks are performed on a single data set, when there are ... use many datasets, and we might perform many experiments on the same dataset It is necessary to manage the datasets accordingly with respect to the raw data, learned data, test data etc Management...
  • 64
  • 319
  • 0
A model driven approach to imbalanced data learning

A model driven approach to imbalanced data learning

Cao đẳng - Đại học

... training data and testing data are balanced The assumption of total accuracy gives equal weight to each class in the data However, neither assumption is valid anymore in imbalanced data In an ... for addressing the imbalanced data problems Data level approaches alter the training data distributions by various data sampling techniques Algorithm level approaches include learning rare class ... majorities In this thesis, we review the existing approaches to deal with the imbalanced data problem, including data level approaches and algorithm level approaches Most data sampling approaches...
  • 209
  • 468
  • 0

Xem thêm