...
quan trong
trong qui trình
KDD
Knowledge
1
2
3
4
5
Data cleaning
Data warehouse
Task relevant data
Data mining
Pattern Evaluation
selection
Data integration
Định nghĩa Kho Dữ Liệu (tt)
•
Theo ... Dữ liệu tổng hợp
65/12/2009
Biến thời gian
9
•
Data
•
Time
•
01/97
•
02/97
•
03/97
•
Data for January
•
Data for February
•
Data for March
•
Data
•
Warehouse
5/12/2009
Ổn Định
•
Là lưu trữ ... ra quyết
định có tính lãnh đạo của tổ chức, với các dữ liệu có mức độ
phức tạp và quan trọng
Data mining: khám phá, tìm kiếm dữ liệu cho các kiến thức
mới không dự biết trước
Một số thuật toán...
... trộn dữ liệu (merge
data) từ nhiều nguồn khác nhau vào một kho dữ liệu
Biến đổi dữ liệu (data transformation): chuẩn hoá dữ liệu
(data normalization)
Thu giảm dữ liệu (data reduction): thu ... liệu
Làm sạch dữ liệu (data cleaning/cleansing): loại bỏ nhiễu
(remove noise), hiệu chỉnh những phần dữ liệu không
nhất quán (correct data inconsistencies)
Tích hợp dữ liệu (data integration): ... tiền xử lý dữ liệu
Quá trình xử lý dữ liệu thô/gốc (raw/original
data) nhằm cải thiện chất lượng dữ liệu
(quality of the data) và do đó, cải thiện chất
lượng của kết quả khai phá.
Dữ liệu...
... Thống Kê, ĐH Kinh Tế TPHCM 30
Hình 5.9: Bảng Model
Model name: Tên mô hình
Use partition data: phân vùng dữ liệu
Mode. phương pháp được sử dụng để xây dựng mô hình.
General model: mô ... TPHCM 24
Hình 5.3: Bảng tùy chọn neural
Model:
Model name: Tên mô hình
Use partitioned data: Sử dụng dữ liệu phân vùng
Method: Phương pháp. Có sáu phương pháp để xây dựng mô hình mạng...
... of others)
Why the confusion?
The evil Multicollinearity!!
(correlated X’s)
Data Mining - What is it?
•
Large datasets
•
Fast methods
•
Not significance testing
•
Topics
–
Trees (recursive ...
Lift
3.3
1
Multiple testing
•
50 different BPs in data, m=49 ways to split
•
Multiply p-value by 49
•
Bonferroni – original idea
•
Kass – apply to datamining (trees)
•
Stop splitting if minimum p-value ... Death = 79.24 – 1.367(lifeline)
Would NOT be unusual if there is no true relationship .
Martian Height
Martian Weight
2 points no information on variation of errors
n points n-2 error DF
How...
... the large itemsets of the database.
Table 1: Transaction database
TID Items
100 ABCD
200 ABCDF
300 BCDE
400 ABCDF
500 ABEF
Hash-Based Approach to Data Mining
11
CHAPTER ... of data structure and algorithm, hash-method often used an array
structure to store database. If the database is too large, we can apply multi-level.
By this deed, we are able to access database ... approach to datamining focuses on the
hash-based method to improve performance of finding association rules in the
transaction databases and use the PHS (perfect hashing and data shrinking)...
... eld is being born,
called data engineering. One of the essential notions of data engineering is metadata. It is data about
data , i.e., a data description of other data. As an example we can ... The latter are
usually represented by texts (text messages) or, strings of numerals (data messages), i.e., by data in the
general sense introduced above.
Data whose origin is completely unknown ... ledge-
based
Ranking
Access to data
repositories
Literature Search
Hypothesis
Data
and e vidence
Data M ining
Data A nalysis
Experim ent
planning
Know ledge-
based
Ranking
Access to data
repositories
Literature...
... process.
REFERENCES
[1] Akeel Al-Attar, 1998, DataMining – Beyond Algorithms’, http://www.attar.com/tutor /mining. htm.
[2] Berry, J. A. Michael; Linoff, Gordon, 1997, DataMining Techniques: For Marketing, ... Analysis
(Consistent family of criteria)
Development of questionnaire
Survey
MUSA
Data Mining
Search Engines
Rule Induction Engine
Data Mining Global
Satisfaction Predicction
Satisfaction
Functions
Patterns ... New
Clusters
Separation of Data Set
(training and test set)
Filling the
empty cells
MUSAFinal Analysis
Is the Data Set
Complete?
Yes
No
Selection of complete
questionnaires
CUSTOMER SATISFACTION USING DATA MINING
TECHNIQUES
Nikolaos...
... actual mining due to their limited data capacity and
inability to handle certain types of operations needed in data preparation, data surveying,
and data modeling. For exploring small data sets, ... of data and the data set, and various ways of
structuring data in order to work with it. Problems that afflict the data and the data set (and
also the miner!) were introduced. All of this data, ... information is
crucial to data mining. It is the very substance enfolded within a data set for which the
data set is being mined. It is the reason to prepare the data set for mining to best expose...
... activities.
Data Issue: Representative Samples
A perennial problem is determining how much data is needed for modeling. One tenet of
data mining is “all of the data, all of the ... prepared, the next
step is to prepare data sets, which is to say, to consider the data as a whole.)
Data Set Issue: Reducing Width
Data sets for mining can be thought of as being ... considered alone.
Data Set /Data Survey Issue: Well- and Ill-Formed Manifolds
This is really the first data survey step as well as the last data preparation step. The data
survey, discussed...
... understand
the data.
Once the assay is completed, the miningdata set, or sets, can be assembled. Given
assembled data sets, much preparatory work still remains to be done before the data is ... access to data about
the whole population, it is necessary to deal with data that represents only some part of
the population. Such data is called a sample.
Even if the whole of the data ... merging separate data streams, it may well be that the time of data capture is
different from stream to stream. While this is partly a data access issue and is discussed
in Data Access Issues”...
... alphas, but also for conducting the data survey and for
addressing various problems and issues in data mining. Becoming comfortable with the
concept of data existing in state space yields insight ... the original data sample. Random
sampling does that. If the original data set represents a biased sample, that is evaluated
partly in the data assay (Chapter 4), again when the data set itself ... important metrics in
both statistical analysis and data mining.
It is this concept of “level of confidence” that allows sampling of data sets to be made. If
the miner decided to use...
... end of a line
of input.
2Overview
Oracle DataMining (ODM) embeds datamining within the Oracle
database. The data never leaves the database — the data, data preparation,
model building, and ... Export and Import
Data mining models can be moved between Oracle databases or schemas.
For example, datamining specialists may build and test datamining
models in a datamining lab. After ... import all datamining models as well as other database
objects
■
Run DBMS _DATA_ MINING. import_model to import datamining
models only, either all models or selected models
The Oracle Data Pump...
... the status of the
mining operations as they are executed.
1.2 Oracle9i DataMining Components
Oracle9i DataMining has two main components:
■
Oracle9i DataMining API
■
Data Mining Server (DMS)
1.2.1 ... Concepts 1-1
1
Basic ODM Concepts
Oracle9i DataMining (ODM) embeds datamining within the Oracle9i database.
The data never leaves the database — the data, data preparation, model building,
and ...
standards. Oracle9i DataMining will comply with the JDM standard when that
standard is published.
1.2.2 DataMining Server
The DataMining Server (DMS) is the server-side, in-database component...
... determining density just by looking at the number of points in a
given area, particularly if in some places the given volume only has one data point, or
even no data points, in it. If enough data ... cure! The data survey, in part, examines the manifold
carefully and should report the location and extent of any such areas in the data. At least
when modeling in such an area of the data, the ... position, and estimate the
distance from there to each of the nearest data points in each dimension. The mean
distance to neighboring data points serves as a surrogate measurement for density. For...