... level, the datamining effort is working and
the data is reasonably accurate. This can be quite comforting. If the dataand
the dataminingtechniques applied to it are powerful enough to discover ... used to sort a list of customers from
most to least loyal or most to least likely to respond or most to least likely to
default on a loan.
The datamining process is sometimes referred to as ... of DataMining 33
Table 2.1 DataMining Differs from Typical Operational Business Processes
TYPICAL OPERATIONAL SYSTEM DATAMINING SYSTEM
Operations and reports on Analysis on historical data...
... Sutiwaraphun, J., To, H.W., and
Yang, D. Large scale data mining: Challenges and responses. Proc. of the Third Int’l Conference on Knowledge
Discovery andData Mining.
Goil, S., Alum, S., and Ranka, ...
performance and wide area datamining systems for over ten years. More recently, he has
worked on standards and testbeds for data mining. He has an AB in Mathematics from
Harvard University and a ... start
-
up, developing datamining technologies for
application to targeted email marketing. Prior to this, he was a researcher at Hitachi’s datamining research labs.
He did his B. Tech. from Indian...
... chapter.
Links todataminingdata sets and software. We will provide a set of links to data
miningdata sets and sites containing interesting datamining software pack-
ages, such as IlliMine from the ... Cattell and Douglas K. Barry
Data on the Web: From Relations to Semistructured Dataand XML
Serge Abiteboul, Peter Buneman, and Dan Suciu
Data Mining: Practical Machine Learning Tools andTechniques ... Reference Data in Enterprise Databases: Binding Corporate Datato the Wider World
Malcolm Chisholm
Data Mining: Concepts and Techniques
Jiawei Han and Micheline Kamber
Understanding SQL and Java Together:...
... and Transformation
Data mining often requires data integration—the merging of datafrom multiple data
stores. The data may also need to be transformed into forms appropriate for mining.
This section ... the data. These
tools rely on parsing and fuzzy matching techniques when cleaning datafrom multiple
sources. Data auditing tools find discrepancies by analyzing the datato discover rules
and ... discrepancy detection and
data transformation.
Data integration combines datafrom multiple sources to form a coherent data store.
Metadata, correlation analysis, data conflict detection, and the resolution...
... tools for data warehousing can be
categorized into access and retrieval tools, database reporting tools, data analysis tools, and
data mining tools.
Business users need to have the means to know ... Warehouse and OLAP Technology: An Overview
3.5
From Data Warehousing toData Mining
“How do data warehousing and OLAP relate todata mining? ” In this section, we study the
usage of data warehousing ... reasons:
High quality of data in data warehouses: Most datamining tools need to work
on integrated, consistent, and cleaned data, which requires costly data clean-
ing, data integration, anddata transformation...
... data
mining. Descriptive datamining describes data in a concise and summarative manner
and presents interesting general properties of the data. This is different from predic-
tive data mining, ... mined from transactional data.
Suppose, however, that rather than using a transactional database, sales and related
information are stored in a relational database or data warehouse. Such data stores ... levels. Data
generalization approaches include data cube–based data aggregation and attribute-
oriented induction.
From a data analysis point of view, data generalization is a form of descriptive data
mining. ...
... resorting. SPRINT was designed to be easily parallelized, further
contributing to its scalability.
While both SLIQandSPRINThandle disk-resident data sets thatare too large to fit into
memory, the scalabilityof ... becomes inefficient due to swapping of the training tuples in
and out of main and cache memories. More scalable approaches, capable of handling
training data that are too large to fit in memory, are ... initialized to small random num-
bers (e.g., ranging from −1.0 to 1.0, or −0.5 to 0.5). Each unit has a bias associated with
it, as explained below. The biases are similarly initialized to small random...
... functions
(Hanson and Burr [HB88]), dynamic adjustment of the network topology (Me´zard
and Nadal [MN89], Fahlman and Lebiere [FL90], Le Cun, Denker, and Solla [LDS90],
and Harp, Samad, and Guha [HSG90] ), and ... data in preparation for classification and prediction can involve
data cleaning to reduce noise or handle missing values, relevance analysis to remove
irrelevant or redundant attributes, anddata ... difficult to control.
Ability to deal with noisy data: Most real-world databases contain outliers or missing,
unknown, or erroneous data. Some clustering algorithms are sensitive to such data
and may...
... telecommu-
nications data, transaction datafrom the retail industry, anddatafrom electric power
grids. Traditional OLAP anddatamining methods typically require multiple scans of
the dataand are therefore ... simple and structured data sets, such as data in relational
databases, transactional databases, anddata warehouses. The growth of data in various
complex forms (e.g., semi-structured and unstructured, ... be extended to mine such
patterns efficiently.
8
Mining Stream, Time-Series,
and Sequence Data
Our previous chapters introduced the basic concepts andtechniques of data mining. The techniques
studied,...
... substructures.
9. Metadata mining. Metadata are data about data. Metadata provide semi-structured
data about unstructured data, ranging from text and Web datato multimedia data-
bases. It is useful for data ... what window size to use, and CpG islands tend to vary in length.
What if, instead, we merge the two Markov chains from above (for CpG islands and
non-CpG islands, respectively) and add transition ... domains. Metadata mining can
be used for schema mapping (where, say, the attribute customer
id from one database
is mapped to cust number from another database because they both refer to the
9.2...
... are closely linked to image
analysis and scientific data mining, and thus many image analysis techniquesand scien-
tific data analysis methods can be applied to image data mining.
The popular ... data, and computer
tomography. It is important to explore datamining inraster or image databases.Methods
for mining raster and image data are examined in the following section regarding the
mining ... multimedia datamining focuses on image data mining.
Mining text dataandmining the World Wide Web are studied in the two subsequent
638 Chapter 10 Mining Object, Spatial, Multimedia, Text, and Web Data
where...