...
demonstrates that, on a technical level, the datamining effort is working and
the data is reasonably accurate. This can be quite comforting. If the dataand
the dataminingtechniques applied to ... combination of techniquesto
apply in a particular situation depends on the nature of the datamining task,
the nature of the available data, and the skills and preferences of the data
miner.
Data ... the charge of banks into data mining, but
other divisions are not far behind. At Wachovia, a large North Carolina-based
bank, dataminingtechniques are used to predict which customers are likely...
... has an equal number of training data items. After the moving phase and before
the load balancing phase starts, each processor has training data item count varying from 0
to Each processor can ... Mathematics from
Harvard University anda Ph.D. in Mathematics from Princeton University.
PARALLEL FORMULATIONS 261
Mehta, M., Agrawal, R., and Rissaneh, J. 1996. SLIQ: A fast scalable classifier ... details.
5HIHUHQFHV
Agrawal, R., Imielinski, T., and Swami, A. 1993. Database mining: A performance perspective. IEEE Transactions
Alsabti, K., Ranka, S., and Singh, V. 1997. A one
-
pass algorithm...
... subclass hierarchies,
property inheritance, and methods and procedures.
Temporal Databases, Sequence Databases, and
Time-Series Databases
A temporal database typically stores relational data that ... fiscal years, academic years, or calendar years. Years may be further decomposed into
quarters or months.
Spatial Databases and Spatiotemporal Databases
Spatial databases contain spatial-related ... data
repository, as well as to transient data, such as data streams. Thus the scope of our
examination of data repositories will include relational databases, data warehouses,
transactional databases,...
... cleaning on future versions of the
same data store.
2.4
Data Integration and Transformation
Data mining often requires data integration—the merging of datafrom multiple data
stores. The data may ... you may already have regarding properties of the data. Such knowledge or data
about data is referred to as metadata. For example, what are the domain anddata type of
each attribute? What are ... inconsistent. Data preprocessing
includes data cleaning, data integration, data transformation, anddata reduction.
Descriptive data summarization provides the analytical foundation for data pre-
processing....
... transaction databases, relational databases,spatial databases,
text databases, time-series databases, flat files, data warehouses, and so on.
On-line analytical mining (OLAM) (also called OLAP mining) ... Metadata Repository
Metadata are data about data. When used in adata warehouse, metadata are the data that
define warehouse objects. Figure 3.12 showed a metadata repository within the bottom
tier ... datamining technology.
3.5.1 Data Warehouse Usage
Data warehouses anddata marts are used in a wide range of applications. Business
executives use the data in data warehouses anddata marts to...
... rules are
commonly mined from transactional data.
Suppose, however, that rather than using a transactional database, sales and related
information are stored in a relational database or data warehouse. ... database froma relatively low conceptual level to higher conceptual levels. Data
generalization approaches include data cube–based data aggregation and attribute-
oriented induction.
From adata ... , a
100
}, it has to generate at least 2
100
−1 ≈10
30
candidates in total.
It may need to repeatedly scan the database and check a large set of candidates by pattern
matching. It is costly to...
... due to swapping of the training tuples in
and out of main and cache memories. More scalable approaches, capable of handling
training data that are too large to fit in memory, are required. Earlier ... Chapter 6 Classification and Prediction
analysis to help guess whether a customer with a given profile will buy a new computer.
A medical researcher wants to analyze breast cancer data in order to ... a prede-
fined class as determined by another database attribute called the class label attribute.
The class label attribute is discrete-valued and unordered. It is categorical in that each
value...
... typi-
cally assume that the data are memory resident a limitation todatamining on large
databases. Several scalable algorithms, such as SLIQ, SPRINT, and RainForest, have
been proposed to address ... preparation for classification and prediction can involve
data cleaning to reduce noise or handle missing values, relevance analysis to remove
irrelevant or redundant attributes, anddata transformation, ... neighbors and restricts the search to subgraphs that are smaller
than the original graph. While CLARA draws a sample of nodes at the beginning of a
search, CLARANS dynamically draws a random sample...
... concepts andtechniques of data mining. The techniques
studied, however, were for simple and structured data sets, such as data in relational
databases, transactional databases, anddata warehouses. ... semantic information, such as time-series streams, spatiotemporal data
streams, and video and audio data streams.
8.2
Mining Time-Series Data
“What is a time-series database?” A time-series database ... evolving data streams was pro-
posed by Aggarwal, Han, Wang, and Yu [AHWY03]. A framework for projected cluster-
ing of high-dimensional data streams was proposed by Aggarwal, Han, Wang, and Yu
[AHWY0 4a] .
A...
... unstructured data, ranging from text and Web datato multimedia data-
bases. It is useful for data integration tasks in many domains. Metadata mining can
be used for schema mapping (where, say, the attribute ... to protein structures. In
chemistry, we can search for subgraphs representing chemical substructures.
9. Metadata mining. Metadata are data about data. Metadata provide semi-structured
data about ... data streams. The k-median-
based STREAM algorithm was proposed by Guha, Mishra, Motwani, and O’Callaghan
[GMMO00] and by O’Callaghan, Mishra, Meyerson, et al. [OMM
+
02]. Aggarwal, Han,
Wang, and...
... spatial and nonspatial data in support of spatial dataminingand spatial -data-
related decision-making processes.
Let’s look at the following example.
Example 10.5
Spatial data cube and spatial ... large multimedia databases,multimedia data cubes can be designed and
constructed in a manner similar to that for traditional data cubes from relational data.
A multimedia data cube can contain ... we can integrate
spatial datato construct adata warehouse that facilitates spatial data mining. A spatial
data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection
of...