... test_ct = 5e6;
4 double data[ ] = { 30, 86,
5 24, 38 };
6 apop _data
*
testdata = apop_line_to _data( data,0,2,2);
7 for (i = 0; i< test_ct; i++)
8 apop_test_fisher_exact(testdata);
9 }
Listing 1.5 ... times. Online source:
.
1 test_ct <− 5e6
2 data <− c( 30, 86,
3 24, 38 )
4 testdata<− matrix (data, nrow=2)
5 for (i in 1:test_ct){
6 fisher.test(testdata)
7 }
Listing 1.6 R code to do the same ... like the debugger and profiler
• Methods for reliability testing functions and making them more robust
• Databases, and how to get them to produce data in the format you need
• Talking to external...
... that, on a technical level, the datamining effort is working and
the data is reasonably accurate. This can be quite comforting. If the data and
the dataminingtechniques applied to it are powerful ... resolve these issues. Datamining can help make more informed
decisions. It can suggest tests to make. Ultimately, though, the business needs
What Is Data Mining?
Data mining, as we use the ... of techniques to
apply in a particular situation depends on the nature of the datamining task,
the nature of the available data, and the skills and preferences of the data
miner.
Data mining...
... level
data, 96
publications
Building the Data Warehouse (Bill
Inmon), 474
Business Modeling and DataMining
(Dorian Pyle), 60
Data Preparation forDataMining
(Dorian Pyle), 75
The Data ...
discussed, 7
Data Preparation forDataMining
(Dorian Pyle), 75
The Data Warehouse Toolkit (Ralph
Kimball), 474
data warehousing
customer patterns, 5
for decision support, 13
discussed, 4
database ...
Business Modeling and Data Mining, 60
Data Preparation forData Mining, 75
470643 bindex.qxd 3/8/04 11:08 AM Page 619
C
Index 619
calculations, probabilities, 133–135
call detail databases, 37
call-center...
... analyzing data
on the information.
can provide value.
into actionable information
using datamining techniques.
Identify
Transform data
1 2 3 4 5 6 7 8 9 10
Measure the results
of the efforts ... of datamining in practice. Figure 2.1
shows the four stages:
1. Identifying the business problem.
2. Miningdata to transform the data into actionable information.
3. Acting on the information. ... the dataminingtechniques discussed in this book are suitable for
use in prediction so long as training data is available in the proper form. The
470643 c02.qxd 3/8/04 11:09 AM Page 21
of Data...
...
before. The newly discovered relationships suggest new hypotheses to test
and the datamining process begins all over again.
Lessons Learned
Data mining comes in two forms. Directed datamining ... California based on data that excludes calls to Los Angeles.
Step Six: Transform Data to Bring
Information to the Surface
Once the data has been assembled and major data problems fixed, the data ... c04.qxd 3/8/04 11:10 AM Page 97
Data Mining Applications 97
mining techniques used to generate the scores. It is worth noting, however,
that many of the dataminingtechniques in this book can...
... which messages are most appropriate for each one.
Even a customer with low scores for every offer has higher scores for some
then others. In Mastering DataMining (Wiley, 1999), we describe how ... 11:10 AM Page 109
Data Mining Applications 109
Start Tracking Customers before
They Become Customers
It is a good idea to start recording information about prospects even before
they become ... for other reasons as well. For instance, it is one way of
taking several variables and converting them to similar ranges. This can be
useful for several datamining techniques, such as clustering...
... in several areas:
■■ Data miners tend to ignore measurement error in raw data.
■■ Data miners assume that there is more than enough data and process-
ing power.
■■ Datamining assumes dependency ... make the same classification, although each leaf makes
that classificationfor a different reason. For example, in a tree that classifies
fruits and vegetables by color, the leaves for apple, ... 11:11 AM Page 159
The Lure of Statistics: DataMining Using Familiar Tools 159
statisticians use similar techniques to solve similar problems, the datamining
approach differs from the standard...
... Networks 219
Neural Networks for Directed DataMining
The previous example illustrates the most common use of neural networks:
building a model forclassification or prediction. The steps in this ... test set to see how well it performs.
7. Apply the model generated by the network to predict outcomes for
unknown inputs.
Fortunately, datamining software now performs most of these steps auto-
matically. ...
children variable might be mapped as follows: 0 (for 0 children), 0.5 (for one
child), 0.75 (for two children), 0.875 (for three children), and so on. For cate-
gorical variables, it is often easier...
... not appropri-
ate for all types of problems. It is not a prediction tool or classification tool like
a neural network that takes data in and produces an answer. Many types of
data are simply ... applied to data. These patterns can be turned into new features of the data,
for use in conjunction with other directed datamining techniques.
470643 c11.qxd 3/8/04 11:17 AM Page 355
Automatic Cluster ... X. Clearly, they must all be converted to a
common scale before distances will make any sense.
Unfortunately, in commercial datamining there is usually no common scale
available because the...
... calculation for these customers, paying par-
ticular attention to the role of censoring. When looking at customer datafor
hazard calculations, both the tenure and the censoring flag are needed. For ... the data speak instead of
finding a special function to speak for it. Empirical hazard probabilities simply
let the historical data determine what is likely to happen, without trying to fit
data ... cus-
tomer databases often contain data on millions of customers and former
customers. Much of the statistical background of survival analysis is focused
on extracting every last bit of information...
...
Choosing a DataMining Technique
The choice of which datamining technique or techniques to apply depends on
the particular datamining task to be accomplished and on the data available
for analysis. ... to build the datamining team and secure sponsorship for a data
mining pilot.
The successful efforts crossed corporate boundaries to involve people from
both marketing and information technology. ...
understood the data, people who understood the datamining techniques, peo-
ple who understood the business problem to be addressed, and at least one
person with experience applyingdatamining to...