Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 40 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
40
Dung lượng
1,18 MB
Nội dung
Contents
Overview 1
Introducing DataMining 2
Training a DataMining Model 12
Building a DataMining Model with
OLAP Data 13
Browsing the Dependency Network 23
Lab A: Creating a Decision Tree with
Relational Data 27
Review 32
Module 17:Introduction
to DataMining
BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY
Information in this document is subject to change without notice. The names of companies,
products, people, characters, and/or data mentioned herein are fictitious and are in no way intended
to represent any real individual, company, product, or event, unless otherwise noted. Complying
with all applicable copyright laws is the responsibility of the user. No part of this document may
be reproduced or transmitted in any form or by any means, electronic or mechanical, for any
purpose, without the express written permission of Microsoft Corporation. If, however, your only
means of access is electronic, permission to print one copy is hereby granted.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual
property rights covering subject matter in this document. Except as expressly provided in any
written license agreement from Microsoft, the furnishing of this document does not give you any
license to these patents, trademarks, copyrights, or other intellectual property.
2000 Microsoft Corporation. All rights reserved.
Microsoft, BackOffice, MS-DOS, Windows, Windows NT, <plus other appropriate product
names or titles. Replace this example list with list of trademarks provided by copy editor.
Microsoft is listed first, followed by all other Microsoft trademarks in alphabetical order. > are
either registered trademarks or trademarks of Microsoft Corporation in the U.S.A. and/or other
countries.
<This is where mention of specific, contractually obligated to, third party trademarks, which are
added by the Copy Editor>
The names of companies, products, people, characters, and/or data mentioned herein are fictitious
and are in no way intended to represent any real individual, company, product, or event, unless
otherwise noted.
Other product and company names mentioned herein may be the trademarks of their respective
owners.
Module17:IntroductiontoDataMining iii
BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY
Instructor Notes
This module introduces students todatamining and explains how to build and
browse datamining models by using Microsoft
®
SQL Server
™
2000 Analysis
Services. Students will learn fundamental datamining terminology, concepts,
techniques, and algorithms.
This is an overview module that focuses on the use of built-in Analysis
Manager wizards. It is not intended to provide in-depth knowledge of data
mining.
After completing this module, students will be able to:
!
Describe datamining characteristics, applications, and modeling techniques.
!
Describe the process of training a model.
!
Use the online analytical processing (OLAP) Mining Model Wizard to edit,
process, and explore the decision trees.
!
Analyze relational data relationships in the dependency network browser.
!
Describe the steps required to build a clustering model by using OLAP data.
Materials and Preparation
This section lists the required materials and preparation tasks that you need to
teach this module.
Required Materials
To teach this module, you need Microsoft PowerPoint
®
file 2074A_17.ppt.
Preparation Tasks
To prepare for this module, you should:
!
Read all the materials for this module.
!
Read the instructor notes and margin notes.
!
Practice combining the lecture with the demonstrations.
!
Complete the lab.
!
Review the Trainer Preparation presentation for this module on the Trainer
Materials compact disc.
!
Review any relevant white papers that are located on the Trainer Materials
compact disc.
Presentation:
40 Minutes
Lab:
20 Minutes
iv Module17:IntroductiontoDataMining
BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY
Demonstration: Determining Why Students Attend College
The following demonstration procedures provide information that will not fit
in the margin notes or is not appropriate for student notes.
In this demonstration, you will create a datamining model by using a decision
tree with relational data. Specifically, you will create a decision tree that
determines why students attend college.
You will create a new OLAP database with a data source connecting to the
Module 17 relational database.
!
To create an OLAP database
1. In Analysis Manager, expand the Analysis Servers folder, right-click your
local server, and then click New Database.
2. Enter Module 17 as the database name, and then click OK.
3. Expand the Module 17 database, right-click the Data Sources folder, and
then click New Data Source.
4. On the Provider tab of the Data Link Properties dialog box, click
Microsoft OLE DB Provider for SQL Server. Click Next.
5. Type localhost in Step 1.
6. In Step 2, click Use Windows NT Integrated security.
7. In Step 3, click Module 17 from the list of databases. Click OK.
!
To create the datamining model
In this procedure, you will create the datamining model by selecting source,
case table, datamining technique, and key column.
1. In the Module 17 database, right-click the Mining Models folder, and then
click New Mining Model.
2. At the welcome page, click Next.
3. From the Select source type step of the Mining Model Wizard, click
Relational data, and then click Next.
Point out that either relational tables or OLAP cubes can be used as source
data. For this model, you are accessing relational data.
4. From the Select case tables step, in the Available tables list, click College
Plans, and then click Next.
5. From the Select datamining technique step, in the Technique list, click
Microsoft Decision Trees, and then click Next.
Two algorithms ship with Analysis Services: Microsoft Decision Trees and
Microsoft Clustering. Use the Decision Trees algorithm for this
demonstration.
6. From the Select the key column step, in the Case key column list, click
StudentID, and then click Next.
Demonstration:
10 Minutes
Module17:IntroductiontoDataMining v
BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY
!
To select input and predictable columns for the mining model
1. From the Select input and predictable columns step of the Mining Model
Wizard, in the Available columns list, click CollegePlans at the bottom of
the column list.
2. Click the top arrow (>) to choose CollegePlans as a predictable column.
3. In the Available columns list, click Gender, and then click the bottom
arrow (>) to choose that column as an input column.
4. In the Available columns list, click ParentIncome, and then click the
bottom arrow (>) to choose that column as an input column.
5. In the Available columns list, click IQ, and then click the bottom arrow (>)
to select that column as an input column.
6. In the Available columns list, click ParentEncouragement, and then click
the bottom arrow (>) to select that column as an input column. Click Next.
!
To finish the Mining Model Wizard
In this procedure, you name the model, initiate processing and then close the
wizard.
1. From the Finish the mining model wizard step, in the Model name box,
type CollegePlans.
2. Click Finish to create and process the model.
3. When the model has completed processing, click Close to close the Process
dialog box.
!
To explore data in the decision tree
1. In the Relational Mining Model Editor, click the Content tab.
2. In the Content Detail pane, click the All node.
View the Totals tab of the Attributes pane, and point out that more than 67
percent of the students interviewed do not plan to attend college.
3. Click the Parent Encouragement = Encouraged node.
Point out to the students that parental encouragement is the most dominant
attribute in this model. More than 57 percent of students that are encouraged
by their parents plan to attend college.
4. Click Parent Encouragement = Not Encouraged.
Fewer than 7 percent of students who are not encouraged by their parents
plan to attend college.
5. Close the Relational Mining Model Editor.
vi Module17:IntroductiontoDataMining
BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY
Module Strategy
Use the following strategy to present this module:
The structure of this module is multiple demonstrations showing students how
to build and browse various types of datamining models. Except for the first
example about students attending college, the demonstrations are documented
directly in the student manual. Integrate your lecture with live demonstration
following the procedures included in the student notes. Encourage students to
follow along with your demonstrations on their computers. Some students may
choose to watch your demonstrations only, which is also acceptable.
!
Introducing DataMining
The case study introduces students todata mining. Datamining may be new
to many students and should be described in very simple terms highlighting
the business application and uses. Emphasize to students why this
technology is useful and complementary to the other forms of analysis they
have been exposed to. Then describe the various datamining techniques that
are available.
!
Training a DataMining Model
Describe the process required to create a datamining model. Define training
data and cases.
!
Building a DataMining Model with OLAP Data
Introduce students to the membership card scenario. Use the membership
card scenario to step students through the process of building a datamining
model with OLAP data by using the Mining Model Wizard. Describe each
step in the process—selecting the datamining technique, selecting the case,
selecting the training data, creating a dimension and virtual cube, and
browsing the datamining model.
!
Browsing the Dependency Network
Demonstrate how to browse the dependency network. Explain that the
Dependency Network Browser can be used to view all the relationships in
your model.
Module17:IntroductiontoDataMining 1
BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY
Overview
!
Introducing Data Mining
!
Training a DataMining Model
!
Building a DataMining Model with OLAP Data
!
Browsing the Dependency Network
This module provides you with an introductionto Microsoft
®
SQL Server
™
2000 Analysis Services Data Mining.
The objective of the module is to introduce you to both datamining principles
and applications while exploring the Analysis Services wizard-driven interface
for creating datamining models.
After completing this module, you will be able to:
!
Describe datamining characteristics, applications, and modeling techniques.
!
Describe the process of training a model.
!
Use the online analytical processing (OLAP) Mining Model Wizard to edit,
process, and explore the decision trees.
!
Analyze relational data relationships in the dependency network browser.
!
Describe the steps required to build a clustering model by using OLAP data.
Topic Objective
To provide an overview of
the module topics and
objectives.
Lead-in
In this module, you will learn
about data mining, how data
mining can be used to
address business
application requirements,
and how to create data
mining models by using the
Analysis Manager.
2 Module17:IntroductiontoDataMining
BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY
#
##
#
Introducing DataMining
!
Defining Data Mining
!
Data Mining Applications
!
Data Mining Models
!
Introductory Example
!
Exploring the Decision Tree
This section introduces datamining concepts, including:
!
Defining data mining.
!
Discussing how datamining can be applied to solve common business
applications.
!
Describing what datamining models are available.
!
Presenting a simple example of how datamining can be used.
!
Exploring the decision tree.
Topic Objective
To introduce the concept of
data mining.
Lead-in
In this section, you will be
introduced to a simple case
study example. In that
example, datamining will be
defined, common
applications and techniques
discussed, and its role in the
data warehouse explored.
Module17:IntroductiontoDataMining 3
BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY
Defining DataMining
!
Is The Process of Deducing Meaningful Patterns and
Rules from Large Quantities of Data
!
Searches for Patterns in Data Rather than Answering
Predefined Questions
!
Is Used To:
$
Provide historical insights
$
Predict future values or outcomes
$
Close the loop for analysis
In many organizations, data volumes are so large that it is difficult, even for the
most seasoned analyst, to identify the key information most relevant to
managing the business.
Data mining is the automatic or semi-automatic process of deducing meaningful
patterns and rules from large quantities of data. These patterns provide valuable
insights to business managers and offer information that may be overlooked by
more traditional manual methods of analysis.
Data mining programs search for patterns in data rather than answer predefined
questions. Because of this, they can be used for knowledge discovery in
addition to hypothesis testing.
Data mining is used to:
!
Provide insight into historical data.
!
Predict future values or outcomes based on historical patterns.
!
Close the analysis loop by taking action based on the information derived
from the analysis.
Topic Objective
To provide a definition of
data mining.
Lead-in
Data mining provides a
means by which the system
deduces knowledge from
the data by identifying
correlations and other
patterns in the data.
4 Module17:IntroductiontoDataMining
BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY
Data Mining Applications
!
Advertising on the Internet
$
“What banner will I display to this visitor?”
$
“What other products is this customer likely to buy?
!
Detecting Fraud
$
“Is this insurance claim a fraud?”
!
Pricing Insurance
$
“How much of a discount will I offer to this customer?”
!
Managing Credit Risk
$
“Will I approve the loan for this customer?”
Data mining techniques are used in a variety of applications. This section
provides some interesting examples.
Advertising on the Internet
You can use dataminingto classify groups of customers with similar
information into segments for targeting advertising or special offers.
Following are two Internet customer examples:
!
An e-commerce Web site sells sporting equipment. When a customer
registers, a database management system collects information about the
customer, such as gender, marital status, favorite sport, and age.
By using datamining techniques, the Web site displays a masculine banner
ad with a golfing motif for the male, golf-loving, 40-year-old who returns to
the Web site after registering.
!
When you purchase merchandise on the Internet, you are sometimes offered
additional merchandise that the Web site predicts you might be interested
in—for example, a book similar to the one you are currently purchasing.
Such recommendations are based on datamining techniques that search out
purchase patterns of customers who purchased the same book you are now
buying. The system recommends: “If you like xyz books, check out the
additional books below.”
Detecting Fraud
You can use a datamining system to identify characteristics of suspicious
insurance claims by analyzing characteristics of legitimate and fraudulent
claims. For example, specific types of injuries that are difficult to diagnose,
such as neck and back injuries, may be more likely candidates for a fraudulent
claim.
Topic Objective
To identify different
applications for data mining.
Lead-in
Data mining is used for a
variety of different
applications. We are now
going to talk about some
common uses.
Delivery Tips
Incorporate your own
examples of how data
mining is used to solve
business problems. Ask
students for examples from
their businesses.
Point out that datamining is
no longer an art used by just
PhDs. This technology is
available and useful to a
variety of businesses.
[...]... PURPOSES ONLY 12 Module17:IntroductiontoDataMining Training a DataMining Model Topic Objective To explain the methodology for creating a mining model and to define terminology Training DataMining Model DataTo Predict Lead-in When creating a datamining model, you need a training data set This is typically historical data where the attributes to be predicted are known DM Engine Mining Model Delivery.. .Module 17:IntroductiontoDataMining 5 Pricing Insurance In the insurance industry, you use datamining techniques to analyze historical data such as age, marital status, gender, and driving history All these factors play a role in predicting the likelihood of a specific driver for getting into an automobile accident Datamining techniques help you to weigh and factor these data points into pricing... TRAINER PREPARATION PURPOSES ONLY Module17:IntroductiontoDataMining 13 # Building a DataMining Model with OLAP Data Topic Objective To describe the steps used to build a datamining model with OLAP data Lead-in These are a variety of steps involved in building a datamining model with OLAP data ! Introducing the Membership Card Scenario ! Selecting the DataMining Technique ! Selecting the Case... segmenting the data based on various attributes you collect To answer the question, you can spend several hours exploring the data manually, or you can use dataminingto explore the data automatically BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY Module17:IntroductiontoDataMining 9 Demonstration: Determining Why Students Attend College Topic Objective To demonstrate how to create... build slide to explain how Analysis Server evaluates training datato build a datamining model, and then uses the model to predict future outcomes based on new data sets DM Engine Predicted DataTo create a model, you must assemble a set of data where the attributes to be predicted are known Such a data set is called the training data During the training process, data is inserted into the datamining model... PURPOSES ONLY 16 Module17:IntroductiontoDataMining Selecting the DataMining Technique Topic Objective To demonstrate how to select the datamining technique by using the Wizard Lead-in Microsoft offers two datamining techniques: Microsoft Decision Trees and Microsoft Clustering You select decision trees in this case because it is a good technique for prediction There are varieties of datamining techniques... MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY Module17:IntroductiontoDataMining 21 Browsing the DataMining Model Topic Objective Content Navigator To demonstrate how to browse the results in a mining model Content Detail Lead-in The OLAP Mining Model Editor can be used to edit properties in your model or browse the results Attributes Node Path To finish creating the model, you must name, save,... standing By using datamining techniques applied to historical loan application information, the bank can predict whether you are a good or bad credit risk and can use this information when deciding on loan approval BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY 6 Module17:IntroductiontoDataMiningDataMining Models Topic Objective To describe different datamining models... ONLY 22 Module17:IntroductiontoDataMining Browsing the Decision Tree Once the decision tree is created and processed, you can examine the results by using the browser Returning to the membership card example, the task now is to analyze which customers are likely to purchase Golden cards BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY Module17:IntroductiontoData Mining. .. using the editor Note Although only one predicted entity can be selected by using the wizard, additional entities may be added by using the DataMining Editor The Mining Model Wizard prompts you to select the predicted entity for this model BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY Module17:IntroductiontoDataMining 19 Selecting Training Data Topic Objective To review . training data to
build a data mining model,
and then uses the model to
predict future outcomes
based on new data sets.
Module 17: Introduction to Data Mining.
!
Defining Data Mining
!
Data Mining Applications
!
Data Mining Models
!
Introductory Example
!
Exploring the Decision Tree
This section introduces data mining