1. Trang chủ
  2. » Công Nghệ Thông Tin

Tài liệu Module 17: Introduction to Data Mining pptx

40 438 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 40
Dung lượng 1,18 MB

Nội dung

Contents Overview 1 Introducing Data Mining 2 Training a Data Mining Model 12 Building a Data Mining Model with OLAP Data 13 Browsing the Dependency Network 23 Lab A: Creating a Decision Tree with Relational Data 27 Review 32 Module 17: Introduction to Data Mining BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY Information in this document is subject to change without notice. The names of companies, products, people, characters, and/or data mentioned herein are fictitious and are in no way intended to represent any real individual, company, product, or event, unless otherwise noted. Complying with all applicable copyright laws is the responsibility of the user. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express written permission of Microsoft Corporation. If, however, your only means of access is electronic, permission to print one copy is hereby granted. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.  2000 Microsoft Corporation. All rights reserved. Microsoft, BackOffice, MS-DOS, Windows, Windows NT, <plus other appropriate product names or titles. Replace this example list with list of trademarks provided by copy editor. Microsoft is listed first, followed by all other Microsoft trademarks in alphabetical order. > are either registered trademarks or trademarks of Microsoft Corporation in the U.S.A. and/or other countries. <This is where mention of specific, contractually obligated to, third party trademarks, which are added by the Copy Editor> The names of companies, products, people, characters, and/or data mentioned herein are fictitious and are in no way intended to represent any real individual, company, product, or event, unless otherwise noted. Other product and company names mentioned herein may be the trademarks of their respective owners. Module 17: Introduction to Data Mining iii BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY Instructor Notes This module introduces students to data mining and explains how to build and browse data mining models by using Microsoft ® SQL Server ™ 2000 Analysis Services. Students will learn fundamental data mining terminology, concepts, techniques, and algorithms. This is an overview module that focuses on the use of built-in Analysis Manager wizards. It is not intended to provide in-depth knowledge of data mining. After completing this module, students will be able to: ! Describe data mining characteristics, applications, and modeling techniques. ! Describe the process of training a model. ! Use the online analytical processing (OLAP) Mining Model Wizard to edit, process, and explore the decision trees. ! Analyze relational data relationships in the dependency network browser. ! Describe the steps required to build a clustering model by using OLAP data. Materials and Preparation This section lists the required materials and preparation tasks that you need to teach this module. Required Materials To teach this module, you need Microsoft PowerPoint ® file 2074A_17.ppt. Preparation Tasks To prepare for this module, you should: ! Read all the materials for this module. ! Read the instructor notes and margin notes. ! Practice combining the lecture with the demonstrations. ! Complete the lab. ! Review the Trainer Preparation presentation for this module on the Trainer Materials compact disc. ! Review any relevant white papers that are located on the Trainer Materials compact disc. Presentation: 40 Minutes Lab: 20 Minutes iv Module 17: Introduction to Data Mining BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY Demonstration: Determining Why Students Attend College The following demonstration procedures provide information that will not fit in the margin notes or is not appropriate for student notes. In this demonstration, you will create a data mining model by using a decision tree with relational data. Specifically, you will create a decision tree that determines why students attend college. You will create a new OLAP database with a data source connecting to the Module 17 relational database. ! To create an OLAP database 1. In Analysis Manager, expand the Analysis Servers folder, right-click your local server, and then click New Database. 2. Enter Module 17 as the database name, and then click OK. 3. Expand the Module 17 database, right-click the Data Sources folder, and then click New Data Source. 4. On the Provider tab of the Data Link Properties dialog box, click Microsoft OLE DB Provider for SQL Server. Click Next. 5. Type localhost in Step 1. 6. In Step 2, click Use Windows NT Integrated security. 7. In Step 3, click Module 17 from the list of databases. Click OK. ! To create the data mining model In this procedure, you will create the data mining model by selecting source, case table, data mining technique, and key column. 1. In the Module 17 database, right-click the Mining Models folder, and then click New Mining Model. 2. At the welcome page, click Next. 3. From the Select source type step of the Mining Model Wizard, click Relational data, and then click Next. Point out that either relational tables or OLAP cubes can be used as source data. For this model, you are accessing relational data. 4. From the Select case tables step, in the Available tables list, click College Plans, and then click Next. 5. From the Select data mining technique step, in the Technique list, click Microsoft Decision Trees, and then click Next. Two algorithms ship with Analysis Services: Microsoft Decision Trees and Microsoft Clustering. Use the Decision Trees algorithm for this demonstration. 6. From the Select the key column step, in the Case key column list, click StudentID, and then click Next. Demonstration: 10 Minutes Module 17: Introduction to Data Mining v BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY ! To select input and predictable columns for the mining model 1. From the Select input and predictable columns step of the Mining Model Wizard, in the Available columns list, click CollegePlans at the bottom of the column list. 2. Click the top arrow (>) to choose CollegePlans as a predictable column. 3. In the Available columns list, click Gender, and then click the bottom arrow (>) to choose that column as an input column. 4. In the Available columns list, click ParentIncome, and then click the bottom arrow (>) to choose that column as an input column. 5. In the Available columns list, click IQ, and then click the bottom arrow (>) to select that column as an input column. 6. In the Available columns list, click ParentEncouragement, and then click the bottom arrow (>) to select that column as an input column. Click Next. ! To finish the Mining Model Wizard In this procedure, you name the model, initiate processing and then close the wizard. 1. From the Finish the mining model wizard step, in the Model name box, type CollegePlans. 2. Click Finish to create and process the model. 3. When the model has completed processing, click Close to close the Process dialog box. ! To explore data in the decision tree 1. In the Relational Mining Model Editor, click the Content tab. 2. In the Content Detail pane, click the All node. View the Totals tab of the Attributes pane, and point out that more than 67 percent of the students interviewed do not plan to attend college. 3. Click the Parent Encouragement = Encouraged node. Point out to the students that parental encouragement is the most dominant attribute in this model. More than 57 percent of students that are encouraged by their parents plan to attend college. 4. Click Parent Encouragement = Not Encouraged. Fewer than 7 percent of students who are not encouraged by their parents plan to attend college. 5. Close the Relational Mining Model Editor. vi Module 17: Introduction to Data Mining BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY Module Strategy Use the following strategy to present this module: The structure of this module is multiple demonstrations showing students how to build and browse various types of data mining models. Except for the first example about students attending college, the demonstrations are documented directly in the student manual. Integrate your lecture with live demonstration following the procedures included in the student notes. Encourage students to follow along with your demonstrations on their computers. Some students may choose to watch your demonstrations only, which is also acceptable. ! Introducing Data Mining The case study introduces students to data mining. Data mining may be new to many students and should be described in very simple terms highlighting the business application and uses. Emphasize to students why this technology is useful and complementary to the other forms of analysis they have been exposed to. Then describe the various data mining techniques that are available. ! Training a Data Mining Model Describe the process required to create a data mining model. Define training data and cases. ! Building a Data Mining Model with OLAP Data Introduce students to the membership card scenario. Use the membership card scenario to step students through the process of building a data mining model with OLAP data by using the Mining Model Wizard. Describe each step in the process—selecting the data mining technique, selecting the case, selecting the training data, creating a dimension and virtual cube, and browsing the data mining model. ! Browsing the Dependency Network Demonstrate how to browse the dependency network. Explain that the Dependency Network Browser can be used to view all the relationships in your model. Module 17: Introduction to Data Mining 1 BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY Overview ! Introducing Data Mining ! Training a Data Mining Model ! Building a Data Mining Model with OLAP Data ! Browsing the Dependency Network This module provides you with an introduction to Microsoft ® SQL Server ™ 2000 Analysis Services Data Mining. The objective of the module is to introduce you to both data mining principles and applications while exploring the Analysis Services wizard-driven interface for creating data mining models. After completing this module, you will be able to: ! Describe data mining characteristics, applications, and modeling techniques. ! Describe the process of training a model. ! Use the online analytical processing (OLAP) Mining Model Wizard to edit, process, and explore the decision trees. ! Analyze relational data relationships in the dependency network browser. ! Describe the steps required to build a clustering model by using OLAP data. Topic Objective To provide an overview of the module topics and objectives. Lead-in In this module, you will learn about data mining, how data mining can be used to address business application requirements, and how to create data mining models by using the Analysis Manager. 2 Module 17: Introduction to Data Mining BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY # ## # Introducing Data Mining ! Defining Data Mining ! Data Mining Applications ! Data Mining Models ! Introductory Example ! Exploring the Decision Tree This section introduces data mining concepts, including: ! Defining data mining. ! Discussing how data mining can be applied to solve common business applications. ! Describing what data mining models are available. ! Presenting a simple example of how data mining can be used. ! Exploring the decision tree. Topic Objective To introduce the concept of data mining. Lead-in In this section, you will be introduced to a simple case study example. In that example, data mining will be defined, common applications and techniques discussed, and its role in the data warehouse explored. Module 17: Introduction to Data Mining 3 BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY Defining Data Mining ! Is The Process of Deducing Meaningful Patterns and Rules from Large Quantities of Data ! Searches for Patterns in Data Rather than Answering Predefined Questions ! Is Used To: $ Provide historical insights $ Predict future values or outcomes $ Close the loop for analysis In many organizations, data volumes are so large that it is difficult, even for the most seasoned analyst, to identify the key information most relevant to managing the business. Data mining is the automatic or semi-automatic process of deducing meaningful patterns and rules from large quantities of data. These patterns provide valuable insights to business managers and offer information that may be overlooked by more traditional manual methods of analysis. Data mining programs search for patterns in data rather than answer predefined questions. Because of this, they can be used for knowledge discovery in addition to hypothesis testing. Data mining is used to: ! Provide insight into historical data. ! Predict future values or outcomes based on historical patterns. ! Close the analysis loop by taking action based on the information derived from the analysis. Topic Objective To provide a definition of data mining. Lead-in Data mining provides a means by which the system deduces knowledge from the data by identifying correlations and other patterns in the data. 4 Module 17: Introduction to Data Mining BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY Data Mining Applications ! Advertising on the Internet $ “What banner will I display to this visitor?” $ “What other products is this customer likely to buy? ! Detecting Fraud $ “Is this insurance claim a fraud?” ! Pricing Insurance $ “How much of a discount will I offer to this customer?” ! Managing Credit Risk $ “Will I approve the loan for this customer?” Data mining techniques are used in a variety of applications. This section provides some interesting examples. Advertising on the Internet You can use data mining to classify groups of customers with similar information into segments for targeting advertising or special offers. Following are two Internet customer examples: ! An e-commerce Web site sells sporting equipment. When a customer registers, a database management system collects information about the customer, such as gender, marital status, favorite sport, and age. By using data mining techniques, the Web site displays a masculine banner ad with a golfing motif for the male, golf-loving, 40-year-old who returns to the Web site after registering. ! When you purchase merchandise on the Internet, you are sometimes offered additional merchandise that the Web site predicts you might be interested in—for example, a book similar to the one you are currently purchasing. Such recommendations are based on data mining techniques that search out purchase patterns of customers who purchased the same book you are now buying. The system recommends: “If you like xyz books, check out the additional books below.” Detecting Fraud You can use a data mining system to identify characteristics of suspicious insurance claims by analyzing characteristics of legitimate and fraudulent claims. For example, specific types of injuries that are difficult to diagnose, such as neck and back injuries, may be more likely candidates for a fraudulent claim. Topic Objective To identify different applications for data mining. Lead-in Data mining is used for a variety of different applications. We are now going to talk about some common uses. Delivery Tips Incorporate your own examples of how data mining is used to solve business problems. Ask students for examples from their businesses. Point out that data mining is no longer an art used by just PhDs. This technology is available and useful to a variety of businesses. [...]... PURPOSES ONLY 12 Module 17: Introduction to Data Mining Training a Data Mining Model Topic Objective To explain the methodology for creating a mining model and to define terminology Training Data Mining Model Data To Predict Lead-in When creating a data mining model, you need a training data set This is typically historical data where the attributes to be predicted are known DM Engine Mining Model Delivery.. .Module 17: Introduction to Data Mining 5 Pricing Insurance In the insurance industry, you use data mining techniques to analyze historical data such as age, marital status, gender, and driving history All these factors play a role in predicting the likelihood of a specific driver for getting into an automobile accident Data mining techniques help you to weigh and factor these data points into pricing... TRAINER PREPARATION PURPOSES ONLY Module 17: Introduction to Data Mining 13 # Building a Data Mining Model with OLAP Data Topic Objective To describe the steps used to build a data mining model with OLAP data Lead-in These are a variety of steps involved in building a data mining model with OLAP data ! Introducing the Membership Card Scenario ! Selecting the Data Mining Technique ! Selecting the Case... segmenting the data based on various attributes you collect To answer the question, you can spend several hours exploring the data manually, or you can use data mining to explore the data automatically BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY Module 17: Introduction to Data Mining 9 Demonstration: Determining Why Students Attend College Topic Objective To demonstrate how to create... build slide to explain how Analysis Server evaluates training data to build a data mining model, and then uses the model to predict future outcomes based on new data sets DM Engine Predicted Data To create a model, you must assemble a set of data where the attributes to be predicted are known Such a data set is called the training data During the training process, data is inserted into the data mining model... PURPOSES ONLY 16 Module 17: Introduction to Data Mining Selecting the Data Mining Technique Topic Objective To demonstrate how to select the data mining technique by using the Wizard Lead-in Microsoft offers two data mining techniques: Microsoft Decision Trees and Microsoft Clustering You select decision trees in this case because it is a good technique for prediction There are varieties of data mining techniques... MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY Module 17: Introduction to Data Mining 21 Browsing the Data Mining Model Topic Objective Content Navigator To demonstrate how to browse the results in a mining model Content Detail Lead-in The OLAP Mining Model Editor can be used to edit properties in your model or browse the results Attributes Node Path To finish creating the model, you must name, save,... standing By using data mining techniques applied to historical loan application information, the bank can predict whether you are a good or bad credit risk and can use this information when deciding on loan approval BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY 6 Module 17: Introduction to Data Mining Data Mining Models Topic Objective To describe different data mining models... ONLY 22 Module 17: Introduction to Data Mining Browsing the Decision Tree Once the decision tree is created and processed, you can examine the results by using the browser Returning to the membership card example, the task now is to analyze which customers are likely to purchase Golden cards BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY Module 17: Introduction to Data Mining. .. using the editor Note Although only one predicted entity can be selected by using the wizard, additional entities may be added by using the Data Mining Editor The Mining Model Wizard prompts you to select the predicted entity for this model BETA MATERIALS FOR MICROSOFT CERTIFIED TRAINER PREPARATION PURPOSES ONLY Module 17: Introduction to Data Mining 19 Selecting Training Data Topic Objective To review . training data to build a data mining model, and then uses the model to predict future outcomes based on new data sets. Module 17: Introduction to Data Mining. ! Defining Data Mining ! Data Mining Applications ! Data Mining Models ! Introductory Example ! Exploring the Decision Tree This section introduces data mining

Ngày đăng: 24/01/2014, 19:20

TỪ KHÓA LIÊN QUAN