1. Trang chủ
  2. » Luận Văn - Báo Cáo

Lecture Business management information system - Lecture 26: Data mining

48 40 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 48
Dung lượng 638,89 KB

Nội dung

Lecture Business management information system - Lecture 26: Data mining. In this chapter, the following content will be discussed: What is data mining? Why data mining? What applications? What techniques? What process? What software?

Data Mining Lecture 26 Today’s Lecture What is data mining?  Why data mining?  What applications?  What techniques?  What process?  What software?  Definition Data mining may be defined as follows: data mining is a collection of techniques for efficient automated discovery of previously unknown, valid, novel, useful and understandable patterns in large databases The patterns must be actionable so they may be used in an enterprise’s decision making What is Data Mining?     Efficient automated discovery of previously unknown patterns in large volumes of data Patterns must be valid, novel, useful and understandable Businesses are mostly interested in discovering past patterns to predict future behaviour A data warehouse, as discussed earlier, is an enterprise’s memory Data mining can provide intelligence using that memory Examples    amazon.com uses associations Recommendations to customers are based on past purchases and what other customers are purchasing A store in USA “Just for Feet” has about 200 stores, each carrying up to 6000 shoe styles, each style in several sizes Data mining is used to find the right shoes to stock in the right store More examples in case studies to be discussed later Data Mining    We assume we are dealing with large data, perhaps Gigabytes, perhaps in Terabytes Although data mining is possible with smaller amount of data, bigger the data, higher the confidence in any unknown pattern that is discovered There is considerable hype about data mining at the present time and Gartner Group has listed data mining as one of the top ten technologies to watch Question: How many books could one store in one Terabyte of memory? Why Data Mining Now?    Growth in generation and storage of corporate data – information explosion Need for sophisticated decision making – current database systems are Online Transaction Processing (OLTP) systems The OLTP data is difficult to use for such applications Why? Evolution of technology – much cheaper storage, easier data collection, better database management, to data analysis and understanding Information explosion    Database systems are being used since the 1960s in the Western countries (perhaps since 1980s in India) These systems have generated mountains of data Point of sale terminals and bar codes on many products, railway bookings, educational institutions, huge number of mobile phones, electronic commerce, all generate data Government is now collecting a lot of information Information explosion      Internet banking via networked computers and ATMs Credit and debit cards Medical data, doctors, hospitals Transportation, Indian railways, automatic toll collection on toll roads, growing air travel Passports, NRI visas, Other visas, NRI money transfers Question: Can you think of other examples of data collection? Information explosion Many adults in India generate:  Mobile phone transactions More than 300 million phones in India, reportedly growing at the rate of 10,000 new ones every hour! Mobile companies must save information about calls  Growing middle class with growing number of credit and debit card transactions About 25m credit cards and 70m debit cards in 2007 Annual growth rate about 30% and 40% respectively Could be 55m credit cards and 200m debit cards in 2010 resulting in perhaps 500m transactions annually 10 Better Marketing It has been reported that more than 1000 variable  values on each customer are held by some mail order  marketing companies The aim is to “lift” the response rate 34 Trend analysis In a large company, not all trends are always visible to  the management. It is then useful to use data mining  software that will identify trends Trends may be long term trends, cyclic trends or  seasonal trends 35 Market Basket Analysis    Aims to find what the customers buy and what they buy together This may be useful in designing store layouts or in deciding which items to put on sale Basket analysis can also be used for applications other than just analysing what items customers buy together 36 Customer Churn    In businesses like telecommunications, companies are trying very hard to keep their good customers and to perhaps persuade good customers of their competitors to switch to them In such an environment, businesses want to find which customers are good, why customers switch and what makes customers loyal Cheaper to develop a retention plan and retain an old customer than to bring in a new customer 37 Customer Churn    The aim is to get to know the customers better so you will be able to keep them longer Given the competitive nature of businesses, customers will move if not looked after Also, some businesses may wish to get rid of customers that cost more than they are worth e.g credit card holders that don’t use the card, bank customers with very small amount of money in their accounts 38 Web site design   A Web site is effective only if the visitors easily find what they are looking for Data mining can help discover affinity of visitors to pages and the site layout may be modified based on this information 39 Data Mining Process Successful data mining involves careful determining the aims and selecting appropriate data The following steps should normally be followed: Requirements analysis Data selection and collection Cleaning and preparing data Data mining exploration and validation Implementing, evaluating and monitoring Results visualisation 40 Requirements Analysis The enterprise decision makers need to formulate goals that the data mining process is expected to achieve The business problem must be clearly defined One cannot use data mining without a good idea of what kind of outcomes the enterprise is looking for If objectives have been clearly defined, it is easier to evaluate the results of the project 41 Data Selection and Collection Find the best source databases for the data that is required If the enterprise has implemented a data warehouse, then most of the data could be available there Otherwise source OLTP systems need to be identified and required information extracted and stored in some temporary system In some cases, only a sample of the data available may be required 42 Cleaning and Preparing Data This may not be an onerous task if a data warehouse containing the required data exists, since most of this must have already been done when data was loaded in the warehouse Otherwise this task can be very resource intensive, perhaps more than 50% of effort in a data mining project is spent on this step Essentially a data store that integrates data from a number of databases may need to be created When integrating data, one often encounters problems like identifying data, dealing with missing data, data conflicts and ambiguity An ETL (extraction, transformation and loading) tool may be used to overcome these problems 43 Exploration and Validation Assuming that the user has access to one or more data mining tools, a data mining model may be constructed based on the enterprise’s needs It may be possible to take a sample of data and apply a number of relevant techniques For each technique the results should be evaluated and their significance interpreted This is likely to be an iterative process which should lead to selection of one or more techniques that are suitable for further exploration, testing and validation 44 Implementing, Evaluating and Monitoring Once a model has been selected and validated, the model can be implemented for use by the decision makers This may involve software development for generating reports or for results visualisation and explanation for managers If more than one technique is available for the given data mining task, it is necessary to evaluate the results and choose the best This may involve checking the accuracy and effectiveness of each technique 45 Implementing, Evaluating and Monitoring Regular monitoring of the performance of the techniques that have been implemented is required Every enterprise evolves with time and so must the data mining system Monitoring may from time to time lead to the refinement of tools and techniques that have been implemented 46 Results Visualisation Explaining the results of data mining to the decision makers is an important step Most DM software includes data visualisation modules which should be used in communicating data mining results to the managers Clever data visualisation tools are being developed to display results that deal with more than two dimensions The visualisation tools available should be tried and used if found effective for the given problem 47 Summary  We have seen today What is data mining?  Why data mining?  What applications?  What techniques?  What process?  What software?  48 ...Today’s Lecture What is data mining?  Why data mining?  What applications?  What techniques?  What process?  What software?  Definition Data mining may be defined as follows: data mining. .. technology – much cheaper storage, easier data collection, better database management, to data analysis and understanding Information explosion    Database systems are being used since the 1960s... cards were:  Credit cards - 88%;  ATM cards - 60%  Membership cards - 58%  Debit cards - 35%  Prepaid cards - 35%  Loyalty cards - 29% Question: What kind of data these cards generate?

Ngày đăng: 18/01/2020, 17:28

TỪ KHÓA LIÊN QUAN