Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 68 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
68
Dung lượng
1,76 MB
Nội dung
Building the Data Mining Environment interaction The flip side of this challenge is establishing a single image of the company and its brand across all channels of communication with the cus tomer, including retail stores, independent dealers, the Web site, the call cen ters, advertising, and direct marketing The goal is not only to make more informed decisions; the goal is to improve the customer experience in a mea surable way In other words, the customer strategy has both analytic and oper ational components This book is more concerned with the analytic component, but both are critical to success T I P Building a customer-centric organization requires a strategy with both analytic and operational components Although this book is about the analytical component, the operational component is also critical Building a customer-centric organization requires centralizing customer information from a variety of sources in a single data warehouse, along with a set of common definitions and well-understood business processes describing the source of the data This combination makes it possible to define a set of cus tomer metrics and business rules used by all groups to monitor the business and to measure the impact of changing market conditions and new initiatives The centralized store of customer information is, of course, the data ware house described in the previous chapter As shown in Figure 16.1, there is twoway traffic between the operational systems and the data warehouse Operational systems supply the raw data that goes into the data warehouse, and the warehouse in turn supplies customer scores, decision rules, customer segment definitions, and action triggers to the operational system As an example, the operational systems of a retail Web site capture all customer orders These orders are then summarized in a data warehouse Using data from the data warehouse, association rules are created and used to generate cross-sell recommendations that are sent back to the operational systems The end result: a customer comes to the site to order a skirt and ends up with sev eral pairs of tights as well Creating a Single Customer View Every part of the organization should have access to a single shared view of the customer and present the customer with a single image of the company In practical terms that means sharing a single customer profitability model, a sin gle payment default risk model, a single customer loyalty model, and shared definitions of such terms as customer start, new customer, loyal customer, and valuable customer 517 518 Chapter 16 Operational Systems S Co egm m en m ts on , A De ct fin ion iti s, on s (b Ope ill ra in tio g, n us al ag Da e, ta etc ) Business Users Common Metadata Common Repository of Customer Information Figure 16.1 A customer-centric organization requires centralized customer data It is natural for different groups to have different definitions of these terms At one publication, the circulation department and the advertising sales department have different views on who are the most valuable customers because the people who pay the highest subscription prices are not necessarily the people of most interest to the advertisers The solution is to have an adver tising value and a subscription value for each customer, using ideas such as advertising fitness introduced in Chapter 4 At another company, the financial risk management group considers a cus tomer “new” for the first 4 months of tenure, and during this initial probation ary period any late payments are pursued aggressively Meanwhile, the customer loyalty group considers the customer “new” for the first 3 months and during this welcome period the customer is treated with extra care So which is it: a honeymoon or a trial engagement? Without agreement within the com pany, the customer receives mixed messages For companies with several different lines of business, the problem is even trickier The same company may provide Internet service and telephone ser vice, and, of course, maintain different billing, customer service, and opera tional systems for the two services Furthermore, if the ISP was recently acquired by the telephone company, it may have no idea what the overlap is between its existing telephone customers and its newly acquired Internet customers Building the Data Mining Environment Defining Customer-Centric Metrics On September 24, 1929, Lieutenant James H Doolittle of the U.S Army Air Corps made history by flying “blind” to demonstrate that with the aid of newly invented instruments such as the artificial horizon, the directional gyro scope, and the barometric altimeter, it was possible to fly a precise course even with the cockpit shrouded by a canvas hood Before the invention of the artifi cial horizon, pilots flying into a cloud or fog bank would often end up flying upside down Now, thanks to all those gauges in the cockpit, we calmly munch pretzels, sip coffee, and revise spreadsheets in weather that would have grounded even Lieutenant Doolittle Good business metrics are just as crucial to keeping a large business flying on the proper course Business metrics are the signals that tell management which levers to move and in what direction Selecting the right metrics is crucial because a business tends to become what it is measured by A business that measures itself by the number of customers it has will tend to sign up new customers without regard to their expected tenure or prospects for future profitability A business that measures itself by market share will tend to increase market share at the expense of other goals such as profitability The challenge for companies that want to be customer-centric is to come up with realistic customer-centric mea sures It sounds great to say that the company’s goal is to increase customer loyalty; it is harder to come up with a good way to measure that quality in cus tomers Is merely having lasted a long time a sign of loyalty? Or should loyalty be defined as being resistant to offers from competitors? If the latter, how can it be measured? Even seemingly simple metrics such as churn or profitability can be surpris ingly hard to pin down When does churn actually occur: ■ ■ On the day phone service is actually deactivated? ■ ■ On the day the customer first expressed an intention to deactivate? ■ ■ At the end of the first billing cycle after deactivation? ■ ■ On the date when the telephone number is released for new customers? Each of these definitions plays a role in different parts of a telephone busi ness For wireless subscribers on a contract, these events may be far apart And, which churn events should be considered voluntary? Consider a sub scriber who refuses to pay in order to protest bad service and is eventually cut off; is that voluntary or involuntary churn? What about a subscriber who stops voluntarily and then doesn’t pay the final amount owed? These questions do not have a right answer; they do suggest the subtleties of defining the cus tomer relationship As for profitability, which customers are considered profitable depends a great deal on how costs are allocated 519 520 Chapter 16 Collecting the Right Data Once metrics such as loyalty, profitability, and churn have been properly defined, the next step is to determine the data needed to calculate them cor rectly This is different from simply approximating the definition using what ever data happens to be available Remember, in the ideal data mining environment, the data mining group has the power to determine what data is made available! Information required for managing the business should drive the addition of new tables and fields to the data warehouse For example, a customer-centric company ought to be able to tell which of its customers are profitable In many companies this is not possible because there is not enough information avail able to sensibly allocate costs at the customer level One of our clients, a wire less phone company, approached this problem by compiling a list of questions that would have to be answered in order to decide what it costs to provide ser vice to a particular customer They then determined what data would be required to answer those questions and set up a project to collect it The list of questions was long, and included the following: ■ ■ How many times per year does the customer call customer care? ■ ■ Does the customer pay bills online, by check, or by credit card? ■ ■ What proportion of the customer’s airtime is spent roaming? ■ ■ On which outside networks does the customer roam? ■ ■ What is the contractual cost for these networks? ■ ■ Are the customer’s calls to customer care handled by the IVR or by human operators? Answering these cost-related questions required data from the call-center system, the billing system , and a financial system Similar exercises around other important metrics revealed a need for call detail data, demographic data, credit data, and Web usage data From Customer Interactions to Learning Opportunities A customer-centric organization maintains a learning relationship with its cus tomers Every interaction with a customer is an opportunity for learning, an opportunity that can be siezed when there is good communication between data miners and the various customer-facing groups within the company Almost any action the company takes that affects customers—a price change, a new product introduction, a marketing campaign—can be designed so that it is also an experiment to learn more about customers The results of these experiments should find their way into the data warehouse, where they Building the Data Mining Environment will be available for analysis Often the actions themselves are suggested by data mining As an example, data mining at one wireless company showed that having had service suspended for late payment was a predictor of both voluntary and involuntary churn That late payment is a predictor of later nonpayment is hardly a surprise, but the fact that late payment (or the company’s treatment of late payers) was a predictor of voluntary churn seemed to warrant further investigation The observation led to the hypothesis that having had their service sus pended lowers a customers’ loyalty to the company and makes it more likely that they will take their business elsewhere when presented with an opportu nity to do so It was also clear from credit bureau data that some of the late payers were financially able to pay their phone bills This suggested an exper iment: Treat low-risk customers differently from high-risk customers by being more patient with their delinquency and employing gentler methods of per suading them to pay before suspending them A controlled experiment tested whether this approach would improve customer loyalty without unacceptably driving up bad debt Two similar cohorts of low-risk, high-value customers received different treatments One was subjected to the “business as usual” treatment, while the other got the kinder, gentler treatment At the end of the trial period, the two groups were compared on the basis of retention and bad debt in order to determine the financial impact of switching to the new treat ment Sure enough, the kinder, gentler treatment turned out to be worthwhile for the lower risk customers—increasing payment rates and slightly increas ing long term tenure Mining Customer Data When every customer interaction is generating data, there are endless oppor tunities for data mining Purchasing patterns and usage patterns can be mined to create customer segments Response data can be mined to improve the tar geting of future campaigns Multiple response models can be combined into best next offer models Survival analysis can be employed to forecast future customer attrition Churn models can spot customers at risk for attrition Cus tomer value models can identify the customers worth keeping Of course, all this requires a data mining group and the infrastructure to support it The Data Mining Group The data mining group is specifically responsible for building models and using data to learn about customers—as opposed to leading marketing efforts, 521 Chapter 16 devising new products, and so on That is, this group has technical responsi bilities rather than business responsibilities We have seen data mining groups located in several different places in the corporate hierarchy: ■ ■ Outside the company as an outsourced activity ■ ■ As part of IT ■ ■ As part of marketing, customer relationship management, or finance organization ■ ■ As an interdisciplinary group whose members still belong to their home departments AM FL Y Each of these structures has certain benefits and drawbacks, as discussed below Outsourcing Data Mining Companies have varying reasons for considering outsourcing data mining For some, data mining is only an occasional need and so not worth investing in an internal group For others, data mining is an ongoing requirement, but the skills required seem so different from the ones currently available in the company that building this expertise from scratch would be very challenging Still others have their customer data hosted by an outside vendor and feel that the analysis should take place close to the data TE 522 Outsourcing Occasional Modeling Some companies think they have little need for building models and using data to understand customers These companies generally fall into one of two types The first are the companies with few customers, either because the com pany is small or because each customer is very large As an example, the pri vate banking group at a typical bank may serve a few thousand customers, and the account representatives personally know their clients In such an envi ronment, data mining may be superfluous, because people are so intimately involved in the relationship However, data mining can play a role even in this environment In particu lar, data mining can make it possible to understand best practices and to spread them For instance, some employees in the private bank may do a bet ter job in some way (retaining customers, encouraging customers to recom mend friends, family members, colleagues, and so on) These employees may have best practices that should be spread through the organization T I P Data mining may be unncessary for companies where dedicated staff maintain deep and personal long-term relationships with their customers Team-Fly® Building the Data Mining Environment Data mining may also seem unimportant to rapidly growing companies in a new market In this situation, customer acquisition drives the business, and advertising, rather than direct marketing, is the principal way of attracting new customers Applications for data mining in advertising are limited, and, at this stage in their development, companies are not yet focused on customer relationship management and customer retention For the limited direct mar keting they do, outsourced modeling is often sufficient Wireless communications, cable television, and Internet service providers all went through periods of exponential growth that have only recently come to an end as these markets matured (and before them, wired telephones, life insurance, catalogs, and credit cards went through similar cycles) During the initial growth phases, understanding customers may not be a worthwhile investment—an additional cell tower, switch, or whatever may provide better return Eventually, though, the business and the customer base grow to a point where understanding the customers takes on increased importance In our experience, it is better for companies to start early along the path of customer insight, rather than waiting until the need becomes critical Outsourcing Ongoing Data Mining Even when a company has recognized the need for data mining, there is still the possibility of outsourcing This is particularly true when the company is built around customer acquisition In the United States, credit bureaus and household data suppliers are happy to provide modeling as a value added ser vice with the data they sell There are also direct marketing companies that handle everything from mailing lists to fulfillment—the actual delivery of products to customers These companies often offer outsourced data mining Outsourcing arrangements have financial advantages for companies The problem is that customer insight is being outsourced as well A company that relies on outsourcing customers analytics runs the risk that customer under standing will be lost between the company and the vendor For instance,one company used direct mail for a significant proportion of its customer acquisition and outsourced the direct mail response modeling work to the mailing list vendors Over the course of about 2 years, there were several direct mail managers in the company and the emphasis on this channel decreased What no one had realized was that direct mail was driving acquisi tion that was being credited to other channels Direct mail pieces could be filled in and returned by mail, in which case the new acquisition was credited to direct mail However, the pieces also contained the company’s URL and a free phone number Many prospects who received the direct mail found it more convenient to respond by phone or on the Web, often forgetting to pro vide the special code identifying them as direct mail prospects Over time, the response attributed to direct mail decreased, and consequently the budget for 523 524 Chapter 16 direct mail decreased as well Only later, when decreased direct mail led to decreased responses in other channels, did the company realize that ignor ing this echo effect had caused them to make a less-than-optimal business decision Insourcing Data Mining The modeling process creates more then models and scores; it also produces insights These insights often come during the process of data exploration and data preparation that is an important part of the data mining process For that reason, we feel that any company with ongoing data mining needs should develop an in-house data mining group to keep the learning in the company Building an Interdisciplinary Data Mining Group Once the decision has been made to bring customer understanding in-house, the question is where In some companies, the data mining group has no per manent home It consists of a group of people seconded from their usual jobs to come together to perform data mining By its nature, such an arrangement seems temporary and often it is the result of some urgent requirement such as the need to understand a sudden upsurge in customer defaults While it lasts, such a group can be very effective, but it is unlikely to last very long because the members will be recalled to their regular duties as soon as a new task requires their attention Building a Data Mining Group in IT A possible home is in the systems group, since this group is often responsible for housing customer data and for running customer-facing operational sys tems Because the data mining group is technical and needs access to data and powerful software and servers, the IT group seems like a natural location In fact, analysis can be seen as an extension of providing databases and access tools and maintaining such systems Being part of IT has the advantage that the data mining group has access to hardware and data as needed, since the IT group has these technical resources and access to data In addition, the IT group is a service organization with clients in many business units In fact, the business units that are the “cus tomers” for data mining are probably already used to relying on IT for data and reporting On the other hand, IT is sometimes a bit removed from the business prob lems that motivate customer analytics Since very slight misunderstandings of the business problems can lead to useless results, it is very important that peo ple from the business units be very closely involved with any IT-based data mining projects Building the Data Mining Environment Building a Data Mining Group in the Business Units The alternative to putting the data mining group where the data and comput ers are is to put it close to the problems being addressed That generally means the marketing group, the customer relationship management group (where such a thing exists), or the finance group Sometimes there are several small data mining groups, one in each of several business units A group in finance building credit risk models and collections models, one in marketing building response models, and one in CRM building cross-sell models and voluntary churn models The advantages and disadvantages of this approach are the inverse of those for putting data mining in IT The business units have a great understanding of their own business problems, but may still have to rely on IT for data and computing resources Although either approach can be successful, on balance we prefer to see data mining centered in the business units What to Look for in Data Mining Staff The best data mining groups are often eclectic mixes of people Because data mining has not existed very long as a separately named activity, there are few people who can claim to be trained data miners There are data miners who used to be physicists, data miners who used to be geologists, data miners who used to be computer scientists, data miners who used to be marketing man agers, data miners who used to be linguists, and data miners who are still statisticians This makes lunchtime conversation in a data mining group fairly interest ing, but it doesn’t offer much guidance for hiring managers The things that make good data miners better than mediocre ones are hard to teach and impossible to automate: good intuition, a feel for how to coax information out of data, and a natural curiosity No one indivdiual is likely to have all the skills required for completing a data mining project Among them, the team members should cover the following: ■ ■ Database skills (SQL, if the data is stored in relational databases) ■ ■ Data transformation and programming skills (SAS, SPSS, S-Plus, PERL, other programming languages, ETL tools) ■ ■ Statistics ■ ■ Machine learning skills ■ ■ Industry knowledge in the relevant industry ■ ■ Data visualization skills ■ ■ Interviewing and requirements-gathering skills ■ ■ Presentation, writing, and communication skills 525 Preparing Data for Mining The following sections discuss these methods, giving examples of derived variables and highlighting important points about computing them Extracting Features from a Single Value Computationally, parsing values is a very simple operation because all the data needed is present in a single value Even though it is so simple, it is quite useful, as these examples show: ■ ■ Calculating the day of the week from a date ■ ■ Extracting the credit card issuer code from a credit card number ■ ■ Taking the SCF (first three digits) of a zip code ■ ■ Determining the vehicle manufacturer code from the VIN ■ ■ Adding a flag when a field is missing These operations generally require rudimentary operations that data mining tools should be able to handle Unfortunately, many statistical tools focus more on numeric data types than on the strings, dates, and times often encountered in business data—so string operations and date arithmetic can be difficult In such cases, these variables may need to be added during a preprocessing phase or as data is extracted from data sources Combining Values within a Record As with the extraction of features from a single value, combining values within a record is computationally simple—instead of using one variable, there are several variables Most data mining tools support adding derived variables that combine values from several fields, particularly for numeric fields This can be very useful, for adding ratios, sums, averages, and so on Such derived values are often more useful for modeling purposes than the raw data because these variables start to capture underlying customer behavior Date fields are often combined Taking the difference of two dates to calculate duration is an especially common and useful example It is not usually necessary to combine string fields, unless the fields are some how related For instance, it might be useful to combine a “credit card payment flag” with a “credit card type,” so there is one field representing the payment type Looking Up Auxiliary Information Looking up auxiliary information is a more complicated process than the pre vious two calculations A lookup is an example of joining two tables together (to use relational database terminology), with the simplifying assumption that one table is big and the other table is relatively small 569 570 Chapter 17 When the lookup table is small enough, such as Table 17.3, which describes the mapping between initial digits of a credit card number and the credit card type, then a simple formula can suffice for the lookup The more common situation is having a secondary table or file with the information This table might, for instance, contain: ■ ■ Populations and median household incomes of zip codes (usefully pro vided for downloading for the United States by the U.S Census Bureau at www.census.gov) ■ ■ Hierarchies for product codes ■ ■ Store type information about retail locations Unfortunately data mining tools do not, as a rule, make it easy to do lookups without programming Tools that do provide this facility, such as I-Miner from Insightful, usually require that both tables be sorted by the field or fields used for the lookup; an example of this is shown in Figure 17.12 This is palatable for one such field, but it is cumbersome when there are many different fields to be looked up In general, it is easier to do these lookups outside the tool, especially when the lookup tables and original data are both coming from databases Figure 17.12 Insightful Miner enables users to use and create lookup tables from the graphical user interface Preparing Data for Mining Sometimes, the lookup tables already exist Other times, they must be cre ated as needed For instance, one useful predictor of customer attrition is the historical attrition rate by zip code To add this to a customer signature requires calculating the historical attrition rate for each zip code and then using the result as a lookup table WA R N I N G When using database joins to look up values in a lookup table, always use a left outer join to ensure that no customer rows are lost in the process! An outer join in SQL looks like: SELECT c.*, l.value FROM (customer c left outer join lookup l on c.code = l.code) Table 17.3 Credit Card Prefixes CARD TYPE PREFIX LENGTH MasterCard 51 16 MasterCard 52 16 MasterCard 53 16 MasterCard 54 16 MasterCard 55 16 Visa 4 13 Visa 4 16 American Express 34 15 American Express 37 15 Diners Club 300 14 Diners Club 301 14 Diners Club 302 14 Diners Club 303 14 Diners Club 304 14 Diners Club 305 14 Discover 6011 16 enRoute 2014 15 enRoute 2149 15 JCB 3 16 JCB 2131 15 JCB 1800 15 571 Chapter 17 Pivoting Regular Time Series AM FL Y Data about customers is often stored at a monthly level, where each month has a separate row of data For instance, billing data is often stored this way, since most subscription-based companies bill customers once a month This data is an exam ple of a regular time series, because the data occurs at fixed, defined intervals Figure 17.13 illustrates the process needed to put this data into a customer signa ture The data must be pivoted, so values that start out in rows end up in columns This is generally a cumbersome process, because neither data mining tools nor SQL makes it easy to do pivoting Data mining tools generally require pro gramming for pivoting To accomplish this, the customer file needs to be sorted by customer ID, and the billing file needs to be sorted by the customer ID and the billing date Then, special-purpose code is needed to calculate the pivoting columns In SAS, proc TRANSPOSE is used for this purpose The sidebar “Piv oting Data in SQL” shows how it is done in SQL Most businesses store customer data on a monthly basis, usually by calen dar month Some industries, though, show strong weekly cyclical patterns, because customers either do or do not do things over the weekend For instance, Web sites might be most active during weekdays, and newspaper subscriptions generally start on Mondays or Sundays Such weekly cycles interfere with the monthly data, because some months are longer than others Consider a Web site where most activity is on weekdays Some months have 20 weekdays; others have up to 23 (not including holidays) The dif ference between successive months could be 15 percent, due solely to the differ ence in the number of weekdays To take this into account, divide the monthly activity by the number of weekdays during the month, to get an “activity per weekday.” This only makes sense, though, when there are strong weekly cycles TE 572 CUSTOMER MON AMOUNT Cust 1 Jan $38.43 Cust 1 Feb $41.22 Cust 1 Mar $21.09 Cust 1 Apr $66.02 Cust 2 Mar $14.36 Cust 2 Apr $9.52 CUSTOMER JAN AMOUNT FEB AMOUNT MAR AMOUNT APR AMOUNT Cust 1 $38.43 $41.22 $21.09 $66.02 $14.36 $9.52 Cust 2 Figure 17.13 Pivoting a field takes values stored in one or more rows for each customer and puts them into a single row for each customer, but in different columns Team-Fly® Preparing Data for Mining PIVOTING DATA IN SQL SQL does not have great support for pivoting data (although some databases may have nonstandard extensions with this capability) However, when using standard SQL it is possible to pivot data Assume that the data consists of billing records and that each has a sequential billing number assigned to it The first billing record has a “1,” the second “2,” and so on The following SQL fragment shows how to pivot this data: SELECT customer_id, sum(case when bill_seq = 1 then bill_amt end) as bill_1, sum(case when bill_seq = 2 then bill_amt end) as bill_2, sum(case when bill_seq = 3 then bill_amt end) as bill_3, FROM billing GROUP BY customer_id One problem with this fragment is that different customers have different numbers of billing periods However, the query can only take a fixed number When a customer has fewer billing periods than the query wants, then the later periods are filled with NULLs Actually, this code fragment is not generally what is needed for customer signatures because the signature wants the most recent billing periods—such as the last 12 or 24 For customers who are active, this is the most recent period However, for customers who have stopped, this requires considering their stop date instead The following code fragment takes this into account: SELECT customer_id, sum(case when trunc(months_between(bill_date, cutoff) = 1 then bill_amt else 0 end) as bill_1, sum(case when trunc(months_between(bill_date, cutoff) = 2 then bill_amt else 0 end) as bill_2, FROM billing b, (select customer_id, (case when status = ‘ACTIVE’ then sysdate else stop_date end) as cutoff from customer) c where b.customer_id = c.customer_id GROUP BY customer_id This code fragment does use some extensions to SQL for the date calculations (these are expressed as Oracle functions in this example) However, most databases have similar functions The above code is an example of a killer query, because it is joining a big table (the customer table) with an even bigger table (the customer billing table) and then doing a grouping operation Fortunately, modern databases can take advantage of multiple processors and multiple disks to perform this query in a reasonable amount of time 573 574 Chapter 17 Summarizing Transactional Records Transactional records are an example of an irregular time series—that is, the records can occur at any point in time Such records are generated by customer interactions, as is the case with: ■ ■ Automated teller machine transactions ■ ■ Telephone calls ■ ■ Web site visits ■ ■ Retail purchases There are several challenges when working with irregular time series First, the transaction volumes are very, very large Working with such voluminous data requires sophisticated tools and powerful computers Second, there is no standard way of working with them The regular time series data has a natural way of pivoting For irregular time series, it is necessary to determine how best to summarize the data One way is to transform the irregular time series into regular time series and then to pivot the series For instance, calculate the number of calls per month or the amount withdrawn from ATMs each month, and then pivot the sums by month When working with transactions, these calculations can be more com plex, such as the number of calls longer than 10 minutes or the number of withdrawals less than $50 These specialized summaries can be quite useful More complicated examples that describe customer behavior are provided just after the next section Another approach is to define a set of data transformations that are run on the transactional data as it is being collected This is an approach taken in the telecommunications industry, where the volume of data is vast Some vari ables may be as simple as minutes of use, others may be a complex as a score for whether the calling number is a business or residence This approach hardcodes the calculations, and such calculations are hard to change Although such variables can be useful, a more flexible environment for summarizing transactional data is strategically more useful Summarizing Fields across the Model Set The last method for deriving variables is summarizing values across fields in the customer signature itself There are several examples of such fields: ■ ■ Binning values into equal sized bins requires calculating the breakpoints for the bins ■ ■ Standardizing a value (subtracting the mean and dividing by the stan dard deviation) requires calculating the mean and standard deviation for the field and then doing the calculation Preparing Data for Mining ■ ■ Ranking a value (so the smallest value has a value of 1, the second smallest 2, and so on) requires sorting all the values to get the ranking Although these are complicated operations, they are performed directly on the model set Data mining tools provide support for these operations, espe cially for binning numeric values, which is the most important of the three One type of binning that would be very useful is not readily available This is binning for codes based on frequency That is, it would be useful to keep all codes that have at least, say, 1,000 instances in the model set and to place all other codes in a single “other” category This is useful for working with out liers, such as the many old and unpopular handsets that show up in mobile telephone data although few customers use them One way to handle this is to identify the handsets to keep and to add a new field “handset for analysis” that keeps these handsets and places the rest into an “other” category A more automated way is to create a lookup table to map the handsets However, per haps a better way is to replace the handset ID itself with information such as the date the handset was released, its weight, and the features it uses—information that is probably available in a lookup table already Examples of Behavior-Based Variables The real power of derived variables comes from the ability to summarize cus tomer behaviors along known dimensions This section builds on the ideas already presented and gives three examples of useful behavior-based variables Frequency of Purchase Once upon a time, catalogers devised a clever method for characterizing cus tomer behavior using three dimensions—recency, frequency, and monetary value RFM, which relies on these three variables, has been used at least since the 1970s Of these three descriptions of customer behavior, recency is usually the most predictive, but frequency is the most interesting Recency simply means the length of time since a customer made a purchase Monetary value is tradi tionally the total amount purchased (although we have found the average pur chase value more useful since the total is highly correlated with frequency) In traditional RFM analysis, frequency is just the number of purchases However, a simple count does not do a good job of characterizing customer behavior There are other approaches to determining frequency, and these can be applied to other areas not related to catalog purchasing—frequency of com plaints, frequency of making international telephone calls, and so on The important point is that customers may perform an action at irregular intervals, and we want to characterize this behavior pattern because it provides poten tially useful information about customers 575 576 Chapter 17 One method of calculating frequency would be to take the length of time indicated by the historical data and divide it by the number of times the cus tomer made a purchase So, if the catalog data goes back 6 years and a cus tomer made a single purchase, then that frequency would be once every 6 years Although simple, this approach misses an important point Consider two customers: ■ ■ John made a purchase 6 years ago and has received every catalog since then ■ ■ Mary made a purchase last month when she first received the catalog Does it make sense that both these customers have the same frequency? No John more clearly has a frequency of no more than once every 6 years Mary only had the opportunity to make one purchase in the past month, so her fre quency would more accurately be described as once per month The first point about frequency is that it should be measured from the first point in time that a customer had an opportunity to make a purchase There is another problem What we really know about John and Mary is that their frequencies are no more than once every 6 years and no more than once per month, respectively Historically, one observation does not contain enough information to deduce a real frequency This is really a time to event problem, such as those discussed in Chapter 12 Our goal here is to characterize frequency as a derived variable, rather than predict the next event (which is best approached using survival analysis) To do this, let’s assume that there are two or more events, so the average time between events is the total span of time divided by the number of events minus one, as shown in Figure 17.14 This provides the average time between events for the period when the events occurred There is no perfect solution to the question of frequency, because customer events occur irregularly and we do not know what will happen in the future— the data is censored Taking the time span from the first event to the most recent event runs into the problem that customers whose events all took place long ago may have a high frequency The alternative is to take the time since the first event, in essence pretending that the present is an event This is unsat isfying, because the next event is not known, and care must be taken when working with censored data In practice, taking the number of events since the first event could have happened and dividing by the total span of time (or the span when the customer was active) is the best solution Preparing Data for Mining Purchase Current Time First Contact A B Time C D Frequency is 2 / (C – A), but does not include time after C Frequency is 3 / C, but does not include time after C Frequency is 3 / (D - A), but data is censored Frequency is 3/D, but data is censored Figure 17.14 There is no perfect way to estimate frequency, but these four ways are all reasonable Declining Usage In telecommunications, one significant predictor of churn is declining usage— customers who use services less and less over time are more likely to leave than other customers Customers who have declining usage are likely to have many variables indicating this: ■ ■ Billing measures, such as recent amounts spent are quite small ■ ■ Usage measures, such as recent amounts used are quite small or always at monthly minimums ■ ■ Optional services recently have no usage ■ ■ Ratios of recent measures to older measures are less than 1, often signif icantly less than one, indicating recent usage is smaller than historical usage The existence of so many different measures for the same underlying behav ior suggests a situation where a derived variable might be useful to capture the behavior in a single variable The goal is to incorporate as much information as possible into a “declining usage” indicator 577 578 Chapter 17 T I P When many different variables all suggest a single customer behavior, then it is likely that a derived variable that incorporates this information will do a better job for data mining Fortunately, mathematics provides an elegant solution, in the form of the best fit line, as shown in Figure 17.15 The goodness of fit is described by the R2 statistic, which varies from 0 to 1, with values near 0 being poor fit and values near 1 being very good The slope of the line provides the average rate of increase or decrease in some variable over time In statistics, this slope is called the beta function and is calculated according to the following formula: Sum of (x-average(x))*(y-average(y)) / sum((x-average(x))2) To give an example of how this might be used, consider the following data for the customer shown in the previous figure Table 17.4 walks through the calculation for a typical customer Table 17.4 Example of Calculating the Slope for a Time Series MONTH (X –VALUE) X – AVG(X) (X – AVG (X))^2 Y (FROM CUST A) Y– AVG(Y) (X – AVG(X)) * (Y – AVG( Y)) 1 –5.5 30.25 53.47 3.19 –17.56 2 –4.5 20.25 46.61 –3.67 16.52 3 –3.5 12.25 47.18 –3.10 10.84 4 –2.5 6.25 49.54 –0.74 1.85 5 –1.5 2.25 48.71 –1.57 2.35 6 –0.5 0.25 52.04 1.76 –0.88 7 0.5 0.25 48.45 –1.83 –0.91 8 1.5 2.25 54.16 3.88 5.83 9 2.5 6.25 54.47 4.19 10.47 10 3.5 12.25 53.69 3.42 11.95 11 4.5 20.25 45.93 –4.35 –19.59 12 5.5 30.25 49.10 –1.18 –6.51 TOTAL SLOPE 143 14.36 0.1004 Preparing Data for Mining 56 54 52 y = 0.1007x + 49.625 50 R2 = 0.0135 48 46 44 1 2 3 4 5 6 7 8 9 10 11 12 Figure 17.15 The slope of the line of best fit provides a good measure of changes over time This example shows a very typical use for calculating the slope—finding the slope over the previous year’s usage or billing patterns The tabular format shows the calculation in a way most suitable for a spreadsheet However, many data mining tools provide a function to calculate beta values directly from a set of variables in a single row When such a function is not available, it is possible to express it using more basic arithmetic functions Although monthly data is often the most convenient for such calculations, remember that different months have different numbers of days This issue is particularly significant for businesses that have strong weekly cycles Some months have five full weekends, for instance, while others only have four Dif ferent months have between 20 and 23 working days (not including holidays) These differences can account for up to 25 percent of the difference between months When working with data that has such cycles, it is a good idea to cal culate the “average per weekend” or “average per working day” to see how the chosen measure is changing over time T I P When working with data that has weekly cycles but must be reported by month, consider variables such as “average per weekend day” or “average per work day” so that comparisons between months are more meaningful 579 580 Chapter 17 Revolvers, Transactors, and Convenience Users: Defining Customer Behavior Often, business people can characterize different groups of customers based on their behavior over time However, translating an informal business description into a form useful for data mining is challenging Faced with such a challenge, the best response is to determine measures of customer behavior that match the business understanding This example is about a credit card group at a major retail bank, which has found that profitable customers come in three flavors: ■ ■ Revolvers are customers who maintain large balances on their credit cards These are highly profitable customers because every month they pay interest on large balances ■ ■ Transactors are customers who have high balances every month, but pay them off These customers do not pay interest, but the processing fee charged on each transaction is an important source of revenue One component of the transaction fee is based on a percentage of the trans action value ■ ■ Convenience users are customers who periodically charge large amounts, for vacations or large purchases, for example, and then pay them off over several months Although not as profitable as revolvers, they are lower risk, while still paying significant amounts of interest The marketing group believes that these three types of customers are moti vated by different needs So, understanding future customer behavior would allow future marketing campaigns to send the most appropriate message to each customer segment The group would like to predict customer behavior 6 months in the future The interesting part of this example is not the prediction, but the definition of the segments The training set needs examples where customers are already classified into the three groups Obtaining this classification proves to be a challenge Preparing Data for Mining Data The data available for this project consisted of 18 months of billing data, including: ■ ■ Credit limit ■ ■ Interest rate ■ ■ New charges made during each month ■ ■ Minimum payment ■ ■ Amount paid ■ ■ Total balance in each month ■ ■ Amount paid in interest and related charges each month The rules for these credit cards are typical When a customer has paid off the balance, there is no interest on new charges (for 1 month) However, when there is an outstanding balance, then interest is charged on both the balance and on new charges What does this data tell us about customers? Segmenting by Estimating Revenue Estimated revenue is a good way of understanding the value of customers (By itself, this value does not provide much insight into customer behavior, so it is not very useful for messaging.) Basing customer value on revenue alone assumes that the costs for all customers are the same This is not true, but it is a useful approximation, since a full profitability model is quite complicated, difficult to develop, and beyond the scope of this example Table 17.5 illustrates 1 month of billing for six customers The last column is the estimated revenue, which has two components The first is the amount of interest paid The second is the transaction fee on new transactions, which is estimated to be 1 percent of the new transaction volume for this example 581 $6,000 $10,000 14.9% $8,000 $5,000 Customer 3 Customer 4 Customer 5 Customer 6 17.9% 12.9% 11.9% 4.9% $5,000 Customer 2 14.9% RATE $500 CREDIT LIMIT $0 $6,500 $2,500 $100 $0 $50 NEW CHARGES $4,500 $0 $0 $3,300 $4,500 $400 BEGINNING BALANCE Six Credit Card Customers and 1 Month of Data Customer 1 Table 17.5 $135 $0 $0 TE $99 $135 $15 $6,500 $75 $1,000 $135 $15 AMOUNT PAID $0.00 $0.00 $32.73 $18.38 $4.97 INTEREST $135 $67.13 AM FL Y MIN PAYMENT $0.00 $65.00 $25.00 $1.00 $0.00 $0.50 TRANSACTION REVENUE $67.13 $65.00 $25.00 $33.73 $18.38 $5.47 EST REVENUE 582 Chapter 17 Team-Fly® Preparing Data for Mining Estimated revenue is a good way to compare different customers with a sin gle number The table clearly shows that someone who rarely uses the credit card (Customer 1) has very little estimated revenue On the other hand, those who make many charges or pay interest create a larger revenue stream However, estimated revenue does not differentiate between different types of customers In fact, a transactor (Customer 5) has very high revenue So, does a revolver who has no new charges (Customer 6) This example shows that estimated revenue has little relationship to customer behavior Frequent users of the credit card and infrequent users both generate a lot of revenue And this is to be expected, since there are different types of profitable customers The real world is more complicated than this simplified example Each cus tomer has a risk of bankruptcy, where the outstanding balance must be writ ten off Different types of cards have different rules For instance, many co-branded cards have the transaction fee going to the co-branded institution And, the cost of servicing different customers varies, depending on whether the customer uses customer service, disputes charges, pays bills online, and so on In short, estimating revenue is a good way of understanding which cus tomers are valuable But, it does not provide much insight into customer behavior Segmentation by Potential In addition to actual revenue, each customer has a potential revenue This is the maximum amount of revenue that the customer could possibly bring in each month The maximum revenue is easy to calculate Simply assume that the entire credit line is used either in new charges (hence transaction revenue) or in carry-overs (hence interest revenue) The greater of these is the potential revenue Table 17.6 compares the potential revenue with the actual revenue for the same six customers during one month This table shows some interesting char acteristics Some not-so-profitable customers are already saturating their potential Without increasing their credit limits or interest rate, it is not possi ble to increase their value 583 ... specifications for a data mining platform adequate for the anticipated dataset sizes and expected usage patterns The Scoring Platform The scoring platform is where models developed on the mining platform... input for data mining models used for targeting, cross-selling, and retention There are several approaches to incorporating data mining into a company’s marketing and customer relationship management. .. values for that column Table 17.1 shows range characteristics for typical types of data used for data mining TE 542 Table 17.1 Range Characteristics for Typical Types of Data Used for Data Mining