Cookbook Modeling Data for Marketing_5 docx

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	29
Dung lượng	413,99 KB

Nội dung

Page 173 How can you tell this has happened to you? If response rates seem extremely low but still have somewhat of a pulse, and if the offer is a proven offer, this may be an area that you want to investigate further. How can you confirm it? First, take the mail file and have this group's data (that would have been used to score them) appended. Score the model or apply the schema. Are they in the correct deciles/groups? If the answer is yes, you may need to look elsewhere for the source of your problem. If the answer is no, perform one other check. Go back to the main file/database where these persons' scores are stored. Pull out those names that were mailed and confirm that they belong to the deciles/groups they should. This two-part validation will answer two issues: Was the data scored properly to begin with, and was the model inverted? In the example of the direct marketing agency, the problem lay with having two databases in two different IT environments. The mainframe held the main database and was where the models were scored. A copy of these scores and deciles was extracted and given to an IT group in a relational setting. The scores were added to the relational environment and in doing so, the programmers ignored the decile codes and redeciled with the highest scores being assigned to decile 10 instead of 1. In revalidating and investigating all the efforts, if we had just compared the individual scores on the file in the relational setting without comparing back to the mainframe, we would have missed the problem. The cautionary tale here is that it can happen, so be careful not to let it. 6. Like a good farmer, check your crop rotation. This is another elementary point in database management, but again it can be overlooked. I was once asked if "list fatigue" existed, and I believe it does but can be avoided/minimized. One tactic is to develop some sound business rules that allow you to systematically rotate your lists. In direct marketing, the rule of thumb is usually 90-day intervals. There are some exceptions, though. With in-house files/databases, in-depth profiling will tell you what your frequency should be for talking to the customer. Some customers love constant communications (frequent purchasers, heavy users), while others would prefer you never talk to them (the opt-outs). E-mail solicitations have become very popular, mainly due to the low costs associated with producing them, but caution should be exercised in how often you fill up someone's inbox with offers. Even though we have all become somewhat numb to the amount of mailbox stuffers we receive, e-mail solicitations have a slightly more invasive feel than direct mail, similar to telemarketing calls. I often wonder how businesses that I haven't bought from get my e-mail address. If we as direct marketers can appreciate this distinction with e-mail and refrain from spamming our hearts out, we can probably assure ourselves that we won't be regulated in how often we can e-mail people and preserve a low-cost alternative for talking to our customers. continues Page 174 (Continued) How can you tell if list fatigue is setting in? Are response and conversion rates gradually declining in a nice steady curve? Can you tell me the average number of times a person is mailed and with what frequency? Do you have business rules that prevent you from over-communicating to customers? If the answers to these questions are yes, no, and no, chances are you aren't rotating your crops enough. 7. Does your model/schema have external validity? This is a question that sometimes is forgotten. You have great analysts who build technically perfect models. But can anyone interpret them in the context of the business? If the answer is no, your models/schemas do not have external validity. External validity to modeling is analogous to common sense. For example, let's take a look at a financial services model that finds that one of the factors in a model to predict demand for a high-interest-rate mortgage is someone's FICO score. FICO is weighted positively, which would be interpreted to mean that someone with a really high FICO score is more likely to convert. Well, any mortgage banker in the crowd will tell you that goes against what really happens. People with high FICO scores are people with excellent credit and therefore would most likely not be interested in, or likely to borrow at, high interest rates. Try evaluating and interpreting analytical work with a marketing manager's perspective. It will help you to evaluate whether your model/schema has external validity. 8. Does your model have good internal validity? When I refer to internal validity, I am referring to the validity of the model/schema building process itself. There are many ways to prevent a badly built model/schema from ever seeing the light of day. One good approach is to have the model/schema building process formalized with validation checks and reviews built into the process. Good modelers always keep a "hold-out" sample for validating their work. Documentation at every step of the process is good so in the case that something goes wrong, one can follow the model-building process much like a story. Not every modeler is very thorough. Having a formalized documentation/process can help to avoid errors. Having modelers review each other's work is also helpful. Often, I am asked to decipher whether a model is "good" or not by just looking at the algorithm. That in itself is not enough to determine the quality of the model. Understanding the underlying data, as well as the process by which the modeler built the algorithm, is crucial. In one such case, the model seemed to be valid. On reviewing the data, however, I found the culprit. The algorithm included an occupation code variable. However, when I looked at the data, this variable was an alphanumeric code that would have had to be transformed to be of any use in a model. And that hadn't happened. This example brings up another related issue. With the explosion in the importance and demand for dataminers, there are many groups/people operating out there who are less than thorough when building models/schemas. If someone builds you a model, ask him or her to detail the process by which he Page 175 or she built it and by what standards he or she evaluated it. If you aren't sure how to evaluate his or her work, hire or find someone who can. 9. Bad ingredients make bad models. Nothing will ruin a model or campaign faster than bad data. Model- building software has become so automated that anyone can build a model with a point and click. But the real service that an experienced analyst brings is being able to detect bad data early on. EDA, or exploratory data analysis, is the first step toward building a good model/schema and avoiding the bad data experience. If you are the analyst, don't take someone's word that the data is what it is; check it out for yourself. Know your data inside and out. I once had an experience where the client gave me all the nonresponders but told me they were responders. Only when I got to the part where I checked my external validity did I find the problem and correct it. If you work in database marketing, don't assume that others understand data the same way. Confirm how samples are pulled, confirm data content, and examine files very closely. If you are working with appended data, make sure that the data is clean. This is more difficult because you may not be as familiar with it. Ask for ranges of values for each field and for the mean scores/frequencies for the entire database that the data came from. A related issue with appended data is that it should make sense with what you are trying to predict. Financial data is a very powerful ingredient in a model/schema to predict demand for financial services, but as a predictor for toothpaste purchase behavior, it is not. Choose your ingredients wisely. 10. Sometimes good models, like good horses, need to be put out to pasture. Good models, built on well- chosen data, will perform over time. But like all good things, models do have a life cycle. Because not every market is the same and consumers tend to change over time, it almost ensures that the process of prediction will not be an event. How can you tell if it is time to refresh/rebuild your model? Have you seen a complete drop-off in response/conversion without a change in your creative/offer or the market at large? If yes, it's time to rebuild. But nobody wants to wait until that happens; you would prefer to be proactive about rebuilding models. So, that said, how do you know when it's time? The first clue is to look at the market itself. Is it volatile and unpredictable? Or is it staid and flat? Has something changed in the marketplace recently (i.e., legislation, new competitors, new product improvements, new usage) that has changed overall demand? Are you communicating/distributing through new channels (i.e., the Internet)? Have you changed the offer/creative? All of the preceding questions will help you to determine how often and when new models should be built. If you are proactive by watching the market, the customers, and the campaigns you will know when it is time. One suggestion is to always be testing a "challenger" to the "established champ." When the challenger starts to out-perform the champ consistently, it's time to retire the champ. Page 176 Back - end Validation In my opinion, the most exciting and stressful part of the modeling process is waiting for the results to come in. I usually set up a daily monitoring program to track the results. That approach can be dangerous, though, because you can't determine the true performance until you have a decent sample size. My advice is to set up a tracking program and then be patient. Wait until you have at least a couple hundred responders before you celebrate. In the case study, I am predicting the probability of a prospect becoming an active account. This presumes that the prospect responds. I can multiply the number of early responders times the expected active rate, given response, to get a rough idea of how the campaign is performing. Once all of the results are in, it is critical to document the campaign performance. It is good to have a standard report. This becomes part of a model log (described in the next section). For the case study, the company mailed deciles 1 through 5 and sampled deciles 6 through 10. In Figure 7.9, the model results are compared with the expected performance shown in Figure 7.6. Each component within each decile is compared. We notice a slight difference in the expected performance and the actual performance. But overall, model performance is good. For both the "active rate" Figure 7.9 Back - end validation report. Page 177 and the "average NPV," the rank ordering is strong and the variation from expected performance is at or below 10%. Model Maintenance I have worked with modelers, marketers, and managers for many years. And I am always amazed at how little is known about what models exist within the corporation, how they were built, and how they have been used to date. After all the time and effort spent developing and validating a model, it is worth the extra effort to document and track the model's origin and utilization. The first step is to determine the expected life of the model. Model Life The life of a model depends on a couple of factors. One of the main factors is the target. If you are modeling response, it is possible to redevelop the model within a few months. If the target is risk, it is difficult to know how the model performs for a couple of years. If the model has an expected life of several years, it is always possible to track the performance along the way. Benchmarking As in our case study, most predictive models are developed on data with performance appended. If the performance window is three years, it should contain all the activity for the three -year period. In other words, let's say you want to predict bankruptcy over a three-year period. You would take all names that are current for time T. The performance is then measured in the time period between T + 6 to T + 36 months. So when the model is implemented on a new file, the performance can be measured or benchmarked at each six - month period. If the model is not performing as expected, then the choice has to be made whether to continue use, rebuild, or refresh. Rebuild or Refresh? When a model begins to degrade, the decision must be made to rebuild or refresh the model. To rebuild means to start from scratch, as I did in chapter 3. I would use new data, build new variables, and rework the entire process. To refresh means that you keep the current variables and rerun the model on new data. Page 178 It usually makes sense to refresh the model unless there is an opportunity to introduce new predictive information. For example, if a new data source becomes available it might make sense to incorporate that information into a new model. If a model is very old, it is often advisable to test building a new one. And finally, if there are strong shifts in the marketplace, a full-scale model redevelopment may be warranted. This happened in the credit card industry when low introductory rates were launched. The key drivers for response and balance transfers were changing with the drop in rates. Model Log A model log is a register that contains information about each model such as development details, key features, and an implementation log. Table 7.2 is an example of a model log for our case study. A model log saves hours of time and effort as it serves as a quick reference for managers, marketers, and analysts to see what's available, how models were Table 7.2 Sample Model Log NAME OF MODEL LIFEA2000 Dates of development 3/00–4/00 Model developer O. Parr Rud Overall objective Increase NPV Specific target Accounts with premium amount > 0 Model development data (date) NewLife600 (6/99) First campaign implementation NewLife750 (6/00) Implementation date 6/15/00 Score distribution (validation) Mean = .037, St Dev = .00059, Min = .00001, Max = .683 Score distribution (implementation) Mean = .034, St Dev = .00085, Min = .00001, Max =.462 Selection criteria Decile 5 Selection business logic > $.05 NPV Preselects Age 25–65; minimum risk screening Expected performance $726M NPV Actual performance $703M NPV Model details Sampled lower deciles for model validation and redevelopment Key drivers Population density, life stage variables Page 179 developed, who's the target audience, and more. It tracks models over the long term with details such as the following: Model name or number. Select a name that reflects the objective or product. Combining it with a number allows for tracking redevelopment models. Date of model development. Range of development time. Model developer. Name of person who developed model. Model development data. Campaign used for model development. Overall objective. Reason for model development. Specific target. Specific group of interest or value estimated. Development data. Campaign used for development. Initial campaign. Initial implementation campaign. Implementation date. First use date. Score distribution (validation). Mean, standard deviation, minimum and maximum values of score on validation sample. Score distribution (implementation). Mean, standard deviation, minimum and maximum values of score on implementation sample. Selection criteria. Score cut-off or depth of file. Selection business logic. Reason for selection criteria. Preselects. Cuts prior to scoring. Expected performance. Expected rate of target variable; response, approval, active, etc. Actual performance. Actual rate of target variable; response, approval, active, etc. Model details. Characteristics about the model development that might be unique or unusual. Key drivers. Key predictors in the model. I recommend a spreadsheet with separate pages for each model. One page might look something like the page in Table 7.2. A new page should be added each time a model is used. This should include the target population, date of score, date of mailing, score distribution parameters, preselects, cut-off score, product code or codes, and results. Summary In this chapter, I estimated the financial impact of the model by calculating net present value. This allowed me to assess the model's impact on the company's Page 180 bottom line. Using decile analysis, the marketers and managers are able to select the number of names to solicit to best meet their business goals. As with any great meal, there is also the clean-up! In our case, tracking results and recording model development are critical to the long - term efficiency of using targeting models. TEAMFLY Team-Fly ® Page 181 PART THREE— RECIPES FOR EVERY OCCASION Page 182 Do you like holiday dinners? Are you a vegetarian? Do you have special dietary restrictions? When deciding what to cook, you have many choices! Targeting models also serve a variety of marketing tastes. Determining who will respond, who is low risk, who will be active, loyal, and above all, profitable— these are all activities for which segmentation and targeting can be valuable. In this part of the book, I cover a variety of modeling objectives for several industries. In chapter 8, I begin with profiling and segmentation, a prudent first step in any customer analysis project. I provide examples for both the catalog and financial services industry using both data-driven and market-driven techniques. In chapter 9 I detail the steps for developing a response model for a business-to-business application. In chapter 10 I develop a risk model for the telecommunication industry. And in chapter 11, I develop a churn or attrition model for the credit card industry. Chapter 12 continues the case study from chapters 3 through 7 with the development of a lifetime value model for the direct- mail life insurance industry. If your work schedule is anything like mine, you must eat fast food once in a while. Well, that's how I like to describe modeling on the Web. It's designed to handle large amounts of data very quickly and can't really be done by hand. In chapter 13, I discuss how the Web is changing the world of marketing. With the help of some contributions from leading thinkers in the field, I discuss how modeling, both traditional and interactive, can be used on a Web site for marketing, risk, and customer relationship management. [...]... the various sources you've identified for the sample you've selected The analytical database may contain transactional data, survey data, and geo-demographic data Data will likely be delivered to you in different formats and will need to be reformatted to populate a common analytical database 7 "Clean" the data where necessary In some cases, records can contain data that might not be representative... marketing, sales, market research, database analysis, information systems, financial analysis, operations, and risk management This will vary by organization and industry 3 Review and evaluate your data requirements Make sure you have considered all necessary data elements for analysis and segmentation purposes Remember to view internal as well as external data overlays Types of data could include survey,... PROC FREQ for the first two variables This gives us information about the distribution of our customers Notice how 33% of the customers are between the ages of 45 and 50 In order to make use of this information for new acquisition marketing, we need to compare this finding to the general population The next PROC FREQ creates similar profiles for the general population: proc freq data= ch08.pop; format... following data step creates a new variable called segment This variable has a value for each of our four segments Following the data step, I format the segment values for use in our profile table: data ch08.profit; set ch08.profit; if riskscr < 651 then if acctrev < else segment end; else do; if acctrev < else segment end; run; do; 151 then segment = '1'; = '2'; 151 then segment = '3'; = '4'; proc format;... extreme values (min and max) for irregularities Once I am comfortable with the range of the variables, I display the mean values only in a table that is useful for developing marketing strategies In Figure 8.9, the averages for each variable are displayed, and each segment is named for its overall character Managers and marketers find this type of analysis very useful for developing marketing strategies... specializing in gifts and tools for the home and garden It has been running a successful business for more than 10 years and now has a database of 35,610 customers But SAMs noticed that its response rates have been dropping, and so it is interested in learning some of the key drivers of response It is also interested in expanding its customer base It is therefore looking for ways to identify good prospects... begin by creating formats to collapse the subgroups PROC FORMAT creates templates that can be used in various summary procedures The following code creates the formats and produces the frequencies: Team-Fly® Page 191 value sales low- . .000 85, Min = .00001, Max =.462 Selection criteria Decile 5 Selection business logic > $. 05 NPV Preselects Age 25 65; minimum risk screening Expected performance $726M NPV Actual performance. Obtain data from the various sources you've identified for the sample you've selected. The analytical database may contain transactional data, survey data, and geo-demographic data. Data. appended data, make sure that the data is clean. This is more difficult because you may not be as familiar with it. Ask for ranges of values for each field and for the mean scores/frequencies for

Ngày đăng: 21/06/2014, 21:20

Xem thêm

Cookbook Modeling Data for Marketing_5 docx