55 Organizational Data Mining Hamid R. Nemati 1 and Christopher D. Barko 2 1 Information Systems and Operations Management Department Bryan School of Business and Economics The University of North Carolina at Greensboro nemati@uncg.edu 2 Customer Analytics, Inc. 7009 Austin Creek Drive Summerfield, NC 27358 chris.barko@customer-analytics.com Summary. Many organizations today possess substantial quantities of business information but have very little real business knowledge. A recent survey of 450 business executives re- ported that managerial intuition and instinct are more prevalent than hard facts in driving or- ganizational decisions. To reverse this trend, businesses of all sizes would be well advised to adopt Organizational Data Mining (ODM). ODM is defined as leveraging Data Mining tools and technologies to enhance the decision-making process by transforming data into valuable and actionable knowledge to gain a competitive advantage. ODM has helped many organi- zations optimize internal resource allocations while better understanding and responding to the needs of their customers. The fundamental aspects of ODM can be categorized into Arti- ficial Intelligence (AI), Information Technology (IT), and Organizational Theory (OT), with OT being the key distinction between ODM and Data Mining. In this chapter, we introduce ODM, explain its unique characteristics, and report on the current status of ODM research. Next we illustrate how several leading organizations have adopted ODM and are benefiting from it. Then we examine the evolution of ODM to the present day and conclude our chapter by contemplating ODM’s challenging yet opportunistic future. Key words: Organizational Data Mining, Customer Relationship Management 55.1 Introduction Data experts estimate that in 2002 the world generated 5 exabytes of information. This amount of data is more than all the words ever spoken by human beings. And the rate of growth is just as staggering – the amount of data produced in 2002 was up 68% from just two years earlier. The size of the typical business database has grown a hundred-fold during the past five years as a result of Internet commerce, ever-expanding computer systems and mandated record keeping O. Maimon, L. Rokach (eds.), Data Mining and Knowledge Discovery Handbook, 2nd ed., DOI 10.1007/978-0-387-09823-4_55, © Springer Science+Business Media, LLC 2010 1042 Hamid R. Nemati and Christopher D. Barko by government regulations. To better grasp how much data this is, consider the following: if one byte of data is the equivalent of this dot , the amount of data produced globally in 2002 would equal the diameter of 4,000 suns. And that amount has probably doubled since then (Hardy, 2004). In spite of this enormous growth in enterprise databases, research from IBM reveals that organizations use less than 1 percent of their data for analysis (Brown, 2002). This is the fun- damental irony of the Information Age we live in: organizations possess enormous amounts of business information, yet have so little real business knowledge. And to magnify the problem further, a leading business intelligence firm recently surveyed executives at 450 companies and discovered that 90 percent of these organizations rely on gut instinct rather than hard facts for most of their decisions because they lack the necessary information when they need it (Brown, 2002). And in cases where sufficient business information is available, those organizations are only able to utilize less than 7 percent of it (The Economist, 2001). This proclamation about data volume growth is no longer surprising, but continues to amaze even the experts. Although for businesses, more data isn’t always better. Organizations must assess what data they need to collect and how to best leverage it. Collecting, storing and managing business data and associated databases can be costly, and expending scarce resources to acquire and manage extraneous data fuels inefficiency and hinders optimal per- formance. The generation and management of business data also loses much of its potential organizational value unless important conclusions can be extracted from it quickly enough to influence decision making while the business opportunity is still present. Managers must rapidly and thoroughly understand the factors driving their business in order to sustain a com- petitive advantage. Organizational speed and agility supported by fact-based decision making are critical to ensure an organization remains at least one step ahead of its competitors. In the past, companies have struggled to make decisions because of the lack of data. But in the current environment, more and more organizations are struggling to overcome ”informa- tion paralysis” – there is so much data available that it is difficult to determine what is relevant and how to extract meaningful knowledge. Organizations today routinely collect and man- age terabytes of data in their databases, thereby making information paralysis a key challenge in enterprise decision-making. Once the essential data elements are identified, the data must be reformatted, pre-processed and analyzed to generate knowledge. The resulting knowledge is then delivered to the decision-makers for collaboration, review and action. Once decided upon, the final decision must be communicated to the appropriate parties in a rapid, efficient and cost-effective manner. 55.2 Organizational Data Mining The manner in which organizations execute this intricate decision-making process is critical to their well-being and industry competitiveness. Those organizations making swift, fact-based decisions by optimally leveraging their data resources will outperform those organizations that do not. A robust technology that facilitates this process of optimal decision-making is known as Organizational Data Mining (ODM). ODM is defined as leveraging Data Mining tools and technologies to enhance the decision-making process by transforming data into valuable and actionable knowledge to gain a competitive advantage (Nemati and Barko, 2001). ODM elimi- nates the guesswork that permeates so much of corporate decision making. By adopting ODM, an organization’s managers and employees are able to act sooner rather than later, be proactive rather than reactive and know rather than guess. ODM technology has helped many organiza- 55 Organizational Data Mining 1043 tions optimize internal resource allocations while better understanding and responding to the needs of their customers. ODM spans a wide array of technologies, including, but not limited to, e-business intelli- gence, data analysis, online analytical processing (OLAP), customer relationship management (CRM), electronic CRM (e-CRM), executive information systems (EIS), digital dashboards and information portals. ODM enables organizations to answer questions about the past (what has happened?), the present (what is happening?), and the future (what might happen?). Armed with this capability, organizations can generate valuable knowledge from their data, which in turn enhances enterprise decisions. This decision-enhancing technology enables many advan- tages in operations (faster product development, increased market share with quicker time to market, optimal supply chain management), marketing (higher profitability and increased customer loyalty through more effective marketing campaigns and customer profitability anal- yses) finance (improved performance through financial analytics and economic evaluation of business units and products) and strategy implementation (business performance management (BPM), the Balanced Scorecard, and related strategy alignment and measurement systems). The result of this enhanced decision making at all levels of the organization is optimal re- source allocation and improved business performance. Profitability in business today relies on speed, agility and efficiency at quality levels thought unobtainable just a few years ago. The slightest imbalance along the supply chain can increase costs, lengthen internal cycle times and delay new product introductions. These im- balances can eventually lead to a loss in both market share and competitive advantage. Mean- while, organizations are also forging closer relationships with their customers and suppliers by defining tighter agreements in terms of shared processes and risks. As a result, many busi- nesses are deeply immersed in continuously reengineering their processes to improve quality. Six sigma and Balanced Scorecard type efforts are increasingly prevalent. ODM enables or- ganizations to remove supply chain imbalances while improving the speed, flexibility and ef- ficiency of their business processes. This leads to stronger customer and partner relationships and a sustainable competitive advantage. 55.3 ODM versus Data Mining Data Mining is the process of discovering and interpreting previously unknown patterns in databases. It is a powerful technology that converts data into information and potentially ac- tionable knowledge. However, obtaining new knowledge in an organizational vacuum does not facilitate optimal decision making in a business setting. The unique organizational challenge of understanding and leveraging ODM to engineer actionable knowledge requires assimilat- ing insights from a variety of organizational and technical fields and developing a comprehen- sive framework that supports an organization’s quest for a sustainable competitive advantage. These multidisciplinary fields include Data Mining, business strategy, organizational learn- ing and behavior, organizational culture, organizational politics, business ethics and privacy, knowledge management, information sciences and decision support systems. These funda- mental elements of ODM can be summarized into three main groups: Artificial Intelligence (AI), Information Technology (IT), and Organizational Theory (OT). Our research and indus- try experience suggest that successfully leveraging ODM requires integrating insights from all three categories in an organizational setting typically characterized by complexity and un- certainty. This is the essence and uniqueness of ODM. Obtaining maximum value from ODM involves a cross-department team effort that includes statisticians/data miners, software engi- 1044 Hamid R. Nemati and Christopher D. Barko neers, business analysts, line-of-business managers, subject matter experts, and upper man- agement support. 55.3.1 Organizational Theory and ODM Organizations are primarily concerned with studying how operating efficiencies and profitabil- ity can be achieved through the effective management of customers, suppliers, partners, and employees. To achieve these goals, research in Organizational Theory (OT) suggests that orga- nizations use data in three vital knowledge creation activities. This organizational knowledge creation and management is a learned ability that can only be achieved via an organized and deliberate methodology. This methodology is a foundation for successfully leveraging ODM within the organization. The three knowledge creation activities (Choo, 1997) are: • Sense making is the ability to interpret and understand information about the environment and events happening both inside and outside the organization. • Knowledge making is the ability to create new knowledge by combining the expertise of members to learn and innovate. • Decision making is the ability to process and analyze information and knowledge in order to select and implement the appropriate course of action. First, organizations use data to make sense of changes and developments in the external environments – a process called sense making. This is a vital activity wherein managers dis- cern the most significant changes, interpret their meaning, and develop appropriate responses. Secondly, organizations create, organize, and process data to generate new knowledge through organizational learning. This knowledge creation activity enables the organization to develop new capabilities, design new products and services, enhance existing offerings, and improve organizational processes. Third, organizations search for and evaluate data in order to make decisions. This data is critical since all organizational actions are initiated by decisions and all decisions are commitments to actions, the consequences of which will, in turn, lead to the creation of new data. Adopting an OT methodology enables an enterprise to enhance the knowledge engineering and management process. In another OT study, researchers and academic scholars have observed that there is no direct correlation between information technology (IT) investments and organizational perfor- mance. Research has confirmed that identical IT investments in two different companies may give a competitive advantage to one company but not the other. Therefore, a key factor for the competitive advantage in an organization is not the IT investment but the effective utilization of information as it relates to organizational performance (Brynjolfsson and Hitt, 1996). This finding emphasizes the necessity of integrating OT practices with robust information technol- ogy and artificial intelligence techniques in successfully leveraging ODM. 55.4 Ongoing ODM Research Given the scarcity of past research in ODM along with its growing acceptance and importance in organizations, we conducted empirical research during the past several years that explored the utilization of ODM in organizations along with project implementation factors critical for success. We surveyed ODM professionals from multiple industries in both domestic and international organizations. Our initial research examined the ODM industry status and best practices, identified both technical and business issues related to ODM projects, and elaborated 55 Organizational Data Mining 1045 on how organizations are benefiting through enhanced enterprise decision-making (Nemati and Barko, 2001). The results of our research suggest that ODM can improve the quality and accuracy of decisions for any organization willing to make the investment. After exploring the status and utilization of ODM in organizations, we decided to focus subsequent research on how organizations implement ODM projects and the factors critical for its success. Similar to our initial research, this was pursued in response to the scarcity of empirical research investigating the implementation of ODM projects. To that end, we de- veloped a new ODM Implementation Framework based on data, technology, organizations, and the Iron Triangle (Nemati and Barko, 2003). Our research demonstrated that selected or- ganizational Data Mining project factors, when modeled under this new framework, have a significant influence on the successful implementation of ODM projects. Our latest research has focused on a specific ODM technology known as Electronic Cus- tomer Relationship Management (e-CRM) and its data integration role within organizations. We developed a new e-CRM Value Framework to better examine the significance of integrat- ing data from all customer touch-points with the goal of improving customer relationships and creating additional value for the firm. Our research findings suggest that despite the cost and complexity, data integration for e-CRM projects contributes to a better understanding of the customer and leads to higher return on investment (ROI), a greater number of benefits, im- proved user satisfaction and a higher probability of attaining a competitive advantage (Nemati et al., 2003). 55.5 ODM Advantages A 2002 Strategic Decision Making study conducted by Hackett Best Practices determined that ”world-class” companies have adopted ODM technologies at more than twice the rate of ”average” companies (Hoblitzell, 2002). ODM technologies provide these world-class organi- zations greater opportunities to understand their business and make informed decisions. ODM also enables world-class organizations to leverage their internal resources more efficiently and effectively than their ”average” counterparts who have not fully embraced ODM. Many of today’s leading organizations credit their success to the development of an in- tegrated, enterprise-level ODM system. For example, Harrah’s Entertainment has saved over $20 million per year since implementing its Total Rewards CRM program. This ODM sys- tem has given Harrah’s a better understanding of its customers and enabled the company to create targeted marketing campaigns that almost doubled the profit per customer and deliv- ered same-store sales growth of 14 percent after only the first year. In another notable case, Travelocity.com, an Internet-based travel agency, implemented an ODM system and improved total bookings and earnings by 100 percent in 2000. Gross profit margins improved 150 per- cent, and booker conversion rates rose to 8.9 percent, the highest in the online travel services industry. In another significant study, executives from twenty-four leading companies in customer- knowledge management, including FedEx, Frito-Lay, Harley-Davidson, Procter & Gamble and 3M, all realized that in order to succeed, they must go beyond simply collecting customer data and translate it into meaningful knowledge about existing and potential customers (Dav- enport et al., 2001). This study revealed that several objectives were common to all of the leading companies, and these objectives can be facilitated by ODM. A few of these objectives are segmenting the customer base, prioritizing customers, understanding online customer be- havior, engendering customer loyalty, and increasing cross-selling opportunities. 1046 Hamid R. Nemati and Christopher D. Barko 55.6 ODM Evolution 55.6.1 Past Initially, IT systems were developed to automate expensive manual systems. This automation provided cost savings through labor reductions and more accurate, faster processes. Over the last three decades, the organizational role of information technology has evolved from effi- ciently processing large amounts of batch transactions to providing information in support of tactical and strategic decision-making activities. This evolution from automating expensive manual systems to providing strategic organizational value led to the birth of Decision Support Systems (DSS) such as data warehousing and Data Mining. Operational and decision support systems are now a vital part of many organizations. The organizational need to combine data from multiple stand-alone systems (e.g. financial, manufacturing and distribution) grew as cor- porations began to acknowledge the power of combining these data sources for reporting. This spurred the growth of data warehousing where multiple data sources were stored in a format that supported advanced data analysis. The slowness in adoption of ODM techniques in the 1990s was partly due to an orga- nizational and cultural resistance. Business management has always been reluctant to trust something it doesn’t fully understand. Until recently, most businesses were managed by in- stinct, intuition and ”gut feel”. The transition over the past twenty years to a method of man- aging by the numbers is both the result of technology advances as well as a generational shift in the business world as younger managers arrive with information technology training and experience. 55.6.2 Present Many current ODM techniques trace their origins to traditional statistics and artificial intel- ligence research from the 1980s. Today, there are extensive vertical Data Mining applica- tions providing analysis in the domains of banking and credit, bioinformatics, CRM, e-CRM, healthcare, human resources, e-commerce, insurance, investment, manufacturing, marketing, retail, entertainment, and telecommunications. Our latest survey findings indicate that the banking, accounting/financial, e-commerce, and retail industries display the highest ODM maturity level to date. The need for service organizations (banking, financial, healthcare and insurance) to build a holistic view of their customers through a mass customization marketing strategy is critical to remaining competitive. And organizations in the e-commerce industry are continuing to improve online customer relationships and overall profitability via e-CRM technologies (Nemati and Barko, 2001). Continuous technological innovations now enable the affordable exploration of enormous volumes of data. It is the combination of technological innovation, creation of new advanced pattern-recognition and data-analysis techniques, ongo- ing research in organizational theory, and the availability of large quantities of data that have guided ODM to where it is today. 55.6.3 Future The number of ODM projects is projected to grow more than 300 percent in the next decade (Linden, 1999). As the collection, organization and storage of data rapidly increases, ODM will be the only means of extracting timely and relevant knowledge from large corporate 55 Organizational Data Mining 1047 databases. The growing mountains of business data coupled with recent advances in Orga- nizational Theory and technological innovations provide organizations with a framework to effectively use their data to gain a competitive advantage. An organization’s future success will depend largely on whether or not they adopt and leverage this ODM framework. ODM will continue to expand and mature as the corporate demand for one-to-one marketing, CRM, e-CRM, Web personalization, and related interactive media increases. As information technology advances, organizations are able to collect, store, process, an- alyze and distribute an ever-increasing amount of data. Data and information are rampant, but knowledge is scarce. As a result, most organizations today are governed by managerial intu- ition and historical reporting. This is the byproduct of years of system automation. However, we believe organizations are slowly moving from the Information Age to the Knowledge Age where decision-makers will leverage ODM and Internet technologies to augment intuition in order to allocate scarce enterprise resources for optimal performance. As organizations set a strategic course into the Knowledge Age, there are a number of difficulties awaiting them. As its name suggests, ODM is part technological and part orga- nizational. Organizations are comprised of individuals, management, politics, culture, hierar- chies, teams, processes, customers, partners, suppliers, and shareholders. The never-ending challenge is to successfully integrate Data Mining technologies with organizations to enhance decision-making with the objective of optimally allocating scarce enterprise resources. As many consultants, professionals, industry leaders and authors of this chapter can attest, this is not an easy task. The media can oversimplify the effort, but successfully implementing ODM is not accomplished without political battles, project management struggles, cultural shocks, business process reengineering, personnel changes, short-term financial and budgetary short- ages, and overall disarray. ODM is a journey, not a destination, so there must be a continual effort in revising existing knowledge bases and generating new ones. But the benefits far out- weigh both the technical and organizational costs, and the enhanced decision-making capabil- ities can lead to a sustainable competitive advantage. Recent ODM research has revealed a number of industry predictions that are expected to be key ODM issues in the future (Nemati and Barko, 2001). About 80 percent of survey respondents expect web farming/mining and consumer privacy to be significant issues, while over 90 percent predict ODM integration with external data sources to be important. We also foresee the development of widely accepted standards for ODM processes and techniques to be an influential factor for knowledge seekers in the 21 st century. One attempt at ODM standard- ization is the creation of the Cross Industry Standard Process for Data Mining (CRISP-DM) project that developed an industry and tool neutral data-mining process model to solve busi- ness problems. Another attempt at industry standardization is the work of the Data Mining Group in developing and advocating the Predictive Model Markup Language (PMML), which is an XML-based language that provides a quick and easy way for companies to define predic- tive models and share models between compliant vendors’ applications. Lastly, Microsoft’s OLE DB for Data Mining is a further attempt at industry standardization and integration. This specification offers a common interface for Data Mining that will enable developers to embed data-mining capabilities into their existing applications. One only has to consider Microsoft’s industry-wide dominance of the office productivity (Microsoft Office), software development (Visual Basic and .Net) and database (SQL Server) markets to envision the potential impact this could have on the ODM market and its future direction. 1048 Hamid R. Nemati and Christopher D. Barko 55.7 Summary Although many improvements have materialized over the last decade, the knowledge gap in many organizations is still prevalent. Industry professionals have suggested that many corpo- rations could maintain current revenues at half the current costs if they optimized their use of corporate data. Whether this finding is true or not, it sheds light on an important issue. Leading corporations in the next decade will adopt and weave these ODM technologies into the fabric of their organizations at all levels, from upper management all the way down to the lowest organizational level. Those enterprises that see the strategic value of evolving into knowledge organizations by leveraging ODM will benefit directly in the form of improved profitabil- ity, increased efficiency, and a sustainable competitive advantage. Once the first organization within an industry realizes a competitive advantage through ODM, it is only a matter of time before one of three events transpires: its industry competitors adopt ODM, change industries, or vanish. By adopting ODM, an organization’s managers and employees are able to act sooner rather than later, anticipate rather than react, know rather than guess, and ultimately, succeed rather than fail. References Anonymous (2001), ”The slow progress of fast wires”, The Economist, London, Vol. 358, No. 8209, February 17. Brown, E. (2002), ”Analyze This”, Forbes, Vol. 169, No. 8, April 1, pp. 96-98. Brynjolfsson, E. and Hitt, L. (1996), “The Customer Counts”, InformationWeek, September 9,www.informationweek.com/596/96mit.htm. Choo, C. W. (1997), The Knowing Organization: How Organizations Use Information to Construct Meaning, Create Knowledge, and Make Decisions, Oxford University Press, www.choo.fis.utoronto.ca/fis/ko/default.html. Davenport, T. H., Harris, J. G. and Kohli, A. K. (2001), “How Do They Know Their Cus- tomers So Well?”, Sloan Management Review, Vol. 42, No. 2, Winter, pp. 63-73. Hardy, Q. (2004), “Data of Reckoning”, Forbes, Vol. 173, No. 10, May 10, pp 151-154. Hoblitzell, T. (2002), ”Disconnects in Today’s BI Systems”, DM Review, Vol. 12, No. 6, July, pp. 56-59. Linden, A. (1999), CIO Update: Data Mining Applications of the Next Decade, Inside Gart- ner Group, Gartner Inc., July 7. Nemati, H. R. and Barko, C. D. (2001), ”Issues in Organizational Data Mining: A Survey of Current Practices”, Journal of Data Warehousing, Vol. 6, No. 1, Winter, pp. 25-36. Nemati, H. R. and Barko, C. D. (2003), ”Key Factors for Achieving Organizational Data Mining Success”, Industrial Management and Data Systems, Vol. 103, No. 4, pp. 282- 292. Nemati, H. R., Barko, C. D. and Moosa, A. (2003), ”E-CRM Analytics: The Role of Data Integration”, Journal of Electronic Commerce in Organizations, Vol. 1, No. 3, July-Sept, pp. 73-89. 56 Mining Time Series Data Chotirat Ann Ratanamahatana 1 , Jessica Lin 1 , Dimitrios Gunopulos 1 , Eamonn Keogh 1 , Michail Vlachos 2 , and Gautam Das 3 1 University of California, Riverside 2 IBM T.J. Watson Research Center 3 University of Texas, Arlington Summary. Much of the world’s supply of data is in the form of time series. In the last decade, there has been an explosion of interest in mining time series data. A number of new algo- rithms have been introduced to classify, cluster, segment, index, discover rules, and detect anomalies/novelties in time series. While these many different techniques used to solve these problems use a multitude of different techniques, they all have one common factor; they re- quire some high level representation of the data, rather than the original raw data. These high level representations are necessary as a feature extraction step, or simply to make the storage, transmission, and computation of massive dataset feasible. A multitude of representations have been proposed in the literature, including spectral transforms, wavelets transforms, piecewise polynomials, eigenfunctions, and symbolic mappings. This chapter gives a high-level survey of time series Data Mining tasks, with an emphasis on time series representations. Key words: Data Mining, Time Series, Representations, Classification, Clustering, Time Se- ries Similarity Measures 56.1 Introduction Time series data accounts for an increasingly large fraction of the world’s supply of data. A random sample of 4,000 graphics from 15 of the world’s newspapers published from 1974 to 1989 found that more than 75% of all graphics were time series (Tufte, 1983). Given the ubiquity of time series data, and the exponentially growing sizes of databases, there has been recently been an explosion of interest in time series Data Mining. In the medical domain alone, large volumes of data as diverse as gene expression data (Aach and Church, 2001), electrocar- diograms, electroencephalograms, gait analysis and growth development charts are routinely created. Similar remarks apply to industry, entertainment, finance, meteorology and virtually every other field of human endeavour. Although statisticians have worked with time series for more than a century, many of their techniques hold little utility for researchers working with massive time series databases (for reasons discussed below). Below are the major task considered by the time series Data Mining community. O. Maimon, L. Rokach (eds.), Data Mining and Knowledge Discovery Handbook, 2nd ed., DOI 10.1007/978-0-387-09823-4_56, © Springer Science+Business Media, LLC 2010 . commerce, ever-expanding computer systems and mandated record keeping O. Maimon, L. Rokach (eds.), Data Mining and Knowledge Discovery Handbook, 2nd ed., DOI 10.1007/978-0-387-09 823 -4_55, © Springer. Organizational Data Mining Success”, Industrial Management and Data Systems, Vol. 103, No. 4, pp. 28 2- 29 2. Nemati, H. R., Barko, C. D. and Moosa, A. (20 03), ”E-CRM Analytics: The Role of Data Integration”,. series databases (for reasons discussed below). Below are the major task considered by the time series Data Mining community. O. Maimon, L. Rokach (eds.), Data Mining and Knowledge Discovery Handbook,