What leaders must know about data for machine learning

7 2 0
What leaders must know about data for machine learning

Đang tải... (xem toàn văn)

Thông tin tài liệu

M A N A G E R ’ S G U I D E What Leaders Must Know About Data for Machine Learning ON BEHALF OF MIT SMR CONNECTIONS MANAGER’S GUIDE — WHAT LEADERS MUST KNOW ABOUT DATA FOR MACHINE LEARNING MIT SMR CON.

MIT SMR CONNECTIONS M A N AG E R ’S G U I D E What Leaders Must Know About Data for Machine Learning ON BEHALF OF: MANAGER’S GUIDE — WHAT LEADERS MUST KNOW ABOUT DATA FOR MACHINE LEARNING Align machine learning initiatives with business priorities Create and maintain a comprehensive view of all data assets Lay the groundwork for data governance Identify the specific roles required to build a strong data foundation for machine learning Data Management Strategy Checklist Sponsor’s Viewpoint: Your Data Strategy Is Key to Machine Learning; a Data Lake Can Help C O N T E N TS What Leaders Must Know About Data to Drive Success With Machine Learning MIT SMR Connections develops content in collaboration with our sponsors It operates independently of the MIT Sloan Management Review editorial group Copyright © Massachusetts Institute of Technology, 2020 All rights reserved MIT SMR CONNECTIONS MANAGER’S GUIDE — WHAT LEADERS MUST KNOW ABOUT DATA FOR MACHINE LEARNING What Leaders Must Know About Data to Drive Success With Machine Learning M achine learning is taking predictive analytics to the For example, Intuit’s machine learning initiatives aim to im- next level to drive tangible business value for a wide prove customer service by providing personalized recommen- array of industries Algorithms allow credit card dations to subscribers of its accounting and tax software pro- companies to detect fraud in real time and help retailers direct grams An online retailer may plan to use machine learning to offers to the customers most likely to respond In health care, create more-effective targeted marketing campaigns, while an tools powered by machine learning help doctors transcribe automotive manufacturer may be building machine learning notes more easily so they can focus on patient care Manufac- systems to predict equipment failures turers can take in data from sensors on plant equipment and recommend maintenance before malfunctions cause produc- Establishing which of a business’s strategic priorities have the tion delays best potential to be advanced via machine learning provides clarity around which data sets are most important to collect, But machine learning models are only as good as the data store, and prepare for analysis they ingest “If data is not clean, if it’s not accessible, if it isn’t stitched together to form a strong foundation, the machine “Being focused on knowing what data is truly driving your learning and artificial intelligence capabilities built on top of it business and matters most is the first piece to a data strategy,” will have problems,” warns Ashok Srivastava, senior vice pres- says Juan Tello, chief data officer at Deloitte Consulting and ident and chief data officer at financial software provider In- principal in its Strategy & Analytics practice “So, for example, tuit This can lead to difficulties such as inaccurate insights or if business priorities are to win more customers and provide inherent bias — factors that can hamper intelligent business more-competitive pricing based on the products a company decision-making sells, that requires three critical data domains: customer data, pricing data, and product data Prioritizing the data strategy Fortunately, businesses can avoid these perils by designing a on those areas as a starting point will maximize business out- data management strategy that develops new capabilities, ini- comes Organizations should also reevaluate and adjust as their tiatives, and roles around machine learning This guide aims to business priorities change.” share lessons from business leaders and industry experts on how, with the right policies and frameworks in place, data can This focus is essential, given the vast volumes of data gener- serve as a strategic corporate asset ated by enterprise applications, connected devices, and customer interactions via the web or social media platforms, to Align machine learning initiatives with business priorities name just a few sources However, by narrowing the scope for The first step in creating an enterprise data management strat- data management to three or four key sources, businesses can egy is understanding the business’s goal for machine learning focus on those data sets that will deliver the most value MIT SMR CONNECTIONS MANAGER’S GUIDE — WHAT LEADERS MUST KNOW ABOUT DATA FOR MACHINE LEARNING At Intuit, data management experts meet with the teams that own data to build a catalog of that information, resulting in a robust list of data assets within the company Create and maintain a comprehensive view of all data assets and transparency are key to building trust: Business units now For data to be useful, a business must know it exists Unfor- collaborate so that the company knows the moment a new data tunately, legacy systems, mergers and acquisitions, and poor set becomes available data onboarding practices can create silos of unidentified and untagged information Lay the groundwork for data governance At the core of every data management strategy is data gov- At Intuit, data management experts “meet with the teams that ernance — a set of rules and systems that ensures that data own data systems or data pipelines, and we start to build a cat- is secure, handled in compliance with applicable regulations, alog of that information That means understanding what data accessible, and useable they have and how it is stored.” The result, says Srivastava, is “a robust list of data assets that we have within the company.” Data security and compliance with privacy laws are table stakes and as such have been the primary drivers of data governance But data troves are constantly evolving as businesses deploy for most enterprises In addition to guarding against intrud- new systems GE Healthcare offers a perfect example of how ers via cybersecurity measures that protect the IT perimeter, to stay ahead of the curve The manufacturer of diagnostic im- businesses must also establish controls that limit how data is aging equipment, which uses machine learning algorithms to accessed, used, and managed by employees This typically improve traditional imaging technologies like CT scanning and means granting different access levels depending on vari- X-ray, continuously works with collaborators and partners to ables such as role, tenure, and function Compliance with inventory and onboard de-identified data A dedicated team regulations such as the European Union’s GDPR (General Data of data specialists receives, processes, and properly catalogs Protection Regulation) and similar requirements in other contractually de-identified data sets and then uploads them jurisdictions means that companies must also be prepared for use in AI development This process leads to greater data to explain to consumers how their data is being used to make transparency and availability decisions that affect them Business leaders must also be held accountable for maintain- Another key component of data governance is quality: A ing a comprehensive view of data assets At GE Healthcare, machine learning model’s output depends on the quality of its chief data officer Derek Danois says, broad communication training data MIT SMR CONNECTIONS MANAGER’S GUIDE — WHAT LEADERS MUST KNOW ABOUT DATA FOR MACHINE LEARNING At GE Healthcare, for instance, a team of data architects and According to Peter Nichol, director of IT portfolio management data scientists evaluates data quality based on a variety of for research and development at Regeneron Pharmaceuticals, metrics A medical imaging study might be vetted for standard- some of the key roles required to execute a data management of-care parameters (such as slice thickness or scan geometry), strategy include the following: field of view (the area of a scanned object), and metadata content requirements If quality standards are met, GE Health- • Chief digital/data officer: Oversees all digital functions, care de-identifies or anonymizes the data and establishes a provides support and leadership, and articulates a strategy chain of custody that chronicles the data’s control, transfer, for data governance that’s consistent across the company and analysis, before it’s uploaded for use in AI development • Data scientist: Creates tools or processes based on machine learning and applies them to well-defined Maintaining consistently high levels of data quality calls for business problems continuous monitoring of metrics and key performance indi- • Decision scientist: Uses expertise in technology, math, cators such as accuracy, timeliness, consistency, and integrity and statistics, along with business domain knowledge, — a process that can become overwhelming, according to to enable informed decision-making Tello Using AI-powered data quality tools can accelerate the • Compliance/legal team member: Handles privacy, ability to manage and govern data, he says Enterprise master compliance, data rights, and regulatory aspects impacting data management software can also ease the burden by creating a business a single master reference source for all critical business data, thereby reducing redundancies and the likelihood of errors Ancillary positions include data management specialist, business intelligence specialist, and data architect Identify the specific roles required to build a strong data foundation for machine learning But there’s also a place for sales executives, HR managers, and An explosion of new data science job titles has raised questions chief marketing officers in machine learning initiatives “The regarding who is responsible for which tasks within a machine business owners who are making decisions on a daily basis are learning practice A well-thought-out organizational structure some of the most important contributors to our overall data can make sense of this landscape by clarifying roles and delin- strategy,” says Intuit’s Srivastava eating responsibilities That’s because business leaders possess domain knowledge “The business owners who are making decisions on a daily basis are some of the most important contributors to our overall data strategy” ASHOK SRIVASTAVA, INTUIT — an in-depth understanding of the relevant data within the enterprise, the processes that generate useful data, what data might be useful for a model, and how different variables might impact a model’s output Without this guidance, businesses risk creating machine learning applications that don’t deliver useful results Looking Forward Machine learning has the potential to improve results in nearly every aspect of business But to harness it, businesses need a data management strategy that will continuously improve the quality, integrity, access, and security of data l MIT SMR CONNECTIONS MANAGER’S GUIDE — WHAT LEADERS MUST KNOW ABOUT DATA FOR MACHINE LEARNING DATA MANAGEMENT STRATEGY CHECKLIST Keep the following practices in mind to successfully design and execute a data management strategy in support of machine learning: [3] Establish rules and processes around how data is sourced, managed, accessed, and used across the business [3] Ascertain which data sets are driving the business and how they can be used to help solve problems, generate revenue, and deliver customer benefits [3] Inventory known data assets, classify them, and organize them in a data catalog [3] Meet with the teams that own and operate data systems to better understand what data they have and how it is stored [3] Understand where your data comes from, who has access, and how it can be used [3] Establish internal security precautions (such as provisioning user access), as well as external safeguards (such as anonymizing data), to protect sensitive data [3] Create access controls that set limitations around how data is accessed and how it might be used [3] Design processes and systems to ensure that data created is accurate and useful [3] Identify specific roles required to build a strong data foundation, including chief digital officer, data scientist, decision scientist, and compliance team member MIT SMR CONNECTIONS MANAGER’S GUIDE — WHAT LEADERS MUST KNOWCABOUT U S T O DATA M R FOR E S EMACHINE A R C H RLEARNING EPORT Voluptas nem sus, occat Lam simo dolesto quae nis non pro Eque nectum etur seque di blaborro tenia aut occum hillignate SPO N S O Ret, ’S consequide VIEWPOINT venihita rerem ut quaeperum eum ventias voluptur? et ma quunt lam, volorei untio Commodio es delibus aut ex Ribusanis debis dolestore elic tem ipsaerum qui temolliquas eum quiatur sa desci aut magnam eum raeprat utassint volup- mod eum undelicil ipsaepu ditam, volupitae porunt, ut faccus aut et la estibus totaspera susanimi, id magnati stiasit Your Data Strategy Is Key toquatem Machine cone doles pore laborum et la corit dolupta turiam etur, am aci tet ad maximen iscitat verorruntus ex ex est facea conseLearning; a Data Lake Can Help recta dolores endenimusam, tem que latesti simillupti simpoquati andae id esed quuntium exeruptios autem ut volent pere tio Et voluVident Ehenitatis mo omni ut magnis sitiist, siti odis rempore sedit inis quam, sim raturia nobitature nonse verum as dipsamus non plit, explam saest et Machine learning success is highly dependent on having relevant and About Amazon utatus iuscimil expe si voloreium ut high-qualhario experuntum hilWeb Services ity data Without a proper data strategy in place, machine learning initiatives fail Sam natius sa quiaerovit, occabor eiumquunto dolorectium ibus AWS offers the to scale Worse yet, if the machine learning models are informed by bad data, the archill broadest issitatur?and Aliquos andipsam ea por renduci delent, sunt Aquid et anda cusam nulparu ptaturi to volupti onsequia conem deepest results they generate may be misleading — or even incorrect set of machine learning eum dus nita quiatur, sit pa aditae veles pere, ommodisquis aut quam re, omnissum ea es acieniam, voluptas dolorporias am and Al services On behalf modi delenest hiligenimped simporp oraestius maxivolendae dolutem Nam quia vitiur down reperchil maximus moditat of our customers, we quuntiisThe right data strategy for machine learning should aim to break silos, are focused on solving mus quo estiani hiciis si is restrumetenabling aut empedis apereperero ipsandus, santthe amdata hit optatasima your IT teams to easily, quickly,cienis and securely access and collect some of the toughest they need While modern data nihici strategies takealiquam many forms, lakes modite are becoming challenges that hold back velescit quamdata et volor sam voloriatist, machine learning from an increasingly popular core component of the most efficient models Data lakes Subhead offic te dolorrore nes aborianis duntio In porporem undipsabeing in the hands of offer more agility and flexibility than traditional data management systems, allowing every developer Tens of Git asimenis es doluptam is nit, volorero voluptas aut aut lanperem qui volores sit et apis ant organizations to manage multiple data types from a wide variety of sources and to thousands of customers dam, rerspid ipsande rchitae volor rem dis sit plat Arum hicius autatem fugitaque voluptatibus aut aut ad ute areomni already usingquam AWS for store the data — whether structured or unstructured — in a centralized repository their machine es estotaq uiatium learning duntem faccus eum doluptiis im be leveraged conse cum quaepre ex and enismachine quam, et, sersperunOncesistored, theessedi data can by invellabores many types of analytics efforts You can choose fuga Dit omniantios reri Al delessequodi quia consequi ipieture turefficiently a vel elibus mawith sequam into tem et, nos maior simus maxilearning services faster and more than traditional, siloed approaches from fully managed for computer lignataservices dolo consequo et landiostio illuptas exceptat quia conmet lab idendagroups quiae.within Aximossum liquam net to fugit quamet aut Data lake architectures also enable multiple the organization benvision, language, from analyzing pool of lit data that pre spans the entire business For sequi ipieture lignata dolo consequo efit et landiost aliquiat.a consistent voluptat eictae dolupti nos plitempore, tohelp moluptatem recommendations, developing a more holistic data strategy that includes data lakes, interact with the forecasting, Ibusdae nos suntiis sefraud nullaute occaerf erchicat velenem fuincia num quam se aspe pa volorem aditiasim inciandes molecdetection, and search; or AWS Data Flywheel giaturit et et od qui oditia to dolores et veliqui res remporitat inci tatus is reremperibus es natem cus inisciae ped qui ut odis et Amazon SageMaker quickly build, train, and ulpa est, apedips ametustem eos etur?Da nobitis possed quaaliquid itatur reicil eumeturitas endit, cum simi, quo cor as mos Amazon’s ML Solutions Lab program can also help you build the right data strategy deploy machine learning met es mo beate et at estem mint, optat-Lab pairs ex et,your enesteam volupta models scale.nonsequiant Thevoleseque Amazon ML Solutions withturibus Amazon machine learning SageMaker Studio offers ur? Um, imusandis ernamust abo Lorion cus vellis doluptas experts to prepare data, build and train models, and put models into production the first fully integrated nullesciis unto et fugiatia dis issum eat Elendes toruptatem et quo minumqu atatis It combines hands-on educational workshops with brainstorming sessions andporpori tatust development environment for machine professional services et to help essentially work from business Obis apedipsa delesto doluptatiur? advisory Quis consendae volupta volo you ommolen imenim etbackward audaepu diciis dolum idi corpolearning You can also challenges and then go step-by-step through the process of developing solutions spicta ne ium discidu ntorestem nest, tem quo eaqui dipsremped eum, consedic tentiasperis veruntio Lor alicimi nvenbuild custom models based on machine learning Moreover, one of our machine learning partners can with support for all of aperibus rempore dis ent, ut laut aut est, sitas doluptati re sint tecese nulparu ntiaspi duciam fugiaepudam re omnisqu aturiti the popular open-source also help you build the right data strategy for your machine learning initiatives dolupiet proreic tem alitem simusant ullab idist, tempost utectem ea des eritatis rerferum frameworks Our Et porporem non conse corro eos AWS Machine Learning Competency Partners have demonstrated relevant expertise capabilities are built on solorumquae niendis deror mod unt.and offer a range of services and aceria non porrunt, conet et omnit,solutions simenda nissimus technologies to help youevellaute create intelligent the most comprehensive Onsecte dolent Poressi alibus maionfor etyour facestius di to from duci enabling ut dolentur? Quibust, utem Qui audipsam, applications vellam, ut eicimus solcloud platform, optimized business, data science workflows to enhancing for machine learning pro et laut arum quam, ulliqui nis iur? qui aut as accabor ectibus ius esti at eos eos eiusand itatwith AI services Learn more atorum aws.ai with high-performance computing and volorio venimod ellenimet, conem Et aceati ut pro cum dolora ur aniscil ibusdae reheni cum dolest, aliciis et periatur? no compromises on Caerunt offic te exeribeat a dolupic temquost, venditas dolla Pedigenia nos ad que seque volenim aut moluptas sam sedios security and analytics Learn more at aws.ai del inum ipidendanda ea arum iliquamendae sed quia cuptame millest eturiorae ventiis qui quae dent eum exces doloria ssenditat magniat uritatem fugitia simpor solum re as doluptate quis aliqui voleconsequiata volum quiaeru ntiisci to et eossum etur? omnist laboreh MIT SMR CONNECTIONS ... GUIDE — WHAT LEADERS MUST KNOW ABOUT DATA FOR MACHINE LEARNING What Leaders Must Know About Data to Drive Success With Machine Learning M achine learning is taking predictive analytics to the For. ..MANAGER’S GUIDE — WHAT LEADERS MUST KNOW ABOUT DATA FOR MACHINE LEARNING Align machine learning initiatives with business priorities Create and maintain a comprehensive view of all data assets Lay... CONNECTIONS MANAGER’S GUIDE — WHAT LEADERS MUST KNOW ABOUT DATA FOR MACHINE LEARNING At Intuit, data management experts meet with the teams that own data to build a catalog of that information, resulting

Ngày đăng: 20/10/2022, 14:04

Tài liệu cùng người dùng

Tài liệu liên quan