BIG DATA and HEALTH ANALYTICS BIG DATA and HEALTH ANALYTICS EDITED BY KATHERINE MARCONI The Graduate School University of Maryland University College HAROLD LEHMANN School of Medicine The Johns Hopkins University CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2015 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S Government works Version Date: 20141023 International Standard Book Number-13: 978-1-4822-2925-7 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint Except as permitted under U.S Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400 CCC is a not-for-profit organization that provides licenses and registration for a variety of users For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Contents Foreword .ix Karen Bandeen-Roche List of Contributors xiii Introduction .xvii Katherine Marconi and Harold Lehmann List of Abbreviations xxv Section I Chapter Little Big Data: Mastering Existing Information as a Foundation for Big Data Donald A Donahue, Jr Chapter Managing Unstructured Data in a Health Care Setting 25 David E Parkhill Chapter Experiences with Linking Data Systems for Analyzing Large Data 45 Dilhari DeAlmeida, Suzanne J Paone, and John A Kellum Chapter The Ecosystem of Federal Big Data and Its Use in Health Care 57 Ryan H Sandefer and David T Marc Chapter Big Data from the Push of Clinical Information: Harvesting User Feedback for Continuing Education 79 Roland Grad, Pierre Pluye, Michael Shulha, David L Tang, Jonathan Moscovici, Carol Repchinsky, and Jamie Meuser Chapter Addressing Social Determinants of Health Using Big Data 105 Gregory D Stevens v vi • Contents Chapter An International Perspective: Institutionalizing Quality Improvement through Data Utilization at a Multicountry, Multiclinic Level 127 Martine Etienne-Mesubi, Peter Memiah, Ruth Atukunda, Constance Shumba, Francesca Odhiambo, Mercy Niyang, Barbara Bastien, Patience Komba, Eva Karorero, Mwansa Mulenga, Lanette Burrows, and Kristen Stafford Section II Chapter Big Data: Architecture and Its Enablement 155 Bruce Johnson Chapter Health Data Governance: Balancing Best Practices for Data Governance and Management with User Needs 177 Linda Dimitropoulos and Charles (Chuck) Thompson Chapter 10 Roadblocks, Regulation, and Red Tape: How American Health Policy and Industry Norms Threaten the Big Data Revolution 201 Matthew Dobra, Dorothy Weinstein, and Christopher Broyles Chapter 11 Education and Training of Health Informaticists 223 Lynda R Hardy Section III Chapter 12 Interactive Visualization 243 Catherine Plaisant, Megan Monroe, Tamra Meyer, and Ben Shneiderman Chapter 13 Driving Successful Population Health Management and Achieving Triple Aim with Clinical Analytics 263 Kim S Jayhan Contents • vii Chapter 14 Improving Decision-Making Using Health Data Analytics 285 Margrét V Bjarnadóttir, Ritu Agarwal, Kenyon Crowley, QianRan Jin, Sean Barnes, and Kislaya Prasad Chapter 15 Measuring e-Health Impact: An e-Health Evaluation Framework that Leverages Process Control Theory and Big Data Analytics 309 Derek Ritz Author Biographies 333 Index 341 Foreword Karen Bandeen-Roche Over much of history, the generation of data was the cost-limiting step for the advancement of science Tycho Brahe labored for decades in collecting the celestial observations that Johannes Kepler ultimately would use to deduce his laws of planetary motion The last hundred years have witnessed huge data-related investments in field after field, whether in the vast accelerators that have been crucial to modern advancements in particle physics, satellites that have surveyed both our planet and the cosmos, technologies through which we can now sequence the genome, hundreds of thousands of persons who have been assessed through public health cohort studies and social science surveys, or efforts to implement exhaustive electronic medical records With infrastructure increasingly in place, the costs of biomedical data collection plummeting, and crowd-sourcing exploding, the cost-limiting paradigm has inverted Data availability is outstripping existing paradigms for governing, managing, analyzing, and interpreting those data Forces to meet this new demand are strengthening throughout our society Academically, we have seen the genesis of the field of “data science.” Industry demand for data scientists is skyrocketing Government agencies such as the National Science Foundation and National Institutes of Health (NIH) are investing hundreds of millions of dollars toward producing the workforce, norms, methods, and tools needed to reap the benefits of “big data”—collections increasingly of terabyte scope or even larger The NIH, for example, has established a new associate directorship of data science who, among other responsibilities, will oversee the “Big Data to Knowledge” (BD2K) program BD2K will make investments, largely through grants, to “enable biomedical scientists to capitalize more fully on the Big Data being generated by those research communities” In 2013 requests for information were issued, and workshops bringing together big data experts and leaders were convened to prioritize areas for investment, including ones to consider workforce training and development One loud-and-clear message from the training workshop was that the science needed is interdisciplinary, including no less than computer ix x • Foreword science, statistics, applied mathematics, engineering, information science, medicine, physics, public health, and “domain” sciences such as biology, neuroscience, and social science A second was that training must go beyond creating experts in these fields—even ones with specialty skills in big data Rather, what is desperately needed is training to create effective teams spanning these fields, as well as transdisciplinary or “pi-shaped” people who cross boundaries with depth in two or more fields Finally, we seem to be moving toward a reality in which data-intensive activity will touch all areas of science, so that training will increasingly need to span all possibilities of depth—from needing merely to be “conversant” to those who can adeptly apply existing tools for dealing with big data to experts who will create the new methods and tools that are urgently needed if our expertise in utilizing the data is to catch up with the volume and complexity of the data itself This volume targets crucial members of the teams who will be needed to unlock the potential of big data: health care and medical professionals, scientists and their students It engages and grounds its readers in the issues to be faced by showing how health care practitioners and organizations are linking data within and across their medical practice on scales that only now have become possible It also elucidates the realities of moving from medical and administrative records to useful information and the innovative ways that this can be accomplished An initial seven chapters sketch the landscape of biomedical big data, and in so doing, communicate the enormous diversity of data sources and types that are contributing to modern health care practice and research environments, and the massive challenges and needs that are posed by their effective integration and dissemination They also expose us to the many uses to which these data are being applied, ranging from clinical decision-making and risk assessment, to mentorship and training to promote transformation of health care through effective data usage, to the assessment of social risks for poor health and the use of resulting measures to target interventions and investments A subsequent eight chapters then examine critical aspects relating to the data side of the equation, including governance, architecture, public policy issues that affect the use and usefulness of big health care, and the use of emerging information-capture technologies to leverage not only newly accruing data but also existing data A concluding section samples the space of analytics tools—for interactive visualization; in the open source 338 • Author Biographies David T Marc is assistant professor and program director for the graduate program in health informatics at the College of St Scholastica in the Department of Health Informatics and Information Management Previously, Marc was an adjunct faculty at St Scholastica where he taught courses on databases in health care, data analytics, and research design Marc has a master’s in biological sciences and has extensive experience working with large health care data sets Prior to working for St. Scholastica, Marc was employed at a biotech company where he applied myriad analytic approaches to investigate the predictive value of potential biomarkers for psychiatric diseases Currently, he is pursuing a PhD in health informatics at the University of Minnesota Peter Memiah is director of outcomes and evaluation at the Institute of Human Virology His expertise spans from evaluation research to community outreach and support For more than seven years, he has worked across several countries in Africa and the Caribbean implementing quality improvement programs and supporting governments, health facilities, and health professionals alike in using data to impact outcomes This has allowed for comprehensive development and implementation of work plans used to support successful patient scale up Memiah is a public health epidemiologist, and his fieldwork has been effective in combining clinical data with community strategies These methods have allowed him to support health facilities in collecting and analyzing site-specific data and in providing support of the complete use of data through inclusive facility feedback and facility-specific interventions He has also developed a broad spectrum of job aids and material for complete assessment, treatment, and continuity of care for the lifelong treatment of HIV These materials are currently being used by local health care providers all over Africa and the Caribbean His commitment to education and teaching is evident through the support he provides his staff in the field Memiah has also published several manuscripts on quality improvement processes Tamra Meyer is epidemiologist at the Pharmacovigilance Center of the U.S Army in Falls Church, Virginia, where she uses administrative claims data from the Military Health System to conduct drug safety studies Megan Monroe is a senior PhD student and graduate research assistant in the computer science department of the University of Maryland As part of her PhD thesis research she is the main developer of Eventflow Author Biographies • 339 David E Parkhill is vice president of consulting services with the Hitachi Consulting Corporation He is a seasoned technology executive specializing in the creation and application of service solutions for the telecom industry Mr Parkhill was a founder of Network Effects, LLC and has previously held positions as CIO and CTO of Virgin Mobile USA; director of BT’s Service Assembly Project; national director at Andersen Business Consulting; and principal with IBM Global Services By combining his passion for leading edge software technology with delighting customers through innovative products and services, he has delivered products and systems projects that customers and clients have used to grow revenue, acquire new customers, and reduce costs Catherine Plaisant received the doctorat d’Ingénieur in France in 1982 She is a senior research scientist at the Human-Computer Interaction Laboratory of the University of Maryland Institute for Advanced Computer Studies She enjoys working most with multidisciplinary teams on designing and evaluating new interface technologies that are useable and useful She has written more than 100 refereed technical publications on the subjects of information visualization, evaluation methods, electronic health record interfaces, digital libraries, and online help She coauthored with Ben Shneiderman the fifth edition of Designing the User Interface Derek Ritz is the principal consultant at ecGroup Inc He is an advisor to international public and private sector clients regarding e-health strategy, enterprise architecture, informatics standards, and national-scale implementations of e-health infrastructure Ryan H Sandefer is assistant professor and chair at the College of St. Scholastica in the Department of Health Informatics and Information Management Previously, he was research coordinator for the Center for Healthcare Innovation at the College of St Scholastica Sandefer has a master’s in political science and is completing a PhD in health informatics with a focus on consumer engagement in the use of health information technology He is on the editorial advisory board and review panel for Perspectives in Health Information Management with the American Health Information Management Association (AHIMA) He is an elected board member and the chair-elect of the AHIMA Council for Excellence in Education and is chair of its research and periodicals workgroup 340 • Author Biographies Sandefer teaches research methods and health care data analytics and participates in the Minnesota e-health advisory workgroups Ben Shneiderman is distinguished university professor in the Department of Computer Science and was the founding director (1983–2000) of the Human-Computer Interaction Laboratory at the University of Maryland He is a fellow of the ACM, AAAS, and IEEE and member of the National Academy of Engineering He received the ACM SIGCHI Lifetime Achievement Award in 2001 His research interests are human–computer interaction, information visualization, and user interface Gregory D Stevens is associate professor of family medicine and preventive medicine at the Keck School of Medicine, University of Southern California He has been working to improve health care systems by enhancing the quality of primary care practice and integrating the fields of primary care and public health He earned his doctorate and MHS from the Bloomberg School of Public Health at Johns Hopkins University Dorothy Weinstein has a long tenure working in Washington, D.C., on national, state, and local health policy She has been employed at Georgetown University Institute of Health Policy Analysis, Association of American Medical Colleges, American Diabetes Association, Endocrine Society, American College of Cardiology, and National Health Council Her various positions include research and writing, crafting legislation, and directing government relations departments at leading major nonprofit health organizations Her most recent activities concentrated on designing and now implementing health care reform legislation The focus of her efforts is on patient engagement in the health care delivery process She also teaches health policy at the University of Phoenix Weinstein also works in health care volunteer and philanthropic work at Children’s National Medical Center (CNMC) and at the Prevention of Blindness Society in Washington, D.C Dorothy has a BA in philosophy from the University of Maryland and a MA from Duke University’s Sanford School of Public Policy She is a member of Phi Beta Kappa and is a published author in areas including fetal tissue/stem cell research, environmental policy on noise abatement, de minimus standards in risk assessment, and personalized/genomic medicine and patient engagement Index A Accountable care organization (ACO), 38–39, 48, 185 Accreditation health informatics, 235–238 standards, 182 Accreditation Council for Graduate Medical Education, 238 Acquired immunodeficiency syndrome See hIV/AIDS Acquisition-Cognition-Application-Levels of Outcome (ACA-LO), 83–84 Acute care office visits, 270–271 Adaptable elements, 312 Administrative data, 185–186 Admission and concurrent review coordination, 275–276 Agency for Health Research and Quality (AHRQ), 68 Aggregation, data, 13 AIDS Relief Consortium, 128, 130 AIDSRelief Model of Care, 130–131 See also president’s Emergency Plan for AIDS Relief (PEPFAR) outcomes and evaluation, 131–138 training of health care providers, 138–145 Amazon, 224 American Board of Medical Specialties, 237–238 American Diabetes Association Clinical Practice Guidelines, 270 American Health Information Management Association (AHIMA), 235 American Health Insurance Plans (AHIP), 264 American Heart Association, 270 American Medical Association (AMA), 235, 270 American Medical Informatics Association (AMIA), 226, 235, 238 American Nurses Association (ANA), 235 American Nursing Informaticists Association (ANIA), 237 American Recovery and Reinvestment Act, 27 American Society of Health Information Management (ASHIM), 235 Analytics See also health informatics; visualization applied to achieve greater return on investment, 275 clinical information delivery programs, 87–91 comparative, 268, 271 database design and, 168–171 decision making and, 286–304 enterprise data, 160–162 free text feedback, 98–99 health care delivery continuous quality improvement, 131–138 health care specific topics, 162 IAM programs, 93–94, 97–99 market growth, 286–287 open-source tools for open-source data, 69–70 patient care, 277–278 predictive, 12, 40–41, 266–278, 292–302 programs, 11–19 provider profiling and evaluation, 278–281 retrospective, 273 search text, 35–41 target solutions, 162–163 text, 34–35 tools, 170 unstructured data, 26–27, 34–35 utilization, 266–278 341 342 • Index Andersen, R M., 112 Android, Antiretroviral therapy, 129–130, 135, 151. See also hIV/AIDS Apple, Associates in Process Improvement, 146 Asthma medications prescription patterns, 254–260 Ayivor, Israelmore, 21 B Ballistic Research Laboratory, Aberdeen Proving Ground, Bandeen-Roche, Karen, 233 Bayh-Dole Act, 218 Berwick, D M., 269 Best practices, data architecture, 172–174 Big data, 156–157 See also data commercial use of, 224 creating, 204–211 defined, 58–59, 225 growth of computer sophistication and, 5–7 health care spending impact of use of, health informaticists and, 225–228 meaningful use and, 27, 64–67 open-source, 69–70 public policy and, 203–204, 212–217 Bioinformaticists, 225 Blackberry, Blood tests, 30, 33 Bloom, Nicholas, 217 Bloom’s taxonomy of learning, 231, 232 Building Healthy Communities Initiative, 115–116, 117–118 Business drivers, health organization, 185 C California Endowment, 115–116, 117–118 California Health Interview Survey, 116, 117–118 Cameron, William, 212 Canada Health Infoway, 329 Canadian Medical Association (CMA), 84–87, 100–101 limitations of information delivery programs and, 96 quantitative analysis and findings of information delivery programs for, 87–91 two-way knowledge translation, 94–96 Canadian Pharmacists Association (CPhA), 85–86, 93, 100–101 two-way knowledge translation, 95–96 Cancer care, 291–296 CanMEDS Framework, 82 Cascading analytics, 14–16 Case management, 272–275 Catholic Medical Mission Board (CMMB), 130 Catholic Relief Services (CRS), 130 CD4 testing, 150–151 Center for Assistance in Research involving the eRecord (CARe), 49–55 Centers for Medicare and Medicaid Services, 65–66, 228, 275 cost and quality transparency and, 288–290 health care data related to meaningful use attestations, 67–68 mapping meaningful use attestation data for EP and EH, 70–74 Triple Aim and, 268 Chan, C L., 302 Change management, 191–193 Change Management Learning Center, 192 Chart abstractions, 149–150 Chen, Jean-Yi, 301 Choosing Wisely, 92 Churchman, C West, City of Chicago Data Portal, 288 Claims data, 298–302 Clinical data, 185 Clinical information channel (CIC), 81 content validity, 84–85 Clinical information delivery case studies, 85–87 continuing education and, 80 future of, 100–101 limitations of, 96 need for, 80–84 quantitative analysis and findings, 87–91 reducing overdiagnosis, 92–96 two-way knowledge translation, 94–96 Index • 343 Clinical relevance of information index (CRII), 87–91, 100–101 Clustering, 99 Cognos, 11 Collection costs, data, 207–208 Collins, H M., 94 Commission for Accreditation of Health Information and Information Management (CAHIM), 235, 236 Commission on Certification for Health Informatics and Information Management (CCHIIM), 236–237 Common data elements (CDE), 226–227 Common Procedure Terms (CPT), 37 Commonwealth Fund, 122 Communication strategies, change management, 193 Comparative analytics, 268, 271 Comparative effectiveness research (CER), 213, 216 Complex adaptive systems (CAS), Computer Research Institute of Montréal, 98 Confidentiality of Alcohol and Drug Abuse Patient Records, 181 Content validity, 84–85 Context, 312–313 Continuing medical education (CME) programs, 80 See also physicians analytic challenges, 97–99 case studies, 85–87 quantitative analysis and findings of information delivery, 87–91 reducing overdiagnosis, 92–96 spaced online education, 86 Continuous quality improvement (CQI), 128 AIDSRelief Model of Care and, 130–131 case studies, 148–151 data collection, 132–138 data demand and use and sustainability of, 145–148 importance of, 129–130 management, 190–191 outcome and evaluation, 131–138 patient-level outcomes (PLO), 137–138 training of health care providers for, 138–145 Core-based statistical area (CBSA), 72–73 Core competencies, health informatics, 238 Cost and quality transparency, 287–290 County Health Rankings and Roadmaps, 288 Critical access hospitals (CAHs), 27, 73–74 Cumulative social risks, 111–115 D Data See also big data; data architecture; data governance administrative, 185–186 aggregation, 13 analytics, 9–11 architecture, 159–163 claims, 298–302 clinical, 185 collection costs, 207–208 cost and quality transparency through publicly available, 287–290 delivery, 162 demand and use and sustainability, 145–148 descriptors, diversity, 208–209 encounter notes, 28–29, 33–34 financial, 186 future of, 21 indexing, 33–34 integration, 161 mandatory collection of, 206–207 marts, 165, 166, 168, 169 medical images, 29–31 meta-, 31–33 modeling techniques, 163–165 open-source, 69–70 operations, 186 ownership of, 218 pattern-based analysis (PBA), 13–19 preparation, 161–162 privacy protection, 209–210 problem of knowing where to look for, 8–9 sources, 185–186 stakeholders, 187–189 344 • Index standards, 158–163, 161 structured, 27 tags, 32–33 transformation of health, 47 unstructured, 26–35 utility of existing, 19–20 vast amount of, 7, 46 Data architecture best practices, 172–174 data, 159–163 database designs and, 163–171 enterprise, 160–162 ties to governance, 171–172 Database design, 163–165, 166 analytics and, 168–171 data marts in, 165, 166, 168 data mining in, 165, 166, 168 EDW in, 165, 166, 166–167 ODS in, 165, 166, 167 super marts in, 168 Database management systems (DBMS), 174–175 Data.gov, 60–63, 288 Data governance, 158, 178–179 See also federal government; policy, public health background, 179–180 current status of, 180–182 framework, 182–183, 184 in the future, 197–198 health organization business drivers and, 185 manager, 188 managing risk, continuous improvement, and change, 190–193 overcoming challenges and achieving desired outcomes with, 193, 194–196 people involved in, 187–189 program mission, 186–187 ties to architecture, 171–172 Data Governance Institute (DGI), 183 Decision making, 12, 286–287 cancer care, 291–296 cost and quality transparency through publicly available data and, 287–290 intervention program selection, 296–298 and predictive analytics for detection of fraud, waste, and abuse using claims data, 298–302 using individual-level clinical data, 290–298 Degree programs in health informatics, 234 Delivery, data, 162 Deming, W Edwards, 313 Denormalized models, 164, 167 Descriptive analytics, 12 Descriptor extraction, 99 Diagnosis evidence-based practice and, 214–216 over-, 92–96 Digital Imaging and Communications in Medicine (DICOM) standard, 41 Disability-adjusted life-years (DALYs), 315–318 Disability-free life expectancy (DFLE), 317 Discharge planning, 276–277 Discourse, 40 Disease/case management programs, 272–275 Diversity, data, 208–209 Donabedian, Avedis, 13 Doyle, Arthur Conan, 12 Drucker, Peter, 313 E Edelstein, Peter, 264, 265 Education See health informatics E-health as a challenging problem, 310 definition of health systems and, 311–313 future directions, 329 health production function and, 319–322 impact of, 323–326 importance of, 326–329 quantification of health and, 313–318 Electronic health records (EHR), 26, 27, 49, 58 data architecture, 159–163 EventFlow for temporal sequence analysis of, 251–252 HITECH Act and, 64–67 Index • 345 Electronic medical records (EMR), 27 chart abstractions, 149–150 clinical information delivery and, 100 continuous quality improvement and, 135–136, 139 encounter notes in, 29 Electronic numerical integrator and computer (ENIAC), 5–6 Enabling risk factors, 112, 113–114 Encounter notes, 28–29, 33 indexing, 33–34 lexical analysis, 35–36 ENIAC, 5–6 Enterprise architecture, 160–162 Enterprise data warehouses (EDWs), 160, 165, 166, 166–167 Epic EMR™, 27 E-Therapeutics+, 86, 93–94, 95 Evaluation analytics, 278–281 Evans, R., 94 EventFlow, 245, 250 case study, 254–260 interface, 252–254 for temporal sequence analysis of EHR data, 251–252 Evidence-based practice (EBP), 213, 268 assessing risk and treatment options in, 214–216 Extended relational data store, 27 Extract, transform, and load (ETL) processes, 13 Extracts and feeds, 170 F Facebook, 224 Federal Bureau of Investigation (FBI), 298 Federal government See also data governance big data and, 58–59 meaningful use and HITECH Act, 64–67 Open Government Initiative, 60–63 role in health policy, 219 spending on big data, 59 Feedback, free text, 98–99 Feedback process control, 321 Feedforward process control, 321 Feldman, B., 58 Financial data, 186 Fitbit, 246 Follow-up post-inpatient stay, 276–277 Fraud, waste, and abuse (FWA) detection, 298–302 Freedom of Information Act (FOIA), 218 Free text feedback, 98–99 Fryback, Dennis G., 314, 317 G Gartner, Inc., 265 GBD Compare, 247–248 General model of vulnerability, 111–115 Genetic Information Non-discrimination Act (GINA), 182 Ggplot2, 70 Global Fund, 132 Good health financing system, 312 Good health services, 311 Google, 224 Gramm-Leach Bliley Act, 181 Grossman, Michael, 319 H Hayek, F A., 212 Head Start, 119 Health managing for, 328 measurement of, 326–328 optimizing, 328–329 production function, 319–322 quantification of, 313–318 Health-adjusted life expectancy (HALE), 315, 317–318, 324 Healthcare Cost and Utilization Project (HCUP), 68 Healthcare Data Warehousing Association (HDWA), 172–174 Health care organizations (HCOs) background, 179–180 business drivers, 185 data governance framework, 182–183, 184 data governance mission, 186–187 data governance people, 187–189 data sources, 185–186 future of data governance and, 197–198 346 • Index managing risk, continuous improvement, and change, 190–193 overcoming challenges and achieving data governance program desired outcomes, 193, 194–196 status of health data governance in, 180–182 Health care-specific lexicons, 36–37 Health Data All-Stars, 288 Health Data Consortium (HDS), 288 HealthData.gov, 62–63 Health disparities, 106–107 See also social determinants of health vulnerable populations and, 108–110 Health informatics accreditation, 235–238 core competencies, 238 definition of, 225 education and training methods, 232–234 when should training begin in, 231–232 who should be trained in, 228–231 Health Information and Management Systems Society (HIMSS), 235, 237 Health information exchange (HIE), 188 Health information technology (HIT), See also analytics analytical programs, 11–19 challenge of using, 9–11 multidisciplinary staffing in, 47–49 pace of change of, 5–7 Health Information Technology for Economic and Clinical Health (HITECH) Act, 7, 64–67, 188, 198, 209, 328 Health Insurance Portability and Accountability Act (HIPAA), 181, 209, 219, 287 Health Leads, 116–119 Health-related quality of life (HRQoL), 314, 318, 326, 328 Health Resources and Services Administration (HRSA), 67–68 Health systems defined, 311–313 health production function and, 319–322 quantification of health and, 313–318 Healthy equity surveillance, 121 Healthy People 2020, 121 Hey, Tony, HIV/AIDS, 128 See also president’s Emergency Plan for AIDS Relief (PEPFAR) AIDSRelief Model of Care, 130–131 care and treatment, 129–130 case studies, 148–151 data demand and use and sustainability, 145–148 training of health care providers in CQI and, 138–145 Homelessness, 119–120 Human-Computer Interaction Lab, University of Maryland, 244, 245, 251 Human immunodeficiency virus See hIV/AIDS I IBM, 21 ICD-9, 37, 38 ICD-10, 37, 54–55 Image management and analysis, 41–42 Incentive programs, EHR, 64–67 Indexing, 33–34 Individual-level clinical data in decision making, 290–298 Inferential feedforward process control, 321 InfoPOEMs, 85, 86 analytic challenges, 97–99 limitations of, 96 quantitative analysis and findings, 87–91 reducing overdiagnosis through, 92–96 Informatics, health accreditation, 235–238 core competencies, 238 definition of, 225 education and training methods, 232–234 when should training begin in, 231–232 who should be trained in, 228–231 Information assessment method (IAM), 82–83 analytic challenges, 97–99 content validity, 84–85 Index • 347 intelligent personalized reminders, 100 limitations of, 96 qualitative analysis of, 93–96 two-way knowledge translation, 94–96 Inherent order, 313 Inpatient Prospective Payment System (IPPS), 68 Institute for Healthcare Improvement (IHI), 268 Institute for Health Technology Transformation, 180 Institute of Health Improvement (IHI), 58 Institute of Medicine (IOM), 7, 213, 245, 298 Insurance, health, 109 fraud, waste, and abuse, 298–302 Integrated delivery network (IDN), 265 Integration, data, 161 Intelligence personal reminders, 100 Interactive visualization See visualization Interchurch Medical Assistance World Wide (IMA), 130 Intermountain Healthcare, 203 Internal Revenue Service, 208 International Quality Solutions (IQSolutions), 135–136, 145 Interorganizational systems (IOS), Interventions e-health infrastructure and, 325 program selection and risk prediction models, 296–298 IPhone, J Jami, Criss, Jhee, Won Chul, 302 Joint Commission, 189 Jurafsky, Daniel, 39, 40 K Kafuman Task Force on Cost-Effective Health Care Innovation, 211 Kaiser Permanente, 180 Kaufman Foundation, 210 Knowledge modeling and classification techniques, 37 Kopec, J., 317 L Lan, C H., 302 Leadership and governance in health systems, 312 Lee, Junwoo, 302 Levels of outcomes (LO), 83–84 Lewin Group, 216 Lexical analysis, 35 encounter notes, 35–36 health care-specific lexicons, 36–37 natural language processing (NLP), 39–40 patient records, 37–39 Life expectancy (LE), 316 LifeFlow, 251 LifeLines, 250 LifeLines2, 251 Liou, Fen-May, 301 LOINC, 37 M Magnetic resonance imaging, 29, 31 Manager, data governance, 188 Martin, E M., 58 Martin, James H., 39, 40 Marts, data, 165, 166, 168, 169 Massive open online courses (MOOCs), 234 McKinsey & Company, McKinsey Global Institute, 180, 224, 225, 227 Meaningful use attestation data mapping, 70–74 big data and, 64–67 health care data related to, 67–68 incentives, 27 Medicaid, 62, 64–67, 181, 203, 288 applying analytics to achieve greater return on investment with, 275 health care data related to meaningful use attestations, 67–68 Medical images analytics, 34 management and analysis, 41–42 as unstructured data, 29–31 Medical Subject Headings, 36–37 348 • Index Medicare, 59, 62, 64–67, 181, 203, 288 Claims Synthetic Public Use Files (SynPUFs), 68 Conditions of Participation, 182 DRG payment system, 276 health care data related to meaningful use attestations, 67–68 Inpatient Prospective Payment System (IPPS), 68 Severity Diagnosis Related Group (MS-DRG), 68 Memorial Sloan-Kettering Cancer Center, 21 Metadata, 31–32 analytics, 34 tags and, 32–33 Microsoft Excel, 11, 54 Research, SharePoint, 35 Mining, data, 165, 166, 168 Modeling, data, 163–165 cancer care and, 292–296 Moore School of Electrical Engineering, University of Pennsylvania, 5–6 Moore’s Law, 227 Morphology, 40 Multidisciplinary teams eRecord project, 49–55 need for, 47–49 MySQL, 69, 70 N Naïve Bayes, 98 National Academy of Sciences, 213 National Committee for Quality Assurance (NCQA), 270 National Institute for Health and Care Excellence (NICE), 329 National Institutes of Health (NIH), 209, 225, 325 BD2K Committee and Workshop, 230–231, 233 National Security Agency, 219 National Surgical Quality Improvement Program (NSQIP), 171 Natural language processing (NLP), 34–35, 39–40 Natural Language Toolkit (NLTK), 35 Need risk factors, 112 New York Open Data Portal, 288 NodeXL, 249 Nolan, T W., 269 Nondegree training in health informatics, 233–234 Normalization, 163 O Obama, Barack, 60, 217, 288 Office of the National Coordinator for Health Information Technology (ONC), 66 Office visits, preventive or acute care, 270–271 Open Government Initiative, 60–63, 65 OpenNLP, 35 Open-source tools, 69–70 mapping meaningful use attestation data for EP and EH, 70–74 Operational data store (ODS), 165, 166, 167 Operations data, 186 Optimizing health, 328–329 Outreach based on proactive risk assessment, 271–272 Overdiagnosis, reduction of, 92–96 P Pacific Research Institute, 213 Park, Hayoung, 302 Patient care, 268–270, 277–278 during admission and concurrent review coordination in, 275–276 applying analytics to achieve greater return on investment for Medicaid, 275 cancer, 291–296 deliberate outreach based on proactive risk assessment, 271–272 discharge planning and follow-up post-inpatient stay, 276–277 informed decision making using individual-level clinical data, 290–298 Index • 349 managing case and disease/care management programs in, 272–275 office visits for preventive or acute care, 270–271 Patient-centered medical homes (PCMHs), 185, 269–270, 277 See also patient care Patient-level outcomes (PLO), 137–138 Patient management tools, PEPFAR, 133–134 Patient-Oriented Evidence that Matters (POEM), 82 Patient Protection and Affordable Care Act (ACA), 4, 7, 38, 159, 181, 203, 217, 264, 287 hospital readmissions reduction program and, 276 Triple Aim and, 269 Patient records, text analytics applied to, 37–39 PatientsLikeMe, 246 Pattern-based analysis (PBA), 13–19 Patterns of prescriptions, 254–260 Patton, 323 Pharmacists See clinical information delivery Physicians See also clinical information delivery; patient care patient-centered medical home and, 185, 269–270, 277 provider profiling and evaluation analytics, 278–281 Picture archiving and communication system (PACS), 27, 163 Plan-do-study-act (PDSA) cycles, 146–147 Policy, public health, 114–115, 202 See also data governance assessing risk and treatment options and, 214–216 big data and, 203–204, 212–217 examination of, 210–211 federal government role in, 219 ownership of data and, 218 privacy protection and, 209–210, 218–219 tort reform and, 218 uncertainty and, 216–217 Portals, 170 Poverty, 108–109 Pragmatics, 40 Predictive analytics, 12, 40–41, 266–268 applied in patient care, 268–278 cancer care and, 292–296 combined with evaluation of evidence-based medicine protocols, 268 for detection of fraud, waste, and abuse using claims data, 298–302 intervention program selection and, 296–298 Predisposing risk factors, 112, 113 Preparation, data, 161–162 Prescriptive analytics, 12 President’s Emergency Plan for AIDS Relief (PEPFAR) data indicators, 132–133 e-health tools and electronic medical records, 135–137 implementation, 129–130 information collected, 134–135 patient-level outcome, 137–138 patient management tools, 133–134 state of data and information gathering prior to, 132 sustainability of PLO activity, 138 Preventive care office visits, 270–271 Primary care physicians and Triple Aim, 270, 279–280 Privacy protection, 209–210, 218–219 Proactive risk assessment, 271–272 Process control theory, 321, 322 Project 50, 119–120 Provider profiling and evaluation analytics, 278–281 Pruning and strengthening, 231 Public health Building Healthy Communities Initiative, 115–116, 117–118 Health Leads, 116–119 policy, 114–115 Project 50, 119–120 strategy for assessing social determinants of health, 120–122 Public use file (PUF), CMS, 65–66 Push-pull information framework, 81, 84–85 350 • Index Q QLikView, 11 Quality-adjusted life-years (QALYs), 314–316 Quals, 228 Quantification of health, 313–318 Quants, 228 R Radiology, 31 reports, 32 Random Forest, 98 Randomized controlled trial (RCT), 206, 216 Rattle, 69 Retrospective analytics, 273 Rind, A., 248 Risk assessment and treatment options, 214–216 deliberate outreach based on proactive assessment of, 271–272 disease/case management and analysis of, 273–274 factors, health (see social risks, cumulative) management, 190–193 prediction models and selection of intervention programs, 296–298 predictive analytics and, 267 Robert Wood Johnson Foundation, 117, 288 RStudio, 69, 70 Russell, J., 59 S SAS, 11, 54 Scilogs, 234 Search text analytics, 35–41 Semantic modeling, 37 Semantics, 40 Semantic technology, 20 Sentiment analysis, 40–41 Severity Diagnosis Related Groups, 68 Shin, Hyunjung, 302 Shneiderman, B., 245 Simon, H A., 313, 324 Simple Logistic, 98 Site capacity assessment (SCA) tool, 147 Skotnes, T., 58 Smart disclosures, 61–62 Smartphones, SNOMED, 37 Snowflake designs, 168 Social determinants of health, 106–107 cumulative social risks and general model of vulnerability in, 111–115 implications for public health and policy, 114–115 public health, medical practice, and, 115–122 role in vulnerability, 109–110 strategy for assessing, 120–122 vulnerable populations and, 108–110 Social risks, cumulative, 111–115 Spaced online education, 86 Spotfire, 245, 246 SPSS, 11, 54 Stakeholders, data, 187–189 Standards, data, 158–163, 161 Standards of practice, 182 State children’s health insurance program (SCHIP), 203 Structured data, 27 Studer, Quint, Sunstein, C R., 62 Super marts, 168 Sustainability data demand and use and, 145–148 of PLO activity, 138 Syntax, 40 T Tableau, 11 Tags, 32–33 analytics, 34 Tang, Ying-Chan, 301 Target analytic solutions, 162–163 Taxonomies, 37 Text analytics, 34–35 applied to patient records, 37–39 free text feedback, 98–99 search, 35–41 Index • 351 TIGER/Line Shapefiles Pre-joined with Demographic Data, 68 To Err Is Human: Building a Safer Health System, Tort reform, 218 Trainers, quality improvement, 140–143, 144 Training See health informatics Transactional information systems, 46 eRecord project, 49–55 need for multidisciplinary staffing and, 47–49 transformation health data and, 47 Transparency, cost and quality, 287–290 Triple Aim, 58 applying utilization and predictive analytics in patient care, 268–278 goals of, 269 provider profiling and evaluation analytics, 278–281 Twitter, 224 Two-way knowledge translation, 94–96 U Uncertainty, 216–217 Unified Medical Language System (UMLS), 37 United Nations Millennium Development Goals, 129 University of Maryland Institute of Human Virology (UMSOM/IHV), 129–130 See also AIDSRelief Model of Care; president’s Emergency Plan for AIDS Relief (PEPFAR) training of health care providers, 138–145 University of Pittsburgh and the Medical Center (UPMC), 49–55 Unpredictable changes, 313 Unstructured data, 26–28 analysis of, 26–27, 34–35 encounter notes, 28–29, 33–34, 35–36 indexing, 33–34 medical images, 29–31, 34 metadata, 31–33, 34 natural language processing (NLP), 34–35, 39–40 sentiment analysis and predictive analysis, 40–41 tags, 32–33, 34 text analytics and natural language processing (NLP), 34–35 Unstructured Information Management Architecture, 35 U.S Census Bureau, 68, 208 U.S Centers for Disease Control and Prevention, 121, 248, 317 U.S Department of Education, 235 U.S Department of Health and Human Services, 288 U.S Food and Drug Administration, 228 U.S National Action Plan, 60 U.S Patient Centered Outcomes Research Institute (PCORI), 329 Utilization analytics, 266–268 applied in patient care, 268–278 V Validity, content, 84–85 Value-based reimbursement, 279 Vendor neutral archive (VNA), 41 Vendor solutions in data architecture, 170–171 Visualization, 244–245 case study, 254–260 EventFlow interface for, 252–254 opportunities and tools for, 245–250 for temporal sequence analysis of EHR data, 251–252 Volume-based reimbursement, 279 Vulnerable populations and social determinants of health, 108–110 cumulative social risks and general model of vulnerability in, 111–115 public health and, 115–122 W Watson supercomputer, 21 Weik, Martin H., Well-functioning health information system, 312 Well-functioning health system, 312 Well-performing health workforce, 312 WellPoint, 21 352 • Index Whittington, J., 269 Williams, Alan, 322 World Bank, 312, 319, 320, 321 World Health Organization, 106, 121, 135, 311–312, 316–317 monitoring and evaluation framework, 319, 320 X X-rays, 31 metadata and tags, 33 reports, 32 ... for health big data and analytics ORGANIZATION OF CHAPTERS Our book is organized into three sections that reflect the available data and potential analytics: sources and uses of health data, business... need it, data capture will not add value The authors in this volume thus provide examples of how big data s management and use can improve access, reduce cost, and improve quality Big data and health... DeAlmeida, Suzanne J Paone, and John A Kellum Chapter The Ecosystem of Federal Big Data and Its Use in Health Care 57 Ryan H Sandefer and David T Marc Chapter Big Data from the Push of Clinical