Big data in healthcare

104 10 0
Big data in healthcare

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

SPRINGER BRIEFS IN PHARMACEUTIC AL SCIENCE & DRUG DEVELOPMENT Pouria Amirian Trudie Lang Francois van Loggerenberg Editors Big Data in Healthcare Extracting Knowledge from Point-of-Care Machines 123 SpringerBriefs in Pharmaceutical Science & Drug Development More information about this series at http://www.springer.com/series/10224 Pouria Amirian Trudie Lang Francois van Loggerenberg • Editors Big Data in Healthcare Extracting Knowledge from Point-of-Care Machines 123 Editors Pouria Amirian Centre for Tropical Medicine and Global Health University of Oxford Oxford UK Francois van Loggerenberg Centre for Tropical Medicine and Global Health University of Oxford Oxford UK Trudie Lang Centre for Tropical Medicine and Global Health University of Oxford Oxford UK ISSN 1864-8118 ISSN 1864-8126 (electronic) SpringerBriefs in Pharmaceutical Science & Drug Development ISBN 978-3-319-62988-9 ISBN 978-3-319-62990-2 (eBook) DOI 10.1007/978-3-319-62990-2 Library of Congress Control Number: 2017946047 Editors keep the copyright © The Editors and Authors 2017 This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Contents Introduction—Improving Healthcare with Big Data Francois van Loggerenberg, Tatiana Vorovchenko and Pouria Amirian Data Science and Analytics Pouria Amirian, Francois van Loggerenberg and Trudie Lang 15 Big Data and Big Data Technologies Pouria Amirian, Francois van Loggerenberg and Trudie Lang 39 Big Data Analytics for Extracting Disease Surveillance Information: An Untapped Opportunity Pouria Amirian, Trudie Lang, Francois van Loggerenberg, Arthur Thomas and Rosanna Peeling #Ebola and Twitter What Insights Can Global Health Draw from Social Media? Tatiana Vorovchenko, Proochista Ariana, Francois van Loggerenberg and Pouria Amirian Index 59 85 99 v About the Editors Pouria Amirian has a Ph.D in Geospatial Information Science (GIS) and is a Principal Research Scientist in Data Science and Big Data at the Ordnance Survey GB and a Data Science Research Associate with the Global Health Network He managed and led a joint project (Oxford and Stanford) on “Using Big Data Analysis Tools to Extract Disease Surveillance Information from Point-of-Care Diagnostic Machines” Pouria has done research and development projects and lectured about Big Data, Data Science, Machine Learning, Spatial Databases, GIS and Spatial Analytics since 2008 Trudie Lang is Professor of Global Health Research, Head of the Global Health Network, Senior Research Scientist in Tropical Medicine at Nuffield Department of Medicine and Research Fellow at Green Templeton College at the University of Oxford She has a Ph.D from the London School of Hygiene and Tropical Medicine and has worked within the industry, the World Health Organisation (WHO), NGOs and academia conducting clinical research studies in low-resource settings Dr Lang is a clinical trial research methodologist with specific expertise in the capacity development and trial operations in low-resource settings She currently leads the Global Health Network (GHN), which is a focused network of researchers to help clinical researchers with trial design, methods, interpretation of regulations and general operations Francois van Loggerenberg is Scientific Lead of the Global Health Network, based out of the Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine Originally trained as a research psychologist, from 2002 to 2012, Francois was employed at the Nelson R Mandela School of Medicine in Durban, South Africa, where he worked initially as the study coordinator on a large HIV pathogenesis study at the Centre for the AIDS Programme of Research in South Africa (CAPRISA) In 2005, he was awarded a Doris Duken Foundation Operations Research For AIDS Care and Treatment In Africa grant that funded his Ph.D work on enhancing adherence to antiretroviral therapy (2011, London School of Hygiene and Tropical Medicine) vii Chapter Introduction—Improving Healthcare with Big Data Francois van Loggerenberg, Tatiana Vorovchenko and Pouria Amirian 1.1 Introduction With the advancement of computing systems and availability of new types of sensors, there has been a huge increase in the amount, type and variety of data that are collected and stored [1] By some estimates in 2013, over 90% of the world’s data had been created in the previous two years [2] In terms of health data, this has been impacted on by the increased use of Electronic Health Records (EHR), personalized medicine, and administrative data Although it is difficult to comprehensively and simply characterise what constitutes Big Data, in terms of data itself, several key characteristics have been identified, which create particular opportunities and challenges [3, 4] These characteristics include the large size (volume) of these datasets, the speed with which these data are generated and collected (velocity), the diversity of the data generated (variety) Some sources add a fourth ‘V’, veracity, to highlight the fact that the quality of data collected this way needs to be carefully considered [1] However, we discuss veracity later in this book and we prove that this is not a characteristics of data in Big Data and, more importantly, Big Data is not just about data [5] As often used, Big Data also refers to datasets that have been collected for a specific purpose, but used in new secondary analyses, the linking of datasets collected for different purposes, or for datasets that are generated from routine activity, and often collected and stored autonomously and automatically These characteristics create huge and rapidly expanding datasets that are ripe for linking, and for algorithmic analysis to detect and characterise relationships and F van Loggerenberg (&) Á T Vorovchenko Á P Amirian University of Oxford, Oxford, UK e-mail: francois.vanloggerenberg@psych.ox.ac.uk P Amirian e-mail: Pouria.Amirian@os.uk © The Editors and Authors 2017 P Amirian et al (eds.), Big Data in Healthcare, SpringerBriefs in Pharmaceutical Science & Drug Development, DOI 10.1007/978-3-319-62990-2_1 F van Loggerenberg et al patterns that would be very difficult to detect in smaller and individual purpose-collected datasets 1.2 Big Data and Health The use of Big Data in biomedical and health sciences has received a lot of attention in recent years These data present a significant opportunity for the improvement of the diagnosis, treatment and prevention of various diseases, and to interventions to improve health outcomes [1, 6] However, this is tied to the obvious risks to privacy and trust of this sensitive information and the exposure of the vulnerability of people requiring interventions or treatments The Big Data revolution has impacted on the biomedical sciences largely due to the technological advances in genome sequencing, improvements and digitalisation of imaging, the development and growth of vast patient data repositories, the rapid growth in biomedical knowledge, as well as the central role patients are taking in the management of their own health data, including collection of personal activity and health data [3] Some of the key sources of data for biomedicine and health that have contributed to the volume, velocity, variety and veracity of health related data are [3]: • Medical Records—Increased digitalisation of electronic health records (EHR); these data are collected for patient care and follow-up, but are key data sources for secondary analysis and combination with other large data sets of longitudinal free text, laboratory and other parameters, imaging, medication records, and a vast array of other key data When combined with data like genomic data, these represent potential sources of making genotype-phenotype associations at the population level • Administrative Data—These data are usually generated for billing or insurance claims, and are not generally available as immediately as EHR data However, they have the benefit of usually being coded in a standardised way, and verified with errors corrected, and so represent, usually, higher quality, comparable data • Web Search Logs, click streams and interaction-based—The internet has become an increasingly important source of information for people about their health complaints, especially prior to seeking professional help, and the systematic collection and analysis of these data have yielded insights into syndromic surveillance and potential public health interventions based on concerns These data have been used to identify epidemic outbreaks [7], and have been useful at highlighting potential issues with pharmaceutical side effects, for example • Social Media—As social media continues to evolve, its definition is constantly changing to capture all its features and reflect the role it plays in the modern world Social media has been describe as being “the platforms that enable the interactive web by engaging users to participate in, comment on and create Introduction—Improving Healthcare with Big Data content as means of communicating with their social graph, other users and the public” [8] Social media continues developing and integrating deeply into human lives, and may serve a variety of purposes such as social interaction, information seeking, time passing, entertainment, relaxation, communicatory utility, expression of opinions, convenience utility, information sharing, and surveillance and watching others [9] For example, LinkedIn allows its users to build professional connections, Facebook is widely used to connect with friends, Twitter allows public broadcasting of short messages, Instagram is used to share favourite pictures, and YouTube allows the sharing of videos This area of data collection and analysis has grown rapidly over recent years, as populations have greater access to, and generate more and more, social data This areas also entails blogs, Q and A sites (like Quora), networking sites, and the data have been used to find things like unreported side effects, for monitoring disease-related beliefs, and to identify or track disasters or disease outbreaks As one of the projects outlined in this book deals with social media, a bit more will be said about this specific data type The number of active social media users has been growing rapidly As of 2015, it is estimated that nearly billion people globally use social networks (Fig 1.1) Social media platforms have differing levels of popularity and a number of active users As of June 2016 Facebook is the most popular platform with 1590 million users (Fig 1.2) Big Data Analytics is also being used for health and human welfare One example of this is Google Flu Trends Millions of users around the world search for health information online Google estimates how much flu is circulating in different Number of users in billions 3.00 2.72 2.55 2.39 2.50 2.22 2.04 1.87 2.00 1.59 1.40 1.50 1.22 1.00 0.97 0.50 0.00 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Fig 1.1 Number of social network users worldwide from 2010 to 2014, with projections to 2019 [10] 86 5.2 T Vorovchenko et al Ebola Virus Disease and Media Coverage The Ebola Virus Disease (EVD) is a severe and often fatal disease in humans that causes haemorrhagic fever The virus is transmitted to people from wild animals and is spread in the human population through human-to-human transmission The average case fatality rate of the disease is estimated to be as high as 50% The 2014 West African Ebola outbreak has been one of the most complex and the largest since the discovery of the Ebola virus in 1976 Its first case dated back to the 26th of December 2013, when a 2-year-old boy in a remote Guinean village, located in a triangle-shaped area where the borders of Guinea, Sierra Leone and Liberia converge, fell ill The most severely affected countries in the 2014 outbreak were Guinea, Sierra Leone, and Liberia In March 2016 WHO declared the end of Ebola transmission in Sierra Leone and in June 2016 in Guinea and Liberia During the duration of the outbreak, there were more than 28,600 reported Ebola cases and more than 11,300 deaths worldwide [1] According to the UN, more than 22,000 children lost at least one parent to the disease During the outbreak there was no Ebola vaccine or treatment approved by WHO In the beginning, the outbreak failed to receive attention from the Western press, Western government websites, and social media Only at the end of July 2014, when two American health-workers contracted Ebola, did the outbreak begin to be called an “epidemic” in global headlines Coverage of the outbreak exploded at the end of September 2014, when a Liberian man who recently arrived to the USA from Liberia was diagnosed with Ebola on US soil On the 1st of October 2016, the Twitter Ebola conversation reached the rate of 6000 tweets per minute, from a mere 100 tweets per minute during September 2014 [2] Initiated by media attention, fear of Ebola started to spread globally faster than the virus, and social media amplified that fear even further The cases of Ebola in the West were covered voluminously, but this coverage was far from balanced, as thousands of West African Ebola cases and deaths were long neglected Each Ebola-related news inspired tens of thousand of Ebola tweets and Internet searches [3] In addition to fear, considerable misinformation was spread on social media about the Ebola Virus Disease [4] Among the most common types of misinformation were messages that Ebola can be treated by the ewedu1 plant or by blood transfusion, and by drinking and washing in salty water Considering the fact that most Americans used the Internet (96%) and media (86%) as major sources of new information about Ebola in 2014 [5], it is evident how important it is to track the quality of information being spread on social media Social media also provides an opportunity for trusted sources, such as WHO and CDC (US Centers for Disease Control and Prevention), to spread scientifically accurate and reliable information to the public Indeed, in collaboration with their local, national, and international partners, WHO and CDC shared and amplified key Ewedu is a flowering herb; its leaves are eaten throughout West African countries in soups and sauces #Ebola and Twitter What Insights Can Global Health Draw … 87 messages through social media For example, they provided information to the Ministry of Health of Nigeria, local healthcare organisations, popular bloggers, and others with large numbers of followers on Facebook and Twitter [6] Unfortunately, their current potential to spread information on Twitter is limited One of the reasons for that is the comparatively low number of followers these organisations have on their social media channels For example, a CDC infographic tweet that was posted on the 30th of September 2014, explaining how EVD is spread, was retweeted more than 4000 times At this time CDC had nearly 400,000 followers However, a humour-focused Twitter account “Tweet like a girl”, that had more than a million followers, retweeted a CDC “Facts About Ebola” image and was retweeted more than 12,000 times, more than the original tweet posted through the CDC account [7] The other limiting factor in public health information reaching the public is low social media and Internet penetration in certain parts of the world Very often countries affected by public health emergencies not have high Internet coverage resulting in low social media use Therefore, those who are most in need, often not have access to the reliable public health information through social media channels Nevertheless, in the case of Ebola, despite the level of poverty in the countries that were most affected by the epidemic, the use of tablets, computers, and smartphones is increasing For example, Nigeria, the most populous country in Africa, had approximately million Twitter users in 2014 [8] It has been demonstrated that social media Big Data can be used for disease surveillance There is a big body of research that attempts to evaluate the ability of Twitter data to track and predict influenza [9, 10], norovirus [11], dengue [12], and other disease outbreaks Interestingly, it was demonstrated that Twitter could be used as a tool for Ebola surveillance as well In the first couple of months, before Ebola was spread to the US (which generated a lot of background noise on Twitter), it was possible to predict the spread of the Ebola virus in the affected countries (based on a comparison of Twitter, WHO and CDC data) [13] Therefore, social media Big Data carries insights that can help public health authorities inform their programmes and deliver them more efficiently Understanding how to manage and analyse these data would benefit future global efforts to combat diseases 5.3 How Can We Study Social Media Data? The aim of the case-study in this chapter was to assess the ways in which Twitter was used during the 2014 West African Ebola outbreak by the general Twitter population, as well as by the public health authorities in order to better inform future efforts to harness the public health potential of Twitter This study also aimed to identify the geography of Ebola tweets, the accounts that demonstrated the highest level of engagement on Twitter related to the 2014 West African Ebola outbreak, and how global health organisations ranked amongst them 88 T Vorovchenko et al The reason to analyse Twitter data amongst other possible social media sources is that both real-time and historic Twitter data is publically available It is possible to obtain a full historic Twitter dataset that meets particular search criteria by purchasing it from Twitter Twitter data can be analysed both quantitatively and qualitatively In terms of quantitative analysis, descriptive analytics, with consideration of various metrics of interest, can be applied on tweet metadata We can investigate such parameters as the number of tweeting accounts about a given topic, the number of followers these accounts have, the number of messages sent by a user, and the types of tweets sent Temporal metrics can help investigate activity peaks to identify the events leading to the increase For example, researchers were interested in finding out which real-world events make people talk about antibiotics [14], health reforms [15], or Ebola [16] As has been previously discussed, conversations on Twitter can be influenced by media coverage Depending on the event, it can be appropriate to estimate temporal metrics minute-by-minute (live sports, television shows), day-by-day (election campaigns, natural disasters), or month-by-month (military conflicts, policy implementation, outbreaks) [17] It is possible to analyse Twitter activity in different geo-locations and generate tweet distribution maps This approach demonstrates where people tweet from and is essential for disease detection and tracking Even though less than 3% of tweets are geo-located [18], several computational techniques may provide further location information Many tweets contain URLs, which may provide reference to, for example, a webpage, a picture, or a video Many of them lead to external pages and there are services that resolve them to an end point These pages often contain larger content, which can be analysed along with the tweet itself For example, URLs might be used to study geographical affinity in terms of topics users discuss It can be done by looking at the location of users and geographical focus of their tweets identified through URLs [19] Twitter engagement measurement can give valuable insights Twitter engagement is an interactive, synchronous communication and collaboration among numerous participants via Twitter [20] In addition to simply counting the number of followers a user has, more effective ways of measuring user interactions on the platform can be used For example, the number of active interactions can be investigated via usage of original tweets, retweets, and replies The number of retweets can demonstrate tweet popularity, Twitter user popularity, and their level of engagement with other Twitter users The number of replies and mentions received by a Twitter user can demonstrate the level of direct engagement with other Twitter users In addition to these metrics, tweets contain data that can be analysed qualitatively, such as account description, or tweet content itself Account description can suggest whether an individual or an organisation owns the account Furthermore, we can learn whether this individual represents civil society or whether they are key players in the area of interest If it is an organisation, we can find out whether it is a governmental, non-governmental, international organisation, private sector, or #Ebola and Twitter What Insights Can Global Health Draw … 89 media This information can help us compare relative performance of individual or institutional accounts in various Twitter communicative contexts For example, the tweets of breast cancer awareness month that were posted by celebrities reached more people in comparison with those that were posted by other groups [21] However, the content analysis of a tweet itself is usually a prime interest for researchers It can give us insights into people’s beliefs, perceptions, and behaviour These insights can detect diseases and prevent their further spread, they can help identify misinformation and rumours about epidemics and inform the communication strategies of public health authorities Tweet content analysis is usually challenged due to the fact that tweets are brief, have unstructured grammar, slang, sarcasm, and other unconventional forms of written expression [22] This analysis can be performed both automatically and manually The former one requires computational logistics expertise and can be applied to large datasets; the latter one is most accurate, but it requires time, and human and financial resources This case-study sample resulted in more than 60 million tweets covering the period of 17 months from the beginning of the outbreak in December 2013 to May 2015 To analyse this huge amount of data, Splunk, a platform for machine data, a time-series optimised NoSQL data storage, analytics and visualisation software, was used The tweets included in the dataset contained the word “ebola” (Latin script) and the most relevant and popular hashtags that were identified from the literature, media coverage and social data platforms Moreover, the data included all tweets that contained URLs to articles that had the word “ebola” in their title, description, and among their keywords, but not necessarily included the word “ebola” or one of the hashtags in the tweet text itself 5.4 Insights from the Ebola Twitter Dataset What can we learn from Ebola tweets? The analysis of a large case-study dataset of more than 60 million tweets posted by more than 12 million users over a 17-month-period revealed temporal dynamic, geographical distribution, and Twitter engagement patterns within The dataset included tweets in 149 languages with 83% of tweets written in English, followed by Spanish, Portuguese, French, and Indonesian 1.2% of the tweets in the study dataset were geo-tagged Geographical mapping (Fig 5.1a) demonstrated that over the period of the outbreak most of the tweets were posted from North America, Europe, and Latin America At the beginning of the outbreak, before the first Ebola case was diagnosed in the USA on the 30th of September, a substantial proportion (approximately 14%) of tweets was posted from West Africa (Fig 5.1b) Although the absolute number of tweets from this region remained relatively high throughout the whole outbreak, their prevalence was overtaken by the vast number of tweets posted from North America after the 90 Fig 5.1 Geographical distribution of Ebola tweets T Vorovchenko et al #Ebola and Twitter What Insights Can Global Health Draw … 91 1 2 Fig 5.2 The overall temporal tweet frequency distribution (by day) and key news events that influenced it 30th of September, when fear and anxiety were spread on social media among the Western population Figure 5.2 represents the frequency of tweets over the whole study period The Ebola conversation on Twitter did not correspond with EVD epidemiology (i.e incidence, case fatality rate, or infection rate) However, and not surprisingly, it corresponded with Ebola news events The Twitter conversation started at the end of March 2014, when a mysterious haemorrhagic fever in Guinea was confirmed as Ebola, which was followed by the release of the first WHO report concerning the Ebola outbreak in West Africa The frequency of tweets grew at the end of July 2014 when the first Ebola case was diagnosed in Nigeria It rapidly increased further when two US missionary workers infected with Ebola were evacuated to the USA from Liberia at the beginning of August 2014 and when WHO declared the Ebola outbreak a Public Health Emergency of International Concern Tweet frequency reached its peak at the beginning of October 2014, after the first Ebola case was diagnosed in the USA on the 30th of September 2014 and demonstrated significant peaks throughout October 2014 Then, the first Ebola infected patient died and three more Ebola cases were diagnosed in the US 40 million out of 60 million tweets of the dataset were posted during October 2014, most of them from North America (Fig 5.1c) Twitter conversation started to gradually drop, along with public interest in the topic, since the beginning of November 2014, when the Ebola epidemic stopped dominating in global news headlines and public anxiety dropped These findings supported the observations that the Ebola Twitter conversation was initiated and amplified by Western media coverage and that the world’s attention was brought to the Ebola epidemic only after the first cases were 92 T Vorovchenko et al diagnosed outside of West Africa at the end of September 2014 However, once the media coverage dropped, the public interest also decreased dramatically, despite the fact that the epidemic was far from being over What can public health organisations learn from these tweets? The possibility to use Twitter data in order to detect and track the spread of a disease was widely demonstrated in the case of influenza [9, 10] However, the Ebola epidemic was different, as the Ebola Twitter conversation was not driven by the epidemic progression, but by the news events and media coverage These, in turn, were highly imbalanced and did not correlate with the actual disease epidemiology This case-study demonstrated that in a context like this, public health organisations can use Twitter data in order to identify news events, track the information being spread on the Twittersphere, and ensure accurate information on the topics of public interest are provided Such news events can be easily distinguished by using Big Data analytic tools to analyse URLs and identify the topics of the most shared articles within the dataset In this case-study 54.5% of tweets in the dataset contained URLs Within a time period when over half of the tweets contained URLs and over half of total unique users shared URLs, it was possible to identify the main topic of discussion For example, on the 25th of March 2014 most of the shared articles within the dataset were about a Canadian man who arrived in Canada from West Africa and developed symptoms of haemorrhagic fever resembling Ebola This insight could inform global public health authorities about increased public concern in this topic and suggest providing more information to the public about this case in order to prevent the spread of unnecessary rumours and misinformation On the 8th of August 2014, one of the peak days in the dataset, most of the top shared articles were about the news that the WHO declared the Ebola outbreak a Public Health Emergency of International Concern This insight could enable public health organisations to realise the necessity to draw public attention to the current state of an epidemic, as well as to highlight important public health information (such as the way the disease is spread, what the first symptoms are, and what people should if they have them) Who were the key players involved in the Twitter discussion surrounding the West African Ebola outbreak and why is it important for public health organisations to know that? In order to discern the key players involved in the Twitter discussion surrounding the West African Ebola outbreak, this case-study identified accounts that achieved the highest level of engagement with other Twitter users during the studied period (Fig 5.3) News organisations achieved the highest level of engagement with the general Twitter public over the period of the outbreak, suggesting that the general public was sharing news stories and actively sought information on Twitter This also supports the notion that the Twitter Ebola conversation corresponded with news events, and highlights the importance of Twitter as a tool for rapid communication #Ebola and Twitter What Insights Can Global Health Draw … Fig 5.3 The proportion of retweets, mentions and replies sorted by account type, received by the top 20 most retweeted, mentioned, or “replied to” accounts in the dataset respectively 93 Retweets Mentions Replies and information sharing during epidemics Therefore, in future outbreak scenarios, it is important for public health organisations to closely track the information provided by news organisations on Twitter in real time and collaborate with them Interestingly, humour accounts also achieved a high level of engagement during the outbreak This finding may suggest that users were sharing humorous content rather than news and information Humour accounts’ activity was especially high during October 2014 (when Ebola cases were diagnosed in the US) This observation also corresponds with the findings from a 2009 H1N1 pandemic tweet study, 94 T Vorovchenko et al which also reported high prevalence of humorous tweets [23] Although there are a number of humour accounts that have millions of followers on Twitter, some of the most retweeted ones during the Ebola epidemic were created during the outbreak, with Ebola as a specific focus of their tweets Humour accounts’ popularity might be explained by the general popularity of Ebola as a topic for Twitter conversation, and by the increased concern of the general public and Twitter users on this issue Reading and sharing humorous content on Twitter might have acted as a means for relief and for tackling anxiety, as it is known that humour, even dark humour, is used in stressful situations as a buffer However, these accounts did not always post humorous content For example, as previously discussed, it was reported that a CDC tweet was retweeted by one of them, which in turn helped CDC to reach even wider audience Twitter celebrities and general celebrities also achieved a high level of engagement with the public during the 2014 West African Ebola outbreak Twitter celebrities are the accounts of people or organisations that are not famous outside of the Twittersphere, but have a high number of followers on Twitter Among them were health specialists, journalists, or bloggers, who actively use Twitter for communication with the public This suggests that they have high influence in the Twittersphere during the outbreak periods Accounts of general celebrities also achieved a high level of Twitter engagement, most likely due the high number of followers and reputation they have on the Twittersphere, which also demonstrates their potential to spread information during public health emergencies Many UN organisations already collaborate with celebrities, appointing them as goodwill or global ambassadors in order to attract public attention and focus the world’s eyes on the organisation’s goals Therefore, it is important for public health organisations to identify accounts that achieve high level of Twitter engagement during public health emergencies for several reasons Firstly, it is important to monitor and be aware of the content posted by these accounts in order to detect and address any inaccurate information For example, the popularity of humour accounts and the accounts of Twitter celebrities during the West African Ebola outbreak was beyond the control of public health organisations and could have potentially led to dissemination of misinformation and decreased the credibility of public health authorities On the other hand, public health organisations could consider collaborating with these accounts in order to be retweeted by them or to include accurate public health content in their messages Furthermore, the understanding of what content features lead to a wider Twitter engagement may help public health organisations to strategically incorporate them in their messages This is well illustrated by the popularity of humour accounts This fact demonstrates that during periods of public panic and anxiety people pay attention to and share humorous content If so, public health organisations should be aware of this issue and strategically include humorous and emotional content in their tweets for a wider dissemination of accurate information #Ebola and Twitter What Insights Can Global Health Draw … 95 How did governmental and non-governmental organisations fit into the Twitter conversation related to the Ebola outbreak? The accounts of WHO, UNICEF, CDC, and the White House were among those that achieved the highest level of overall Twitter engagement This suggests that Twitter users turned to them in order to seek information during the Ebola outbreak Whereas the WHO and UNICEF accounts were sources of information for the global community, the accounts of the CDC and White House provided information targeting the US public Most retweeted WHO tweet (retweeted more than 3,000 times) Most retweeted CDC tweet (retweeted more than 4,000 times) Most retweeted tweet in the dataset (retweeted more than 63,000 times) Fig 5.4 Most retweeted tweets of the WHO, CDC, and overall in the dataset 96 T Vorovchenko et al However, none of the tweets of these organisations achieved the highest levels of popularity The most retweeted WHO and CDC tweets were posted on the day when the first Ebola case was diagnosed in the USA (the 30th September 2014) The WHO tweet contained an emotional personal story and was retweeted more than 3000 times The most retweeted CDC tweet contained public health information and was retweeted more than 4000 times Interestingly, the most retweeted tweet in the whole dataset was posted by a Twitter celebrity, who had just around 2500 followers This tweet contained a political statement expressed as a joke and was retweeted nearly 63,000 times (Fig 5.4) Considering the relatively low number of followers that this account had, this tweet was likely retweeted due to its controversial content that attracted people’s attention This suggests that the combination of emotional content with important public health messages can help public health organisations to achieve even higher levels of engagement with the Twitter public 5.5 Conclusion Health authorities would benefit from building their Twitter presence since people turn to these organisations to obtain accurate health information in public health emergencies Twitter provides a unique opportunity for public health organisations to listen to their audience, and to share scientifically accurate information Public health organisations may benefit from employing humorous and emotional content, and collaborating with accounts with a high level of Twitter engagement, such as news, humour, Twitter celebrities, and celebrities It is also important for them to monitor how accurate the information posted by these accounts is, preferably in real time Health authorities need to consider the use of Big Data analytics tools in order to quickly gain insights from social media data to inform their communications strategies The volume of data in the dataset required the use of Big Data analysis tools and this study illustrates the utility of applications such as Splunk for the analysis of really large datasets in a manageable and quick way Moreover, these applications are flexible and enable the change of filters and search terms to track real time changes during events of interest This case-study demonstrates a variety of methods that can be applied to a Twitter dataset, with the potential to be used by different actors participating in an emergency response, including public health authorities, in order to answer a range of questions For example, global health authorities may identify locations where the majority of tweets come from in order to understand which parts of the world express a greater concern about a situation and identify a target audience for their messages, as well as events which raise public concern They may also quickly identify what kind of accounts achieve the highest level of engagement with the Twitter population and examine their own position in relation to them This information could enable health authorities to track the information provided by #Ebola and Twitter What Insights Can Global Health Draw … 97 these accounts, and, if needed, addressing it It might also help them identify the accounts with which they may consider collaboration for the dissemination of information Acknowledgements The work was done within the The Global Health Network group, Nuffield Department of Medicine, University of Oxford, and supported by Splunk4Good, corporate social responsibility programme of Splunk Inc References WHO: Ebola virus disease outbreak [Internet] WHO [cited 24 July 2016] Available from: http://www.who.int/csr/disease/ebola/en/ Watch How Word of Ebola Exploded in America TIME [Internet] [cited 24 July 2016] Available from: http://time.com/3478452/ebola-twitter/ Towers, S., Afzal, S., Bernal, G., Bliss, N., Brown, S., Espinoza, B., et al.: Mass Media and the Contagion of Fear: The Case of Ebola in America PLoS ONE 10(6), e0129179, Nov (2015) Oyeyemi, S.O., Gabarron, E., Wynn, R.: Ebola, Twitter, and misinformation: a dangerous combination? BMJ 349, g6178, 14 Oct (2014) Rolison, J.J., Hanoch, Y.: Knowledge and risk perceptions of the Ebola virus in the United States Prev Med Rep 2, 262–264 (2015) Carter, M.: How Twitter may have helped Nigeria contain Ebola BMJ 349, g6946, 19 Nov (2014) Luckerson, V.: Fear, misinformation, and social media complicate Ebola fight Time [Internet] Oct (2014) [cited 24 July 2016] Available from: http://time.com/3479254/ebolasocial-media/ Social media in Nigeria: state of the art, and marketing best practices Digital in the round [Internet] [cited 24 July 2016] Available from: http://www.digitalintheround.com/socialmedia-marketing-nigeria/ de Quincey, E., Kostkova, P.: Early warning and outbreak detection using social networking websites: the potential of Twitter In: Kostkova, P (ed.) Electronic Healthcare [Internet] [cited 24 July 2016] Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pp 21–24 Springer, Berlin (2009) Available from: http://link.springer.com/chapter/10.1007/978-3-642-11745-9_4 10 Kostkova, P., Szomszor, M., St Louis, C.:#swineflu: The use of Twitter as an early warning and risk communication tool in the 2009 swine flu pandemic ACM Trans Manag Inf Syst (2), 1–25, July (2014) 11 Jayawardene, W., YoussefAgha, A., Lohrmann, D.: Role of social media in early warning of norovirus outbreaks: A longitudinal Twitter-based infoveillance In: ResearchGate [Internet] (2013) [cited 24 July 2016] Available from: https://www.researchgate.net/ publication/256095323_Role_of_Social_Media_in_Early_Warning_of_Norovirus_Outbreaks_ A_Longitudinal_Twitter-Based_Infoveillance 12 Dengue surveillance based on a computational model of spatio-temporal locality of Twitter [Internet] [cited 24 July 2016] Available from: http://dl.acm.org/citation.cfm?id=2527049 13 Smailhodvic, A., Andrew, K., Hahn, L., Womble, P.C., Webb, C.: Sample NLPDE and NLODE Social-media modeling of information transmission for infectious diseases: Case Study Ebola ArXiv150100198 Phys [Internet] 31 Dec (2014) [cited 24 July 2016] Available from: http://arxiv.org/abs/1501.00198 98 T Vorovchenko et al 14 Dyar, O.J., Castro-Sánchez, E., Holmes, A.H.: What makes people talk about antibiotics on social media? A retrospective analysis of Twitter use J Antimicrob Chemother 69(9), 2568– 2572, Sep (2014) 15 King, D., Ramirez-Cano, D., Greaves, F., Vlaev, I., Beales, S., Darzi, A.: Twitter and the health reforms in the English National Health Service Health Policy Amst Neth 110(2–3), 291–297, May (2013) 16 The Life Cycle of Ebola on Twitter [Internet] [cited 24 July 2016] Available from: http:// www.symplur.com/blog/the-life-cycle-of-ebola-on-twitter/ 17 Bruns, A., Stieglitz, S.: Towards more systematic Twitter analysis: metrics for tweeting activities Int J Soc Res Methodol 16(2), 91–108, Mar (2013) 18 Burton, S.H., Tanner, K.W., Giraud-Carrier, C.G., West, J.H., Barnes, M.D.: “Right time, right place” health communication on Twitter: value and accuracy of location information J Med Internet Res 14(6), e156 (2012) 19 Leetaru, K., Wang, S., Cao, G., Padmanabhan, A., Shook, E.: Mapping the global Twitter heartbeat: the geography of Twitter First Monday 18(5) [Internet] 22 Apr (2013) [cited 24 July 2016] Available from: http://firstmonday.org/ojs/index.php/fm/article/view/4366 20 Heldman AB, Schindelar J, Iii JBW Social media engagementand public health communication:implications for public health organizationsbeing truly “social.” ResearchGate 35(1) [Internet] Jan (2013) [cited 24 July 2016] Available from: https://www.researchgate.net/ publication/285931224_Social_media_engagementand_public_health_communicationimplica tions_for_public_health_organizationsbeing_truly_social 21 Thackeray, R., Burton, S.H., Giraud-Carrier, C., Rollins, S., Draper, C.R.: Using Twitter for breast cancer prevention: an analysis of breast cancer awareness month BMC Cancer 13, 508 (2013) 22 Kim, A.E., Hansen, H.M., Murphy, J., Richards, A.K., Duke, J., Allen, J.A.: Methodological considerations in analyzing Twitter data J Natl Cancer Inst Monogr 47, 140–146, Dec (2013) 23 Chew, C., Eysenbach, G.: Pandemics in the age of Twitter: content analysis of Tweets during the 2009 H1N1 outbreak PLoS ONE 5(11), e14118, 29 Nov (2010) Index A Accuracy, 21, 28, 30, 32, 33 Analysis, 1–3, 7–12 Analytics, 16, 20, 21, 24, 36, 39, 40, 43, 45, 52, 54–57, 60, 62, 64, 66, 68, 71, 72, 80, 81 Anomaly detection, 60, 68, 75, 81 Artificial intelligence, 15 B Big data, 1–3, 5, 6, 8–12, 39–43, 45, 47, 48, 53–55, 57 C Cartridge, 66, 67, 73, 75 CDC, 86, 94, 95 Classification, 18, 19, 26–30, 33, 35 Click streams, Clinic, 61, 62, 69, 71 Cloud, 63, 69–71 Cluster, 40, 41, 47–50, 54, 56 Communication, 62, 63, 69 Connectivity, 62–65, 69 CRISP-DM, 23, 24, 35 D Dashboard, 71, 72 Data pipeline, 55 Data science, 15–19, 21–28, 30, 32, 34–36 Demographic, 59, 63–65 Descriptive, 60, 66, 81 Diagnostics, 60, 61, 63, 66, 71, 76 Disease, 59, 62, 64, 71, 76 Distributed computing, 15 E #Ebola, 85 Ebola virus, 85–87 Epidemic, 86, 87, 91–93 F Facebook, 3, 7, H Hadoop, 48–50, 52–55 Health, 2, 3, 5–9, 12 Healthcare, 5, 6, 12, 18, 20, 25–27, 35, 36 Healthcare setting, 62, 65, 72, 76 High-performance computing, 50 HIV, 61, 62, 66, 77 Hospital, 62, 69, 71, 72, 79 Humour, 87, 93, 94, 96 Hyperparameter, 27, 30, 32 I Ingestion, 42, 48, 53–56 Instagram, Internet, 60, 69 K Kappa architecture, 54–57 Knowledge, 60, 71 L Lambda architecture, 55, 56 Latency, 48, 55, 56 Learning, 16–19, 21, 22, 26, 30, 31 LinkedIn, Local-level, 62, 64, 65 Location, 60, 64, 66, 67, 72, 79, 80 Low and middle-income countries, M Machine generated data, 40 Machine learning, 15, 20, 21, 31, 34 Maintenance, 71, 73, 76, 78 MapReduce, 48–53 Measurement, 65, 76 Message, 69 © The Editors and Authors 2017 P Amirian et al (eds.), Big Data in Healthcare, SpringerBriefs in Pharmaceutical Science & Drug Development, DOI 10.1007/978-3-319-62990-2 99 100 Misinformation, 85, 86, 89, 94 Mobile lab, 72, 74, 77, 79 N National-level, 62, 64 News, 86, 91, 92, 96 O Observation, 72–75, 77 P Patient, 18–20, 25, 26, 29, 36 Performance, 22, 24–31, 33, 35 POC, 59–61, 64, 65, 71, 72, 76, 81 POC device, 60–63, 65, 66, 68, 72, 76, 81 Population-level, 59, 63–65 Predictive, 18, 25–28, 33 Predictive analytics, 19, 22, 27, 30, 35 Process, 15, 16, 19, 20, 22–24, 26, 27, 34, 35 Public health, 85, 87, 91–94, 96 Q Quora, R Real-time, 40, 45, 52–57 Real-time analysis, 68 Regression, 18, 19, 26, 27, 30–33, 35 Results, 59–66, 70, 76 Retweet, 87, 88, 94 S Scalable, 40, 42, 46, 47, 55 Semi-structured, 40, 43, 45, 46, 49 Similarity, 18, 19, 35 Index SMS, 69 Social media, 2, 3, 5, 7, 10, 12, 85 Spark, 53 Splunk, 54 Spread, 85–87, 89, 92, 94 Statistical, 16, 18, 20–22, 26, 30, 31 Statistics, 16, 21, 22 Structured, 40, 43, 44, 49 Supervised, 16–19, 26, 27, 30 T Train, 27, 30, 31, 33 Twitter, 86–89, 91, 92, 94, 96 Twitter celebrities, 94, 96 Twittersphere, 92, 94 U Unstructured, 40, 43–45, 49 Unsupervised, 16–19 V Variety, 1–3, 8, 9, 12, 39, 40, 42, 43, 47 Velocity, 1, 2, 8, 39, 40, 42, 43, 47 Veracity, 1, Volume, 1, 2, 8, 9, 39, 40, 42, 43, 45–47, 56 W Web, 70 Web search logs, 2014 West African Ebola outbreak, 86, 87, 94 WHO, 85–87, 91, 92, 95, 96 Y Youtube, ... 2.3.1 Data Science, Analytics, Statistics, Business Intelligence and Data Mining Data Science and Analytics In general, Data Science, analytics and even data mining are the same Data Mining is... 2.2 Methods in Data Science Data Science is the process of discovering interesting and meaningful patterns in data using computational analytics methods Analytical methods in the Data Science... of data has led to increasing interest in methods for extracting useful information and knowledge from data (Data Science) and data driven decision making [2] With availability of relevant data

Ngày đăng: 26/08/2021, 16:13

Từ khóa liên quan

Mục lục

  • Contents

  • About the Editors

  • 1 Introduction—Improving Healthcare with Big Data

    • 1.1 Introduction

    • 1.2 Big Data and Health

    • 1.3 Big Data and Health in Low- and Middle-Income Countries

      • 1.3.1 Analytical Challenges

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan