Kenneth Cukier and Viktor Mayer-Schoenberger as well as Alex Pentland believe that Big Data Analysis is on its way to changing society and that it is doing so for the better. Others wonder whether that is indeed the case and warn against the dangers of this changed society. After summarizing Pentland and Cukier and Mayer-Schoenberger’s positive vision, we survey the issues that have come up against the changes that Big Data Analysis is bringing to Society.
The Benefits of Big Data Analysis for Society
Alex Pentland is a great believer in the societal changes that Big Data Analysis can bring about. He believes that the management of organizations such as cities or governments can be improved using Big Data analysis and develops a vision for the future in his article entitled the “Data Driven Society” [52]. In particular, he believes, from his research on social interactions, that free exchanges between entities (people, organizations, etc.) improve productivity and creativity. He would, therefore, like to create societies that permit the flow of ideas between citizens, and believes that such activity could help prevent major disasters such as financial crashes, epidemics of dangerous diseases and so on. Cukier and Mayer-Schoenberger agree that Big Data Analysis applications can improve the management of organizations or the effectiveness of certain processes. However, they do not go as far as Pentland who implemented the idea of an open-data city in an actual city (Trento, Italy), which is used as a living lab for this experiment.
The Downside of Big Data Analysis for Society
In this section, we discuss the perception of negative societal repercussions that have been discussed since the advent of Big Data Analysis. First, however, we would like to
mention that not everyone is convinced that Big Data Analysis is as significant as it is made up to be. Marcus and Davis, for example, wonder whether the hype given to Big Data analysis is justified [44]. Big Data analysis is held as a revolutionary advance, and as Marcus and Davis suggest, it is an important innovation, but they wonder how tools built from Big Data such as Google Flu Trends compare to advances such as the discovery of antibiotics, cars or airplanes. This consideration aside, it is clear that Big Data Analysis causes a number of changes that can affect society, and some, in a negative way, as listed below:
• Big Data Analysis yields a carefree/dangerous attitude toward the validity of the results:Traditional Statistical tools rely on assumptions about the data characteristics and the way it was sampled. However, as previously discussed, such assumptions are more likely to be violated when dealing with huge data sets whose provenance is not always known, and which have been assembled from disparate sources [24]. Because of these data limitations, Boyd and Crawford, as well as Tufekci, caution scientists against wrong interpretations and inferences from the observed results. Indeed, massive data makes the researchers less careful about what the data set represents: instances and variables are just thrown in with the expectation that the learning system will spit out the relevant results. This danger was less present in carefully assembled smaller data collections.
• Big Data Analysis causes a mistaken semblance of authority:Marcus and Davis [44] note that answers given by tools based on Big Data analysis may give a semblance of authority when, in fact, the results are not valid. They cite the case of tools that search large databases for an answer. In particular, they cite the example of two tools that searched for a ranking of the most important people in history from Wikipedia documents. Unfortunately, the notion of “importance” was not well defined and because the question was imprecise, the tools were allowed to go in unintended directions. For example, although the tools correctly retrieved people like Jesus, Lincoln and Shakespeare, one of them also asserted that Francis Scott Key, whose claim to fame is the writing of the US National Anthem, “The Star-Spangled Banner”, was the 19th most important poet in history. The tools seem authoritative because they are exhaustive (or at least, they search a much larger space than any human could possibly search), however, they suffer from the same “idiot savant” predicament as the 1970s expert systems.
• Data Privacy and Transparency are compromised by Big Data Analysis:Many Big Data studies concern personal data. Some personal data are submitted by indi- viduals on their own initiative (as in social networks or as a result of gaining access to free services), others may be collected automatically (by using some devices or specific services) or may be shared with external sources to enrich data sets.
Finally, some data may be inferred from other data, and the apparent anonymity may get lost, as was previously discussed. Therefore, privacy or data protection are more serious challenges than they have ever been before. While a number of computational and legal solutions have been proposed, this problem is far from resolved and will continue to cause great concern in society. In his open-data city experiment, Pentland proposes a solution to this issue in which people would keep
ownership of their data the way they do of money in a bank, and, likewise, would control how this data is used by choosing to share it or not, on a one-to-one basis.
Another solution is proposed by Kord Davis, the author of [19], who believes in the need for serious conversations among the Big Data Analysis community regard- ing companies’ policies and codes of ethics related to data privacy, identifiable customer information, data ownership and allowed actions with data results. In his opinion, transparency is a key issue and the data owners need to have a transparent view of how personal data is being used. Transparent rules should also refer to the case of how data is sold or transferred to other, third parties [5]. In addition, transparency may also be needed in the context of algorithms. For instance, Cukier and Mayer-Schoenberger, in Chap. 9 of their book [47], call for the special moni- toring of algorithms and data, especially if they are used to judge people. This is another critical issue since algorithms may make decisions concerning bank cred- its, insurance or job offers depending on various individual data and indicators of individual behaviour.
• Big Data Analysis causes a new digital divide:As previously mentioned, and noted by Boyd and Crawford and Tufekci, everyone has access to most of Twit- ter’s data, but not everyone can access Google or Facebook data. Furthermore, as discussed by Boyd and Crawford, Big Data processing requires access to large computers, which are available in some facilities but not others. As well, Big Data research is accessible to people with the required computational skills but not to others. All these requirements for working in the field of Big Data Analysis create a divide that will perpetuate itself since students trained in top-class universities where large computing facilities are available, and access to Big Data may have been paid for, will be the ones publishing the best quality Big Data research and be invited to work in large corporations, and so on. As a result, the other less fortunate individuals will be left out of these interesting and lucrative opportunities.
This concludes our discussion of the effect of Big Data Analysis on the world, as we know it. The next section takes a look at the various scientific contributions made in the remainder of this volume and organizes them by themes and applications.
4 Edited Volume’s Contributions
The contributed chapters of this book span the whole framework established in this introduction and enhance it by providing deeper investigations and thoughts into a number of its categories. The papers can be roughly divided into two groups: the problem-centric contributions and the domain-centric ones. Though most papers span both groups, they were found to put more emphasis toward one or the other one and are, therefore, classified accordingly.
In the problem-centric category, we present four chapters on the following topics:
1. The challenges of Big Data Analysis from a Statistician’s viewpoint 2. A framework for Problem-Solving Support tools for Big Data Analysis
3. Proposed solutions to the Concept Drift problem
4. Proposed solutions to the mining of complex Information Networks
In the domain-centric category, we present seven chapters that fit in the areas of Business, Science and Technology, and Life Science. More specifically, the papers focus on the following topics:
1. Issues to consider when using Big Data Analysis in the Business field 2. Dealing with data uncertainties in the Financial Domain
3. Dealing with Capacity issues in the Insurance Domain
4. New issues in Big Data Analysis emanating from the Internet of Things 5. The mining of complex Information Networks in the Telecommunication Sector 6. Issues to consider when using Big Data Analysis for DNA sequencing
7. High-dimensionality in Life Science problems
We now give a brief summary of each of these chapters in turn, and explain how they fit in the framework we have created. A deeper discussion of each of these contributions along with their analysis will be provided in the conclusion of this edited volume. The next four paragraphs pertain to the problem-centric type of papers.