Data Analytics is the most powerful tool to analyze today’s business environment and to predict future developments. Is it not the dream of every business owner to know exactly what the customer will buy in 6 months or what the new product hype will look like in your OWN industry? Data Analytics is the tool that will bring you answers to these questions.Here’s why Data Analytics for Beginners will bring your business to a complete new level: How you can use data analytics to improve your business How to plan data analysis to know exactly what your target group wants How to implement descriptive analysis You will learn the exact techniques that are required to master Data Analytics
Trang 1Data Analytics for BeginnersBasic Guide to Master Data Analytics
Trang 2Table of Contents:
Introduction
Chapter 1: Overview of Data Analytics
Foundations Data Analytics
Getting Started
Mathematics and Analytics
Analysis and Analytics
Communicating Data Insights
Automated Data Services
Chapter 2: The Basics of Data Analytics
Planning a Study
Surveys
Experiments
Gathering Data
Selecting a Useful Sample
Avoiding Bias in a Data Set
Explaining Data
Descriptive analytics
Charts and Graphs
Chapter 3: Measures of Central Tendency
Time Charts and Line Graphs
Create a Line Graph in MS ExcelCustomizing Your Chart
Annual Employee Losses
Trang 3Adding another Set of Data
Histograms
Create a Histogram with MS Excel
Creating a Histogram
Scatter Plots
Create a Scatter Chart with MS Excel
Spatial Plots and Maps
Chapter 5: Applying Data Analytics to Business and Industry
Business Intelligence (BI)
Data Analytics in Business and Industry
BI and Data Analytics
Chapter 6: Final Thoughts on Data
Conclusion
Trang 4We live in thrilling and innovative times As business moves to the digital environment, virtually everyaction we take produces data Information is collected from every online interaction All sorts of devicesgather and store data about who we are, where we are, and what we are doing Increasingly-massivewarehouses of data are now freely available to the public Skilled analyses of all this data can helpbusinesses, governments, and organizations to make better-informed decisions, respond quickly tochanging needs, and to gain deeper insights into our rapidly-changing environment It is a challenge toeven attempt to make good use of all of the available data In order to answer specific questions, aperson must decide what data to collect, which methods to use, and how to interpret the results
Data analytics is a way to make valuable use all types of information Analytics is used to help
categorize data, identify patterns, and predict results Data use has become so ubiquitous that it hasbecome necessary for individuals in every profession to learn how to work with data Those who
become the most proficient at working with data in useful and creative ways will be the most successful
in the new world of business
Until recently, data analytics was limited to an exclusive culture of data analysts, who characteristicallypresented this topic in complicated and often unintelligible terminology Fortunately, data analytics isnot as complicated as many believe It simply consists of using analytical methods and processes todevelop and explain specific and useful information from data The point of data analytics is to enhancepractices and to support better-informed decisions This can result in: safer practices within an industry,greater revenues for a business, higher customer satisfaction, or any other object of focus This eBookintroduces a wide range of ideas and concepts used for deriving useful information from a set of data,including data analytics techniques and what can be achieved by using them
Trang 5Chapter 1: Overview of Data Analytics
With a little statistical understanding and procedural training, you will be able to use analytical methods
to make data-based insights Data analytics offers new ways to understand the world Businesses andorganizations were in the habit of making decisions based on assumptions and hoping for favorableoutcomes Data analytics gives people the insights that they need to plan for improvements and specificresults Analytics are generally used for the following purposes:
• To enhance business organizations and increase returns on investment (ROIs)
• To improve the success of sales and marketing campaigns
• To identify trends and emerging developments
• To make society more safe
Foundations Data Analytics
Data analytics requires the use mathematical and statistical procedures It also requires the skills to workwith certain software applications and a knowledge of the subject area you are working with Withoutknowledge of the subject-matter, analytics is reduced to simple analytics Due to the increasing demandfor data insights, every field of business has begun to implement data analytics This has resulted in avariety of analytic specialties, such as: market analytics, financial analytics, clinical analytics,
geographical analytics, retail analytics, educational analytics, and many other areas of interest
Getting Started
This chapter explains the major components comprising data analytics, gathering, exploring, and
interpreting data As a data analyst, you will be collecting and sorting large volumes of raw,
unstructured, and partially-structured data The amounts of data that you are likely to be working withcan be too large for a normal database system to effective process A data set that is too large, changestoo quickly, or it does not conform to the structure of standard database designs requires a special
skillset to manage Data analytics consists of analyzing, predicting, and visualizing data When dataanalysts gather, query, and interpret data, they conduct a process that is quite similar to data engineering.Although useful insights can be produced from an individual source of data, the blending of severalsources gives context to the data that is necessary to make more informed decisions As a data analyst,you can combine multiple datasets that are maintained in a single database You can also work withseveral different databases maintained within a large data warehouse Data can also be maintained andmanaged within a cloud-based platform specially designed for that purpose However the data is pooledand wherever it is stored, the analyst must still issue queries on the data and make commands to retrievespecific information This is typically done using a specialized database language called StructuredQuery Language (SQL)
Trang 6When using a database software application or conducting an analysis using other programming
languages, like R or Python, you can utilize a variety of digital file formats, such as:
• Comma-separated values (CSV) files: Virtually all data-based software applications (includingcloud-based programs) and scripting languages are compatible with the CSV file type
• Programming Scripts: Professional data analysts generally know how to write programmingscripts in order to work with data and visualizations in languages like Python and R
• Common File Extensions: MS Excel files have the xls or xlsx extension Geospatial
applications are saved with their own file formats (e.g., mxdextension for ArcGIS and the qgsextension for QGIS)
• Web Programming Files: Web-based data visualizations often use the Data Driven DocumentsJavaScript library (D3.js.) D3.js, files are saved as html files
Mathematics and Analytics
Data analytics requires the ability to perform mathematical and statistical operations These skills arenecessary to understand both to make sense of the data and to evaluate its relative significance This isalso important in data analytics, because they can be used to conduct data forecasting, decision analytics,and testing of hypotheses Before getting into more advanced explorations of mathematical and
statistical procedures, we will take some time to explain some distinctions between mathematics andanalytics
Mathematics relies on specific numerical procedures and deductive reasoning to develop a mathematicalexplanation of some phenomenon Like mathematics, analytics provides a mathematical description of aphenomenon Analytics is actually a type of analytics that is based on mathematics However analyticsuses inductive reasoning and probability to form a conclusions and explanations
Data analysts use mathematical procedures to make decision models, to produce estimations, and tomake forecasts In order to follow this book, you need little more than common math skills This bookwill teach you how to statistical techniques to develop insights from data In the field of data analytics,statistical procedures are used to determine the meaning and significance of data This can then utilized
to test hypotheses, build data simulations, and make predictions about future outcomes
Analysis and Analytics
The major difference between data analysis and data analytics is the need for subject knowledge Typicalstatisticians specialize in data procedures and have little-to-no knowledge of other fields of study Theymust consult with others who have subject-specific expertise to know which data to look for and to helpfind meaning in that data Data analysts, on the other hand, must understand their subject matter Theyseek to gain important insights that they can use with their subject-matter expertise to make meaning ofthose insights Below is a list of ways that subject matter experts use analytics to enhance performance
in their areas:
• Engineering analysts use data analytics with building designs
• Clinical data analysts use predictive methods to foresee future health issues
• Marketing data analysts use regression data to predict and moderate customer turnover
• Data journalists search databases for patterns that may be worth investigating
• Crime data analysts develop spatial models to identify patterns and predict future crimes
• Disaster relief data analysts work to organize and explain important data about the effects ofdisasters, which is then used to determine the types of assistance needed
Trang 7Communicating Data Insights
Data analysts often have to explain data in ways that non-technical people can comprehend They must
be able to create understandable data visualizations and reports Generally, people have to visuallyprocess data in the form of charts, graphs, and pictures for to be able to understand data Analysts have
to be both creative and practical in the ways that they communicate their findings
Organizational leaders often have difficulties trying to figure out what to do with all of data that theirorganization collects What they do know, however, is that effectively using analytical tools can helpthem to both strengthen and gain a valuable competitive edge for their business or organization
Currently, very few of these leaders know the available options for engaging in the process The
following section discusses the major data analytics solutions and the benefits that can be gained byorganizations
When implementing data analytics within an organization, there are three key methodologies One cancreate an internal data analytics department One could contract out the assignments to independent dataanalysts, or one could pay for a cloud-based software-as-a-service (SAS) solution that enables novices toutilize powerful of data analytics tools
There are a few major ways to create an internal data analytics team:
• Train current personnel This can be an inexpensive way to provide an organization with
ongoing data analytics This training can be used to transform certain employees into skilled subject-matter experts who are proficient in data analysis
highly-• Train current personnel and also hire professional analysts This strategy follows the sameprocess as the first method, but also includes hiring a few data professionals to oversee theprocess and personally handle the most challenging problems and tasks
• Hire data professionals An organization get their needs met by hiring or contracting with
professional data analysts This is the most expensive option, because professional data analystsare in low supply and generally have high salary requirements
Securing highly-skilled data analysts to meet the needs of an organization can be extremely difficult.Many businesses and organizations outsource their data analytics jobs to external experts This happens
in two different ways: They contract with someone to develop a wide-ranging data analytics plan toserve the entire organization Another way is to contract with experts to provide individual data analyticssolutions for specific situations and problems that that their organization may encounter
Automated Data Services
Although you must understand some certain statistical and mathematical procedures, it is not essential tolearn how to code like professional analysts Computer program applications have been developed thatcan help to provide powerful capabilities without having to code or script Cloud-based platform
solutions can provide organizations with most or all of their data analytics needs, although training isstill required for personnel to operate the cloud platform programs
This book will teach you how to use the power of data analytics to achieve a individual and
organizational goals Regardless of a field of work, learning data analytics can help you to become a
Trang 8more in proficient and sought after professional Below is a brief list of benefits that data analyticsprovide for various areas:
• Benefits for corporations: Cost minimization, higher return on investment (ROI), increased productivity, reduction of customer loss, higher customer satisfaction, sales forecasting, pricing-model enhancement, loss detection, and more efficient processes
staff-• Benefits for governments: Increased staff-productivity, improved decision-making models, morereliable budget forecasting, more efficient resource allocations, and discovery of organizationalpatterns
• Benefits for academia: More efficient resource allocations, improved instructional focus andstudent performance, increased student retention, refinement of processes, reliable budget
forecasting, and increased ROI for student recruitment practices
This chapter provided an introduction to the concept of data analytics Analytics is a growing field ofscience that brings together traditional statistical procedures and computer science in order to ascertainmeaningful insights from huge sets of raw data for the benefit of businesses, organizations,
governments, and society Data analytics is sometimes confused with Business Intelligence (BI) because
of the common tools they both share, particularly data visualizations, such as traditional charts andgraphs BI, however, is a discipline designed for business leaders without the advanced training
necessary to engage in data analytics The following chapter discusses the basic principle of data
analytics
Trang 9Chapter 2: The Basics of Data Analytics
This Chapter will help you to understand the big picture of the field of analytics It will discuss the steps
of the scientific method, and it will help you to learn how to apply analytics at each step of the scientificprocess Analytics does not only consist of analyzing data It also consists of using the scientific process
to find answers to questions and make important decisions The process includes designing studies,gathering useful information, explaining the data with figures and charts, exploring the data, and
drawing conclusions We will now examine each step in this process and discuss the critical role ofanalytics
Planning a Study
Once the research question is established, it is time to design a study to answer that specific question.This requires figuring out the methods that you will use to extract the necessary data This section coversthe two main types of studies: descriptive studies and experimental studies
Surveys
With a descriptive study, data are gathered from people in a way that does not have an impact on them.The most widely used type of descriptive study is a survey Surveys are questionnaires that are given topeople who are randomly selected from a target population Surveys are useful data tools for gatheringinformation As with all methods of gathering data, improperly conducted surveys are likely to result ininaccurate information Common issues with surveys include inadequately worded questions, which can
be confusing, lack of participant response, or lack of randomization in the selection process Any ofthese problems can invalidate the results of the survey, therefore surveys must be carefully plannedbefore they are implemented
A limitation of the survey method is that they can only provide information on relationships that existbetween variables and not information on causes and effects If the survey researchers observe that thepeople who smoke cigarettes, for example, tend to work longer hours per day than those who do notsmoke, they are not in a position to suggest that smoking is the cause for the longer work hours
Variables that were not part of the research design might cause the relationship, such as number of hoursthey sleep every night
Experiments
Experiments involve the application of one or more treatments to subjects in a controlled environment.The treatments are things that may or may not affect the subject under study Some studies involvemedical experiments, wherein the subjects are patients who undergo medical treatments Other
experiments might include students who receive tutoring, or exposure to a particular instructional tool asthe treatment Businesses engage in experiments that involve sample participants from the consumermarket These participants may be exposed to a certain type of advertisement and asked how they wereemotionally affected
Once the treatments are applied, the responses are systematically recorded For instance, to study theeffect of a drug dosage amount on blood pressure, a group of subjects may be administered 15mg of amedicine A different sample group may be administered 30 mg of the same drug Typically, a controlgroup is also involved, where subjects each receive a placebo treatment (i.e., a substance with no
medicinal properties)
Trang 10Experiments are often designed to take place in a controlled setting, in order to reduce the number ofpotential unrelated variables and possible biases that might affect the results Some possible problemsmight include: researchers knowing which participants received particular treatments; a particular
circumstance or condition, not factored into the study, that may impact the results (e.g., other
medications that a participant may be taking), or not including an experimental control group However,when experiments are designed correctly, difference in responses, found when the groups are compared,allow the researchers to conclude that there is a cause and effect relationship No matter what the study,
it must be designed so that the original questions can be answered in a credible way
Selecting a Useful Sample
In analytics, as with computer programming, garbage in results in garbage out If subjects are
improperly chosen, for example by giving some more of a chance to be selected than others, the resultswill be unreliable and not useful for making decisions For example, John is researching the attitudes ofindividuals about a possible new tax John stands in front of a local grocery store and asks passers-by toshare their thoughts and attitudes The problem with that is that John will only get the attitudes of a)individuals who shop at that grocery store; b) on that specific day; c) at that specific time; d) and whoactually chose to participate Because of his limited selection process, the subjects in his survey are notrepresentative of the entire population of the town Likewise, John design an online survey and askpeople to input their feedback on the new tax However, only people who are aware of the website, haveaccess to the Internet, and choose to participate will provide data Characteristically, only people withthe strongest attitudes are likely to participate Again, these the participants would not be representative
of everyone in the town
In order to avoid such selection bias, it is necessary to select the sample randomly, using some type ofprocess that gives everyone in the population the same statistical opportunity to be chosen There arevarious methods for randomly selecting subjects in order to get valid and useable results
Avoiding Bias in a Data Set
If you were conducting a phone survey on political voting preferences, and you made your calls topeople’s land lines at home between the hours of 8:00 a.m and 4:00 p.m., you would fail to get feedbackfrom individuals who work at that time Perhaps those who work during those hours have differentpreferences than those at home during those hours For example, more business owners may be at homeand express voting preferences for something completely different than members of the working class.Surveys that are poorly designed may be too lengthy, resulting in some participants quitting before theyfinish Participants may not be completely honest if the questions are too personal If the list of choices
is too limited, the survey will not be able to capture valuable data that people would have provided.Many things can render survey data invalid
Trang 11Experiments can be even more problematic in terms of gathering data If you want to test how wellpeople retain information when exposed to loud music, a variety of factors could affect the outcomes.The experiment designer should consider if everyone will listen to the same song, if they will be askedabout the amount of sleep they got the night before, if they have prior knowledge about the type ofsubject matter, how they feel about being there participating in the experiment, whether they use drugs
or alcohol regularly, and a host of other considerations that must be considered in order to control foroutside variables
Explaining Data
Once data has been collected, it is time to compile it in order to get a view of the entire data set
Analysts describe data in two basic ways: with images, like graphs and charts, and with figures, calleddescriptive analytics Descriptive analytics are the most commonly-used methods for describing data tothe general population When used effectively, a chart or graph can easily explain volumes of data in asingle snapshot
Descriptive analytics
Data can be summarized by using descriptive analytics Descriptive analytics are numerical
representations of data that highlight the most important features of a dataset With categorical data,wherein everything is sorted into groups (e.g., age, gender, ethnicity, currency, price, etc.) things areusually summarized by the number of units in each category This is referred to in terms of frequency orpercentage
Numerical data consists of literal quantities or totals (e.g., height, weight, amount of money, etc.),wherein the actual numbers are meaningful When working with numerical data, more aspects can besummarized than just the number or percentage within each category Such elements include measures
of middle (i.e., the center point of the data); measures of variance (i.e., how widely spread or how
tightly-clustered the data are around the center) Another consideration is a measure of the relationshipbetween different variables
Depending on the particular situation, certain descriptive analytics are more appropriate than others Forexample, if you were to assign the codes 1 for men and 2 women, when analyzing the data, it would notmake sense to attempt to average those numbers Likewise, attempting to use a percentage to explain asingular amount of time would not be useful
Another type of data, ordinal data, is somewhat of a combination of the first two types Ordinal dataappear are in categories, however the categories have a hierarchical order, such as rankings from 1 to 10,
or student ranks of freshman through seniors This data can be analyzed the same way as categoricaldata Numerical data procedures can also be used when the categories represent meaningful numbers
Charts and Graphs
Data can be presented visually with graphs and charts Such graphs include pie charts and bar charts,which can be used with categorical variables like gender or type of car A bar graph might present dataabout attitudes using, for example, a series of five ordered bars labeled from “Strongly Disagree”
through “Strongly Agree.”
Not all data, however, can be presented clearly with these types of charts Numerical data, such asheight, time, or dollars that represent measures of something or totals require the types of graphs thatcan either summarize the numbers or group them numerically One such graph that is a histogram, whichwill be discussed later in this book
Trang 12Once the data is collected and described with pictures and numbers, it is time to begin the process ofdata analysis Assuming that the study was planned well, the research question can be properly answered
by applying an appropriate data analysis As with all previous steps in the process, selecting an
appropriate analytical procedure determines the usefulness of the results
This chapter discussed the foundations of data analytics Using mathematical techniques and scientificprocedures to collect, measure, analyze, and draw conclusions from data is what data analytics is allabout The following chapter discusses the major kinds of data analyses necessary to conduct effectivedata analytics In the following chapter you will learn the basics of calculating and measuring commondescriptive analytics for measuring central tendency and variation within a set of data, as well as theanalytics necessary to evaluate the relative position of a specific value within that data set
Trang 13Chapter 3: Measures of Central Tendency
The essence of data analytics is their analysis of data Analysts use analytical procedures to make senseout of large amounts of data and their characteristics Analytical methods can be applied to find
commonalities within groups of people; which can then be used to influence the decisions that theymake This is done all of the time by advertisers and politicians A governmental department, for
example, may want to find out the average number of people below the age of 18 that use smokelesstobacco products Based on the results of their study, the department could propose a new requirementthat smokeless tobacco products be restricted from advertising near schools Likewise, a fashion
designer might want to learn the height and weight of U.S women with full time jobs A great deal ofdata analytics is conducted to find averages and other measures of central characteristics among sets ofdata
When investigating a total of 100 units, it can be convenient to gather the entire population and applymeasurements When dealing with larger numbers, reaching the millions or even billions, measuring theentire population can be slightly more challenging In situations, it is necessary to take random samplesfrom the total population and allow the sample to represent the total group This section discusses wesome of the essential principals of data analytical that lay the foundation for all types of data analysis.These important concepts are: mean, median, mode, variance, and standard deviation
Mean
The mean or average of a set of data, is the sum of all the numbers within a group divided by the number
of units in the group The mean of a group is a representative property of the collective group Usefulassumptions can be made about an entire set of data by figuring out its mean The formula for
calculating the mean is below
Mean = Sum of all the set elements / Number of elements
For example: (1+2+3+4+5) / 5 = 3
The mean of a data set summarizes all of the data with a single value An analyst might want to comparethe average price of houses between to different neighborhoods In order to compare theses housingprices, it would be illogical to compare the price of each individual house to the price of every otherhouse in the study The best way to approach this research question would be to find the mean prices ofhouses in each of the two neighborhoods, and then compare the two means with each other By doingthis, the analyst will be able make a valid assumption about which neighborhood has the more expensivehouses
Median
Median is the middle number of a data set For a set of data that is composed of an odd number of
values, the value in the middle the median For a set of data composed of an even number of values, isthe average of the two middle numbers is the median The median is commonly utilized to divide acollection of data into two separate halves
In order to find the median of a set of data, write the numbers of the set in order from smallest to largest,and count the number of units and identify the one or two numbers in the center This is different from
Trang 14calculating the mean, because the range of number values is not taken into consideration Consider thisset of numbers: (1, 2, 3, 4, 20):
Mean: (1+2+3+4+20) / 5 = 6 Median: (1, 2, 3, 4, 20) = 3
The median of a data set is important, because it is not affected by abnormal deviations in the data set
As we can see in our example, the value "20" disproportionately affects our median, making it appear asthough half of the values would be below 6 and the other half above 6 The mean, in this case, does notprovide a realistic representation of the data set If the values represented dollars per week in allowance,
it would appear that the individual receives amounts that are half over and half under $6, when in fact,the person would have only once received more than $4 The median, in this case, provides us with amore accurate description of the contents of the data set Bear in mind that this small collection of dataonly consists of 5 values, so it is easy to understand with a quick glance When the data set containshundreds of thousands of values, accurate estimations cannot be made with a quick glance
The most significant feature of this data set is the single outlier that raises the mean An outlier is an
outstanding deviation from the majority of the data set For instance, if a set of data contains the values:
10, 20, 30, 40, 1000, the value 1000 is considered an outlier Outliers can move the value of the mean farfrom its logical central location The mean of the above set is 1100/5=220 and the median is 30 Themedian of this more accurately represents the data set than does the mean
Mode
In a data set, the mode is the value that occurs most frequently Mode is a measure of central tendencylike mean and median The mode also represents a set of data with a single value For instance, the mode
of the dataset (1,2,3,3,3,4,4,4,4,4,5,5,6,7) is 4, because it appears more than any other value
If a data set has a normal distribution of values, the mode is equal to the values of the median and themean With data distributions that are skewed (not standard), the mean, median, and mode values mayall be different Data is symmetrical to the central value in a normal distribution The distribution curve
in a normally-distributed data set is also symmetrical to an axis Also, in a perfectly normal distribution,half of the data values are lower than the mean, and the other half are higher
Variance
It is sometimes necessary, and always a helpful, to measure the variation from the mean value within aset of data As we saw earlier, one or two outliers can result in an inaccurate representation of the dataset For example, a large variance within family income data for a city may suggest that a mostly poorpopulation, with a few wealthy members, is earning more than a solidly middle-class population
Measuring variance adds context to a standard data analysis Below is the procedure for finding
Trang 15Subtract the difference between the mean of the entire data set and the all of the individual values in thedata set (using absolute values no negative numbers).
is used to calculate the Standard Deviation, which is a critical concept of data analytics.
Standard Deviation
Standard deviation is a single value that represents how widely spread the values in a data set are fromthe central value (mean) The more spread out a data distribution is, the greater its standard deviation.This value provides a precise measure of how widely dispersed the values are in a dataset, allowing formore advance statistical analyses The standard deviation is determined by squaring the variance of thedata set Standard deviation is derived by calculating the square root of the variance Therefore, standarddeviation is a highly reliable analytical value that can be used to conduct sophisticated analytical
procedures Standard deviation is also necessary to perform probability calculations, making it thatmuch more important to data analytics
Step 1
Calculate the variance of the data set This is necessary to find the standard deviation
In our earlier example the variance was 2.5
distribution is not very uniform and, therefore, it is less representative of the average member), we must
normalize it by calculating the Coefficient of Variation.
Trang 16monthly income in the neighborhood of $5,000 If the standard deviation is low, then we may tend toconsider the population generally affluent.
With standard deviation, 68% of the values in a data set will always be within one standard deviation ofthe group mean Ninety-five percent of the values will be within two standard deviations of the mean.Also, 99.7% of all values in the data set will be within three standard deviations of the mean Considerthe statement, “Ninety-five percent of a town’s residents are between the ages of 4 and 84 years old To
find the mean age, you would use the formula, mean = the sum of all data values / the total number of
values (4+84/2=22) Therefore, the mean age of the population is 22 Because we already know that the
range include 95% of the total population, we can assume that at least 68% of the citizens are within onstandard deviation of 22; therefore the majority of citizens are young
Drawing Conclusions
Analysts utilize computers and formulas However neither computers nor formulas can detect if they arebeing used to perform useful operations Nor can these things determine the meaning or significance ofthe results A common error made in in analytics is to overemphasize the significance of the results, or
to apply the results to the general population, when there is no logical basis for doing so For example, aresearch team is researching which types of restaurants airline travelers prefer to frequent They
interview 100 travelers from the local airport and ask them to rate each restaurant from a provided list.They produce a top 5 list, and conclude that travelers like those 5 restaurants the most However, theyactually only know which ones those particular traveler like the most; they cannot draw conclusionsabout travelers everywhere
Analytics is much more than just numbers It is important for analysts to know how to draw sensibleconclusions from their results
This chapter discussed measures of central tendency and the role they play in data analytics Analyticalconcepts were explained, including: standard deviation, variance, relative standing, and other measures
Trang 17of variance All data analysis is affected by variation and analyses of how the values within the set ofdata are distributed Normally distributed data values strengthen both the inferences that can be drawnand the predictions that can be made from statistical procedures conducted on a set of data.
Trang 18Chapter 4: Charts and Graphs
This chapter presents visual ways to present day, including Pie Charts and Bar Graphs for categoricaldata, Time Charts for time series data, and Histograms and Boxplots for numerical data The primarypurpose for data displays is to organize and present data clearly and effectively The reader will learn themost common types of data displays used to present both categorical and numerical data Also discussedare caveats concerning data interpretation, and guidelines for data evaluation
Pie Charts
Pie chart take are used for categorical data They illustrate the percentage of individuals that in eachcategory The total of all of the pieces of the pie equal 100% Categories can clearly be compared andcontrasted with each other, due to the visually straightforward pie chart The Budgets are typicallypresented with pie charts to show how money is distributed
Total Yearly Sales by Quarter
In order to assess the accuracy of a pie chart:
• Make sure that the percentages add up to 100% or very close
• Check for pieces of the pie labeled “other” that are disproportionately big in relation to otherslices
• Verify that the pie chart consists of percentages for each category and not the literal numbers ineach group
Trang 19Items Amount Spent
Create a Pie Chart in MS Excel
Step 1 Open a new MS Excel spreadsheet Enter your data into two columns.
In this the example, the pie chart will be created to identify the relative percentage of money spent at thegrocery store The data table includes a column with the list of grocery items and another column withthe amount of money spent on each item The process is the same whether it is for a small list of
groceries or a large list of corporate transactions
Step 2 Highlight the information that you would like to include in your pie chart You do not have to
include all of the data in your table, however you must have at least 1 data record Do this by clickingand dragging your mouse over the area Be sure to include the column headings when you do this Inthis example, those would be “Items” and “Amount Spent.” This way, you can include the headings inyour chart
Trang 20Step 3 Click on the "Insert" menu on the tool bar along the top of the screen Select "Chart" from the
list of options Then select the Pie Chart
Step 4 Choose the type of pie chart you would like to make from the range of options The pie chart
options consist of a flat chart, a 3D chart, an exploded chart, a pie-of-pie chart, or a bar-of-pie chart,each option includes a section of the chart with more detail
Trang 21If you would like to preview each pie chart, click the "Press and Hold to View Sample" button,
Step 5 Click “Enter”, and review your pie chart To edit or modify your chart, right click on it and
select from the extensive range of options
Trang 22Percentages Spent on Groceries
Bar Graphs
Bar graphs are another way to summarize categorical data Like pie charts, bar graphs display data bycategory, indicating how many objects are in each group, or the percentages of each category Analyststypically us bar graphs to compare and contrast categorical groups by separating the categories for eachone and displaying the resultant bars next to each other
Average Number of Cars Sold per Month
Below is a checklist for evaluating bar graphs:
• Make sure that the units on the Y-axis are evenly spaced
• Consider units of measurement on the scale of the bar graph Smaller scales can make minordifferences appear to be huge
• If the bars represent percentages, as opposed to total numbers, look for the total number of unitsbeing summarized
Create a Bar Graph with MS Excel
Step 1 Create a data table with 1 independent variable Bar graphs are horizontal visualizations that
illustrate values or data from a single variable
Trang 23Include labels for the data and variable at the head of each column If you want to graph the number ofmilitary personnel recruited in a month, you would write "Branch" at the head of the first column and
"Recruited" at the head of the second column
Trang 24As an option, you could insert a third column containing a sub-data category The Bar Graph menuallows you to choose from a standard, clustered, or stacked bar graph The stacked bar graph thatdisplays an additional number that is related to the variable.
Trang 25Step 2 Click on the "Insert" menu on the tool bar along the top of the screen Select "Chart" from the
list of options Then select the “Bar Graph.” Click on the kind of bar graph you want from the choicesavailable in the bar menu Bar graph options include: Cylinder, 2-D, 3-D, or Pyramid Cone or shapedbar graphs