ADVANTAGES AND DISADVANTAGES OF PRIMARY AND SECONDARY DATA Primary data is explicitly collected for a specific goal, including the design, method, and data analysis techniques to be util
Trang 1
NATIONAL ECONOMICS UNIVERSITY
STATISTICS
ass: Auditing CFAB K64
oup 4: Lê Nguyễn Ngọc Hảo (11222182)
Nguyễn Diệu Anh (11220318)
Nguyễn Phương Thảo (11225911)
Pham Minh Anh (11220538)
Luc Thi VG Anh (11220288)
Nguyễn Thị Thu Hà (11221942)
Lê Thị Nhàn (11224867)
Lê Hà Phương Anh (11220234)
Lê Minh Thu (11226042)
guyễn Huyền Trang
Trang 2TABLE OE CONTENTS
A ADVANTAGES AND DISADVANTAGES OF PRIMARY AND SECONDARY
IT, Reasons to choose the ÍODÍC: Ăn 0 TH TT TH n0 0v 000 cụ iv 20
TƯ SHIHHDHHHĐ HICÍÏHOỨ: Q0 HT TH HH HH c0 0000090000 00 0 80800 994909 21
Trang 3A ADVANTAGES AND DISADVANTAGES OF PRIMARY AND SECONDARY DATA
Primary data is explicitly
collected for a specific goal,
including the design, method,
and data analysis techniques to
be utilized Primary data
collectors have complete control
over the process For a survey,
for example, this means that the
questions will be created
specifically to achieve that goal
To obtain a better outcome, it is
also feasible to make any
modifications or updates to the
questionnaire or the data
collection process If any
problems appear or there are
things that must be clarified in
the questionnaire, changes can
be made
- Updated information:
Primary data is first-hand data
and likely to be the most recent
at the moment it is collected
- Higher reliability and
accuracy: This is a result of the
fact that there is more control
over the data and the fact that
primary data is newly collected
The data collected by others
might contain mistakes or might
be trying to deliberately mislead
people, but newly collected data
is more likely to be accurate, and
- Lower cost or free:
Most secondary sources can be accessed for free or for very minimal cost It saves you time and effort in addition to money Secondary research enables you to obtain data without having to put any money on the table, in contrast
to primary research, which requires you to plan and carry out a whole primary study procedure from the
outset
- Time-saving:
As the above advantage suggests, you can perform secondary research in no time Sometimes it
is a matter of a few Google searches to find a source of data
- Generate new insights from previous analysis:
Re-analyzing old data can bring unexpected new understandings and points of view or even new relevant conclusions
- Large sample size:
Secondary data often covers large populations, providing a broader perspective and increased statistical power
- Longitudinal analysis:
Secondary data allows you to perform a longitudinal analysis which means the studies are
Trang 4
the collectors are going to give a
higher level of attention to
details As collectors are
collecting data for themselves,
they will generally be truthful
and reliable However, this is
under the assumption that the
collectors have enough expertise
- The target problem is dealt
with: People who are engaged in
the collection of data prepare the
questionnaire and sometimes
conduct interviews from the
targeted group to obtain data
Also, the problem is addressed
so that after proper feedback it
could be put in the limelight and
can be resolved In this way, the
program can be made
productive, and problems also
can be easily handled
- A better understanding of
data:
Data collected by someone else
might be wrongly interpreted
For primary data, however, the
collector should be the one who
understands the data and the
method they use to collect the
most There will be no
misunderstanding
- Ownership:
With the approval of the people
who were surveyed, the
researchers of the data will have
complete control over whether
the core data is made public,
patented, or sold to other parties
In order to maintain their
performed spanning over a large period of time This can help you
to determine different trends In addition, you can find secondary data from many years back up to a couple of hours ago It allows you
to compare data over time
- Availability:
In general, it is simple to access secondary data sources Anyone can find the information gathered
by other researchers, particularly when using the Internet There are many sources of information available for reference, and those who are unfamiliar with other methods of data generation can benefit from them
Trang 5
competitive advantage (being the
first to gather and evaluate the
data), the collector can keep the
information secret and
inaccessible to anybody else
(such as their competitors)
It takes a lot of time to collect
data from raw sources For
secondary data, data is gathered
from already processed sources,
which makes the process much
easier
- Higher cost and labor:
Experts will need to use specific
tools and programs, as well as
employ workers, to collect data
These are pricey chores Primary
data collection may not be
feasible on one's own
Additionally, there are situations
where respondents to surveys or
questionnaires must be
compensated As a result, it will
be more difficult and require
more resources to find the
suitable candidates Sometimes
this can be impossible
- The questionnaire must be
easy and understandable:
The questionnaire prepared must
be easy to understand then only
the researchers may get correct
and valid feedback The
researchers have to make the set
of sample questionnaires in such
a way or use the method or
technique that may help the
people to interpret it easily if not
- Outdated or incomplete information:
The data provided through different sources may also be outdated as it has been stored and managed for many years
Therefore, it may also sometimes
be outdated and may not be relevant for today’s scenario
- Lack of control over the collection process:
The secondary data might lack quality It is a limitation of secondary data that the data collected over the past few years may be inaccurate The source of the information may be questionable, especially when you gather the data via the Internet As you rely on secondary data for your data-driven decision-making, you must evaluate the reliability of the information by finding out how the information was collected and analyzed
- Anyone can access data: Data is not being privatized by the person who owns it; anyone who wants to do study on the subject can access the data There is no data secrecy, but the person who accesses the data cannot contest its possession or ownership
Trang 6
the feedback which is produced
will be wrong or inaccurate
- High difficulty and expertise
required:
It may not always be possible for
non-experts or inexperienced
persons to apply the best
technique or to create the ideal
survey that will meet their goals
There's also a risk that the
feedback was gathered
incorrectly since the
mexperienced collectors used the
wrong technique After gathering
the raw data, they might also
want the assistance of a
specialist to perform data
to the needs of the researcher Because of this, the secondary data may not be dependable for your
present requirements You can get
a ton of information from secondary data sources, but quantity is not always a good indicator of relevance
- Bias:
As the secondary data 1s collected
by someone else than you, typically the data is biased in favor
of the person who gathered it This might not meet your requirements
as a researcher or marketer
B TYPES OF SAMPLING
I Probability sampling
Probability sampling is a sampling technique in which researchers choose samples from
a larger population using a method based on the theory of probability This sampling method considers every member of the population and forms samples based on a fixed
process
There are 5 types of probability sampling methods: Simple random sampling, Systematic sampling, Stratified random sampling, Cluster sampling, Multi-stage Random
sampling
Simple random sampling samplin pling samplin pling samplin pling
Definition | Simple random —_| Systematic Stratified Cluster
sampling is a sampling is a random sampling | sampling is a
Trang 7
of a population at regular intervals
It requires the selection of a starting point for the sample and sample size that can be repeated at regular intervals
This type of sampling method has a predefined range, and hence this sampling technique is the least time- consuming
is amethod in which the researcher divides the population into smaller groups that don’t overlap but represent the entire population
While sampling, these groups can
be organized and then draw a sample from each group separately
method where the elements in the population are first divided into separate groups called clusters Each element of the population belongs to one and only one cluster A simple random sample of the clusters is then taken This method works best when each cluster is a representative small-scale version of the entire population
- Types of Cluster Sampling + One-stage cluster sampling: + Two-stage cluster sampling
the stratifying variable
- Step 2: Divide the sampling - Step 1: Define
the population
- Step 2: Divide your sample into clusters
Trang 8
- Step 4: Divide
frame into strata
or categories
- Step 3: Draw a systematic or random sample
of each stratum
- Step 3: Randomly select clusters
sample size from | the population of sample the
numbers into groups of k
individuals
- Step 5: Select k= N
correspond to the
randomly chosen | - Step 5:
numbers Randomly select
one individual from the Ist
group
- Step 6: Select
every kh
individual thereafter
Example | The instructor of | Let’s say a Researchers are | - One-stage
CFAB 64 audit | manager ofa looking to cluster class made the department store | analyze the sampling: decision to call | wants to improve | characteristics of | A bakery owner any student from | the experience of | people belonging | is planning to the class list to his customers by | to different ages | expand her respond to the collecting He wants to business questions in feedback from know their Before that, she order to review | them For this choices in cases | wants to know the prior purpose, he has which depend on | how many Macroeconomics | asked an ages For people from the lecture Knowing | employee to example: less neighborhood that the class has | stand by the than 10, 10-20, | buy her bakery
a total of 54 entrance and 20-30, 30-40 products She
students, in this survey every 30" splits the
Trang 9
is essentially random
neighborhood into several areas and randomly selects
customers to
form cluster samples Then she surveys every member chosen from the neighborhood for her research
- Two-stage cluster sampling: Let’s say the management of
a toy company wants to
examine how all of its outlets are performing
in the market The
management
divides the outlets based on location and randomly selects samples
to form clusters Then they used the cluster sample to study the performance
of all the outlets
- Multi-stage Random Sampling
A complex form of cluster and stratified sampling
Trang 10¢ Carried out in stages
« Using smaller and smaller sampling units at each stage
Compare the probability sample methods
This type of research involves basic
observation and recording skills It
requires no basic skills out of the
population base or the items being
researched It also removes any
classification errors that may be
involved if other forms of data
collection were being used
- Not require any additional
information except the contact
information:
Researchers only need the contact
information of the respondents to
choose random person for survey
- Reduce researcher bias:
There are two common approaches
that are used for random sampling
to limit any potential bias in the
data The first is a lottery method,
which involves having a population
group drawing to see who will be
included and who will not
Researchers can also use random
numbers that are assigned to
specific individuals and then have a
random collection of those numbers
selected to be part of the project
- It offers an equal chance of
selection for everyone within the
- Identification of all members of the population can be difficult: Only when a complete list of the entire population to be researched is available can a simple random sampling yield an accurate statistical measure of a big population Consider a list of university students or a group of workers at a particular business The availability of these lists is the issue Accessing the entire list can therefore be difficult It's possible that some colleges or universities won't want to give researchers an exhaustive list of their faculty or students Similar to this, some businesses might not be able or willing to provide information about particular employee groups due to privacy policies
- Time-consuming for a large population:
When a full list of a larger population is not available, individuals attempting to conduct simple random sampling must gather information from other sources If publicly available, smaller subset lists can be used to recreate a full list of a larger population, but this strategy takes time to complete
10
Trang 11
It allows everyone or everything
within a defined region to have an
equal chance of being selected This
helps to create more accuracy
within the data collected because
everyone and everything has a
50/50 opportunity It is a process
that builds an inherent “fairness”
into the research being conducted
because no previous information
about the individuals or items
involved is included in the data
or smaller subgroup lists from a third-party data source
- Easy to construct, execute,
compare, and understand:
The formula to choose sample
subsets is predetermined, the only
random aspect of the study is
choosing the initial subject From
there, the selection process follows
a fixed pattern until the desired
sample group is complete
Additionally, since systematic
sampling builds representative data
for the overall group, researchers
don’t need to number each subject
This means sample selection and
data analysis are quick and easy
- Samples are evenly distributed:
Systematic sampling is highly
structured, resulting in a more
authentic representation of the
overall population No matter how
diverse the group is, this selection
process produces an evenly
distributed collection of subjects
This makes their results easier to
compare, execute, and analyze
- High sampling bias if periodicity exists:
If study participants deduce the sampling interval, this can bias the population as non-participants will
be different from study participants
- Greater Risk of Data Manipulation:
There is a greater risk of data manipulation with systematic sampling because researchers might
be able to construct their systems to increase the likelihood of achieving
a targeted outcome rather than letting the random data produce a
representative answer Any
resulting statistics could not be trusted
- Success Relies on Population Count:
The effectiveness of systematic sampling depends on the initial count of the population After all, that’s the number that is divided by the desired sample size to determine
11
Trang 12- Quick and cost-effective:
The way systematic sampling is
structured makes surveys easy to
create and the data easy to analyze
This type of sampling is also
effective when the budget is nght
because the sample selection
process is relatively straightforward
with no further research needed at
the outset
the fixed interval for sample selection When the population isn’t measurable or available, researchers have to be able to make a close approximation If the population is estimated to be smaller or larger than its actual number, this can affect the samples and produce inaccurate results
Stratified
- More accurate sample:
It is more accurate than other
sampling techniques because it
divides the population into smaller
groups, or strata, based on
important characteristics This
allows researchers to gather more
precise data and make more
accurate predictions about the
larger population
- Effective representation of all
subgroups:
Stratified random sampling also
ensures that each stratum is
represented in the sample, which
helps to reduce bias and increase
the accuracy of the data This
makes it an ideal technique for
studying populations that are
diverse and have distinct
subgroups
- Comparisons:
Stratified sampling facilitates valid
comparisons between different
subgroups within the population
Researchers can analyze and
compare results across strata,
gaining insights into similarities
- Complex to apply at practical levels
Stratified random sampling is a more complex and time-consuming technique than other sampling methods This is because it requires researchers to divide the population into smaller strata, and then sample from each stratum in proportion to its size This can be a challenging task, especially for large or diverse populations
- Increased cost and time: Stratified sampling may involve increased costs, as researchers need
to collect data from multiple strata The process of identifying, selecting, and sampling from each
stratum can also consume more
time and resources compared to simpler sampling methods
12
Trang 13
and variations among different
segments
- Statistical inference:
Stratified sampling assists in
statistical inference by improving
the representativeness and reducing
bias Estimates derived from
stratified samples often yield
smaller sampling errors, thus
enhancing the reliability and
robustness of the findings
Cluster
- Convenience:
Cluster sampling simplifies the
logistics of data collection
Researchers can concentrate their
efforts on selected clusters, making
it more convenient to reach and
interview participants within those
designated areas
- Cost efficient:
Cluster sampling is generally a
cost-efficient sampling process It
allows you to gather responses from
a certain niche audience without
having to pay for the whole sample
to come from that audience (which
can be expensive, depending on
their criteria)
- Applicable where no complete
lists of units are available:
Cluster sampling should only be
considered when there are
economic justifications to use this
approach If reduced costs can be
used to overcome precision losses,
then it can be a useful tool This
advantage occurs most often when
the construction of a complete list
- May not be representative of the whole population:
Cluster sampling can provide a wonderful dataset that applies to a large population group It is also essential to remember that the findings of researchers can only apply to that specific demographic That’s why generalized findings that apply to everyone cannot be obtained when using this method One neighborhood is not reflective
of an entire city, just as a single state or province isn’t reflective of
an entire country
- Biased samples:
If the clusters in each sample get formed with a biased opinion from the researchers, then the data obtained can be easily manipulated
to convey the desired message It creates an inference within the information about the entire population or demographic, creating
a bias in that segment simultaneously
- Higher sampling error:
13