identify and define the population to be studied. ¨ identify and describe common methods of sampling ¨ discuss problems of bias that should be avoided when selecting a sample ¨ list the factors to consider when deciding on sample size ¨ decide on the sampling methods and sample size most appropriate for the research design they are developing
Study designs
Learning objectives
At the end of this session, participants should be able to: ¨ recognize and list the various types of descriptive studies ¨ understand the advantages and disadvantages of cross sectional studies ¨ know and understand the principles of planning and implementing a case- control study ¨ know and understand the potential biases associated with a case-control study ¨ know and understand the advantages and disadvantages of case-control study ¨ describe a cohort study design and indicate its strengths and weaknesses ¨ give a research question, and design an appropriate cohort study to investigate the problem ¨ describe a RCT design and indicate its strengths and weaknesses ¨ describe potential sources of bias in RCTs
Introduction
A study may involve different study designs Study design characteristics include type of data (qualitative vs quantitative), the type of comparisons (with or without control group), the type of setting or unit of analysis chosen, etc Therefore, the selection of a research strategy is the core of a research design and is probably the single most important decision the investigator has to make This section deals on the different types of epidemiological research designs.
Selection of study design
Depending on the existing state of knowledge about a problem that is being studied, different types of questions may be asked which require different study designs Some examples are given in the fallowing table:
Table 1: Research questions and study designs
State of knowledge of the problem
Types of research questions Types of study design
Knowing that a problem exists but knowing little about its characteristics or possible causes
What is the nature/magnitude of the problem ã Who is affected? When and where? ã How do the affected people behave? ã What do they know, believe, and think about the problem?
Descriptive Case studies Cross-sectional surveys Qualitative methods
Suspecting that certain factors contribute to the problem
Are certain factors associated with the problem? (e.g Is lack of school sex education related to high incidence of STD?)
Cross-sectional comparative Case-control studies Cohort studies Having established that certain factors are associated with the problem, desiring to establish the extent to which a particular factor causes or contributes to the problem
What is the cause of the problem?
Will the removal of a particular factor prevent or reduce the problem (e.g stopping Khat, stopping smoking, providing safe water)
Experimental or quasi- experimental study designs
Having sufficient knowledge about cause to develop and assess an intervention which would prevent, control or solve the problem
What is the effect of a particular intervention/strategy? (e.g new drug, special educational programme)
Experimental or quasi- experimental study designs
The type of study design chosen depends on (see examples in Table 1): ¨ The type of problem ¨ The knowledge already available about the problem and ¨ Resources available for the study
Observational versus Experimental (Intervention) studies
Observational study design is the more common approach in public health for testing hypotheses The investigator can only observe the occurrence of disease in people who are already segregated into groups on the basis of some exposure In this kind of study, allocation into groups on the basis of exposure to a factor is not under the control of the investigator
The experimental (intervention) study is an epidemiologic design that can provide data of high quality The distinguishing characteristic of experimental study design is that the investigators themselves allocate the exposure
Although an experiment is an important step in establishing causality, it is often neither feasible nor ethical to subject human beings to risk factors in etiological studies Therefore, experimental studies are not commonly done.
Observational studies
Observational studies are classified into two as descriptive and analytical studies The following sections provide detailed descriptions
When an epidemiological study is not structured formally as an analytical or experimental study, i.e when it is not aimed specifically to test a hypothesis, it is called a descriptive study
Descriptive studies characterize the occurrence and distribution of problems by time, place and person The wealth of material obtained in most descriptive studies allows the generation of hypotheses, which can then be tested by analytical or experimental designs
A descriptive study assesses morbidity or mortality in a population and the occurrence and distribution in population groups according to (1) characteristics of persons, (2) characteristics of place, and (3) characteristics of time
The numbers of events (mortality or morbidity) are enumerated and the population at risk identified Rates, ratios and proportions are calculated as measures of the probability of events One must be careful to use the right measurements and the right ‘denominators’ when assessing these measures of probability
The case report is the type of descriptive study that gives a detailed report of single patient
Classical example: In 1941 Gregs (An Australian Ophthalmologist) reported a new syndrome of congenital cataract linked to rubella in the mother during pregnancy
Clinical observation such as this can give the first clues in the identification of a new disease and the effect of an exposure
A case series is a descriptive study that reports a series of cases of a specific condition, or a series of treated cases These represent the numerator of disease occurrence, and should not be used to estimate risks
Example: In the 1940s, Alton Ochenser, USA, observed that virtually all of the patients on whom he was operating for lung cancer gave a history of cigarette
6 smoking Based on his case series observation he hypothesized that cigarette smoking was linked with lung cancer
In classical infectious disease epidemiology, a case series is often used as an early means of identifying the presence of epidemic
Ecological descriptive studies: when the unit of observation is an aggregate
(e.g family, clan or school) or an ecological unit (a village, town or country) the study becomes an ecological descriptive study
As mentioned earlier, hypothesis testing is not generally an objective of the descriptive study However, in some cross-sectional surveys, and ecological studies some hypothesis testing may be appropriate
Descriptive cross-sectional studies or community (population) surveys: cross-sectional studies entail the collection of data on, as the term implies, a cross-section of the population, which may comprise the whole population or a proportion (sample) of it Many cross-sectional studies do not aim at testing a hypothesis about an association, and are thus descriptive They provide a prevalence rate at a particular point in time (point prevalence) or over a period of time (period prevalence) The study population at risk is the denominator for these prevalence rates Included in this type of descriptive study are surveys in which the distribution of a disease, disability, and nutritional status is assessed This design may also be used in health systems research to describe
‘prevalence’ by certain characteristics – pattern of health service utilization and compliance – or in opinion surveys A common procedure used in family planning and in other services is the KAP survey (survey of knowledge, attitudes and practice)
Trend studies: data may be collected at different points in time, and changes in the pattern are analyzed Though different study subjects are studied at each time, each sample can represent the same type of population
It should be noted that trend studies often involve a rather long period of data collection In most cases, the same researcher does not personally collect the data used in a trend study, but instead conducts a secondary analysis of data collected over time by several other observers or routinely collected data
Table 2: Advantages and disadvantages of cross sectional studies
Advantages Disadvantages ¨ They are relatively quick and inexpensive ¨ Often a good first step for a cohort study ¨ Provide prevalence information ¨ Researcher has control over the selection of study subjects ¨ Researcher has control over the measurements used ¨ Can study several factors or outcomes at the one time ¨ Often provides early clues for hypothesis generation ¨ Does not allow the true temporal sequence of exposure and outcome to be ascertained, therefore unable to shed light on cause and effect associations ¨ Potential bias in measuring exposure ¨ Potential sampling and/or survivor bias ¨ Not feasible for rare conditions ¨ Does not yield incidence or true relative risk
An example of a cross-sectional study
An indigenous malaria transmission in the outskirts of Addis Ababa, Akaki Town and its environs
Adugna Woyessa 1 , Teshome Gebre-Micheal 2 , Ahmed Ali 3
Abstract Background: In recent years malaria is becoming endemic in highland areas beyond its previously known upper limit of transmission Assessment of the situation of the disease in such areas is necessary in order to institute appropriate control activities
Objectives: The objectives of the study were to determine the prevalence of malaria, the parasite species involved and Anopheles species responsible in local malaria transmission
Methods: A systematic sampling technique was used to select survey households
Blood films were collected monthly between October and December 1999 from all household members by a trained and experienced laboratory technician Larval and adult mosquitoes were monthly collected using different methods from September 1999 to October 2000
Results: Among 2136 examined blood films, 78(3.7%) of them were malaria positive of which 54(69%) were due to Plasmodium vivax and 24 (31%) due to P falciparum Anopheles gambiae s l (presumably An arabiensis) and An christyi were the dominant man-biting species, with the former being the major vector in the area Both these species were found to be more of exophagic and active in the early evening, unlike An pharoensis, which showed an endophagic tendency
Conclusion: This study indicated that indigenous transmission of malaria occurs in the study area Transmission is reckoned to be maintained by low density of vector species for short period of time under favorable conditions Therefore, the acquisition of communal immunity is interrupted by long duration of non-malaria season leading to the occurrence of recurrent malaria epidemics [Ethiop.J.Health Dev 2004;18(1):2-7]
Observational studies, where the primary goal of a study is establishing a relationship (association) between a ‘risk factor’ (etiological agent) and an outcome (disease), it is termed analytical Analytical studies always require having a comparison group
The basic approach in analytical studies is to develop a specific, testable hypothesis, and to design the study to control any extraneous variables that could potentially confound the observed relationship between the studied factor and the disease The approach varies according to the specific strategy used as described below for case-control and cohort studies
Case-control study design is design where by people diagnosed as having a disease (cases) are compared with persons who do not have the disease (controls) to determine if the two groups differ in the proportion of persons exposed to a specific factor or factors
Experimental Studies
The experimental study, or clinical trial, is an epidemiologic design that can provide data of high quality As in a cohort study, individuals are enrolled on the basis of their exposure status: however, the distinguishing characteristic of an experimental study design is that the investigators themselves allocate the exposure
The experimental study is the best epidemiological study design to prove causation It can be viewed as the final or definitive step in the research process The experimenter (investigator) has control of the subjects, the intervention, outcome measurements, and sets the conditions under which the experiment is conducted In particular, the investigator determines who will be exposed to the intervention and who will not This selection is done in such a way that the comparison of outcome measure between the exposed and unexposed groups is as free of bias as possible
In health research, we are often interested in comparative experiment, where one or more groups with specific interventions is compared with a group unexposed to interventions (clinical trials) or exposed to the best treatment currently available The effect of the new interventions on one or more outcome variables is compared between the groups by the use of statistical procedures Two types of comparative experiments, the randomized clinical trial (RCT) and the community intervention trial (CIT) are discussed in this section
FIGURE 3 FLOW CHART OF AN EXPERIMENT
Experimental (study) population Inclusion/exclusion criteria
1.6.1 The randomized clinical trial (RCT)
The most commonly encountered experiment in health science research, and the research strategy by which evidence of effectiveness is measured, is the randomized, controlled, double blind clinical trial, commonly known as the RCT Clinical trials may be done for various purposes Some of the common types of clinical trial (according to purpose) are: a prophylactic trials, e.g immunization, contraception; b therapeutic trials, e.g drug treatment, surgical procedure; c safety trials, e.g side effects of oral contraceptives and injectables; d risk-factor trials, e.g proving the etiology of a disease by inducing it with the putative agent in animals, or withdrawing the agent (e.g smoking) through cessation
Therapeutic trials may be conducted to test efficacy (e.g does a therapeutic agent work in an ideal, controlled situation?) or to test effectiveness (e.g after having established efficacy, if the therapy is introduced to the population at large, will it be effective when having to deal with other co-interventions, confounding, contamination, etc.?)
The intervention in a clinical trial may include: ¨ drugs for prevention, treatment or palliation; ¨ clinical devices, such as intrauterine devices; ¨ surgical procedures, rehabilitation procedures; ¨ medical counseling; ¨ diet, exercise, change of other lifestyle habits; ¨ hospital services, e.g integrated versus non-integrated, acute vs chronic care; ¨ risk factors; ¨ communication approaches, e.g face-to-face communication vs pamphlets; ¨ different categories of health personnel, e.g doctors versus nurses; ¨ treatment regimens, e.g once-a-day dispensation versus three times a day
The major difference between Randomized Clinical Trials and Community Intervention Trials is that the randomization is done on communities rather than individuals The classic example of a community intervention trial would be that of testing a vaccine Some communities will be randomly assigned to receive the
18 vaccine, while other communities will either not be vaccinated, or will be vaccinated with a placebo Another example would be a test of whether the introduction of iron-fortified salt in the community would reduce the incidence of anemia in the community Communities selected for entry to the study have to be similar as much as is possible, especially since only a small number of communities will be entered
Very often, blinding is not possible in these types of studies, and contamination and co-interventions become serious problems Contamination occurs when individuals from one of the experimental groups receive the intervention from the other experimental group For example, in the study of iron-fortified salt, some of the members of the community receiving non-fortified salt might hear about the fortified salt, and may acquire it from the other community (The reverse is also possible) This is particularly so if the communities are geographically close
Table 5: Advantages and Disadvantages of the experimental approach
- The ability to manipulate or assign the exposure
- The ability to randomize subjects to experimental and control groups
- The ability to control confounding and eliminate sources of spurious association
- The ability to ensure temporality
- The ability to replicate findings
- Lack of reality In most human situations, it is impossible to randomize all risk factors except those under examination
- Ethical problems In human experimentation, people are either deliberately exposed to risk factors (in etiological studies) or treatment is deliberately withheld from cases (intervention trials)
- Difficulties in manipulating the independent variable
- Non-representativeness of samples Many experiments are carried out on captive populations or volunteers, who are not necessarily representative of the population at large
- Experiments in hospitals (where the experimental approach is most feasible and is frequently used) suffer from several sources of selection bias
Example of a Randomized Clinical Trial
Clinical efficacy of three common treatments in acute otitis externa in primary care: randomized controlled trial
Frank A M van Balen,W Martijn Smit, Nicolaas P A Zuithoff, Theo J M Verheij
Abstract Objective: To compare the clinical efficacy of ear drops containing acetic acid, corticosteroid and acetic acid, and steroid and antibiotic in acute otitis externa in primary care
Participants: 213 adults with acute otitis externa
Main outcome measures: Primary outcome: duration of symptoms (days) according to patient diaries Secondary outcome: cure rate according to general practitioner completed questionnaires and recurrence of symptoms between days 21 and 42
Results: Symptoms lasted for a median of 8.0 days (95% confidence interval 7.0 to 9.0) in the acetic acid group, 7.0 days (5.8 to 8.3) in the steroid and acetic acid group, and 6.0 days (5.1 to 6.9) in the steroid and antibiotic group The overall cure rates at seven, 14, and 21 days were 38%, 68%, and 75%, respectively
Compared with the acetic acid group, significantly more patients were cured in the steroid and acetic acid group and steroid and antibiotic group at day 14 (odds ratio 2.4, 1.1 to 5.3, and 3.5, 1.6 to 7.7, respectively) and day 21 (5.3, 2.0 to 13.7, and 3.9, 1.7 to 9.1, respectively)
Recurrence of symptoms between days 21 and 42 occurred in 29% (50/172) of patients and was seen significantly less in the steroid and acetic acid group (0.3, 0.1 to 0.7) and steroid and antibiotic group (0.4, 0.2 to 1.0) than in the acetic acid group
Conclusions: Ear drops containing corticosteroids are more effective than acetic acid ear drops in the treatment of acute otitis externa in primary care Steroid and acetic acid or steroid and antibiotic ear drops are equally effective
Do this exercise in groups, then one group will present the answer, then there will be discussion
You are asked to design an RCT to evaluate the effect of new anti AIDS drug Discuss the issues involved in selecting the study sample and implementing the study
Summary points on study designs
There are two types of epidemiological research designs ã Observational
The randomized clinical trial (RCTs) Community intervention trials (CITs)
The distinctive feature of the descriptive study design is that its primary concern is with description rather than with the testing of hypotheses or proving causality
Descriptive studies include: ã Case reports ã Case series ã Cross sectional studies or community surveys ã Ecological descriptive studies
Observational studies, where establishing a relationship (association) between a ‘risk factor’ (etiological agent) and an outcome (disease) is the primary goal, are termed analytical In this type of study, hypothesis testing is the primary tool of inference
Types of Analytical studies ã Case-control study
Sampling Methods and Sample Size
Learning Objectives
At the end of this session participants should be able to: ¨ identify and define the population to be studied ¨ identify and describe common methods of sampling ¨ discuss problems of bias that should be avoided when selecting a sample ¨ list the factors to consider when deciding on sample size ¨ decide on the sampling methods and sample size most appropriate for the research design they are developing
What is sampling
Most research studies involve the observation of a sample from some predefined population of interest The conclusions drawn from the study are often based on generalizing the results observed in the sample to the entire population from which the sample was drawn Therefore, the accuracy of the conclusions will depend on how well the samples have been collected, and especially on how representative the sample is of the population In this chapter, we will discuss the major issues that a researcher has to face in selecting an appropriate sample
Sampling is a process of choosing a section of the population for observation and study.
Why sampling?
There are several reasons why samples are chosen for a study, rather than studying the entire population First and foremost, a researcher wants to minimize the costs (financial and otherwise) of collecting the data, processing and reporting on the results If a reasonable picture of a population can be obtained by observing only a section of it, the researcher economizes by choosing such a section of the population Obviously, when a sample is observed, the total information will be less than if one were to observe the entire population
A major advantage of sampling over complete enumeration is the fact that the available resources can be better spent in refining the measuring instruments and methods so that the information collected is accurate (valid and reliable) Some information, such as monitoring of the body burden of toxic metals in the
22 population, which may require specialized equipment and staff, cannot be collected from the entire population A sample in such cases would provide a reasonable picture of the population status
When we draw a sample from a population we will be confronted with the following questions:
- What is the group of people (study population) from which we want to draw a sample?
- How many people do we need in our sample?
- How will these people be selected?
The study population has to be clearly defined for example, according to age, sex, and residents Apart from persons, a study population may consist of villages, institutions, records, etc
Example of Study Population and Study Units
Problem Study population Study unit
Malnutrition related to weaning in district X
All children 6-24 months of age in District X
One child between 6-24 months in District X High dropout rates in primary schools in District
All primary schools in District Y
One primary school in District Y
Inappropriate record- keeping for leprosy patients registered in Hospital Z
All records on leprosy patients in hospital Z
One record on a leprosy patient registered in hospital Z
The primary concern in selecting an appropriate sample is that the sample should be representative of the population Every variable of interest should ideally have the same distribution in the sample as in the population from which the sample is chosen This requires knowledge of the variables and their distribution in the population, which of course is why we are doing the study in the first place! Therefore, it is not often possible to ensure the representativeness of the population However, statisticians have come up with ways in which we can give a reasonable guarantee of representativeness We will discuss some of these methods briefly in this section
A REPRESENTATIVE SAMPLE has all the important characteristics of the population from which it is drawn
There are two types of sampling methods: non-probability (convenience, quota sampling) and probability sampling methods The non-probability sampling methods are inappropriate if the aim is to measure variables and generalize findings obtained from a sample to the total study population For this purpose probability sampling methods should be used
Many clinic-based studies use convenience samples
CONVENIENCE SAMPLING is a method in which for convenience sake the study units that happen to be available at the time of data collection are selected in the sample
A researcher wants to study the attitudes of villagers towards family planning services provided by a MCH clinic He decides to interview all adult patients who visit the out patient clinic during one particular day This is more convenient than taking a random sample of people in the village, and it gives a useful first impression
A drawback of convenience sampling is that the sample may be quite un- representative of the population you want to study Some units may be over- selected, others under selected or missed altogether It is impossible to adjust for such a distortion - if you need to be representative you have to use another sampling method
Quota Sampling is a method that ensures that a certain number of sample units from different categories with specific characteristics appear in the sample so that all these characteristics are represented
In this method the investigator interviews as many people in each category of study unit as he can find until he has filled his quota
If a sampling frame does exist or can be compiled, probability sampling methods can be used With these methods, each study unit has an equal or at least a
How sampling
known probability of being selected in the sample The following probability sampling methods will be discussed: ¨ Simple random sampling ¨ Systematic sampling ¨ Stratified sampling ¨ Cluster sampling ¨ Multi-stage sampling
PROBABLITY SAMPLING involves random selection procedures to ensure that each unit of the sample is chosen on the basis of chance All units of the study population should have an equal, or at least a known chance of being included in the sample
This is the most common and the simplest of the sampling methods In this method, the subjects are chosen from the population with equal probability of selection One may use a random number table (see ANNEX 1 and 2), or use techniques such as putting the names of people into a hat and selecting the appropriate number of names blindly Recently, computer programs have been developed to draw simple random samples from a given population; this will be dealt in module 3 The simple random sample has the advantages that it is easy to administer, is representative of the population in the long run, and the analysis of data using such a sampling scheme is straightforward
In SYSTEMATIC sampling individuals are chosen at regular intervals (for example every fifth) from the sampling frame Ideally we randomly select a number to tell us where to start selecting individuals from the list
A systematic sample is to be selected from 1200 students of a school The sample size selected is 100 The sampling fraction is:
The sampling interval is, therefore, 12 The number of the first student to be included in the sample is chosen randomly, for example by blindly picking one out of twelve pieces of paper, numbered 1- 12 If number 6 is picked, then every twelfth student will be included in the sample, starting with student number 6, until 100 students are selected: then numbers selected would be 6, 18, 30, 42, etc
Systematic sampling is usually less time consuming and easier to perform than simple random sampling However, there is a risk of bias, as the sampling interval may coincide with a systematic variation in the sampling frame For instance, if we want to select a random sample of days on which to count clinic attendance, systematic sampling with a sampling interval of 7 days would be inappropriate, as all study days would fall on the same day of the week, which might, for example, be market day
When the size of the sample is small and we have some information about the distribution of a particular variable (e.g gender: 50% male, 50% female), it may be advantageous to select simple random samples from within each of the subgroups defined by that variable By choosing half the sample from males and half from females, we assure that the sample is representative of the population with respect to gender When confounding is an important issue (such as in case-control studies), stratified sampling will reduce potential confounding by selecting homogeneous subgroups
If it is important that the sample includes representative groups of study units with specific characteristics (for example, residents from urban and rural areas, or different age groups), then the sampling frame must be divided into groups or strata, according to these characteristics Random or systematic samples of predetermined size will then have to be obtained from each group (stratum) This is called Stratified Sampling
Stratified sampling is only possible when we know what proportion of the study population belongs to each group we are interested in
An advantage of stratified sampling is that we can take a relatively large sample from small group in our study population This allows us to get sample that is big enough to enable us to draw valid conclusions about a relatively small group without having to collect an unnecessarily large (and hence expensive) sample of the other, larger groups However, in doing so, we are using unequal sampling fractions, and it is important to correct for this when generalizing our findings to the whole study population
A survey is conducted on household water supply in a district comprising 20,000 households, of which 20% are urban and 80% rural It is suspected that in urban areas the access to safe water sources is much more satisfactory A decision is made to include 100 urban households (out of 4000, which gives a 1 in 40 sample) and 200 rural households (out of 6,000, which gives a 1 in 80 sample) Because we know the sampling fraction for both strata, the access to safe water for all the district households can be calculated
The selection of groups of study units (clusters) instead of the selection of study units individually is called CLUSTER SAMPLING
In many administrative surveys, studies are done on large populations which may be geographically quite dispersed To obtain the required number of subjects for the study by a simple random sample method will require large costs and will be inconvenient In such cases, clusters may be identified (e.g households) and random samples of clusters will be included in the study; then every member of the cluster will also be part of the study This introduces two types of variations in the data – between clusters and within clusters – and this will have to be taken into account when analyzing data
In a study of knowledge, attitudes, and practices related to family planning in rural communities of a region, a list is made of all the villages Using this list, a random sample of villages is chosen and all the adults in the selected villages are interviewed
Many studies, especially large nationwide surveys, will incorporate different sampling methods for different groups, and may be done in several stages In experiments, or common epidemiological studies such as case-control or cohort studies, this is not a common practice
In a study of utilization of pit latrines in a district, 150 homesteads are to be visited for interviews with family members as well as for observations on types and cleanliness of latrines The district is composed of six wards and each ward has between six and nine villages The following four stage sampling procedure could be performed:
1.Select three wards out of the six by simple random sampling
2.For each ward, select five villages by simple random sampling (15 villages in total)
3.For each village select ten households Because simply choosing households in the center of the village would produce a biased sample, the following systematic sampling procedure is proposed: ¨ Go to the center of the village ¨ Choose a direction in random way: spin a bottle on the ground and choose the direction the bottleneck indicates ¨ Walk in the chosen direction and select every third or every fifth household (depending on the size of the village) until you have the ten you need If you reach the boundary of the village and you still do not have ten households return to the center of the village, walk in the opposite direction and continue to select your sample in the same way until you get ten If there is nobody in a chosen household, take the next nearest one
4.Decide beforehand whom to interview (for example the head of the household, if present, or the oldest adult who lives there and who is available.)
Table 6: The main advantages and disadvantages of cluster- and multi- stage sampling are that:
Advantages Disadvantages ¨ a sampling frame of individual units is not required for the whole population
Initially a sampling frame of clusters is sufficient Only within the clusters that are finally selected do we need to list and sample the individual units ¨ The sample is easier to select than a simple random sample of similar size, because the individual units in the sample are physically together in groups instead of scattered all over the study population ¨ compared to simple random sampling, there is a larger probability that the final sample will not be representative of the total study population The likelihood of the sample not being representative depends mainly on the number of clusters selected in the first stage The larger the number of clusters, the greater the likelihood that the sample will be representative
The main determinant of the sample size is how accurate the results need to be
Learning objectives
At the end of this section participants should be able to: ¨ identify the sources of health data in the community where they work, ¨ describe various data collection techniques and state their uses and limitations ¨ identify the limitations and strength of routine data sources ¨ state the benefits of using a combination of different data collection techniques ¨ state various sources of bias in data collection and ways of preventing bias ¨ promote the collection of accurate data by members of their health team
Data collection techniques allow us to systematically collect information about our subjects of study and about the settings in which they occur
In the collection of data, we have to be systematic If data are collected haphazardly, it will be difficult to answer research questions in any conclusive way.
Data collection techniques
¨ Using available information (record review) ¨ Observing ¨ Interviewing ¨ Administering written questionnaires ¨ Focus group discussions ¨ Other data collection techniques
There is a large amount of data that has already been collected by others Locating these sources and retrieving the information is a good starting point in any data collection effort Some sources of such data are listed below: ¨ Mortality reports ¨ Morbidity reports ¨ Epidemic reports ¨ Reports of laboratory utilization (including laboratory test results) ¨ Reports of individual case investigations ¨ Reports of epidemic investigations
30 ¨ Special surveys (e.g., hospital admissions, disease registers, and serologic surveys) ¨ Demographic data
Analysis of health services data, census data, unpublished reports, publications in libraries or in offices at the various levels of health and health related services, may be a study in itself In order to retrieve the data from available sources, the researcher will have to design an instrument such as a checklist or compilation sheet In designing such instruments, it is important to inspect the layout of the source documents from which the data is to be extracted and design the data compilation sheet so that the items of data can be transferred in the order in which the items appear in the source document This will save time and reduce error
The assessment of the health status of the community is the basis for planning an evaluation of the health services Useful information needed for making decisions can often be obtained from routinely available data, even though these are not accurate or complete enough for detailed or elaborate analysis We shall consider in this section what information you can obtain on the frequency and distribution of morbidity, mortality and their causes from routine sources
Now do the exercise 3.1 below on uses and limitations of routine data (Can be done in a group or individually, general discussion at the end)
What do you think are the uses and limitations of the hospital-related sources of information shown in the table below? Think in terms of the people served and the levels of health care provided and then write down your ideas in the spaces provided in the table When you have done this, turn over the page and compare with the explanation provided on the next page
Source of data Uses Limitations
Health center and hospital returns
In-patient and outpatient records Immunization reports
Health center and hospital returns: health center and hospital returns are likely to be accurate with respect to disease diagnosis but the data may only relate to the area served by the hospital Time-based data, such as length of stay, and organizational information, such as staffing or the distance patients travel to the hospital, can also be used
In-patient and outpatient records: Analysis of hospital records can provide high quality information on the most important causes of major illness in a community But to be useful as an indicator of the health status of the population you must make allowances for the fact that patients treated in hospital are not representative of the general population in the area People from remote areas, infants and the elderly, for example, will be under-represented In some countries, many if not most, seriously ill patients never reach hospital
Out patient records: seen in hospitals, health centers, health posts and clinics often provide much ill defined data Diagnostic data are usually given in terms of the chief complaint Those coming for immunizations or other preventive services may be included with those who come because of illness The patients who are seen are again probably not representative of the general population: although coverage of the population may be greater than with a hospital because of greater geographical distribution, the people who live near a facility or who can afford the time to come will be over-represented However, these records do provide information about the usage of outpatient facilities and the most frequent complaints and may help you to understand the pattern of disease in your community
Immunization: useful to compare the number of births with the number of children immunized, this can give an indication of the coverage of any immunization programme
Childhood diseases: MCH clinics are one of the best sources of data on childhood diseases such as measles and malnutrition and, over a period of months or years, are reasonably accurate MCH records, alone, are not enough as they are only a source of data on births and on deaths in children under five years Use other sources of data to obtain a more representative picture MCH records can also be used to measure the workload of the MCH workers
Routine data ¨ Fail to include a great deal of important illness and disability In particular, much of the chronic illness due to tropical diseases such as schistosomiasis, leprosy, blindness, under nutrition and crippling due to birth trauma or polio, will not be detected from routine records ¨ Relate only to numerator data
OBSERVATION is a technique which involves systematically selecting, watching and recording behaviors and characteristics of living beings, objects or phenomena
Observation of human behaviors is a much used data collection technique It can be undertaken in two different ways: ¨ Participant observation: the observer takes part in the situation he or she observes ¨ Non-participant observation: the observer watches the situation, openly or concealed, but does not participate
Observations are usually complementary to other data collection techniques They can give additional, more accurate information on behavior or people than interviews or questionnaires: questionnaires may be incomplete because we forget to ask certain questions and informants may forget or be unwilling to mention certain things Observations can therefore check on information collected (especially on sensitive topics such as alcohol or drug use, or stigmatization of leprosy, TB, epilepsy or AIDS patients) Observation can also be a primary source of information
Observations of human behaviors can form part of any type of study, but as they are time consuming they are most often used in small-scale studies Observations can also be made on objects For example, the presence or absence of latrine and its state of cleanliness may be observed
An INTERVIEW is a data collection technique that involves oral questioning of respondents, either individually or as group
Answers to the questions posed during an interview can be recorded by writing them down Interviews can be conducted with varying degrees of flexibility The two extremes, high and low degree of flexibility, are described below: a High degree of flexibility:
A structured or loosely structured method of asking questions can be used for interviewing individuals as well as groups of key informants
A flexible method of interviewing is useful if a researcher has as yet little understanding of the problem or situation under investigation It is frequently applied in exploratory studies and also used during case studies
Example: Interviews using an interview schedule, to ensure that all issues are discussed, but allowing flexibility in timing and the order in which the questions are asked The interviewer may ask additional questions on the spot in order to gain as much useful information as possible Questions are open ended: the respondent is unrestricted in what and how he answers b) Low degree of flexibility :
Less flexible methods of interviewing are useful when the researcher is relatively knowledgeable about expected answers or when the number of respondents being interviewed is relatively large
Example: Interviews using a questionnaire with a fixed list of questions in a standard sequence, which have mainly fixed or pre-categorized answers
A SELF-ADMINISTERED QUESTIONNAIRE: is a data collection tool in which written questions are presented to be answered by the respondents in written form
Bias in Information Collection and its possible causes
BIAS in information collection is a distortion which results in the information not being representative of the true situation
Bias in information collection can occur as a result of:
For example, questionnaires with ¨ fixed or closed questions on topics about which too little is known; ¨ open ended questions without guidelines on how to ask (or to answer) them; ¨ vaguely phrased questions; or ¨ questions placed in an illogical order or weighing scales which are not standardized
These sources of bias can be prevented by carefully planning the data collection process and by pre-testing the data collection tools
Observer bias can easily occur when conducting observation or utilizing loosely structured group or individual interviews There is a risk that the data collector will only see or hear things in which he or she is interested or will miss information that is critical to the research Observation protocols and guidelines for conducting loosely structured interviews should be prepared, and training and practice should be provided to data collectors in using both these tools Moreover it is highly recommended that data collector work in pairs when using
36 flexible research techniques and discuss and interpret the data immediately after collecting it
If a large proportion of the population under study refuses to cooperate (non- response) or if the sampling procedure used in the study is not adequate, this results in selection bias This type of the bias affects the representativeness of the study and will be discussed at length in other sections
Information bias may occur while abstracting information from records or statistics Many times, medical records are incomplete or incomprehensible
This poses some problems if you want to use these records in your research
Another example of information bias is called recall (or memory) bias This form of bias is related to the inconsistencies in the memory of informants
3.3.5 Effect of the Interview(er) on the Informant
This is a possible factor in all interview situations The informant may mistrust the intention of the interview and move away from certain questions or give misleading answers Such bias can be reduced by adequately introducing the purpose of the study to informants, by taking sufficient time for the interview, and by assuring informants that the data collected will remain confidential
It is also important to be careful in the selection of interviewers In a study soliciting the reasons for the low utilization of local health service, for example, one should not ask health workers of the health center concerned to interview the population Their use as interviewers would certainly influence the results of the study
By being aware of these potential biases it is possible, to a certain extent, to prevent them If the researcher does not fully succeed, it is important to report honestly in what ways the data may be biased.
Importance of Combining Different Data Collection Techniques
Different data collection techniques can complement each other A skillful use of a combination of different techniques can maximize the quality of the data collected and reduce the chance of bias
Example: To determine the extent of the malnutrition problem in your area, you could make use of: ¨ Growth charts and the existing health center records of malnourished children in the area; ¨ Focus group discussions (FGDs) with several groups of mothers and/or in- depth interviews with a small group of mothers to find out how they feed their young children ¨ A household survey, testing the relevant findings of the exploratory study on larger scale
Exercise 3.2: (Can be done in a group or individually, general discussion at the end)
Name three ways in which data about the age of individuals could be inaccurate Answer this based on your experience Classify your answers into those inaccuracies that may be due to the people questioned or due to the observer
Possible Sources of inaccuracy about age
People ¨ In areas where older age is highly respected, people will add to their age ¨ Where there is no tradition for counting age by years, events have to be used For adults and children, therefore, the data of their birth, or their marriage, or their first child’s birth has to be related to an event
Observer ¨ An inaccurate observer may round off an age to the nearest five years or may routinely suggest ‘40’ or ‘25’ years Therefore, the ages which are registered in this way are of very little value
Exercise 3.3 (Can be done in a group or individually, general discussion at the end)
What could you do in your area to promote the collection of reliable data? Write down your own ideas before reading the suggestions in the following page
There are ways in which you can make the data collected more reliable:
Training: train all the members of your health team/data collectors to collect accurate data, to avoid bias and to record carefully and emphasize that you will check the accuracy of their work
Use of different sources: take the information from a number of different sources If you then compared the data from the different sources you might well be able to identify inconsistencies and thus inaccuracies
Pre-testing: pre-testing is a try out of the questionnaire Pre-testing is carried out on a small number of respondents who are comparable to the sample of correspondents but are not part of it
Supervision: regular supervision during the data collection process
Few data are wholly accurate The degree of inaccuracy that can be tolerated cannot be expressed in figures Therefore, ¨ Be clear about the data you really want ¨ Decide how best this can be collected most accurately ¨ Explain the reasons and methods carefully to the members of your team ¨ Check from time to time on the data which are being collected ¨ Let your colleagues and field workers know how the data they have collected has helped your work
When everyone is ready, one group will present the answers, there will be a discussion
Two health-related problems for which studies must be developed are described below For each problem you are asked to state: ¨ what type(s) of study you would propose ¨ from whom (or from what) you would collect the data required for each study; and ¨ what data collection techniques you would use
1 In your region you have noticed an increase in the defaulter rate from tuberculosis treatment You decide to study the reason and you also want to know the local socio-cultural aspects of the disease to improve the treatment outcome of tuberculosis in your region
2 You have recently been appointed as a Woreda Research Officer in a remote Woreda of your region The government wants to improve health services in this area You want to collect information that will contribute to the development of the plan
Summary points on Data collection ¨ The following are the methods of data collection: § Using available information (records) § Observing § Interviewing § Administering written questionnaires § Focus group discussions ¨ BIAS in information collection is a distortion, which results in the information not being representative of the true situation ¨ Possible sources of bias during data collection: § Defective instruments § Observer Bias § Selection bias § Information bias § Effect of the Interview on the Informant ¨ Data collection can be improved by: § Training of data collectors § Pre-testing the questionnaire § Supervision § Use of different sources for comparison
Chapter 4 Variables and Measurement Errors
Learning objectives
At the end of this course participants should be able to: ¨ define what variables are and describe why their selection is important in research ¨ state the difference between numerical and categorical variables ¨ discuss the difference between dependent and independent variables and how they are used in designing research ¨ identify the variables that will be measured in the research project you are designing and ¨ develop operational definitions with indicators for those variables that cannot be measured directly.
What is a variable?
A variable is measurable characteristic of a person, object or phenomenon, which can take on different values
A simple example of a variable is “a person’s age” The variable "age" is measurable and can take on different values since a person can be 20 years old,
35 years old and so on Other examples of numerical variables are: § weight (expressed in kilograms or in pounds); § distance between homes and clinic (expressed in kilometers or in minutes walking distance); and § monthly income (expressed in birr, dollars) § The different values of a variable may also be expressed in categories For example, the variable sex has two values male and female which are distinct categories Other examples of categorical variables are:
Table 9: Examples of categorical variables
Color red blue green, etc
Outcome of disease recovery chronic illness death Main type of staple food eaten maize millet rice cassava, etc
Operationalizing variables by choosing appropriate indicators
For some variables it is sometimes not possible to find meaningful categories unless the variables are made operational with one or more precise
INDICATORS Operationalizing variables means that you make them
1 You want to determine the level of knowledge concerning a specific issue in order to find out to what extent the factor “poor knowledge” influences the problem under study (for example low utilization of VCT programme by high school students)
The variable level of knowledge cannot be measured as such You would need to develop a series of questions to assess students' knowledge, for example on risk factors related to acquiring HIV/AIDS The answers to these questions form an indicator of someone’s knowledge on this issue, which can then be categorized If 10 questions were asked, you might decide that the knowledge of those with ¨ 0 to 3 correct answers is poor ¨ 4 to 6 correct answers is reasonable, and ¨ 7 to 10 correct answers is good
2 You want to determine the nutritional status of under 5 year olds You need to choose appropriate indicators for the variable “nutritional status” Widely used indicators for nutritional status include: ¨ Weight for age,
42 ¨ Weight for height; ¨ Height for age; and ¨ Upper-arm circumference;
For the classification of nutritional status, internationally accepted categories already exist, which are based on standard growth curves For the indicator
“weight/age”, for example, children are: ¨ Well nourished if they are above 80% of the standard ¨ Moderately malnourished if they are between 60% and 80% ¨ Severely malnourished if they are below 60%
Defining variables and indicators of variables
To ensure that everyone (the researcher, the data collectors, and eventually, the reader of the research report) understands exactly what has been measured and to ensure that there will be consistency in measurement, it is necessary to clearly define the variables (and indicators of variables) For example, to define the indicator “waiting time” it is necessary to decide what will be considered the starting point of the “waiting period” e.g Is it when the patient enters the front door, or when he has been registered and obtained his card?
For certain variables, it may not be possible to adequately define the variable or the indicator immediately because further information may be needed for this purpose The researcher may need to review the literature to find out what definitions have been used by other researchers, so that he can standardize his definitions and thus be able later to easily compare his findings with those of the other studies In some cases the opinions of “experts” or of community members of health care providers may be needed in order to define the variable or indicator
Because in health research you often look for causal explanations, it is important to make a distinction between dependent and independent variables
The variable that is used to describe or measure the problem under study is called the DEPENDENT variable This is also known as the outcome variable
The variables that are used to describe or measure the factors that are assumed to cause or at least influence the problem are called the INDEPENDENT variables These are also known as exposure variables
For example, in a study of the relationship between use of prophylactic Isonizid (INH) treatment and Tuberculosis, “development of clinical TB” (with the values yes, no) would be the dependent variable and “prophylactic INH” the independent variable
Whether a variable is dependent or independent is determined by the statement of the problem and objectives of the study It is therefore, important when designing a study to clearly state which variable is the dependent and which are the independent variables
A variable that is associated with the problem and with possible cause of the problem is a potential confounding variable
Confounding is a mixing of the effect of the exposure under study on the disease with that of third factor This third factor must be associated with the exposure and, independent of that exposure, be a risk for the disease
A confounding variable may either strengthen or weaken the apparent relationship between the problem and possible cause
Therefore, in order to give a true picture of cause and effect, the confounding variables must be considered, either during planning or while doing data analysis
For example: A relationship is shown between the low level of the mother’s education and malnutrition in under 5 children However, family income is related to the mother’s education as well as with malnutrition
Family income is therefore a potential confounding variable In order to give a true picture of the relationship between mother’s education and malnutrition, the family income should also be considered and measured This could either be incorporated into the research design, such as by selecting only mothers with a specific level of family income, or it can be taken into account in the analysis of the findings, with mother’s education and malnutrition among their children being analyzed for families with different categories of income
4.5 What is validity and reliability?
Two common sources of error that need to be controlled arise from problems with ‘reliability’ and ‘validity’ Our inference should have high reliability (if the observations are repeated under similar conditions, the inferences should be similar) and high validity (the inference should be a reflection of the true nature of the relationship)
Reliability of measurements: If repeated measurements of a characteristic in the same individual under identical conditions produce similar results, we would say that the measurement is reliable
A study result is said to be reliable if the same result is obtained when the study is repeated under the same conditions
Reliability is often closely related to the matter of validity, but refers to the repeatability of scientific observations If the same set of door-to-door interviews on respondents' sexual behavior produces approximately the same set of response on repeated trials and with different interviewers, we can say that this observational technique has high reliability, regardless of the validity of the findings
A measurement is said to be valid if it measures what it is supposed to measure
Thus, if we use a scale that is not calibrated to zero, the weights we obtain using this scale will not be valid
Validity refers to the degree to which scientific observations actually measure or record what they allege to measure Door to door interviewing about intimate details of respondents sexual behavior might produce a lot of answers duly recorded in interviewers’ notebooks, but we would seriously doubt that the answers were an accurate representation of actual behaviors Thus, such interviewing on sensitive subjects generally lacks validity
Exercise 4 is also a group exercise When everyone is ready one group will present and there will a discussion
1 A health researcher believes that in a certain region, anemia, malaria and malnutrition are serious problems among adult males and, in particular among farmers He wishes to study the prevalence of these diseases among adult males of various ages, occupational groups and educational backgrounds to determine how serious a problem these diseases are for this population ¨ What are the dependent and independent variables in the study?
Which of these are categorical and which are numerical variables
2 A Zonal health manager receives a complaint from a particular woreda that one health centre often runs out of anti TB drugs In a preliminary investigation, this shortage of anti TB drugs is confirmed The zonal manager decides to investigate why there is a shortage of anti TB drugs in the health centre ¨ What is the dependent variable in the study? ¨ What would be a meaningful indicator for the dependent variable? ¨ How would you define shortage of anti TB drugs? ¨ Can you think of some independent variables? ¨ Which variables are measurable as they are and which ones need indicators?
3 Look at the following description of a research problem and then answer the questions that follow:
In a study concerning the patterns of distribution of schistosomiasis in the adult population of a village community, a researcher found that the adults were predominantly farmers and that overall, 20% of them had schistosomiasis The researcher believed that the prevalence of the disease was moderately low in the adult population ¨ Are there any variables whose inclusions in the study might have shown that the prevalence of the disease varied greatly among different categories of adults in the village?
Summary points on Variables and Measurement ¨ A variable is measurable characteristic of a person, object or phenomenon, which can take on different values ¨ For some variables it is sometimes not possible to find meaningful categories unless the variables are made operational with one or more precise
INDICATORS Operationalizing variables means that you make them
What is validity and reliability?
Two common sources of error that need to be controlled arise from problems with ‘reliability’ and ‘validity’ Our inference should have high reliability (if the observations are repeated under similar conditions, the inferences should be similar) and high validity (the inference should be a reflection of the true nature of the relationship)
Reliability of measurements: If repeated measurements of a characteristic in the same individual under identical conditions produce similar results, we would say that the measurement is reliable
A study result is said to be reliable if the same result is obtained when the study is repeated under the same conditions
Reliability is often closely related to the matter of validity, but refers to the repeatability of scientific observations If the same set of door-to-door interviews on respondents' sexual behavior produces approximately the same set of response on repeated trials and with different interviewers, we can say that this observational technique has high reliability, regardless of the validity of the findings
A measurement is said to be valid if it measures what it is supposed to measure
Thus, if we use a scale that is not calibrated to zero, the weights we obtain using this scale will not be valid
Validity refers to the degree to which scientific observations actually measure or record what they allege to measure Door to door interviewing about intimate details of respondents sexual behavior might produce a lot of answers duly recorded in interviewers’ notebooks, but we would seriously doubt that the answers were an accurate representation of actual behaviors Thus, such interviewing on sensitive subjects generally lacks validity
Exercise 4 is also a group exercise When everyone is ready one group will present and there will a discussion
1 A health researcher believes that in a certain region, anemia, malaria and malnutrition are serious problems among adult males and, in particular among farmers He wishes to study the prevalence of these diseases among adult males of various ages, occupational groups and educational backgrounds to determine how serious a problem these diseases are for this population ¨ What are the dependent and independent variables in the study?
Which of these are categorical and which are numerical variables
2 A Zonal health manager receives a complaint from a particular woreda that one health centre often runs out of anti TB drugs In a preliminary investigation, this shortage of anti TB drugs is confirmed The zonal manager decides to investigate why there is a shortage of anti TB drugs in the health centre ¨ What is the dependent variable in the study? ¨ What would be a meaningful indicator for the dependent variable? ¨ How would you define shortage of anti TB drugs? ¨ Can you think of some independent variables? ¨ Which variables are measurable as they are and which ones need indicators?
3 Look at the following description of a research problem and then answer the questions that follow:
In a study concerning the patterns of distribution of schistosomiasis in the adult population of a village community, a researcher found that the adults were predominantly farmers and that overall, 20% of them had schistosomiasis The researcher believed that the prevalence of the disease was moderately low in the adult population ¨ Are there any variables whose inclusions in the study might have shown that the prevalence of the disease varied greatly among different categories of adults in the village?
Summary points on Variables and Measurement ¨ A variable is measurable characteristic of a person, object or phenomenon, which can take on different values ¨ For some variables it is sometimes not possible to find meaningful categories unless the variables are made operational with one or more precise
INDICATORS Operationalizing variables means that you make them
“measurable”: ¨ To ensure that everyone (the researcher, the data collectors, and eventually, the reader of the research report) understands exactly what has been measured and to ensure that there will be consistency in the measurement, it is necessary to clearly define the variables (and indicators of variables) ¨ The variable that is used to describe or measure the problem under study is called the DEPENDENT variable (outcome variable) ¨ The variables that are used to describe or measure the factors that are assumed to cause or at least influence the problem are called the
INDEPENDENT variables (exposure variables) ¨ A variable that is associated with the problem and with a possible cause of the problem is a potential confounding variable ¨ A confounding variable may either strengthen or weaken the apparent relationship between the problem and possible cause Therefore, in order to give a true picture of cause and effect, the confounding variables must be considered, either during planning or while doing data analysis ¨ VALIDITY refers to the degree to which scientific observations actually measure or record what they purport to measure ¨ RELIABILITY is often closely related to the matter of validity, but refers to the repeatability of scientific observations.
Qualitative Research Methods
Learning Objectives
At the end of the session participants should be able: ¨ to list the key features of qualitative research, ¨ to describe the basic design questions in qualitative research methods, ¨ to identify different methods used for addressing different research questions, ¨ to describe the sampling strategies used in qualitative approaches
Introduction
Qualitative research is a type of formative research that offers specialized techniques for obtaining in-depth responses about what people think and how they feel It enables programme management to gain insight into attitudes, beliefs, motives and behaviors of the target population By its very nature, qualitative research deals with the emotional and contextual aspects of human response rather than with objective measurable behaviors and attitudes
In previous times qualitative research methods were less accepted and research findings based on these methods were criticized as being of lower quality However, qualitative research methods can effectively be used to describe social determinants of health and disease A qualitative study may be designed to explore concepts, develop hypotheses or theories, develop research tools, and clarify the findings of a quantitative study.
Why Use Qualitative Research?
There are both conceptual and practical reasons for using qualitative research The primary conceptual reason for using qualitative research is that it provides greater depth of response and, therefore, greater consequent understanding than can be acquired through quantitative techniques
Qualitative research: ¨ is a good source of descriptions and explanations of processes in identifiable local contexts ¨ can describe chronological flow, which events led to which consequences and derive fruitful explanations ¨ could help researchers to get beyond initial conceptions and to generate or revise conceptual frameworks ¨ is fundamentally well suited for locating the meanings people place on events, processes, and structures of their lives and for connecting these meanings to the social world
There are three domains in which qualitative research tends to be used in public health:
1 The first domain includes economic, political, social and cultural, environmental and organizational factors which influence health
2 The second domain focuses on gaining an understanding of how people make sense of their experiences of health and disease
3 The third domain includes interaction of actors involved in different public health activities These domains underscore the role of qualitative research in public health.
How is Qualitative research used?
Qualitative research is used largely in four general ways:
1 as a tool to generate ideas
2 as a step in developing a quantitative study
3 as an aid in evaluating a quantitative study and
4 on occasion, as the primary data collection method for a research topic
Characteristics of qualitative research
Qualitative research methods have many distinguishing characteristics In comparison with quantitative research methods, qualitative methods take the views of informants, whereas quantitative research takes the ideas of the researcher as points of departure Furthermore, the lines of reasoning in both methods differ The line of reasoning in quantitative research is deductive, typically starting with the generation of a hypothesis based on existing theory, then the testing of the hypothesis against existing reality, i.e., being verified or rejected based on the data collected On the other hand, the line of reasoning in the qualitative method is inductive However, qualitative researchers may also test emerging hypotheses or theories against data, and thus oscillate between data and theory
Another characteristic of qualitative research is concerned with reliability and validity The strength of the quantitative approach lies in its reliability (repeatability) - that is the same measurements should yield the same results time after time, the strength of qualitative research lies in validity (closeness to the truth) - that is good qualitative research should touch the core of what is
50 going on rather than just skimming the surface The validity of qualitative methods is greatly improved by a process known as triangulation and by independent analysis of the data by two or more researchers
Some qualitative researchers suggest that qualitative research can complement quantitative methods and explain that qualitative description is a prerequisite for good quantitative research, particularly in areas that have received little previous investigation This is the first area in which qualitative research complements quantitative research The second area of complementarity is the validation process (triangulation), where three or more methods are used and the results compared for convergence (e.g., a large scale survey, focus groups, and a period of observation) or as part of multi-method approach which examines a particular phenomena or topic on several different levels The third way in which qualitative research can complement quantitative work is by exploring complex phenomena or areas not amenable to quantitative research such as in studies of health service organization and policy
Qualitative research has its own designs and methods The study designs include ethnography, phenomenology, grounded theory and participant action research There are many other qualitative research designs and some are extensions of the more popular ethnographic and phenomenological designs
Varieties of different methods are used by qualitative researchers to answer research questions The most common methods of data collection in qualitative research are: participant observation, interviews, focus groups, and historical methods Focus groups are discussed below
Focus groups are far more widely used than individual in-depth interviews The main reasons focus groups are selected more often as the qualitative technique include: ¨ Group interaction: Interaction of respondents will generally stimulate richer responses and allow new and valuable thoughts to emerge ¨ Cost and timing: Focus groups can be completed more quickly and generally less expensively than in-depth interviews ¨ Idea generation: A group works best to build on ideas generated ¨ Evaluation of message concepts: Messages in some rough, pre-production form are presented to potential target audience groups for evaluation and refinement A group works best because creative personnel can be present to view the group
Table 10: Checklist for setting up focus groups
Determine the number of groups needed ¨ Are there at least two groups for each relevant variable? ¨ Are there enough groups to rotate the stimulus materials? ¨ Were groups conducted until responses were showing similarities? ¨ Are groups needed in different geographic regions?
Determine the composition of each group ¨ Are respondents of the same social class? ¨ Are respondents similar in terms of their lifecycle or experience status regarding the topic area? ¨ Can users and non users be put together without stifling group interaction? ¨ Do respondents have similar levels of expertise on complex topics ¨ Is it important to separate respondents by age and or marital status? ¨ Are respondents of similar cultural background?
Determine the Length of the Group discussion ¨ Can the information needs be met in one to two hours? ¨ If not, is another research technique more appropriate or should additional groups be set up?
Determine the size of the group ¨ Will respondents be able to say all they know in ten minutes? (Eight-ten respondents) ¨ Is the subject complex enough for each respondent to give twenty minutes of relevant information? ¨ Does the subject matter require a small, intimate group?
Determine the group setting ¨ Will respondents have sufficient privacy to talk freely? ¨ Is the location accessible to respondents? ¨ Will respondents be threatened or intimidated by the location?
Table 11 Distinction Between Qualitative and Quantitative Research
Qualitative Quantitative ¨ Provides depth of understanding ¨ Asks why? ¨ Studies motivations ¨ Is subjective ¨ Enables discovery ¨ Is exploratory ¨ Allows insights into behaviour, trends and so on ¨ Interprets ¨ Measures level of occurrence ¨ Asks how many? How often? ¨ Studies action ¨ Is objective ¨ Provides proof ¨ Is definitive ¨ Measures level of actions, trends, and so on ¨ Describes
Design questions in qualitative research
To choose the best research design(s), appropriate data collection methods and sampling techniques requires a broad understanding of the concepts and methods used in qualitative research This will enable researchers to justify the study designs, data collection methods and sampling techniques to be employed in the study
The following points discuss some of the key design questions necessary in designing qualitative research
5.6.1 Defining an area of inquiry
Similar to the quantitative research, the starting point in trying to conduct qualitative research is to specifically define an area of enquiry This can be drawn from personal experience, reviewing literature and auditing earlier studies The choice may depend on a desire to solve the problem, the feasibility of the prospective research topic to be investigated, and values and expectations that the study will answer to the benefit the society at large
At this stage, the broad area of the enquiry will be defined in terms of specific issues that will form the core of the study Literature review helps to provide detailed information on the potential significance of the problem, and to avoid duplication of research This entails that a gap in scientific knowledge can be described through stating the problem and purpose of the study In addition, statement of the problem helps to describe what has been done so far and identify questions that have been unanswered Finally, the ways in which the findings of the present study might be utilized will be forwarded
A conceptual framework is an alternative way of depicting a set of related variables and outcomes in the study in an elaborative schematic diagram It shows the key factors, presumed relationships and possible outcomes of the research problem The conceptual framework helps to outline the research questions and core variables included in the data collection instrument As the study progresses, concepts and their relationships become clearer through interaction with the participants
A thoroughly defined research problem helps to examine the issue with more specific and relevant questions Some research questions may be more suitable for qualitative methods, others for quantitative research methods Therefore, research questions and information needs should be clearly defined first Then, the choice of appropriate data collection methods (see Annex 4 for choice of data collection methods) and sampling strategies will become straightforward
5.7 Relevant concepts for designing qualitative research
This section attempts to describe some of the basic concepts that have relevance to the design of qualitative research projects
This describes that the natural context of people’s lives is central to qualitative research design Qualitative research describes social phenomena as they occur naturally and discovers the meaning that people themselves ascribe to events or phenomena The natural setting is critically important because it may influence the perspectives, experiences, interactions and actions of participants Unlike the quantitative approach which is conducted under certain controlled settings, the participants in the qualitative research approach are free from any control and no attempt is made to manipulate the situation under study
In qualitative research, understanding of a situation is gained through a holistic perspective by looking at a total, rather than a fragmented reality This takes into consideration many aspects of social, historical and physical context to discover multiple subjective realities
The general perspective of qualitative research is that knowledge is generated through exchange of experiences during interaction with people The researcher is an instrument in qualitative research Data are collected by the researcher through direct contact with study participants, through one-to-one interviews or group interviews or observation Hence, the researcher must be involved in every step of the research process from initiation of the process to the final report writing stage This demands the researcher to be responsive, flexible, adaptive, and a good listener The researcher is an insider; he or she becomes a full participant in the data collection process This is in contrast to a quantitative study, where the major decisions about the study are made earlier than the actual data collection Rather, the researcher in quantitative research tries to be an objective outsider
One of the main characteristics of qualitative research is its ability to use flexible design Qualitative designs do not begin with specific issues, over time; a more focused study design emerges as a result of the increased understanding that the researcher gains through the research process The aim of the researcher is to learn from every step of the research through an inductive approach There is a constant shifting with the changing phenomena and context, the method that fits now may not work best at another point in time
This is the stage at which no further substantial information is required for the study After several cycles of data collection and analysis, the results of the next data collection can be predicted This implies that a pattern that makes sense to the researchers and study participants has emerged This may alert the researchers to stop data collection, implying that the saturation or redundancy stage has been reached.
Sampling strategies in qualitative research
The sampling strategies for qualitative and quantitative research approaches are different Sampling techniques often used in quantitative research are based upon probability sampling, so that everybody in the study population has an equal chance of being selected The samples are assumed to be representative and generalizations can be made to their source population A probability sampling is convenient when the researcher is interested to answer questions on how many or how strongly the factors under consideration are associated
In qualitative research, purposive sampling technique is commonly used in selecting the study participants Purposive sampling, in contrast to probabilistic sampling, is selecting study subjects for their ability to generate rich information
Purposiveness in qualitative sampling is a strategic approach, and should not be equated with convenience sampling, because the latter is primarily guided by ease of access to study participants The samples selected for study through purposive sampling are considered to be theoretically representative of the source population, because the range of variation among subjects in the study site can be represented This implies that a small number of study subjects with rich information may yield credible and valid information
There are no hard and fast rules for obtaining the optimum sample size for qualitative research design In contrast to quantitative research, the sample size in qualitative study design is usually small and not predetermined Rather, sampling continues until the researcher determines that information saturation has been reached
This implies that the selection of study participants continues to the point of redundancy In general, the sample size in qualitative research depends on the purpose of the study, specific research questions to be addressed, available time and resources, and the credibility of the information generated
Purposive sampling in qualitative research can be achieved through different techniques:
This type of sampling technique depends on locating participants by asking others to identify individuals or groups with rich information on the phenomenon under study This implies that the first subject is used to identify the next person or group to facilitate the identification of cases of interest This sampling technique is especially valuable when the researcher is new to the study site, and also important for identifying individuals who have rich information but are difficult to reach
This type of purposive sampling includes people with basically similar characteristics to study the group in depth The selection of participants is usually done within certain strata; participants with similar demographic or social characteristics being included in the same strata Focus groups usually use this type of sampling The group interaction stimulates people within the group to discuss their experiences The main advantage of homogeneous sampling is that it focuses on a similar type of respondents thereby simplifying analysis and group interviewing
Extreme or deviant sampling chooses extreme cases of outstanding successes or crisis events after knowing the typical case in order to highlight and understand the situation For example, a researcher may be interested in studying two health facilities, one whose family planning clients are highly satisfied and another whose clients are not satisfied, in order to identify factors that favor or discourage the utilization of services This type of sampling is valuable to test emerging theories by learning from highly unusual manifestations
This is sometimes known as heterogeneous sampling This is useful for obtaining maximum differences among information-rich informants or group The subjects included in the study are different from each other based on predetermined criteria A study of rural, urban and suburban or merchants and academicians or high activity/low activity college students, etc employs this type of sampling to identify issues that cut across individuals
Study participants are selected based on their ease, accessibility and availability The researcher selects those individuals who are most readily available This may help to save time, money and effort However, it may be the weakest sampling scheme due to its low credibility
Additional study subjects may be selected to take advantage of unexpected opportunities at the field level
This type of sampling involves the selection of people who are politically important to give emphasis to the study It is particularly important for making the program sustainable and ensuring community participation through involving responsible people.
Trustworthiness
Ensuring the quality of data based on certain established criteria is the main activity of the researcher both in qualitative and quantitative research traditions This is important, particularly for qualitative research, where the challenge of understanding and making meaning is put upon the researcher The four common criteria for assessing the trustworthiness of qualitative research findings are: truth value, applicability, consistency and neutrality
Truth-value refers to the ability of the study to detect what the research really aimed at studying We use internal validity for quantitative research and credibility for the qualitative approach
Applicability refers to the ability to determine the extent to which the findings are applicable in other settings, situations, populations or circumstances
The basic question asked by researchers while dealing with consistency is “can the findings be repeated with the same (or similar) respondents in the same context?” The consistency of findings in both quantitative and qualitative research designs can be explained by reliability and dependability, respectively
The concept of neutrality refers to the role of the researchers mainly during data collection This is assessed by objectivity in quantitative research and conformability in qualitative approach
Individual exercise, discussion when everyone is ready
Formulate a research question for which qualitative research could be conducted
List the data collection method(s) you will use and explain why you chose the particular method or methods of data collection
Qualitative research is a type of formative research that offers specialized techniques for obtaining in-depth responses about what people think and how they feel A qualitative study may be designed to explore concepts; develop hypotheses or theories; develop research tools; or clarify the findings of a quantitative study
Qualitative research: ¨ is a good source of descriptions and explanation of processes in identifiable local contexts ¨ can describe chronological flow - which events led to which consequences - and derive fruitful explanations ¨ can help researchers get beyond initial conceptions and generate or revise conceptual frameworks ¨ is fundamentally well suited for locating the meanings people place on events, processes, and structures of their lives and for connecting these meanings to the social world
There are three domains in which qualitative research tends to be used in public health: ¨ The first domain includes economic, political, social and cultural, environmental and organizational factors which influence health ¨ The second domain focuses on gaining an understanding of how people make sense of their experiences of health and disease ¨ The third domain includes interaction of actors involved in different public health activities These domains underscore the role of qualitative research in public health
Qualitative research is used largely in four general ways:
1 as a tool to generate ideas
2 as a step in developing a quantitative study
3 as an aid in evaluating a quantitative study and
4 on occasion, as the primary data collection method for a research topic
The line of reasoning in quantitative research is deductive On the other hand, the line of reasoning in the qualitative method is inductive
The strength of qualitative research lies in validity (closeness to the truth)
The study designs in qualitative research include: ethnography, phenomenology, grounded theory and participant action research
The most common methods of data collection in qualitative research are: ¨ Participant observation, ¨ Interviews, ¨ Focus groups, and ¨ Historical methods
Key design issues in designing qualitative research: ¨ Define area of inquiry ¨ State the research problem ¨ Develop conceptual framework ¨ Formulate qualitative research questions
Relevant concepts for designing qualitative research: ¨ Natural setting, ¨ Holism, ¨ The human research instrument, ¨ Emergent design ¨ Saturation and redundancy
Purposive sampling: is the sampling method used in qualitative research
Sample size in qualitative research is usually small and not predetermined
The sampling techniques used in qualitative research are: ¨ Snowball or chain sampling ¨ Homogeneous sampling ¨ Extreme or deviant ¨ Maximum variation ¨ Convenience sampling ¨ Opportunistic ¨ Sampling politically important subjects
Four Criteria for assessing the trustworthiness of qualitative research:
These questions are designed to help you assess how well you have learned the content of this module You may refer to the text whenever you are unsure of the answer
1 Descriptive epidemiology includes all EXCEPT:
2 In one study, a group of residents exposed to an environmental pollutant have been followed since the 1960s to identify occurrence of lung disease, is an example of which type(s) of study? (Circle ALL that apply.)
3 The Cancer and Steroid Hormone study, in which women with breast cancer and a comparable group of women without breast cancer were asked about their prior use of oral contraceptives (“the Pill”), is an example of which type of study? (Circle ALL that apply.)
4 Because socioeconomic status is difficult to quantify, we commonly use all of the following substitute measures EXCEPT:
5 The primary difference between an experimental and an observational study is:
A the investigator is “blinded” (prevented from knowing the subjects’ true exposure status until the end of the study) in an experimental study but not in an observational study
B the investigator controls the subject’s exposure in an experimental study but not in an observational study
C the investigator controls the subject’s outcome in an experimental study but not in an observational study
D experimental studies are conducted with animals; observational studies are conducted with humans
Questions 6-9, for each numbered objective below, select the most appropriate epidemiologic approach for the following lettered options Each option can be used once, more than once, or not at all
6 Which approach is used to compare the relative benefits of two alternative pharmacologic treatments for peptic ulcer disease?
7 Which approach is best to evaluate factors suspected of contributing to the development of intrauterine growth retardation?
8 Which approach is used to assess the association of frequent gastro-enteritis and bottle feeding?
9 Which approach is used to assess the effect of an educational programme in improving immunization coverage?
10 Which of the following is/are not the limitation of cohort study design?
D The exposure precedes the development of the disease
11 One of the following is not true about descriptive studies
A They allow for generation of hypotheses
B They assess the occurrence and distribution of diseases in a population
C Their main purpose is testing hypotheses
D They characterize the occurrence and distribution of a problem by time, person and place
12 Which of the following is not a descriptive study?
13 Which of the following is false about a case-control study design?
A Used to study rare diseases
D No problem of loss to follow up
14 Which of the following is/are not true about sampling?
B Minimizes time for data collection and processing
D Results might be less reliable
15 Which of the following is true about reliability?
A The same result is obtained when the study is repeated under the same conditions
B The same set of door to door interviews of respondents will give the same set of responses on different occasions
C A measurement measures what it is supposed to measure
Answers to self-assessment questions
1 The correct answer is E Descriptive epidemiology provides the what, who, when, and where of health-related events Analytic epidemiology provides the why
2 It is a cohort study because subjects were enrolled, classified by exposure, and then followed for evidence of disease
3 The correct answers are B and D The study is an observational study rather than an experimental study or clinical trial because the investigators did not attempt to influence the subjects’ choices; they simply asked about past use It is a case-control study rather than a cohort study because the subjects were enrolled on the basis of whether or not they had disease, and then asked about exposure
4 The correct answer is D Educational achievement, family income, and occupations are used because they are easy to measure Social standing is not
5 The correct answer is B The hallmark of an experimental study is that the investigator dictates each subject’s exposure In an observational study, the investigator observes, measures, or asks about the exposure, but does not dictate it
6 The answer is E, Randomized Clinical Trial, it is the only method whereby the benefits of two alternative pharmacologic treatments can be ascertained
7 The answer is B, Case-control study Cases and controls can be classified according to the presence of intrauterine growth retardation and can be compared to determine if the two groups differ in the proportion of persons exposed to a specific factor or factors
8 The answer is A and B Cohort study: Cohorts of infants who are on bottle feeding and those who are not being bottle fed can be followed and compared for the difference in occurrence of gastroenteritis in the two groups A case-control study could also be used
9 The answer is C A Community Intervention Trial Communities can be randomized to intervention and control groups The intervention communities receive the educational programme and the two communities can be compared
10 The answer is D Exposure precedes the development of disease
This is not a limitation it is one of the strengths of cohort study
11 The answer is C The main purpose of descriptive studies is not testing hypotheses, however, they are useful to generate hypotheses
12 The answer is B Case-control study A case-control study is an observational and analytic type of study design
13 The answer is C Risk can not be calculated in a case-control study, it is one of the drawbacks of case-control studies
14 The answer is D Results might be less reliable Sampling rather increases reliability and accuracy of measurements
A The same result is obtained when the study is repeated under the same conditions
B the same set of door to door interviews on respondents will give the same set of responses on different occasions
Association of Teachers of Preventive Medicine (ATPM) An outbreak of jaundice in a rural county Centre for Disease Control, 1992
Brink H Quantitative vs qualitative research Nursing RSA Verpleging 1991; 6(1): 14-18
Cochran W.G Sampling techniques New York, John Wiley and Sons, 1977
Dahlgren L, Emmelin M, and Winkvist A Qualitative methodology for international public health, Print och Media, Umea Universty-Sweden, 2004
Epidemiology Research design, Descriptive studies, University of Newcastle, New South Wales, Australia, 1994
Epidemiology Research design, Randomized Controlled Trials (RCT), University of Newcastle, New South Wales, Australia, 1996
Epidemiology Research design, Case control studies, University of Newcastle, New South Wales, Australia, 1996
Epidemiology Research design, Cohort studies, University of Newcastle, New South Wales, Australia, 1996
Epidemiology, Collecting Data, The Wellcome Tropical Institute A 9- part manual for distance learning in tropical countries, unit 4, 1987
Epidemiological Reviews, Application of the Case-Control Method, edited by Haroutune K Armenian, Volume 16, No 1, 1994
Fleiss H Statistical methods for rates and proportions New York, John Wiley and Sons, 1981
Gore S.M Assessing clinical trials: why randomize? British Medical Journal,
Hennekens C H, Buring J E, Epidemiology in Medicine, edited by Mayrent S L, Boston/Toronto, Little, Brown and Company, 1987
Hardon A, Boonmongokon P, Streefland P, Lim tan M, Hongvivatana T, van der Geest S, van Staa A, Varkevisser C Applied health research manual,
Anthropology of Health and Health care, second edition, Netherlands, 1995
Kleinbaum DG, Kupper LL, Morgenstern H Epidemiologic research: principles and quantitative methods London, Lifetime Learning Publications, 1982
Markush R, Siegel D Oral contraceptives and mortality trends from thromboembolism in the United States American Journal of Public Health, 1969, 59: 418-434
Mausner J S and Kramer S, Epidemiology: an Introductory Text Second edition, USA, W.B Saunders Company
Mays N, and Pope C Rigour and qualitative research BMJ 1995; 311: 109-12
Hudelson PM Qualitative research for Health Programmes, Division of Mental Health World Health Organization, Geneva, 1994
Pope C, and Mays N Reaching the parts other methods cannot reach: an introduction to qualitative methods in health and health services research BMJ
Research Methodology Training Course for Disease Control Managers in SADC Regions, Biomedical Research and Training Institute, Zimbabwe and the Danish Bilharziasis Laboratory, Denmark, 1996
Schlesselman J.J Case-control studies New York, Oxford University Press,
Sample size determination: a user’s manual Geneva, World Health Organization, 1986 (WHO/HST/ESM/86.1)
How to use random number tables
1) First, decide how large a number you need Next, count if it is a one, two or larger digit number For example, if your sampling frame consists of 10 units, you must choose from numbers 1-10 (inclusive) You must use two digits to ensure that 10 has an equal chance of being included
You also use two digits for a sampling frame consisting of 0-99 units