Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 138 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
138
Dung lượng
1,07 MB
File đính kèm
20. Data Management manual_2011.rar
(1 MB)
Nội dung
Epidemiology in Practice Data Management Using EPI-DATA and STATA Course Manual This document contains the material which has been written to support teaching sessions in data management, which are part of the Epidemiology in Practice module for London-based MSc students The material is designed to introduce students to the practicalities of managing data in epidemiological studies, including field procedures, questionnaire design, the cleaning and checking of data, and organizing a dataset in preparation for analysis IMPORTANT NOTE: This copy is for DL MSc Epidemiology students Please note that the datasets used are part of the EPM103 materials and DL students will need to access these from the core module CD-ROM (rather than in the location described in the document) LONDON SCHOOL OF HYGIENE AND TROPICAL MEDICINE October 2011 No part of this teaching material may be reproduced by any means without the written authority of the School given by the Secretary and Registrar Contents Introduction to the Unit Session 1: Introduction to data management and questionnaire design Introduction to data management 1.1 Designing a data management strategy 1.2 What we mean by data? 1.2.1 Data management considerations at the preparation phase 1.2.2 Data management considerations in the fieldwork 1.2.3 Data management strategy and manual 1.2.4 Database construction 1.2.5 Questionnaire design 1.3 General guidelines for writing questions 1.3.1 Other considerations 1.3.2 Coding responses 1.3.3 Identifying respondents and data collection forms 1.3.4 General layout - introduction 1.3.5 Ordering of questions 1.3.5.1 Instructions 1.3.5.2 Pilot testing 1.4 Quality control 1.5 Information and consent 1.6 Practical 1: Questionnaire design Practical 2: Introduction to EpiData Introduction to EpiData 2.1 Creating a data file 2.1.1 Datasets, cases and files 2.1.2 Defining database file structure 2.1.3 Variable names 2.1.4 Variable types 2.1.5 Null values – missing and not applicable data 2.1.6 Identifying (ID) numbers 2.1.7 Data quality and checking 2.2 Errors in data entry 2.2.1 Quality and validity 2.2.2 Data checking functions in EpiData 2.2.3 The onchocerciasis dataset 2.3 Using EpiData 2.4 Starting EpiData 2.4.1 Creating a QES file using the EpiData editor 2.4.2 Saving the QES file 2.4.3 Creating a data (.REC) file 2.4.4 Entering data 2.4.5 Variable names 2.4.6 Finding records 2.4.7 Summary so far 2.4.8 Adding checks in EpiData 2.5 Creating specified ranges 2.5.1 Creating labels for variables 2.5.2 Setting up checks 2.5.3 Specifying a KEY UNIQUE variable 2.5.4 Limitations of interactive data checking 2.5.5 Double entry and validation 2.6 Exporting data from EpiData 2.7 The Onchoceriasis dataset 2.7.1 Exporting to STATA 2.7.2 Folder management in STATA 2.7.3 Practical 3: Data management using Stata 12 – basics Introduction 3.1 When to save and when to clear? 3.2 Do Files 3.3 What is a file? 3.3.1 Comments in files 3.3.2 The “set more off” command 3.3.3 A do-file template 3.3.4 The display command 3.4 Log files 3.5 Appending data 3.6 Naming variables, labeling variables and labeling values 3.7 Variable names 3.7.1 Variable labels 3.7.2 Value labels 3.7.3 Describing data 3.8 Equal signs in Stata commands – one or two? 3.6.1 Creating and transforming variables – gen, replace, recode 3.9 Recategorising data 3.9.1 The egen command 3.10 What’s the difference between gen and egen? 3.10.1 Exercise 3.11 Practical 4: Data management using Stata 12 – essentials Types of variable – string and numeric 4.1 Converting string variables to numeric (destring, encode) 4.1.1 Converting numeric variables to string (decode) 4.1.2 Collapsing datasets 4.2 Reshaping data 4.3 Merging files 4.4 Practical 5: Data management using Stata 12 – advanced Dates 5.1 Identifying duplicate observations 5.2 Shortcuts for repeating commands : using foreach loops 5.3 Working with sub-groups – use of Stata system variables 5.4 Final integrating exercise 5.5 Appendix 1: Example data processing manual Appendix 2: The progress of a questionnaire Appendix 3: Questionnaires to enter Appendix 4: Common stata commands Data Management using EpiData and Stata Introduction The aim of these sessions is to equip you with the practical and essential epidemiological skills needed to design a questionnaire, enter data using Epi-Data and prepare data for analysis in Stata 12 By the end of this Teaching Unit, you should: • appreciate the need for a coherent data management strategy for an epidemiological study • understand the principles of good questionnaire design • be able to create an Epi-Data database to translate data from the questionnaire to computer • be able to enter and verify data in Epi-Data • know how to transfer data from Epi-Data to Stata and other statistical packages • be able to create and use Stata do- and log-files • know how to undertake common management tasks in Stata, including merging, appending, and collapsing files • be able to use common Stata commands to generate, recode and replace variables • be familiar with more advanced topics such as dates and substrings The total class contact time for Data Management is 15 hours, consisting of five three-hour sessions You are advised to read notes for practical sessions before the class and to work through the material for each session Each week’s exercises build on work completed in former sessions If you cannot complete the work within the three face-toface hours you are strongly advised to complete the practical in your own time Recommended texts An excellent book which covers essential issues of data management and questionnaire design is: Field Trials of Health Interventions in Developing Countries: A Toolbox Edited by Smith & Morrow; WHO A useful book on Stata is: A Short introduction to Stata for Biostatistics Hills & DeStavola Timberlake Consultants Ltd There are several copies of both books in the library Dataset During this course we will use data from a randomised controlled trial of an intervention against river blindness (onchocerciasis) in Sierra Leone in West Africa Onchocerciasis is a common and debilitating disease of the tropics It is chronic, and affects the skin and eyes Its pathology is thought to result from the cumulative effects of inflammatory responses to immobile and dead microfilariae in the skin and eyes Microfilariae are tiny wormlike parasites that breed in fast-flowing tropical rivers and are deposited in the skin by blackfly (simulium) The fly bites and injects the parasite larvae (microfilariae) under the skin These mature and produce further larvae that may migrate to the eye where they may cause lesions leading to blindness The worms are detectable by microscopic examination of skin samples, usually snipped from around the hips; severity of infection is measured by counting the average number of worms per microgram of skin examined A double-blind-placebo-controlled trial was designed to study the clinical and parasitological effects of repeated treatment with a newly developed drug called Ivermectin Subjects were enrolled from six villages in Sierra Leone (West Africa), and initial demographic and parasitological surveys were conducted between June and October 1987 Subjects were randomly allocated to either the Ivermectin treatment group or the placebo control group Randomisation was done in London The questionnaire in section 2.3 is similar to that used to collect baseline data for the study It contains questions on background demographic and socio-economic factors, and on subjects' previous experience of onchocerciasis Follow-up parasitology and repeated treatment was performed for five further surveys at six monthly intervals The principal outcome of interest was the comparison between microfilarial counts both before and after treatment, and between the two treatment groups Reference Whitworth JAG, Morgan D, Maude GH, Downhan MD, and Taylor DW (1991), A community trial of ivermectin for onchocerciasis in Sierra Leone: clinical and parasitological responses to the initial dose, Transactions of the Royal Society of Tropical Medicine and Hygiene, 85, 92-6 Files and Variables You will need the following files during the course They are currently in the drive u:\download\teach\dataepi Create a new folder in your drive called h:\dataepi, and use Windows Explorer to copy all the files from the u:\download\teach\dataepi folder into h:\dataepi File DEMOG_x.REC baseline demography MICRO.REC Microfilariae counts BLOOD.REC blood samples TMTCODES.REC Treatment codes Variable Contents Values / Codes IDNO VILL DATE AGE Subject ID number Village number Date of interview Age in years SEX Sex TRIBE Tribe code HHNO Household number REL Relation household STAY Years in village OCC Occupation code IDNO SR MFRIC MFLIC Subject ID number Survey round MF count (right) MF count (left) SSRIC Skin diameter (right) SSLIC Skin diameter (left) IDNO SR EOSIN PCV Subject ID number Survey round Eosinophil count Packed cell volume MPS Malaria parasites DRUG Drug batch IDNO Subject ID number in Dd/mm/yyyy Positive integers = Male = Female = Mende = Other Positive integers = Self = Parent = Child = Sibling = Other blood = Spouse = Other nonblood = Friend = Other Positive integers 99 = Missing = At home = At school = Farming = Fishing = Office work = Trading = Housework = Mining = Other Positive integers Positive integers Positive real numbers Positive real numbers Positive integers Positive integers = Negative = Positive IVER = DRUG PLAC = PLACEBO Code for missing values 99 99 99 999 999 9999 99 Data Management using EpiData and Stata Session : Introduction to Data Management and Questionnaire Design In this session we cover the essentials of data management, and give an introduction to questionnaire design Objectives of this session After this session, you should understand how to: start planning data sources and computer files for your own studies use a data management strategy as a way of ensuring quality of data in preparation for the analysis outline tasks to be included in the procedures guide for fieldwork and data management Start designing a precise and informative questionnaire 1.1 Introduction to data management An essential part of any epidemiological study is the collection of relevant data on participants Data will include identification information such as name, age, sex, place of residence, information on the main outcome and exposure, and on other factors that may be potential confounders of the association under study This will usually include clinical and lab data which need to be linked to the socio-demographic and behavioural data Often different sources of data are used - for example some data may be collected at community level, or samples may be collected from the same individual at different timepoints In an epidemiological study, we need to transfer information from the study population to the computer, ready for statistical analysis There are however, many things that can go wrong in this process so that the final dataset may not represent the true situation Data management is the process of transferring information from the study population to a dataset ready for analysis The main aim of data management is to ensure that sufficient and necessary information is collected and processed to answer the research question Reality: population Information e.g about height, the weight, target Final dataset on the computer should income, represent reality Analysis of this data is used concentration of malaria parasites in blood, to draw number of sexual partners in last year etc population conclusions about Reality Target population the target Final dataset Sample population Study population Data collection Data entry Preparation of dataset for analysis Example: The multi-centre study of sexual behaviour in Africa As an example, we will consider a cross-sectional study of sexual behaviour in Africa This study was a population-based survey of around 2000 adults and 300 sex workers in each of four cities in Africa The aim of the study was to explore the factors associated with the differential spread of HIV in these cities (Buvé et al, AIDS 2001 Aug 15 Suppl 4) Households in each city were selected randomly and all adults living in the selected household were eligible for the study The following information was collected:- household information through interview - individual information (socio-demographic, behavioural) through interviewer-administered questionnaires - clinical examination for men - specimen form detailing which biological samples were taken - laboratory form with test results Like many epidemiological studies, this was a large, complex and expensive project involving collaborators in several African & European countries The success of the study depended on two main features of the data processing and management:1 Standardised data management across all sites Accurate data entry, with all queries being meticulously recorded and processed A data management strategy had to be designed to facilitate these two objectives 1.2 Designing a data management strategy A data management strategy comprises the data processing needs of each stage of the research study This strategy outlines how the work should be tackled, the problems that may be encountered, and how to overcome them 1.2.1 What we mean by data? ‘Data’ are all values needed in the analysis of the research question Results from lab tests and clinical measurements may be data, as can observations and measurements of a community or household We must plan how these data are to be used in the study, and how they are collected, stored, processed and made suitable for analysis Some data may not be used in the analysis, but may feed into the study prior to other activities For example qualitative data from focus group discussions may be needed to establish the nature of community norms, and these can then be used to design the questionnaire, or community characteristics may be needed in order to stratify or randomise the sample 1.2.2 Data management considerations at the preparation phase When writing your study proposal, you must start planning how to manage the data If you don’t, problems with data processing are likely to delay the study considerably or, even worse, result in poor quality data and meaningless results To ensure that the data collected represent reality you need to consider the following: • Study design – is this to be cross-sectional survey, case-control, cohort etc? • Sampling - how will you identify eligible respondents in your study population – for example, where will the list of the study sample come from, and what criteria will you use for inclusion in the sample? • What data are to be collected? What are the outcomes, exposures and potential confounders? • What questionnaires and forms will be needed? Will data be collected from different sources questionnaires, clinical records, lab tests etc? What hardware and software needs are there? • What is the sample size needed to meet the objectives of the study? What percentage will you add to take into account refusals, loss to follow-up etc? What methods can be used to minimise the refusal rate? the 01 001 01 / | \ village household person so that this ID would represent the first person in first household of the first village (or cluster) Often this identifier is allocated to the person at the beginning of the study, and the information entered into the computer Subsequent forms can be pre-printed with the individual’s number before the interview Check Digits When this identification number is entered onto the computer it is important that it is typed in correctly, otherwise the pieces of information for a person will not link together One way to ensure that the identification is typed in properly is the use of check digits These are numbers or letters which are calculated from the numbers in the ID and are then added to the ID by the computer This is done at the stage when the IDs are being generated, and may be displayed in subsequent printed records So instead of the ID 01 001 01 being generated a check digit would be calculated from this ID The final ID might then be 01 001 01 A If the data entry clerk types in any part of the ID number or check digit incorrectly, an internal calculation exposes this and will generate an error message This is described more fully in "Methods for field trials of interventions against tropical diseases: a toolbox" by PG Smith and RH Morrow Linking samples to forms Data in the computer can be printed as lists, but it is also possible for the computer to print out sticky labels These labels are useful to put on specimen samples and other biological data that may be required (slides, swabs, tubes etc) If laboratory specimens are involved it is very important that the labels are able to withstand temperatures well below freezing (often -200C to -700C) Sometimes the label will have the ID number of the individual, but this may compromise confidentiality Thus sequential labels could be produced that not identify the individual Many labels of the same number could be printed (one for each sample to be collected), and before use, and extra label could be attached to the list next to the individuals name or ID number For the multi-centre study we used labels in the following way A questionnaire was available for each eligible respondent The ID sticker was attached to this questionnaire Further labels were stuck to the biological specimens (blood, urine) collected The remaining labels with that ID number were put in the small polythene bag containing the specimen tubes When this arrived at the laboratory the staff divided up the blood and urine into two samples, and stuck one of the extra labels onto the extra tubes The laboratory staff also stuck a label onto the laboratory form which was used to record the results of the analysis, and was returned to the data entry team The alternative is to write the identification number on at each stage This takes longer and introduces a possible transcription error It is generally advised that handwritten transcription of numbers and data should be kept to the minimum, and mistakes can easily happen 16 Quality control Supervision of the quality of the fieldwork is crucial to ensure that the data are accurate This can be done by a dedicated field supervisor, whose duties include monitoring the interviewers, clinicians, sample collection Alternatively it can be done by the interviewers themselves, by checking and comparing answers on the questionnaires, finding discrepancies and going back to the respondents to clarify the response before they leave Many studies use separate quality control (QC) questionnaires for this These can be delivered by the supervisor, or by another interviewer blind to the first interview Some studies insist that the QC responses are compared immediately and mistakes rectified, other studies are not able to compare responses until the end of the day, and it may be difficult to rectify mistakes The data management strategy should outline how the QC questionnaires are used, by whom, and what will be done if discrepancies are found The strategy should consider how respondents might feel about repeating the questionnaire or clinical examination The duties of the supervisor may include: • Arrangements for transport - which clusters to visit on which day; • Checking where and when market days occur so that these can be avoided (as many respondents may be absent from the household); • Greeting community leaders, informing them about the study and seeking their cooperation; • Advising interviewers which households are eligible and keeping a checklist of the outcome of each interviewers visit to each household; • Partial re-interviewing of a random sample of households already interviewed by interviewers; • Spot checks that interviewers are following procedures; • Checking questionnaires and forms to ensure that they are completed properly; • Coordinating follow-up; • Handing completed questionnaires to data processing staff; • Answering queries by data processing staff; • Ensuring security and safety of interviewers; • Solving unanticipated problems (eg ineligible people asking to join the study); 17 Prepare data processing procedures manual This needs to set out all the procedures to be followed to ensure the quality of the data It should include the following: • The flow of the questionnaires and forms from the field to final storage For example the field manager puts the questionnaires in box1 after they have been completed and he has checked them for legibility and completeness After initial checking (and possibly some coding by data entry clerk or manager) this batch of questionnaires goes into box2 One data entry clerk takes these from box1, does the first data entry then puts them in box2 (after marking that they have been entered once) The second data entry clerk takes them from box2, does the second entry then puts the forms in box3 A clerk takes the forms from box3 sorts them into order and puts them in the final storage place If anyone in this process has a query about any form, the query is marked and put in a special queries box for the field manager to pick up Sometimes tick sheets are used to check which data has been received and where it is in the data entry process • How to use the data entry system set up according to section 3.4 Include lists of filenames, which computers will be used for first and second data entry, and in particular any merging needed • Procedures for entering data which has been corrected following discussions with field manager • Taking backups (if you don’t this you can lose the data!) • Running virus checking programs • Producing sticky labels • Producing lists or summary statistics to facilitate fieldwork (eg response rates, eligible respondents needing follow-up etc.) • Responsibilities for maintaining hardware and software Planning the computer resources and data sources Computing hardware If an institution’s computers are to be used rather than buying computers especially for the project you should check: • availability of equipment (time frames - other projects are likely to be using them as well) • number and type of machines • are zip drives, USB drives etc available to facilitate backup? • are up-to-date virus detection programmess installed? • what software is loaded, whether versions are the same on different machines • where they are located, i.e all together in one room or spread out around building If you are going to buy new equipment, consider: • how many machines of which type are needed • requirements for other equipment - UPS, printer, zip drives, voltage regulators etc • the time frame to order and ship the equipment 18 • customs regulations for import of equipment • what will happen to the equipment after the project has finished Computer software There are many specialist packages for different computing “jobs” It is important to get the best software for each job If different packages are used, it is usually easy to transfer the data from one package to another to the next job However this will involve the expense of buying more packages • Databases (for entering, storing and manipulating data) eg dBASE, Access, FoxPro • Spreadsheets (matrices of numbers - useful for accounting and data manipulation) e.g Excel, Lotus • Graphics programs (for producing attractive plots and graphs - some of this can be done from spreadsheets or statistics packages) e.g Prism • Word Processors (for writing reports etc.) e.g MS Word • Statistical packages (for statistical analysis of data) e.g SPSS, SAS, Stata On this course we will use two packages: EpiData and Stata Data entry Design the data entry system You will be learning how to this in detail during the Epi-Data sessions The basic stages are: • • • • Set up a data entry screen for the questionnaire and other forms Set up range and consistency checks within the data entry screen Enter the data Check the data (this can be done by double data entry, manual checking of all data, or manual checking of a random selection of data) • Clean the data (additional range and consistency checks (eg that there are no laboratory forms for which there is no questionnaire etc.) • Merge the data (eg the data from the questionnaires with the different forms) • Recode the data (eg age into age groups) Facilitating computer data entry There are some further points to be considered in relation to transfer of the data from the questionnaire to a computer Examples of different designs are attached to the handout It is important to make clear which parts of the form are to be entered in the computer This is usually done by providing boxes for each answer It can useful to reserve the right-hand side of the form for "coding boxes" The questions, codes, written responses etc are all put on the left-hand side and the coding boxes on the right are reserved for the numbers which are to be entered into the computer This makes data entry and checking easier 19 One box is needed for every digit Hence, for each response you need to work out the maximum numbers of digits which may be needed e.g for height, what is the largest height that you are likely to record? Also, for a variable such as height, there may be a decimal point; indicate this between boxes on the questionnaire The first few coding boxes of the questionnaire usually contain a code for the type of questionnaire, followed by identification codes for the respondent; typically, village (or community) code, household code, individual code There may be some specific requirements for the way the data are to be recorded, depending on the software that it is intended to use Consultation with data-processing staff and looking ahead to the requirements of the proposed analysis will help the design Some computer packages cannot distinguish between a blank and a zero This particular problem is becoming less common, but it needs to be considered when deciding on numerical codes e.g code "missing values" as rather than leaving a blank Note that, except where questions are not applicable and have therefore been skipped, blanks should not be used for "not known" since they might be ambiguous for simply a missing value 8’s are conventionally used to code “don’t know” Unless it is very obvious from the questionnaire a "code book" should be created so that during analysis the numeric can codes can be easily interpreted Some packages allow notes to be attached to variables or datasets This would be a useful way of ensuring that the 'codes' not get lost Most computer software useful for this kind of data analysis uses short identifying names (usually up to characters) for each variable It is sensible to choose variable names that are indicative of the variable, (though an alternative is to call them by question number; Q1 and so on) It may be useful to put these variable names on the questionnaire itself (Some computer software allows a longer name as well that will appear on the output but is not used in programming) Epi-Data, and some other packages enable the questionnaire to be reproduced on the screen, and the data to be entered through this medium They also enable range and consistency checks to be built in to the entry process This is called "direct data entry" For very small data sets (for example laboratory data) it may not be worth setting up a direct data entry screen and the data can be entered straight into a spreadsheet, or a word-processing package/editor, or may be downloaded from the laboratory machine (PCR for example) Keeping a master copy of the entered data The data entry clerks will normally enter a small amount of data into a file The file will be validated against a second entry of the same data Each of the small files can then be added to a master file It is usually for the data supervisor to keep this master file, and to add the data using a programme 20 The master file can be named at the beginning of the study, so that programmes can be written for that data file Programmes can be written not only to update (add) new data to the master file, but also to provide frequencies of the key variables In this way study personnel can have up to date information about how many communities have been seen, the percentage of the target population that have been interviewed, the male-female split in the survey, etc The programmes provide a record of what is happening to the data at that point in the study Cleaning data At this stage, the data is known to be an accurate reflection of the questionnaires (or other data collection device), but there may still be many mistakes Cleaning the data is important to identify inconsistencies and mistakes Data may be cleaned using the same software as data entry or in other software If new software is being used, a direct translation of the master data must be used for the cleaning programmes Cleaning programmes have two tasks Firstly to identify potential mistakes, and secondly to rectify those mistakes Examples of mistakes could be: • Inconsistencies missed by the data entry screen • Impossible differences between two fields (eg follow up date before the recruitment date) • Inconsistent reports between two fields (eg use of a condom by someone who reports never having sex) • Identification problems (If a questionnaire exists for a subject who refused, or a sample sticker label does not correspond with the list of subjects, or questionnaire numbers) After the mistakes have been identified then a list can be generated A data supervisor must then go over all the mistakes, compare them with the questionnaire and see if they can be solved This is very time consuming and requires attention to detail If the mistakes can not be rectified then the data will normally be assigned as missing In most cases a programme is used to change the data and to clean the mistakes, this ensures that any changes are reproducible and verifiable Keeping a master copy of the cleaned data Once the data have been cleaned, a master copy of the cleaned data is kept These clean data are the basis for the analysis, and can be used by others to examine different questions and hypothesis These data may also provide the template for other future studies The master data should be kept in the original format, and be translated into the software to be used for analysis (eg STATA or SAS) Once the master data have been put into the analysis format, a programme can be written to label all the variables and values in the dataset Notes can be attached to explain any missing values, or other difficulties in the data 21 Merging different data Only when the data are known to be clean can they be merged with other data from different sources Before data can be merged another source of difficulty must be overcome The variable on which the data are merged must match exactly These problems should have been identified in the cleaning process, but if they have not, another iteration of cleaning may be required It may be possible to merge data with some other datasets early on in the analysis This will enable a partial analysis to be done However other data may take longer to compile (eg from laboratory doing complex tests in another country) Thus a full analysis may not be possible until long after the data collection Merging should be done using a programme in the analysis software When merging new variables can be created, and possibly variables can be recoded to suit the analysis to be done Categories can be obtained for age groups, or for the results of lab tests By using programmes the master data are not altered in any way, and the analysis can be defined to answer the required question Decisions may need to be taken over the ‘unit of analysis’ when merging data If there is a hierarchy among the data which one corresponds to the unit of analysis For example the data may include village based data, household based data, data on adults (mothers or children) and on children Depending on the question being asked, and the hypothesis being tested, the unit of analysis may be the village, the household or the individual child 22 Appendix 3: Data to enter from onchocerciasissurvev Onchocerciasis Baseline Surwey - personal Data DNorCrfrrQat Date of D A rrE] - i f l r O r g t l * q _ r L p r rnterv:_ew l a a nYs i - rJ.r ygctl- Sex (M or A G ED f ) b sEX [11 F) (Mende other Tribe Household " P P r R t r lt -l l - l I HHN Erffr co head of household? SeLf Parent chifd sibling Have you lived B in I n\ nJn r J - v rl { rrr vr ^ rw a rl ' rr: rct ar v cllron r u What is 2) number Relationship your f this 'l -i -'^^ lived vi-llage i in- t h i -s i ^ tsL REL Other Bfood Refa tive Spouse Other Non-Bl-ood ReLattve Friend alt your life? (y/N) F\ ,-.1 r ^ v i l l ar g^e- ? main occrpation? l/one / Mlssinq At llome At Schoof Farminq Fishing flf srAy B lJ I (years) - lLl> O C C ft - 4V- f LIV | Of f ice Work Tradlnq Housework Mlning 23 Onchocerciasis Baseline - Survey personal Data rDNo rt rh r:1tLI Date of 1-^ ; rIVE D A rrE l r ? rI r l r l r l t b r T i interview - _ L r y U d r acnlt r8r Sex (M or sEX lfl F) (Mende Other Tribe Household 2) rr(rbll HHNot t tCt number Relationship to head of 5 B in this -l r rF Trnvt ^w r \nJn ar r y -h rrr rca r v c / o l - l v e d w n a r l- _i ^ s y o u r t^71-^F -> REL l) I trousehold,? self parent chifd sibllnq Have you lived ,1 , 12l village in other Bfood Rela tive Spouse Other Non-Blood Refattve Friend alt- your this virlage? marn occupatron? l/one / Missing At llome At School Farminq Fishinq (y/N) life? srAy ili3f LrV 'l t+l _t, (years) OCC l_l B Off ice Work Trading Ilousework Mininq 24 Onchocerciasis Baseline Surwey - personal Data r D N toI t i C t l Date of i-^ raa nys rrr DArE Zff | f?rt tl_rtr/r intervrew yedl- Sex (M or , y ' A G EI I l > | b sEX lli F) (Mende Other Tribe 2) TprRE to r a = L L Parent chifd e l h l i n n Have you lived -l head of your l r i l in this iivecr RELi3 r household? B rT]nr^r r v w r \nnJ r rc v rl.rrrcr rrav c l o u What is , I H H Nlol t a t Househol-d number Relationship t village in O t h e r B l - o o d R e _ l at i v e Spouse Other Non-Blood Refative Friend alf this your life? virlage? srAy llf LrV S flf (years) occla main occupation? l/one / Uissinq At llome At Schoof Farminq Fishinq (y/N) B Of f ice Work Tradlng Ilousework Minlnq 25 Onchocerciasis Baseline - Survey personal Data r D N ol L t C l L l L t Date of r*a nvc: -.1* rrr D A rLEf t t l t tI i g t t t l r interview A G Et h $ yedIs Sex (M or sEXm F) (Mende Other Tribe Household 2) lprP,tr HHNO t I tQt number Relationship co head of household? Se_Zf Parent child sibJinq Have you lived rl Tr.vr rv^ rv lI vnrnar ry What is rr nr:ar r va c your B ln |oLl I I I r-+ I this l-ived vittage in this REL ILt a t h e r B l o o d R e _ l at i v e Spouse Other Non-Bfood Refattve Friend atl your t-ife? village? LtCl Lrv IJf (years) occr3 r main occupation? l,/one / ttissinq At llome At Schoof Farming Fishing srAy (y/N) B Of f ice Work Tradinq Ilousewark Mining 26 Onchocerciasis Baseline - Survey personal Data rDNo iC tStCtA Date of I -^ nv c: ; ^ _Lr] gctl y (M or sex D A rt E t tLCrlr lrgrSiTr intervrew AGElztLl :i a sEX lf I I,) (Mende Other Tribe Household to head of household? Se-lf Parent chifd sibTlnq Have you lived 7- in this vitlage l-ived Whac is occupation? I nnn a h:rra your L | | main lJone / Uissinq At llome At Schoof Farminq Fishinq in ! atl this DFr I I + O t h e r B l o o d R e - Z at i v e Spouse ather Non-Bfood Refatlve a r r v w r \ J r r y r L o v e| o u lTnr^r t | H H N ol Q l S I number Relationship mnrn-Lrt_Lnr1 2) l g l l u your 1ife? viltage? (y/N) srAy f2.+1f LrV fY f (years) occ l5l B Of f ice Work Trading Itousework Mlninq 27 Onchocerciasis Baseline - Survey personal Data r D N foC f t t C r l r Date of n ^^ fl'v c ; ^ _Lrr D A rrE l rstQqrjr?nft intervrew y cct.L AGE]!f 1f Sex (tu or F) (Mende1 orher 2) rribe Househord number Relationship co head househol_d? chif d sibJing *L {vn rw^ z ll nvnrn a y rh arrar av u \l /oA ru1 What is of B in your i rRrBE f I f HHNo f L f Kf -?-' Self parent Have you lived s E X ff this 'l;-'^.l lived village i- n1- this ^ your life? - - - 1r r ^ - ^ - virlage? (y/N) srAy fl ?-t>l LIV { \ In-, I (years) o c c5 r main occupacion? l/one / ttisslnq At llome At School Farming Fishlng I I \o I O t h e r B l o o d R e _ Z at i v e Spouse Other Non-Bl-ood Ref attve Friend all fl- j REL B Of f ice Work Trading Housework Minlnq 28 Appendix - Useful Stata commands for data management and manipulation You can find out more about any of these by typing ‘help’ followed by the command We will explain the commands in bold further during Practical Command append cf codebook collapse compare compress count cross dates decode describe drop edit egen encode expand fillin format generate infile input inspect ipolate label list merge modify mvencode order outfile quietly recast recode rename replace reshape sample save scalar sort use xpose Description Append two data sets Compare two files Display codebook Make data set of means, medians etc Compare two variables Compress data in memory Count observations satisfying specified conditions Form every pairwise combination of two data sets Date conversions Creates a string variable from an encoded numeric variable Describe contents of data in memory or on disk Eliminate variables or observations Edit and list data using spreadsheet editor Extensions to generate Creates a numeric variable from a string variable Duplicate observations Rectangularise data set Specify permanent display format Create or change contents of variable Read non-Stata data (i.e text) into memory Enter data from keyboard Display simple summary of data's characteristics Linearly interpolate or extrapolate variables Label manipulation List values of variables Merge data sets Interactively modify data values Recode missing values Reorder variables in a data set Save data in non Stata format (i.e text) Carry out the next command without giving output Change storage type of variable Recode categorical variable Rename variables Replace values of continuous variables Convert data from wide to long and vice-versa Draw a random sample Save and use Stata data sets Define scalars (i.e constants) Sort data Use Stata dataset Interchange observations and variables 29 30 ... Introduction to data management and questionnaire design Introduction to data management 1.1 Designing a data management strategy 1.2 What we mean by data? 1.2.1 Data management. .. guidelines) A data management manual must be produced It is good practice to have written and explicit instructions on all aspects of data management filed with the data managers The data management manual. .. 99 999 999 9999 99 Data Management using EpiData and Stata Session : Introduction to Data Management and Questionnaire Design In this session we cover the essentials of data management, and give