Databasescoursebook Version4.1(8 October2013) FreeUniversityofBolzanoBozen–PaoloColetti Introduction This book contains the relational databases and Access course’s lessons held at the Free University of Bolzano Bozen. The book is divided into levels, the level is indicated between parenthesis after each section’stitle: studentsofInformationSystemsandDataManagement3creditscourseuselevel1; studentsofInformationSystemsandDataManagement5creditscourseuselevels1,2and3; studentsofComputerScienceandInformationProcessingcourseuselevels1,2and3; studentsofAdvancedDataAnalysiscourseuselevels2and5. This book refers to Microsoft Access 2010, with referrals to 2007 and 2003 in footnotes, to MySQL CommunityServerversion5.5andtoHeidiSQLversion7.0.0. Thisbookisincontinuousdevelopment,pleasetakealookatitsversionnumber,whichmarksimportant changes. Disclaimers This book is designed for novice database designers. It contains simplifications of theory and many technicaldetailsarepurposelyomitted. TableofContents INTRODUCTION 1 TABLEOFCONTENTS 1 1. RELATIONALDATABASES(LEVEL2) 2 1.1. DATABASEINNORMALFORM 2 1.2. RELATIONS 3 1.3. ONETOMANYRELATION 5 1.4. ONETOONERELATION 6 1.5. MANYTOMANYRELATION 7 1.6. FOREIGNKEYWITHSEVERALRELATIONS 9 1.7. REFERENTIALINTEGRITY 10 1.8. TEMPORALVERSUSSTATICDATABASE 11 1.9. NONRELATIONALSTRUCTURES 11 1.10. ENTITYRELATIONSHIPMODEL(LEVEL9) 12 2. MICROSOFTACCESS(LEVEL1) 14 2.1. BASICOPERATIONS 14 2.2. TABLES(LEVEL1) 15 2.3. FORMS(LEVEL3) 18 2.4. QUERIES(LEVEL1) 19 2.5. REPORTS(LEVEL3) 22 3. MYSQL(LEVEL5) 23 3.1. HEIDISQL 23 3.2. INSTALLINGMYSQLSERVER 25 4. SQLLANGUAGEFORMYSQL(LEVEL5) 29 4.1. BASICOPERATIONS 29 4.2. SIMPLESELECTIONQUERIES 29 4.3. INNERJOINS 31 4.4. SUMMARYQUERIES 33 4.5. MODIFYINGRECORDS 34 4.6. EXTERNALDATA 34 4.7. TABLES 35 5. DESIGNINGADATABASE(LEVEL2) 38 5.1. PAPERDIAGRAM 38 5.2. BUILDINGTHETABLES 39 5.3. INSERTINGDATA 41 6. TECHNICALDOCUMENTATION(LEVEL9) 42 6.1. MYFARMEXAMPLE 42 PaoloColettiDatabasescoursebook Page2of44Version4.1(08/10/2013) 1. Relationaldatabases(level2) Thischapterpresentsthebasicideasandmotivationswhichliebehindtheconceptofrelationaldatabase. Readerswithpreviousexperienceinbuildingschemasforrelationaldatabasescanskipthispart. Arelationaldatabaseisdefinedasacollectionoftablesconnectedviarelations.Itisalwaysagoodideato have thistableorganizedinastructuredwasthatiscallednormalform. 1.1. DatabaseinNormalForm The easiest form of database, which can behandled even by Microsoft Excel, is a single table. To be a databaseinnormalform,thetablemustsatisfysomerequisites: 1. the first line contains the headers of the columns, which univocally define the content of the column.Forexample: Studentnumber Name Surname Telephone 2345 Mary Smith 0471234567 2. eachcolumn containsonlywhatisindicatedinitsheader. Forexample,inacolumn with header “telephonenumber”wemaynotputtwonumbersorindicationonthepreferredcallingtime,such asinthesecondrowofthistable: Studentnumber Name Surname Telephone 2345 Mary Smith 0471234567 2348 John McFlurry 0471234567or3378765432 3. eachrowreferstoasingleobject.Forexample,theremaynotbearowwithinformationonseveral objectsoronagroupofobjects,suchasinthesecondrowofthistable: Studentnumber Name Surname Degreecourse 2345 Mary Smith EconomicsandManagement Startingwith5 LogisticsandProductionEngineering 4. rowsareindependent,i.e.nocellhasreferencestootherrows,suchasinthesecondrowofthis table: Studentnumber Name Surname Notes 2345 Mary Smith 2376 John Smith isthebrotherof2345 5. rowsandcolumnsaredisordered,i.e.theirorderisnotimportant.Forexample,thesefourtables arethesameone: Studentnumber Name Surname Studentnumber Name Surname 2345 Mary Smith 2376 John McFlurry 2376 John McFlurry 2345 Mary Smith Name Studentnumber Surname Surname Studentnumber Name Mary 2345 Smith McFlurry 2376 John John 2376 McFlurry Smith 2345 Mary Databasescoursebook PaoloColetti Version4.1(08/10/2013) Page3of44 6. cellsdonotcontainvalueswhichcanbedirectlycalculatedfromcellsofthesamerow,suchasin thelastcolumnofthistable: Studentnumber Name Surname Tax1 st semester Tax2 nd semester Totaltax 2345 Mary Smith 550€ 430€ 980€ 2376 John McFlurry 450€ 0€ 450€ Databaserowsarecalledrecordsanddatabasecolumnsarecalledfields. Singletabledatabasescanbeeasilyhandledbymanyprogramsandbyhumanbeings,evenwhenthetable isverylongorwithmanyfields.Therearehoweversituationsinwhichasingletableisnotanefficientway tohandletheinformation. 1.1.1. Primarykey Eachtableshouldhaveaprimarykey,whichmeansafieldwhosevalueisdifferentforeveryrecord.Many timesprimarykeyhasanaturalcandidate,asforexamplestudentnumberforastudents’table,taxcode foracitizenstable,telephonenumberforatelephonestable.Othertimesagoodprimarykeycandidateis difficulttodetect,forexampleinacars’tablethecarnameisnotaprimarykeysincetherearedifferent seriesanddifferentmotortypesofthesamecar.Inthesecasesitispossibletoaddanextrafield,calledID orsurrogatekey,withaprogressivenumber, tobeusedasprimarykey.Inmanydatabaseprogramsthis progressivenumberishandleddirectlybytheprogramitself. Itis also possibletodefine asprimarykey several fieldstogether, for examplein a people table thefirst nametogetherwiththelastname,togetherwithplaceanddateofbirthformauniquesequenceforevery person. In this case the primary key is also called composite key or compound key. On some database managementprogramshoweverhandlingacompositekeycancreateproblemsandthereforeitisabetter ideatouse,inthiscase,anID. 1.2. Relations 1.2.1. Informationredundancy Insomesituationstryingtoputtheinformationweneedinasingletabledatabasecausesaduplicationof identicaldatawhichcan becalledinformationredundancy.Forexample,ifweaddtoourstudents’table the information on who is the reference secretary for each student, together with other secretary’s informationsuchasofficetelephonenumber,officeroomandtimetables,wegetthistable: Studentnumber Name Surname Secretary Telephone Office Time 2345 Mary Smith AnneBoyce 0471222222 C340 1418 2376 John McFlurry JessyCodd 0471223334 C343 911 2382 Elena Burger JessyCodd 0471223334 C343 911 2391 Sarah Crusa AnneBoyce 0471222222 C340 1418 2393 Bob Fochs JessyCodd 0471223334 C343 911 Informationredundancyisnotaproblembyitself,but: storing several times the same information is a waste of computer space (hard disk and memory), whichforaverylargetable,hasabadimpactonthesizeofthefileandonthespeedofeverysearch orsortingoperation; wheneverweneedtoupdatearepeatedinformation(e.g.thesecretarychangesoffice),weneedto doalotofchanges; manually inserting the same information several times can lead to typing (or copying&pasting) mistakes,whichdecreasethequalityofthedatabase. PaoloColettiDatabasescoursebook Page4of44Version4.1(08/10/2013) Inordertoavoidthissituation,itisacommonproceduretosplitthetableintotwodistincttables,onefor thestudentsandanotheroneforthesecretaries.Toeachsecretaryweassignauniquecodeandtoeach studentweindicatethesecretary’scode. Students Studentnumber Name Surname Secretary 2345 Mary Smith 1 2376 John McFlurry 2 2382 Elena Burger 2 2391 Sarah Crusa 1 2393 Bob Fochs 2 Secretaries Secretarycode Name Surname Telephone Office Time 1 Anne Boyce 0471222222 C340 1418 2 Jessy Codd 0471223334 C343 911 In this way the information on each secretary is written and stored only once and can be updated very easily.Thepriceforthisisthateverytimeweneedtoknowwhoisastudent’ssecretarywehavetolookat its secretary code and find the corresponding code in the Secretaries table: this can be a long and frustratingprocedureforahumanbeingwhentheSecretariestablehasmanyrecords,butisveryfasttask foracomputerprogramwhichisdesignedtoquicklysearchthroughtables. 1.2.2. Emptyfields Another typical problem which arises with single table databases is the case of many empty fields.For example,ifwewanttobuildanaddressbookwiththetelephonenumbersofallthepeople,wewillhave somebodywithnotelephonenumbers,manypeoplewithafewtelephonenumbers,andsomepeoplewith a lot of telephone numbers. Moreover, we must also take into consideration that new numbers will probablybeaddedinthefuturetoanybody. Ifwereserveafieldforeverytelephone,thetablelookslikethis: Name Surname Phone1 Phone2 Phone3 Phone4 Phone5 Phone6 Phone7 Mary Smith 0412345 John McFlurry 0412375 3396754 Elena Burger 0412976 3397654 0436754 3376547 0487652 3387655 0463456 Sarah Crusa 0418765 0412345 Bob Fochs 0346789 0765439 3376543 Asitisclear,ifwereserveseveralfieldsforthetelephonenumbers,alotofcellsareempty.Theproblems ofemptycellsare: anemptycellisawasteofcomputerspace; thereisafixedlimitoffieldswhichmaybeused.Ifarecordneeds anotherfield(forexample,Elena Burgergetsanothertelephonenumber)theentirestructureofthetablemustbechanged; since all these fields contain the same type of information, it is difficult to search whether an informationispresentsinceitmustbelookedforineveryfield,including thecellswhichareempty. In order toavoid this situation, we again split the table into two distinct tables, one for the people and anotheronefortheirtelephonenumbers.Thistime,however,weassignauniquecodetoeachpersonand webuildthesecondtablewithcombinationsofpersontelephone. Databasescoursebook PaoloColetti Version4.1(08/10/2013) Page5of44 People Personcode Name Surname 1 Mary Smith 2 John McFlurry 3 Elena Burger 4 Sarah Crusa 5 Bob Fochs Telephones Owner Number 1 0412345 2 0412375 2 3396754 3 0412976 3 3397654 3 0436754 3 3376547 3 0487652 3 3387655 3 0463456 4 0418765 4 0412345 5 0346789 5 0765439 5 3376543 Even thoughit seemsstrange, each person’scode appearsseveral timesin the Telephonestable. This is correct, since Telephones table uses the exact amount of records to avoid having empty cells: people appearasmanytimesasmanytelephonestheyhave,andpeoplewithnotelephonedonotappearat all. Thedrawbackisthateverytimewewanttogetto know telephone numberswe haveto gothroughthe entire Telephones table searching for the person’s code, but again this procedure is very fast for an appropriatecomputerprogram. 1.2.3. Foreignkey Whenafield,whichisnottheprimarykey,isusedinarelationwithanothertablethisfieldiscalledforeign key.This field isimportantforthe database managementprogram,such asAccess,whenithas tocheck referentialintegrity(seesection1.6). For example, in the previous examples Owner is a foreign key for Telephones table and Secretary is a foreignkeyforStudentstable. 1.3. Onetomanyrelation ArelationisaconnectionbetweenafieldoftableA(whichbecomesaforeignkey)andtheprimarykeyof tableB:ontheBsidetherelationis“1”,meaningthatforeachrecordoftableAthereisoneandonlyone corresponding record of table B, while on the A side the relation is “many” (indicated with the mathematical symbol) meaning that for each record of table B there can be none, one or more correspondingrecordsintableA. Fortheexampleofsection1.2.1,thetablesareindicatedinthisway,meaningthatfor eachstudentthereis exactlyonesecretaryandforeachsecretarytherearemanystudents.Thisrelationiscalledmanytoone relation. PaoloColettiDatabasescoursebook Page6of44Version4.1(08/10/2013) Fortheexampleofsection1.2.2,thetablesareinsteadindicatedinthisway,meaningthatforeachperson therecanbenone,oneorseveraltelephonenumbersandforeachnumberthereisonlyonecorresponding owner. Thisrelationiscalledonetomanyrelation. Clearlyonetomanyandmanytoonearethesamerelation,theonlydifferencebeingtheorderofdrawn tables. It is however very important to correctly identify the “1” side, since it has several implications on the correct working of the database. For example, in the previous example putting the “1” side on the Telephones table means that for each person there is only one telephone and that for each telephone therearemanypeople,asituationwhichispossibleuptothe90s,whentherewasonlyonetelephonefor awholefamilyusedbyallitscomponents, butwhichisnotwhatwewanttodescribewiththecurrent21 st century’sdatabase.Moreover,reversingtherelationalsoneedtochangealittlethestructureofthetables, puttingtheforeignkeyTelephoneinthePeopletableinsteadoftheforeignkeyPersonintheTelephones table,suchas 1.4. Onetoonerelation Aonetoonerelationisadirectconnectionbetweentwoprimarykeys.Eachrecordofthefirsttablehas exactly onecorresponding record in the second table and vice versa. An example can be countries and nationalflags.Thisrelationcansometimesbeusefultoseparateintwotablestwoconceptuallydifferent objectswithalotoffields,butitshouldbeavoided,sincethetwotablescanbeeasilyjoinedtogetherina singletable. 1 Students Studentnumber Name Surname Secretary Secretaries ID Name Surname Telephone Office Time 1 People Personcode Name Surname Telephones Owner Number 1 People Personcode Name Surname Telephone Telephones Number Databases c Version4.1 1.5. Ma Even thou g handled a u junctionta b tomany r e othertime s ameaningf u havealway s Forexampl (withdiffer dates),and manytom a “Whatiso w Each owne r refertotha Thisisthet the relatio n competitio n c oursebook (08/10/201 3 a n y tom a g h a manyt o u tomatically b le,whichis e lated; som e s itisonlya n u lnametot s clearlyin m e,webuild entpercent a ontheoth e a ny relation w nedbywh o r can theref thouse.On ypicalstruc t n . An exam n swithCar T Countrie s Name Size Populati o Continen t Houses Address Square m Height Construc t 3 ) a nyrela t o manyrela t by relation a anextrata b e times this j n abstractre hejunction t m inditsmea n adatabase w a gesor,if w e rhandeac h betweenh o m”or“Wh o ore have m a theotherh a t ureofthej u ple where t T ypes,Tires, 1 s o n t 1 m eters t ionyear t ion t ionis very c a l database s b lewiththe unction tab presentatio n t able,often n ing. w ithhouses w earebuildi n h personma y o uses and o o ownswhat ” a ny propert y a ndeachpr o u nctiontabl e t he junctio n Races,Driv e 1 1 c ommonin s . In order t taskofcon n le has an c o n oftherela t usingaque s s andowner s n ganhistor i y ownseve r o wners we u ”or,usinga y acts and e o pertyacth a e :itcontains n table con t e rs. Flags ID Shape Picture Property Actnum b Percenta House Owner Beginda t Enddate realapplica t t o deal wit h n ectingtoge o rrespondin g t ion.Inany c s tionforms u s .Eachhou s i caldatabas e r alportions o u se a juncti o moretangi b e ach house c a swritteno n twoormor ains four f o Acts b er ge t e t ions,unfor t h them, rel a t herthetw o g meaning i c ase,itisal w u chas“Wha t s emaybeo w e ,withdiffe r o fhouses.I n o n table wh i b lename,Pr o c an have m a n itonlyone e foreignke y o reign keys Cou n Na m Size Pop u Con t Flag Flag 1 O w Ta x Na Su r Bir t Bir t P t unately th e a tional data o fieldswhi c i n everyday w aysagood t isownedb wnedbyse v r entstartin g n ordertor e ich can be c opertyActs. a ny propert y ownerand o y sonthe“ m is this dat a n tries m e u lation t inent shape picture w ners x code me r name r thplace r thdate P aoloColett i Page7of4 4 ycannot b e bases use a c haremany experience , ideatogiv e ywhom”,t o v eralpeopl e g andendin g e presentthi s c alled eithe r y actswhic h o nehouse. m any”sideo f a base of ca r i 4 e a , e o e g s r h f r PaoloColettiDatabasescoursebook Page8of44Version4.1(08/10/2013) 1.5.1. Detailstable Manytimesineverydayapplicationstherelationissocomplicatedthatajunctiontableisnotenough.This is the case, for example, of a selling database, with table Customers and table Products. Clearly each customer may order different products and each products is hopefully ordered by several customers, thereforewe needanOrdersjunctiontable.Thistablecontainsalsoallthedetailsoftheorder,suchasthe amountofproducts,thedateandtheshippingcost. However,whileitiscorrectthatforeachorderthereisoneandonlyonecustomer,foreachorderthereis alsooneandonlyoneproduct,whichisnotwhat usuallyhappensin real applicationswherea customer ordersseveralproductsatthesametimeandwantsalsotopaythemalltogetherwithcombinedshipping costs. In order to deal with this situation, we need a details table. We leave all the order’s administrative information,includingthecustomerrelation,intheOrderstableandwemovethelistoforderedproducts intothedetailstable,whichwilllookliketheTelephonestableofsection1.2.2. 1 1 1 CarTypes Cartype Brand Enginecc Speed 1 Drivers Taxcode Name Surname Address Tires Tirename Radial Type Width Races Racename Date Length Participants Carplate Cartype Driver Tires Race Racetime Arrivalposition 1 Products Productcode Description UnitPrice Category Weight 1 Customers CustomerID Name Surname Address Orders Ordernumber Date Customer Product Shippingcost Amount 1 1 Products ProductID Description UnitPrice Category Weight 1 Customers CustomerID Name Surname Address Orders OrderID Date CustomerID Shippingcost OrderDetails ID OrderID ProductID Quantity Databasescoursebook PaoloColetti Version4.1(08/10/2013) Page9of44 EachrecordintheOrderDetailstablerepresentsaproductwhichisorderedwithitsamountandclearlyan ordercanhaveseveraldetails.InthiswayanentireordercanberepresentedtakingfromtheCustomers tabletheinformationonwhoorderedit,fromtheProductsthroughtheOrderDetailstabletheinformation ontheproductsandfromtheOrderstableitselftheadministrativeinformation. Usingqueriesandreports(explainedinsections2.4and2.5forAccess)allthesedatacanbeconveniently put together, taking them from the tables and automatically joining them following the relations,into a reportlikethisone. Adetailstableisingeneralusedeverytimethejunctiontable,evenwithseveralforeignkeys,isnotenough todescribetherelation.Insomecasesfurthersubdetailtablesmaybeevennecessary. 1.6. Foreignkeywithseveralrelations Consideradatabasewithpeopleandcompanies.Clearlythesetwoobjectsmustbeintwodifferenttables sincetheyrequiredifferentfields.Ifhoweverweneedtobuildatablecontainingphonesweeitherhaveto buildtwodistincttablesas: An alternative schema is thefollowing, which uses two relations coming out from the same foreign key field: Orders OrderID,Date Customers Name,Surname,Address OrderDetails Products Product,Description,WeightAmount Products Product,Description,WeightAmount Products Product,Description,WeightAmount Orders Shippingcost People’s phones Number Owner 1 People ID Name Surname Birthdate 1 Companies ID Name Type Administrator Companies’ phones Number Owner PaoloColettiDatabasescoursebook Page10of44Version4.1(08/10/2013) However this schema creates a technical problem: many database management programs which automatically follow relations, such as Access, do not know whether to follow the first or the second relationin order to findthe phone’sowner’s name.Therefore,if thedatabase designerdoes not havea good experience, it is better to avoid this second schema and to choose, according to the problem, the moreappropriatebetweenthefirstoneorthisthirdone: fillingintononappropriatefields(suchasPersonsurnameandPersonbirthdatewhen recordrefertstoa company)anemptyvalue,technicallycalledNull. 1.7. Referentialintegrity If two tables are related via a manytoone relation, like the one between students and secretaries of section1.2.1,wearenomorefreetomodifythedataonthe“1”sideatourwill.Forexample,ifwedelete asecretaryofifwechangeitsID,there areprobablycorrespondingstudentsintheStudentstablewhich becomesorphans,i.e.theydonothavetheircorrespondingsecretaryanymoreandfollowingtheirrelation totheSecretariestableleadstoanonexistentID.Thisissueisknownasreferentialintegrity,whichisthe propertyofadatabase tohavealltheforeign key’sdatacorrectlyrelated toprimarykey’sdata.Whena record on the “1” side table is deleted, referential integrity can be broken and this results in a non consistentdatabase. Secretaries Secretarycode Name Surname Telephone Office Time 1 Anne Boyce 0471222222 C340 1418 2 Jessy Codd 0471223334 C343 911 Phones Number Owner 1 People ID Name Surname Birthdate Companies ID Name Type Administrator 1 Phones Number Owner 1 PeopleCompanies ID Company(yes/no) Name Personsurname Personbirthdate Companytype Companyadministrator 1 Students Studentnumber Name Surname Secretary Secretaries ID Name Surname Telephone Office [...]... hotel booking table it is necessary to have departure dates not before arrival dates and therefore the condition here is [Departure Date] . 0412345 2 0412375 2 33 967 54 3 04129 76 3 339 765 4 3 04 367 54 3 33 765 47 3 048 765 2 3 338 765 5 3 0 463 4 56 4 0418 765 4 0412345 5 03 467 89 5 0 765 439 5 33 765 43 Even thoughit. Phone5 Phone6 Phone7 Mary Smith 0412345 John McFlurry 0412375 33 967 54 Elena Burger 04129 76 339 765 4 04 367 54 33 765 47 048 765 2 338 765 5 0 463 4 56 Sarah Crusa 0418 765 0412345. Databases course book Version4.1(8 October2013) FreeUniversityofBolzanoBozen–PaoloColetti Introduction This book contains the relational databases and Access course s