Tgp chi Khoa hgc Trudng Dgi hgc Cdn Tha Phdn A Khoa hgc Ttr nhiin, Cdng nghi vd Mdi Irudng 33 (2014) 49 57 Tap chl Khoa hpc TnTdng f)ai hpc Can Thtf website sj ctu edu vn PHAT HIEN MON HQC QUAN TRONG[.]
Tgp chi Khoa hgc Trudng Dgi hgc Cdn Tha Phdn A: Khoa hgc Ttr nhiin, Cdng nghi vd Mdi Irudng: 33 (2 Tap chl Khoa hpc TnTdng f)ai hpc Can Thtf website: sj.ctu.edu.vn PHAT HIEN MON HQC QUAN TRONG ANH HlTCJNG DEN Ktl SINH VIEN NGANH CONG NGHE T H N G TIN QUA HOC TAP Dd Thanh Nghj', Pham Nguyen Khang', Nguydn Minh Trung^ va Trinh Trung Hung^ ' Khoa Cdng nghi Thdngtin& Truyin thong Trudng Dal hgc Can Tha ^ Khoa Khoa hgc Tif nhien, Trudng Dgi hgc Cdn Tha ^ Trung tdm Liin kit Ddo tgo, Trudng Dgi hgc Cdn Tha Thdng tin chung: Ngdy nhgn: 20/01/2014 Ngdy chdp nhgn: 28/08/2014 Title: Detection of the key courses affecting the learning outcomes of information technology students Tirkhda: Chuang trinh ddo tgo ngdnh CNTT, Khai mo dd lieu, Rimg ngdu nhien, Rut trich dgc trung Keywords: Study program of information technology Data mining Random forests Feature ABSTRACT This paper presents data mining approach for detecting the key courses which affect the learning outcomes of information technology students We collect the study results of undergraduate students studying information technology programs at Can Tho University; and then the pre-processing step is to transform the dataset into structured one (i.e the table formal) suited for the input of data mining algorithms used in the next step The random forest model is learnt from the dataset lo extract the important features (the key courses) The experimental results showed that the key courses extracted by our proposed approach provide useful information to educational managers to improve the training efficiency TOM T A T Trong bdi ndy, chung tdi gidi thieu tiip cgn khai mo dit liiu di phdt hiin mdn hgc quan trgng dnh hudng den ket qua hgc tap ciia sinh vien ngdnh cdng nghe thdn^ tin (CNTT) Chung toi tiin hdnh suu tap dir lieu hgc tap ciia sinh vien tdt nghiep ngdnh CNTT tgi Trudng Dgi hgc Cdn Tha, sau thifc hiin budc tiin xit ly dd lieu, dua dH liiu vi cdu true bdng Chiing tdi di xudt su dyng gidi thugt rimg ngdu nhien hgc tit dit liiu di rdt trich cdc mdn hgc quan trgng chuang trinh ddo tgo ngdnh CNTT Kit qud thu dugc sau rut trich cd thi cung cdp thdng tin hiru ich cho cdc nhd qudn ly gido diic viec td chdc gidng dgy di ndng cao hiiu qud ddo tgo GlOfI THIEU Trong nhieu nim qua, ca so lugng dio tgo nhan luc t ^ cic tnrdng dgi hgc, cao ding chuydn nganh vl cdng nghf thdng tm (CNTT) da ting gip cho den lan, nliu ciu ngudn nhin lye CNTT ting nhanh Nhung theo dinh gia eua cae nhi myln dung, dao tgo CNTT d cic trudng hien chua dip ling dugc nhu eiu thyc tien, Nguyen nhin chii ydu chit lugng diu eua sinh vidn nginh CNTT van cdn thip De nang cao dugc chit lugng ciia sinh vidn nhim dap iing dugc nhu ciu thyc tien, cin phii cd sy phdi hgp nbjp nhang gida nhi myen dyng, ca sd dao tao mi d vaittdcua nha c^iaxi ly giao due, dpi ngu giing vien, giao vien ed v ^ hpc tip vi sinh vidn Lam giio vien cd vin hpc tip tu van de sinh vien biet duge kien thdc nio la quan Upng inh hudng ddn ket qui trudng Nhd dd sinh vidn chu tim ban d cac mdn hpc quan Ugng nhim cii thifn dugc chit lugng hpc tgp Ddng thdi, nhi quan ly cung cd eo hgi bd tri, sip xep chuang ttinh, dpi ngu giing vien phu hgp vdi cic mdn hpc tbufc phan kien thuc quan Ugng Chiing tdi dl xuit tilp can phit hien mdn hpc quan ttpng anh hudng den ket qua hpc tap ciia sinh Tgp chi Khoa hgc Tnrdng Dgi hgc Cdn Tho Phdn A: Khoa hgc Tunhiin, Cdng nghi vd Mdi irudng: 33 (2014): 49-57 vien CNTT tai Tnroi^ Dgi hpc Cin Tha (DHCT), dya tten cdng nghf kham phi tri thiic v i khai md dii lifu (Fayyad el ai, 1996) Qua do, nhi cjuan \y cd ehiln lupc quin ly phu hop nbim cai tien chit lugng giang dgy cho nhom mdn hpc quan ttgng, giao vien ed vin tu vin cho sinh vien tap trung cai tiiifn chit lugng hpc tip Nang cao hifu qui dau cua sinh vien CNTT Cac budc tiiyc hien nghien ciiu cua chiing tdi bao gdm suu tap dii lieu hpc tip ciia sinh vien tdt nghifp nganh CNTT, sau thyc hifn budc tiln xu ly du lifu, dua du lieu ve ciu tnic bing ma tir dd giii thugt rimg ngiu nhien (Breiman, 2001) duge huin luyen dl nit ttich cac mdn hpc quan trpng Uong chuong trinh dao tao Kit qui tiiu dupe sau rut m'ch bao gdm cac mdn hgc nhu xic suit thdng ke, toan rdi rac, ciu tnic diJ lifu, cd the cung cip thdng tin huu ich cho cic nha quan ly giio dye, ^ang vien, smh vien ttong vifc td chuc giing dgy dl ning cao hifu qui dio tgo Phin tilp theo cua bai viet dugc ttinh bay nhu sau: Phin ttinh bay ngin gpn ve cac nghien eiiu liSn quan; Phin ttinh bay giai thugt hpc riing ngiu nhien vi cic nit trich dgc tnmg; Phin ttinh biy cie kit qua thyc nghifm tiep theo sau la ket lu^n vi hudng phat triln NGHIEN CCrU LIEN QUAN Nghien ciiu dng dung khai mo dii lifu vio quin ly giio dye dao tao dupe xem rit cin thiet cho cac nha quin ly giao due, giiip cdng tic quan IJ? vi hogch dinh chien lupc giao due ngiy cang hieu qui Gin diy cd eic cdng Uinh nghidn eiiu irng dyng ky tiiugt khai md dd difu dem lgi nhieu Ipi ich ttong giao due Nghien ciiu ciia (Le, 2002) de xuit su "duiig khai phd lugt ket hgp (Agrawal et al., 1993) va logic md (Zadeh, 1965) tten kit qui thi tot nghifp THPT vi THCS cho muc tieu dinh gia hifu qua dao tgo vi cung cip cac thdng tin can thiet cho qua ttinh ning cao chat lupng hgc sinh Mdt hudng tiep c|n tuong ty ciia tic gii (Nguyen, 2002) su dung luat kit hgp Uong vifc tinh diem de phat hifn hgc sinh ydu, cae hpc sinh cin phu dgo thdm Lugn vin thgc si ciia (Pban, 2009) da n^ien cdu phuong phip khai md tim luat kdt h(^ Uen dii lifu giao due LTng dyng thyc nghifm tten dd lifu ket qui hpc tgp cua sinh vien trudng Dai hgc Tdn Due Thing, nhim ho tig dinh gia va dy doan kit qui hgc tgp cua sinh vien, qua dd ning cao chit lugng dio tgo Dd tai thac sT cua (Nguydn, 2011) t§p trung xiy dyng hf th6ng du doan tdt nghiep ph^ tiidng trung hgc Tic gii ap dung thuit toin khai phi lu§t ket hgp md vao vide dy doin kit qui tot nghifp phd thdng trung hpc dua ttdn hpc lyc vi hanh kilm cOa hpc sinh Nghidn cdu khic cua (Nguyin, 2012) ttinh bay kit qui di dat dugc tiln hanh ap dyng gidi thuit gom cum dii Ufu, kMeans (MacQueen, 1967) dl khai thac tiidng tin td dilm hpc smh cua uudng Cao ding nghd Van Lang Hi Ndi Tic gii tim bilu sy anh hudng ciia viing mien, ciia hoin cdnh gia dinh, din tpc, dgo ddc din kit qui hpc tip eiia hpc sinh, phin logi kit qui hpc tip de danh gii mdt each nhanh chdng nhan thdc ciia ngudi bge Tir d6 ed nhung dilu chinh giing dgy ciia giio vifin phu hgp vdi ning lye ngudi hgc Nghien ciiu cua (Nguyen et al., 2007) dl xuat su dyng gidi tiiuit may hpc ciy quylt dmh (Breiman et al., 1984), (Qumlan, 1993) vi mang Bayes (Pearl, 1985) dy doin kit qui hpc tgp cua smh viSn dai hpc v i sau dai hpc eiia Trirdng DHCT Mdt nghien ciiu khic cua (Nguyen et ai, 2011) dl xuit sd dung ky thuit phin ma trin dl dy doan ket qua hpc tgp cua sinh vidn Nghien eiiu cua (Pal & Pal, 2013) dl xuit silt dyng giii tiiuit may hpc ciy quylt djnh (Breiman et ai, 1984), (Quutian, 1993) va Bagging (Breiman, 1996) dl dy doin kit qui hpc tgp eia sinh vien Dgi hpc Purvanchal, An Df Nghien cdu eua (Bukralia et ai, 2012) di dl xuit sd dung cic ky thugt may hpc nhu mang noron, hdi quy logistic (Hastie et ai, 2001), ciy quylt dinh (Breiman et ai, 1984), (Quinlan, 1993), miy hpc vec-ta hd ttg SVM (Vapnik, 1995) dl dy doan ket qua hpc tgp cua sinh vidn theo hf dio tgo tir xa cua Dai hpc Midwest, Hoa Ky Cd the thiy dugc ring, cic nghidn curu tren diy deu t£^ trung vio dy doin ket qui hgc tap, dy doan diem mdn hpc Nghien ciiu ciia chiing tdi de xuat kbdng di theo hudng dy doin chinh xic ket qua bge tip Chiing tdi quan tim den phdt hifn mdn hpc quan Ugng inh hudng din kit qui hpc tip eia sinh vien nganh CNTT dya tten giii thu§t hpc rdng ngiu nhien GIAI THUAT RlTNG NGAU NHIEN Tiep can riing ngau nhien (Breiman, 2001) dua li mdt ttong nbOng phuong phip tgp hgp md hinh cdng nhit Giii thuit rime ngiu nhien tgo mdt tgp hgp cie ciy quyet dinh • Tgp chi Khoa hgc TrudngDgi hgc Cdn Tha Phdn A Khoa hgc Tv nhiin Cong ngh? vd Mdi trudng: 33 (2014): 49-57 (Breiman et ai, 1984), (Qumlan, 1993) khdng cit nhanh, mdi ciy dugc xay dyng ttdn tip mau bootsttap (liy miu cd hoin lgi tir tip hpc), tgi m6i mit phan hogch tdt obit dugc thyc hifn tu vifc chpn ngiu nhidn mdt tip cic thudc tinh, Loi tong quit cua rdng phu thupc vao dp chirdi xac ciia tiing ciy thinh vidn ttong rdng vi sy phu thudc ldn giua cic ciy thdnh vidn Giii thu^t rdng ngau nhien xiy dyng cay khdng cit nhinh nhim giii cho tiianh phin l6i bias tiiip (tiianh phin ldi bias li thdnh phin ldi ciia gidi thuit hpc, nd ddc lip vdi tap dir lifu hpc) v i diing tinh ngiu nhien dh dieu khidn tinh tuong quan thip giiia cie ciy ttor^ rimg Tiep cin rung ngiu nhien cho dd chinh xic cao so sanh vdi cie thuit toan hpc ed giim sit hifn Nhu trinh bay ttong (Breiman, 2001), rimg ngiu nhien hpc nhanh, chju dyng nhieu tdt va khdng bj tinh ttang hpc vet Giii thuit rdng ngau nhidn sinh md hinh cd dp chinh xac cao dip ung dugc ydu ciu thyc tien cbo van de phan loai, hdi qui 3.1 Giai thuit xay dung riing ngiu nhien Giii thu|it may hgc riing ngiu nhien (Hinh 1) cd the duge trinh biy ngan ggn nhu sau: Tii tap du Ufu hpc LS cd m phin td v i n bien (thuOe tinh), xay dyng T ciy quylt dinh mdt each dfc lip Md hinh ciy quydt dinh thu t dugc xiy dyng tren tgp miu Bootsttap thd / (lay miu m phin tii cd hoin lgi tii tip hgc LS) - Tai ndt ttong, chpn ngau nhien n' bien (n ' « / i ) vd tinh toan phan hogch tdt nhat dya tten n'bidn - Ciy dugc xiy dung ddn dp siu tdi da khdng citnhanh Kdt thiic qua trinh xiy dyng T md hinh ca sd, tiling chien lupc binh chpn so ddng dk phan ldp mdt phan tii mdi den X iy^ 'Y^ -yiX ?,(x) iSM^smi^s^iSiE imtsm^^S^ Hinh 1: Giai thuat riing ngau nhien 3.2 Riit trich die trung Rut ttich die tnmg quan ttgng duge thyc hien ttong huin luyfn md hlnb cua rimg ngiu nhidn M5i budc /, sd dung tip Bootstrapi (liy miu cd hoan lgi m phin td td tgp huin luyfn LS) dk xay dyng md hinh ciy quylt dinh ca sd thii / (DT,) ttong rimg ngiu nhien; giai thuat liy tip Out-OfBootsttapt, OOBt (cae phin tu Uong tap du lifu huin luyen LS nhung khdng nam Uong tip Bootstrapi) lam tip kiem Ua dl tinh dd chinb xac phin ldp ciia ciy DTi ttong rimg ngau nhidn T^ chi Khoa hgc TrudngDgi hgc Cdn Tha Phdn A: Khoa hgc Tu nhiin, Cdng nghi vd Mdi trudng: 33 (2014): 49-57 Thudc tinh quan ttgng dugc hidu li tiiudc ti'nh lim dnh hudng rit nhieu den kdt qui phan ldp eua rimg ngiu nhifin Cu thi la ndu ed nhiing tiiay ddi (hoan vi cic gia tri eiia thupc tinh) thi dp chinh xac phin ldp ciia rimg ngiu nhien hi giim nhieu so vdi ehua tie dpng lam thay ddi thufc tinh Vifc thyc hien cac tmh toan dk xac djnh thudc tinh quan ttpng rimg ngau nhien nhu sau Khi xiy dyng ciy thu t hpc tii tip Bootstrapi,ti'nhdp ehinh xdc ciia ciy DTt su dung tip OOB, (Out-OfBootstrap), la Acc(DTi, OOB^ Lan lugt thyc hien hoan vj gii Ui ciia timg thudc tinh thii i ciia tap OOBi, Id OOBi(rand(i)) Tinh lgi dp chinh xac ciia cay DTi su dyng tap OOB,(rand(i)), Acc(DT,, OOB, (rand(l)) TiSp ddn, can tinh lgi sy khac bift ve dp ehinh xac tiudc vi sau hoin vi eae gia tri cua thupc tinh thd / ciia ciy DT, Vdi cac thupc tinh 1=1,2, , k, chiing ta cd: Aacc,,! = Acc(DT,, OOBi) OOB,(rand(I))) Acc(DTi, Aacc,^ = Acc(DT,, 00B,(rand(2))) OOB,) Acc(DT,, Aaccyt = Acc(DT[, 00B,(rand(k))) OOB,) Acc(DT,, Vdi md hmh rdng ngiu nhien RF cd T eay, chiing ta ed duge tong sy khac bift vl ehinh xac ttudc vi sau hoan vi cac gia tti ciia thudc tinh eua rimg ngiu nhien RF Id: thu^ctinbl: a.\ - Aacci.i -r Aacc2,i +AaccT.i thufc ti'nh 2: Ui - Aaeci^ -r Aaccj.: +AaccT.2 tiiufc ttnh k: uk = Aaeci^ -r Aacc2,k +AaecT,k + Sip xep 0/, a;, , at theo thd ty giim dan, dilu niy ddng nglua vdi thii ty tdng sy khic bift vl df chinh xac trudc va sau hoin vj cac gii tri cua cac thudc ti'nh Su khic bift cing Imi tiii tiiudc tinh tuong ling cing quan ttpng Tu -y tudng niy, chiing ta thyc hien nit ttich mdn hpc quan ttpng anh hudng din kit qui hpc tip ciia sinh vidn nganh CNTT Chung ta cd thi xem smh vien nhu la ddng (miu tin, phin td cua dir lifu), cic mdn hpc ciia sinh vien xem nhu thudc tinh (cdt, trudng) vi ket qui xdp logi hoc tip trudng co the xem Id ldp (nhan) Nhu viy, dfi lifu hpc tap ciia sinh vien chinh li bing dii lifu Chung tdi su dung riing ngiu nhien hpc dd phan logi sinh vien Trong qua ttinh xay dung md hinh hpc, rimg ngiu nhien thyc hifn nit trich cac indn hpc (thudc tinh) quan trpng nhu vira dugc md ti Cd the di^n giii rang nhiing mdn hpc quan trpng dugc rut trich tii md hinh hpc rdng ngau nhien li nhiing mdn hgc lim anh hudng rat ldn din kit qui phin logi hgc tip cda sinh vien KET QUA THV'C NGHIfiM Trong phin thyc nghidm, chiing tdi tien hanh thu thgp du lifu ket qui hgc tip cua sinh vidn t^l phong dao tao, Trudng DHCT, Dii lifu tiiu tii|lp bao gdm kdt qui hgc tap ciia sinh vien ngdnh CNTT tiiudc eae khda tu 20 din 29 (tuyln sinh tir nim 1994 din 2003) Cae khda tir 30 ttd vl sau dugc dieu ehinh bdi quy chl dao tgo tin chi sd dung phuang phap dinh gii kit qui hgc tip theo tiiang dilm chii (A, B, C,„) ngn dii lieu khdng ddng nliit, sd lugng dtt lifu thu thgp da du ldn dl nghien ciiu nen chiing tdi khdng thu thgp dii lifu eic khoa niy Du lifu thu thip duge c6 dgng ciu true bdng, dugc td chuc theo tiing hgc 1^ nam hpc Mdi hgc ky nam hgc cd cac tip tin dii lifu nhu; diem (luu dilm sinh vidn), dtolng (luu sinh vien tit nghifp), ctdt (luu chuong trinh dio tgo) va cdc tip tin dii lifu khic tgi hgc k^- dd Ben cgnh do, cdn CO eic tap tin luu trii thdng tin didn giii cie mi s6~ eua hf tiling nhu: hg tdn smh vidn, ten mdn hpc, ten nganh, Mdi tip tin diem chiia eic thdng tin: rai sd sinh vien, ma sd mdn hgc (tuy chpn v i bit bupc), dilm thi ciia cac mdn hpc, m i sinh vidn tham gia hpc vao timg hpc ky (Hinh 2) Phan A: Khoa hgc Tif nhien, Cong nghe vd Mdt trudng- 33 (2014): 49-57 Tgp chi Khoa hgc Trudng Dgi hgc CSn Tha 1 » ]> F mawlilF nanhJF (fienfat F I O | F cfieml | F cliem2|F diemlll i M5 : TM002C i04 i :S0 ' ML101C :04 1380001 •70 ; j ^°etsm !VLOD2C 04 ; 7.0 : : 1980001 'THtraiC 04 ;ao ( •1330001 iHHD02C 04 ; ao : ; 1380001 jTNOOGC 04 • ; 5.0 : ; 13aomi JTH004C 04 : RO : \ 1380001 :VL092C 04 ; 5.0 : i 1900002 ITN002C 04 : 6.5 • ; 1380002 iMLIOIC 04 ; : 4.0 ^ I 1380002 ;VL002C 04 ! ! = i lffi0002 iTHOOIC 04 1380Ca2 :HH002C 104 \ ' ; 4-5 : \ ; ' &0 • I98aro2 ;TNDOH: 04 1980002 "'IfHCSMC 04 : 13.0 i j ISBOroz 1VL092C m i 6.0 : i9Bdm3" ' " i f N o c ^ m j40 „J U5 1990003 iMLloic VoV 1990003 VL002C 104 ISO 1990003"" TH001C 04 i ,5.0 iHH052C 04 i 1980003 '40 i 1990003 TN006C ;04 1 5.Q ! ! 1380003 TH004C -04 150 ; 1380003 VL092C 104 !50 i L.„ Hinh 2: Cau true tap tin diem Tap tin dtotng chira thdng tin ket qud xep loai tdt nghifp ciia sinh vien bao gdm: ma so sinh vien, ma nganh hpc, diem tdt nghiep, xep logi tdt nghiep, (Hinh 3) fBJf'Dtotng F masv 1950233 F mans 56 F mabedt F nhhk iVdiWi'"" 56 56 1960216 1970423 '"'" 56 56 19704S3 1970513 56 56 1970525 iSTOMS 56 1970554 56 19S0515 56 19B05-'l 56 1980522 56 19S0551 56 19S0561 56 1980SSI 56 19S0597 56 F dtbtn 5.61 5.44 5,46 6.12 5.97 6.24 5.71 6.60 6.07 6.17 6.33 6.06 6.03 6J20 6.04 6.65 F xeploai Tning_Bliih Trong Binh Tnm^Blsh Tbinh^IOiS Trung Btnh TbinluKii^ Tnmg TWnti Tbinh Kh£ TbJnh_Kha TbinlLKhi Tbinh K h i Tbinh Khf n>inh_Khi Tblnh_KhS T>itTiti_ K h i j n)lnh_Kli4 F dottn 30074_ 0 ^ , 30064' 30084 30074" 30064 30074 30064 30074 30074 30084 300S4 300S4 30074 30074 30064 Hinh 3: Cau true tip tin didm tot nghifp Ngoai ra, chiing tdi cdn sd dung tap tm ctdmh (luu ten mdn hpc) dl diln gidi ten mdn hoc tir ma sd mdn hpc (Hinh 4) F tenrnhvn F mamh TH343C Do in sidu hoc - Tin hoc TH344C X i K- tin hieu s5 TH345CJ Ca sa vien thong TH346C TT JK- Ihtiat & vi XEt KTH347C Dignt&cdng suSi&iitig dnng TH34SC TT.Di6n tn c6ng snat &i!nf dung TH349C _ Ky thuat Audio & video THSSOC " Ati-ten & tniy^ sdng TH351C £)d 5n in6n hoc - D i ^ tir F dvhl iF ts I F it| 21 30: "l 31 45; 45? 4i 60| 45i Ol X 45 230! 30' J 1_ 301 _l" 5' 75i 75^ ""451'" 451 ; " 2; 30] _0j Hinh 4: Cau trie tap tin mdn hpc Tgp chi Khoa hgc Trudng Dgi hgc Cin Thcr Phdn A Khga hgc Tit nhien Cong nghi va Moi tmdng: 33(2014): 49-57 * Ben canh do, chung toi ciing tim hiSu plmong pliap tinh dilm hoc tap cua sinh vien tk xip ioai tdt nghiep BiSm t6t nghi$p (BTN): la trung bmh c6 trgng s« ciia dijm cac m6n hgc da tich % tinh ffln thoi diem xet (khong bao gom^ cac mon hoc dieu ki?n va cac mon hpc b, diem F) Cong thuc nhu sau: n ZJ ^ J ^ / -^ ^^ * V