DSpace at VNU: A feature-based opinion mining model on product reviews in Vietnamese tài liệu, giáo án, bài giảng , luận...
Radoslaw Katarzyniak, Tzu-Fu Chiu, Chao-Fu Hong, and Ngoc Thanh Nguyen (Eds.) Semantic Methods for Knowledge Management and Communication Studies in Computational Intelligence, Volume 381 Editor-in-Chief Prof Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul Newelska 01-447 Warsaw Poland E-mail: kacprzyk@ibspan.waw.pl Further volumes of this series can be found on our homepage: springer.com Vol 359 Xin-She Yang, and Slawomir Koziel (Eds.) Computational Optimization and Applications in Engineering and Industry, 2011 ISBN 978-3-642-20985-7 Vol 360 Mikhail Moshkov and Beata Zielosko Combinatorial Machine Learning, 2011 ISBN 978-3-642-20994-9 Vol 361 Vincenzo Pallotta, Alessandro Soro, and Eloisa Vargiu (Eds.) Advances in Distributed Agent-Based Retrieval Tools, 2011 ISBN 978-3-642-21383-0 Vol 362 Pascal Bouvry, Horacio González-Vélez, and Joanna Kolodziej (Eds.) Intelligent Decision Systems in Large-Scale Distributed Environments, 2011 ISBN 978-3-642-21270-3 Vol 363 Kishan G Mehrotra, Chilukuri Mohan, Jae C Oh, Pramod K Varshney, and Moonis Ali (Eds.) Developing Concepts in Applied Intelligence, 2011 ISBN 978-3-642-21331-1 Vol 364 Roger Lee (Ed.) Computer and Information Science, 2011 ISBN 978-3-642-21377-9 Vol 365 Roger Lee (Ed.) Computers, Networks, Systems, and Industrial Engineering 2011, 2011 ISBN 978-3-642-21374-8 Vol 366 Mario Köppen, Gerald Schaefer, and Ajith Abraham (Eds.) Intelligent Computational Optimization in Engineering, 2011 ISBN 978-3-642-21704-3 Vol 367 Gabriel Luque and Enrique Alba Parallel Genetic Algorithms, 2011 ISBN 978-3-642-22083-8 Vol 368 Roger Lee (Ed.) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2011, 2011 ISBN 978-3-642-22287-0 Vol 369 Dominik Ry_zko, Piotr Gawrysiak, Henryk Rybinski, and Marzena Kryszkiewicz (Eds.) Emerging Intelligent Technologies in Industry, 2011 ISBN 978-3-642-22731-8 Vol 370 Alexander Mehler, Kai-Uwe Kühnberger, Henning Lobin, Harald Lüngen, Angelika Storrer, and Andreas Witt (Eds.) Modeling, Learning, and Processing of Text Technological Data Structures, 2011 ISBN 978-3-642-22612-0 Vol 371 Leonid Perlovsky, Ross Deming, and Roman Ilin (Eds.) Emotional Cognitive Neural Algorithms with Engineering Applications, 2011 ISBN 978-3-642-22829-2 Vol 372 Ant´onio E Ruano and Annam´aria R V´arkonyi-K´oczy (Eds.) New Advances in Intelligent Signal Processing, 2011 ISBN 978-3-642-11738-1 Vol 373 Oleg Okun, Giorgio Valentini, and Matteo Re (Eds.) Ensembles in Machine Learning Applications, 2011 ISBN 978-3-642-22909-1 Vol 374 Dimitri Plemenos and Georgios Miaoulis (Eds.) Intelligent Computer Graphics 2011, 2011 ISBN 978-3-642-22906-0 Vol 375 Marenglen Biba and Fatos Xhafa (Eds.) Learning Structure and Schemas from Documents, 2011 ISBN 978-3-642-22912-1 Vol 376 Toyohide Watanabe and Lakhmi C Jain (Eds.) Innovations in Intelligent Machines – 2, 2011 ISBN 978-3-642-23189-6 Vol 377 Roger Lee (Ed.) Software Engineering Research, Management and Applications 2011, 2011 ISBN 978-3-642-23201-5 Vol 378 János Fodor, Ryszard Klempous, and Carmen Paz Suárez Araujo (Eds.) Recent Advances in Intelligent Engineering Systems, 2011 ISBN 978-3-642-23228-2 Vol 379 Ferrante Neri, Carlos Cotta, and Pablo Moscato (Eds.) Handbook of Memetic Algorithms, 2011 ISBN 978-3-642-23246-6 Vol 380 Anthony Brabazon, Michael O’Neill, and Dietmar Maringer (Eds.) Natural Computing in Computational Finance, 2011 ISBN 978-3-642-23335-7 Vol 381 Radoslaw Katarzyniak, Tzu-Fu Chiu, Chao-Fu Hong, and Ngoc Thanh Nguyen (Eds.) Semantic Methods for Knowledge Management and Communication, 2011 ISBN 978-3-642-23417-0 Rados law Katarzyniak, Tzu-Fu Chiu, Chao-Fu Hong, and Ngoc Thanh Nguyen (Eds.) Semantic Methods for Knowledge Management and Communication 123 Editors Prof Rados l aw Katarzyniak Prof Chao-Fu Hong Institute of Informatics Wroc law University of Technology Str Wybrzez˙ e Wyspia´nskiego 27 50-370 Wroc law, Poland E-mail: radoslaw.katarzyniak@pwr.wroc.pl Department of Infomation Management Aletheia University No 32, Chen-Li Street Tamsui District, New Taipei City, Taiwan, R.O.C E-mail: au4076@au.edu.tw Prof Tzu-Fu Chiu Department of Industrial Management & Enterprise Information Aletheia University No 32, Chen-Li Street Tamsui District, New Taipei City, Taiwan, R.O.C E-mail: chiu@mail.au.edu.tw ISBN 978-3-642-23417-0 Prof Ngoc Thanh Nguyen Institute of Informatics Wroc law University of Technology Str Wybrzez˙ e Wyspia´nskiego 27 50-370 Wroc law, Poland E-mail: thanh@pwr.wroc.pl e-ISBN 978-3-642-23418-7 DOI 10.1007/978-3-642-23418-7 Studies in Computational Intelligence ISSN 1860-949X Library of Congress Control Number: 2011935117 c 2011 Springer-Verlag Berlin Heidelberg This work is subject to copyright All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer Violations are liable to prosecution under the German Copyright Law The use of general descriptive names, registered names, trademarks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use Typeset & Cover Design: Scientific Publishing Services Pvt Ltd., Chennai, India Printed on acid-free paper 987654321 springer.com Preface Knowledge management and communication have already become to be vital research and practical issues studied intensively by highly developed societies These societies have already utilized uncountable computing techniques to create, collect, process, retrieve and distribute enormous volumes of knowledge, and created complex human activity systems involving both artificial and natural agents In these practical contexts effective management and communication of knowledge has become badly needed to keep human activity systems ongoing Unfortunately, the diversity of computational models applied in the knowledge management field has led to the situation in which humans (the end users of all artificial technology) find it almost impossible to utilize own products in the effective way To cope with this problem the concept of human centered computing, strongly combined with computational collective techniques, and supported by new semantic methods has been developed and put on the current research agenda by main academic and industry centers In this book many interesting issues related to the above mentioned concepts are discussed in a rigorous scientific way and evaluated from practical point of view All chapters in this book contribute directly or indirectly to the concept of human centered computing in which semantic methods are key factor of success These chapters are extended versions of oral presentations presented during the 3rd International Conference on Computational Collective Intelligence - Technologies and Applications ICCCI 2011 (21–23 September 2011, Gdynia, Poland) and the 1st Polish-Taiwanes Workshop on Semantic Methods for Knowledge Discovery and Communication (21–23 September 2011, Gdynia, Poland), as well as individual contributions prepared independently from these two scientific events September 2011 Radosław Katarzyniak Tzu-Fu Chiu Chao-Fu Hong Ngoc Thanh Nguyen Contents Part I: Knowledge Processing in Agent and Multiagent Systems Chapter 1: A Multiagent System for Consensus-Based Integration of Semi-hierarchical Partitions - Theoretical Foundations for the Integration Phase Radosław P Katarzyniak, Grzegorz Skorupa, Michał Adamski, Łukasz Burdka Chapter 2: Practical Aspects of Knowledge Integration Using Attribute Tables Generated from Relational Databases Stanisława Kluska-Nawarecka, Dorota Wilk-Kołodziejczyk, Krzysztof Regulski 13 Chapter 3: A Feature-Based Opinion Mining Model on Product Reviews in Vietnamese Tien-Thanh Vu, Huyen-Trang Pham, Cong-To Luu, Quang-Thuy Ha 23 Chapter 4: Identification of an Assessment Model for Evaluating Performance of a Manufacturing System Based on Experts Opinions Tomasz Wi´sniewski, Przemysław Korytkowski 35 Chapter 5: The Motivation Model for the Intellectual Capital Increasing in the Knowledge-Base Organization Przemysław R´oz˙ewski, Oleg Zaikin, Emma Kusztina, Ryszard Tadeusiewicz 47 Chapter 6: Visual Design of Drools Rule Bases Using the XTT2 Method Krzysztof Kaczor, Grzegorz Jacek Nalepa, Łukasz Łysik, Krzysztof Kluza Chapter 7: New Possibilities in Using of Neural Networks Library for Material Defect Detection Diagnosis Ondrej Krejcar Chapter 8: Intransitivity in Inconsistent Judgments Amir Homayoun Sarfaraz, Hamed Maleki 57 67 81 VIII Contents Part II: Computational Collective Intelligence in Knowledge Management Chapter 9: A Double Particle Swarm Optimization for Mixed-Variable Optimization Problems Chaoli Sun, Jianchao Zeng, Jengshyang Pan, Shuchuan Chu, Yunqiang Zhang 93 Chapter 10: Particle Swarm Optimization with Disagreements on Stagnation 103 Andrei Lihu, S¸tefan Holban Chapter 11: Classifier Committee Based on Feature Selection Method for Obstructive Nephropathy Diagnosis 115 Bartosz Krawczyk Chapter 12: Construction of New Cubature Formula of Degree Eight in the Triangle Using Genetic Algorithm 127 Grzegorz Kusztelak, Jacek Sta´ndo Chapter 13: Affymetrix Chip Definition Files Construction Based on Custom Probe Set Annotation Database 135 Michał Marczyk, Roman Jaksik, Andrzej Pola´nski, Joanna Pola´nska Part III: Models for Collectives of Intelligent Agents Chapter 14: Advanced Methods for Computational Collective Intelligence 147 Ngoc Thanh Nguyen, Radosław P Katarzyniak, Janusz Sobecki Chapter 15: Identity Criterion for Living Objects Based on the Entanglement Measure 159 Mariusz Nowostawski, Andrzej Gecow Chapter 16: Remedial English e-Learning Study in Chance Building Model 171 Chia-Ling Hsu Chapter 17: Using IPC-Based Clustering and Link Analysis to Observe the Technological Directions 183 Tzu-Fu Chiu, Chao-Fu Hong, Yu-Ting Chiu Chapter 18: Using the Advertisement of Early Adopters’ Innovativeness to Investigate the Majority Acceptance 199 Chao-Fu Hong, Tzu-Fu Chiu, Yuh-Chang Lin, Jer-Haur Lee, Mu-Hua Lin Contents IX Chapter 19: The Chance for Crossing Chasm: Constructing the Bowling Alley 215 Chao-Fu Hong, Yuh-Chang Lin, Mu-Hua Lin, Woo-Tsong Lin, Hsiao-Fang Yang Chapter 20: Visualization of the Technological Evolution of the DVD Business Ecosystem 231 Yan-Ru Li Part IV: Models and Environments for Human-Centered Computing Chapter 21: Discovering Students’ Real Voice through Computer-Mediated Dialogue Journal Writing 241 Ai-Ling Wang, Dawn Michele Ruhl Chapter 22: The ALCN Description Logic Concept Satisfiability as a SAT Problem 253 Adam Meissner Chapter 23: Embedding the H EA RT Rule Engine into a Semantic Wiki 265 Grzegorz Jacek Nalepa, Szymon Bobek Chapter 24: The Acceptance Model of e-Book for On-Line Learning Environment 277 Wei-Chen Tsai, Yan-Ru Li Chapter 25: Human Computer Interface for Handicapped People Using Virtual Keyboard by Head Motion Detection 289 Ondrej Krejcar Chapter 26: Automated Understanding of a Semi-natural Language for the Purpose of Web Pages Testing 301 Marek Zachara, Dariusz Pałka Chapter 27: Emerging Artificial Intelligence Application: Transforming Television into Smart Television 311 Sasanka Prabhala, Subhashini Ganapathy Chapter 28: Secure Data Access Control Scheme Using Type-Based Re-encryption in Cloud Environment 319 Namje Park Chapter 29: A New Method for Face Identification and Determing Facial Asymmetry 329 Piotr Milczarski X Contents Chapter 30: 3W Scaffolding in Curriculum of Database Management and Application – Applying the Human-Centered Computing Systems 341 Min-Huei Lin, Ching-Fan Chen Chapter 31: Geoparsing of Czech RSS News and Evaluation of Its Spatial Distribution 353 Jiˇr´ı Hor´ak, Pavel Belaj, Igor Ivan, Peter Nemec, Jiˇr´ı Ardielli, Jan R˚uzˇiˇcka Author Index 369 Part I Knowledge Processing in Agent and Multiagent Systems 342 M.-H Lin and C.-F Chen diffusion of content knowledge cannot succeed because of the difficult communication between students and teachers usually Data modeling and database development is usually a required and core curriculum in department of MIS Its content knowledge contains entity-relationship model, relational data model, SQL, normalization etc [3, 5, 8] The researcher has taught this course since 13 years ago, many students learned the database analysis and design hard but not well When they attended the project analysis and implementation course, they eventually sensed that the insufficient ability for developing database will influence the progress and quality of developing and implementing information systems In this study, the researcher investigates the teaching and learning database modeling and developing course in department of MIS in Aletheia University Teachers play the role of innovator and plan scheduled progress in courses, they make use of the taxonomy of Bloom’s educational objectives in cognitive domain to precisely specify learning objectives, activities and assessments Every new or unknown course unit is an innovative product for students, and students are divided into leading adopters and underachievement majority according to the time they need to achieve knowledge cognition level And we try to use the 3W approach, ‘When, Who, What’, to collect students learning data through data mining and text mining technology and analyze data by using human-centered computing systems (Hong 2009), our goal is to extract the features of students knowledge clusters and discover when is the proper time, who needs assistance and what scaffolding they need, to support students learning effectively Literature Review 2.1 Diffusion of Innovation Rogers defined the diffusion process as the process of a new idea spread to end users or users in 1995, his experiment of diffusion of innovation model divided individuals into five kinds of groups according the time of accepting innovation, innovators, early adopters, early majority, late majority, and laggards (Fig.1) He also notified us unceasingly that only about 20% of early adopters may accept innovation whenever new products emerge, because they have some characteristics for accepting innovation quickly and early, use and develop the useful and creative value of innovation And Moore indicated that there exists a large chasm between early adopters and early majority in ‘inside the tornado’, if not identifying the early adopters and their creative value, the innovation diffusion will be terminated In this study, students are divided into leading adopters and underachievement majority according to the time they need to achieve knowledge cognition level, and try to extract the features of clusters of students and let the chasm appear When teachers discover the chasm, they have chance to build a platform to support teachers and students crossing chasm When the underachieved majority may achieve more, the diffusion of innovation may happen 3W Scaffolding in Curriculum of Database Management and Application 343 Fig Clusters of students and chasm in Learning and Teaching 2.2 Scaffolding Instruction Scaffolding instruction theory is originated from the constructivism, it emphasized that one’s knowledge is constructed by individual, and the one’s constructive process is through the interaction with others in social systems The meaning of scaffolding extends to instruction, it stands that adults or teachers provide temporary scaffolding or support form to assist learners to develop their learning ability Such assistance is the same as the scaffolding built when buildings established, reinforced or beautified (Wood et al 1976) The key point of implementing scaffolding instruction is whenever the adults provide assistance they can really discover the zone of proximal development of learners, they just can provide appropriate assistance at the right moment 2.3 Revised Taxonomy of Bloom’s Educational Objectives in Cognitive Domain Bloom’s Taxonomy has, in the past, provided a foundation for developing learning objectives designed for learners to acquire knowledge The taxonomy system is not only an assessment tool but also a common specification used for teachers when they designate learning objectives In the revised taxonomy in cognitive domain, educational objectives are divides into knowledge dimension and cognitive process dimension, the former helps teachers distinguish what to teach, the latter aims to promote the retention and transfer of knowledge learned by students In the knowledge dimension, there are four classifications of knowledge, factual knowledge, conceptual knowledge, procedural knowledge and metacognitive knowledge The cognitive process dimension contains remember, understand, apply, analyze, evaluate and create [1, 2] The two-way taxonomy table is listed in Table 1, teachers usually fill learning objectives, learning activities, assessments etc in the taxonomy table 344 M.-H Lin and C.-F Chen Table Taxonomy table of revised taxonomy of Bloom’s educational Objectives in cognitive domain Knowledge Dimension Cognitive process dimension Factual Knowledge Activity Remember Understand - Apply Analyze - Activity Evaluate Create - - - - - Evaluation Conceptual Knowledge - Object 1~2 Activity 2~3 Evaluation Procedural Knowledge - Evaluation Activity 3~4 - - Metacognitive Knowledge - Evaluation - - - - - Evaluation - Methodology In this study, we applied innovation diffusion model and the human-centered computing system, and proposed the U-framework for crossing the chasm in teaching and learning The experimental subject is the sophomores of the department of MIS in Aletheia University in the Database management and application course And we try to use the 3W approach, ‘When, Who, What’, to investigate how to find who needs what assistance when they need under the scheduled progress in this course by analyzing data collected from students 3.1 Framework of Using Scaffolding to Cross Chasm of Teaching and Learning The framework is depicted as Fig.2, there are three roles defined in the innovation diffusion social system, i.e teachers, leading adopters and underachieved majority At first, teachers play innovators, they design materials, assessments, activities and instructional scheduled progress, classify the course knowledge and related cognitive process, and then start teaching Whenever a new unit is being conducted, the new unit is an innovation to students Students still practice their daily life, attend class, ask questions, discuss, consult references, teach peers, assignments, and undergo tests Then the teaching and learning is proceeding, teachers continuously gather the students learning data, and use data mining technology to extract the various students clusters of various knowledge cognition from these learning data Students are divided into leading adopters and underachievement majority according to the time they need to achieve knowledge cognition level, the leading adopters’ actual development level is almost consistent with teachers defined, and the underachieved majority’s actual development level is apparently incomplete Teachers can recognize the features of every cluster and design appropriate scaffolding for every cluster, and students can acquire needed learning support After the scaffolding activities finished, teachers observe the transformation trend of students’ actual development level and potential development level 3W Scaffolding in Curriculum of Database Management and Application 345 Fig U-framework for crossing chasm 3.2 Human-Centered Computing Systems – Extract the Features of Clusters of Students In this study, we consider the process that underachieved majority accept the content knowledge from the view of scaffolding instruction So teachers collect the students learning data and perform data analysis at first to extract the actual development level of students, and generate the associative network of students’ knowledge cognition The interactive steps are as follows: Step 1: Data preprocess 1-1H 1) Teachers define the key words and concept terms according to content knowledge and cognition 1-1C 2) Teachers specify a time period and course unit, and then retrieve the required students learning data from the database on Moodle 1-2H) Teachers recognize the students learning data by their domain knowledge, and then tag words, eliminate meaningless words, and append the concept label with words Step 2: Words co-occurrence analysis 2-1C) The associative value of two words can be computed as formula It is based on their co-occurrence in the same sentence ( assoc(Wi ,W j ) = ∑ Wi s , W j s∈D ‘H’ means performing by human ‘C’ means doing by computer s ) (1) 346 M.-H Lin and C.-F Chen where Wi and Wj is the ith word and the jth word; s denotes a sentence and is a set of words; D is a set of sentences and includes all the students learning data |Wi|s and |Wj|s denote the frequency of words Wi and Wj occurred in the sentence s 2-2C) The result of co-occurrence analysis will be visualized to co-occurrence associative graph 2-1H) The co-occurrence associative graph can help teachers to recognize the concepts and categories inside it, and stimulate teachers preliminary understand the association of students knowledge clusters appeared from learning data 3.3 Learning Content Knowledge In this study, we use the case about developing the database of mp3 music download website, and try to understand whether the students’ abilities about understanding and executing conversion of entity-relationship diagram (ERD) and relational database schema are achieved Content Knowledge Classification on the revised Bloom’s taxonomy in cognitive domain Teachers classify the content knowledge based on the revised Bloom’s taxonomy in cognitive domain, and list these dimensions in Table Table Content Knowledge Classification Knowledge Di- Content mension Factual Knowl- entity, relationship, simple attribute, single-value attribute, key attribute, muledge ti-valued attribute, compounded multi-valued attribute, table, primary key, foreign key Conceptual Knowledge entity-relationship diagram, cardinality, relational database diagram ERD can be transformed to table: 1.regular entity type is converted to table, 2.multi-valued attribute is converted to table, 3.many-to-many relationship type is converted to table, 4.one-to-many and one-to-one relationship type are converted to field Procedural Knowledge The method and steps about convert ERD to table: 1.regular entity type: add the simple and single-valued attributes, and choose one key attribute to act as primary key, the other key attributes are served as unique key; 2.multi-valued attribute: add the primary key of original entity’s table(one set of foreign key) and the multi-valued attribute, both fields forms primary key; 3.many-tomany relationship type: add two participated entities’ primary keys(two sets of foreign keys), both two is primary key; one-to-many and one-to-one relationship type (others omitted) Metacognitive Knowledge - Learning Activities and Learning objectives We provide the ERD (drawn via ER Assistant) and relational database diagram (MS SQL Server 2008) depicted in Fig.3 to students, and the objectives and activities are (listed in Table 3) designed that students should express the relation between the ERD and database diagram 3W Scaffolding in Curriculum of Database Management and Application 347 Fig Entity-relationship diagram and relational database diagram Table Learning objectives and learning activities Objectives objective1 objective2 Activities Understand the concept of conversion between ERD and relational database schema activity1 Teachers explain the activities Execute the converting ERD to relational database schema activity2 Differentiate essential information Teachers show the ERD and correspondent database diagram Teachers tell students that they need to use the knowledge about entity-relationship model and relational data model to describe how the assigned items in database diagram are correspondent to which part of ERD Teachers inquire students the learning tasks, (1) What I need to do? (2)What should I need to know? to let students focus on differentiating essential information of entity-relationship model and relational data model by comparing two pictures activity How about tNo of member table activity How about download table activity How about songSinger table activity How about songBillboard table activity How about cId of billboard table activity How about songCategory table 348 M.-H Lin and C.-F Chen And the learning objectives and learning activities are recorded properly in the taxonomy table (Table 4) In activity 1, teachers explain the following activities and tasks that students should implement After activity2, teacher will proceed activity 3~8 and record their responses of students in database on Moodle Table Learning objectives and learning activities in taxonomy table Knowledge dimension Factual Conceptual Cognitive process dimension remember activity 3~8 understand objective1 apply - activity 3~8 Procedural - - Metacognitive - - objective2 analyze activity2 - evaluate create - - - - - - activity 3, - - - - Data Analysis and Findings After the instructional implementation was completed, we retrieved related students’ data from Moodle and started using the human-centered computing system to analyze these data We tried to visualize data into associative graph to find the clusters within leading adopters and the clusters within underachieved majority These clusters of students were linked to various content knowledge and misconceptions, so we could understand what a cluster learned and underachieved In the following works, we’ll try to design scaffolding for various clusters to assist their learning 4.1 Data Resource The experimental data were collected from 20 students in the practice course of database management and application on 10 March, 2011, and the learning objectives and activities have been described in ‘3.3 Learning Content Knowledge’ 4.2 Human-Centered Computing Phase: Extract the Knowledge Clusters of Students Based on the framework of integrating human-intelligence, data mining and text mining technology, teachers read students learning data detailed, and tokenize words (Fig.4), and define three conceptual labels according to content knowledge in Table2 and Table 4, they are [one-to-many relationship to field], [many-to-many relationship to table] and [multi-valued attribute to table], the three conceptual labels are belonged to the category-{ERtoTable_C} After reading students data, teachers define five wrong conceptual labels according to the wrong responses answered by students in addition, they are [entity to foreign key], [fail to convert one-to-many relationship], [fail to convert many-to-many relationship], [fail to convert multi-valued attribute] 3W Scaffolding in Curriculum of Database Management and Application 349 Fig Tokenization - key words for students’ learning data and teachers’ data3 Fig Append the category and conceptual labels with words and [meaningless conversion], the five conceptual labels are belonged to the category{ERtoTable_W} And then, words with similar meanings with one concept are belonged to the concept, and append the concept label with the words (Fig.5) And then, through computing the frequency and co-occurrence of terms, the associative graph will be generated (Fig.6) In the associative graph, we separate two divisions, top and bottom, according to the achievement and underachievement of learning In the bottom of the associative graph, the content knowledge conceptual graph is defined by teachers, the header of these blocks is yellow and their body is blue In the top of the graph, the partial content knowledge conceptual graph is presented by students’ data, the header of these blocks is pink and their body is white, white color means empty and underachievement From the associative graph, we can see the status of connection between Student1 and conceptual blocks, and find that there is some difference between the Student1’s knowledge cognition and teachers’ The student has understood the concept about one-to-many relationship converted to a set of fields, but she made mistake when execute such a conversion, she answered ‘entity to foreign key’ When teachers observe the feature of the students, they can design proper scaffolding to support the student In Fig.7, there contained twenty students’ learning data and a teachers’ knowledge structure in the associative graph, and there existed six students clusters labeled with G1~G6 The students in the same cluster were of similar features of knowledge cognition, teachers could observe the underachieved content knowledge of every cluster For example, clusters G1, G2 had no connection with teacher’s knowledge, it meant that these students failed to apply knowledge and failed to understand knowledge, and (E) means entity type,(R) means relationship type, (MVA) means multi-valued attribute, (T) means table, (F) means field 350 M.-H Lin and C.-F Chen Underachievement of content knowledge linked to Student1 Content knowledge defined by teachers Student1 also linked to these blocks, it means Student1 achieved Fig One student’s knowledge cognition and teachers’ knowledge cognition associative graph Underachievement of content knowledge linked to various clusters Fig Knowledge cognition associative graph – students clusters’ vs teacher’s 3W Scaffolding in Curriculum of Database Management and Application 351 cluster G3 needed to enhance understanding and applying some procedural knowledge, and G5 had lots of links with meaningless conversion, it meant that teachers should require to pay much more attention and concern to them, although they had understood some procedural knowledge, but from their wrong responses, we found that they were seriously lack of the knowledge about entity-relationship model and relational data model, so they just presented several meaningless conversion Both G6 and G4 were belonged to the leading adopters, the actual development level of G6 was achieved the level defined by teachers, but G4 still needed to enhance applying procedural knowledge Suggestions Teachers plan scheduled progress in courses, make use of the taxonomy of Bloom’s educational objectives in cognitive domain to precisely master learning objectives, activities and assessments During students participate the activities, teachers collect the learning data from students, define the conceptual labels according to content knowledge cognition in courses, and the conceptual labels may be appeared from students learning data, and then teachers code students learning data with conceptual labels predefined When the knowledge cognition associative graph is created, teachers can indeed observe and interpreter the features of various clusters of students, and these features can be provided for teachers to refer and design appropriate scaffolding for proper students just when they need assistance References Yeh, L.C., Lin, S.P.: The study of the Revised Taxonomy of Educational Objectives in Cognitive Domain Journal of Education Research 105, 94–106 (2003) Lee, K.C.: Taxonomy of Educational Objectives in Cognitive, Affective, and Psychomotor Domains: Applications in Assessment High Education Press (2009) Chen, P.S.: The Entity Relationship Model, Towards a Unified View of Data ACM Transactions on Database Systems 1(1), 9–36 (1976) Hong, C.-F.: Qualitative Chance Discovery – Extracting competitive advantages Information Sciences 179, 1570–1583 (2009) Halpin, T.: Bloesch A: Data Modeling in UML and ORM: A Comparison Journal of Database Management 14(4), 4–14 (1999) Moore, G.: A Inside the Tornado: Marketing Strategies from Silicon Valley’s Cutting Edge Harper Business, New York (1999) Rogers, E.M.: Diffusion of Innovations Free Press, New York (2003) Connolly, T.M., Begg, C.E.: A Constructivist-Based Approach to Teaching Database Analysis and Design Journal of Information Systems Education 7(1), 43–53 (2006) Wood, C., Bruner, J.S., Ross, G.: The role of tutoring in problem solving Journal of Child Psychology and Psychiatry 17, 89–100 (1976) Geoparsing of Czech RSS News and Evaluation of Its Spatial Distribution Jiří Horák1, Pavel Belaj1, Igor Ivan1, Peter Nemec2, Jiří Ardielli1, and Jan Růžička1 VSB Technical University of Ostrava, Institute of Geoinformatics, 17 listopadu 15, 70833 Ostrava-Poruba, Czech Republic {jiri.horak,pavel.belaj,igor.ivan,jiri.ardielli, jan.ruzicka}@vsb.cz Software602 a s., Hornokrčská 15, 140 00 Praha 4, Czech Republic pnemec@602.cz Abstract Geoparsing assigns geographic identifiers to textual words and phrases in documents The specific problem is how to apply geoparsing in languages where changes of word termination occur An appropriate method requires a flexible solution reflecting different strategies and priorities Sixteen Czech RSS news channels were evaluated according to ten criteria Three selected RSS channels were monitored for more than two years The applied geoparsing included successive steps of different filters’ application and utilized the generation of different grammatical cases for recognized entities Various problems with geographical names are classified and documented The quality assessment shows satisfactory results namely for identification of names in domiciles (94%) The pessimistic strategy is applied to analyze a geographical balance of news distribution The results show significant differences between distribution of news in monitored channels and document a high concentration of cultural and national news in several locations Keywords: RSS, Geoparsing, Geocoding, News, Czech TV Introduction In 2003 it was estimated that approximately 80% of all information is stored in textual documents, meaning in a form of unstructured data Extraction and processing of useful information head towards transformation into structural data and classification or indexing text according to selected criteria, usually on the base of user’s queries A substantial support in these processes is found in Knowledge organization systems such as thesauruses, classification schemes, subject heading systems, and taxonomies in the frame of semantic web New solutions require deeper linguistic analyses of text It is necessary to deal with different forms of words, with spelling, to utilize contextual information, build a thematic thesaurus to employ relationships between terms (descriptors) R Katarzyniak et al (Eds.): Semantic Methods, SCI 381, pp 353–367 springerlink.com © Springer-Verlag Berlin Heidelberg 2011 354 J Horák et al Automated processing of natural languages is one of the most demanding tasks of artificial intelligence Usually keyword analysis, syntactic-semantic analysis and recognition of named entities are utilized in the process of understanding natural language Named entities are phrases that contain the names of persons, organizations, locations, times and quantities (Erik, 2002) Automatic recognition of named entities is a subject of wide research Besides English the research enables processing of various other languages like German, French, Spanish, Swedish, Greek and Italian From Slavonic languages we can find Polish (Piskorski 2004), Romanian (Cucerzan, Yarowsky 1999), Russian (Popov et al 2004),) and Bulgarian (Da Silva et al 2004) Processing is usually oriented to systematically label (tag) recognized named entities in the text (Chowdhury, 2003) Geographical names represent one of the named entities applied in automated processing of natural languages They use spatial relationships to improve results of recognition Principles of geocoding have been well known in geographical information systems for more than 20 years (Aronoff 1989) The original idea is to obtain geographical location of data by matching individual parts of postal addresses (for data) and interpolate the location from the range of house numbers Geoparsing (or geotagging) deals with unstructured texts, and must overcome the uncertainty connected with writing geographical information into the text Geoparsing is still more disseminated in English speaking environments Usually countries or selected cities are located Geoparsing which recognizes more detail locations is still infrequent (Lee, Lee 2005) The easiest way of monitoring news media is to explore its RSS channels because most media provide such internet channels The news from different providers is usually organized according to sections, such as economy, national news, world news, sports, culture, regional news etc The objective of the study is to identify a suitable (flexible and efficient) method of geoparsing Czech (or other languages where changes of word termination occur) geographical named entities from RSS media channels and to evaluate the geographical and temporal balance of news, because any deviation from a regular distribution may create bias of the media image of particular localities Geocoding, Geoparsing and RSS Geocoding identifies occurrences of geographical entities in structured location references such as postal addresses, and assign appropriate geographical coordinates to them The process usually compares sections of the given address with data in a reference layer (like set of municipalities, set of streets) Currently geocoding services are provided by many agencies such as Microsoft, Google, and Yahoo Services obtain addresses for geocoding and return answers containing geographical coordinates, country, town, ZIP, street, number, etc The answer may contain also ancillary information like accuracy or warnings Geoparsing of Czech RSS News and Evaluation of Its Spatial Distribution 355 Geoparsing assigns geographic identifiers to textual words and phrases in documents with unstructured content Geocoding utilizes only structured information, while geoparsing is based on the processing of unstructured text, where geographical information is usually disseminated in different places, combining quantitative values (like 10km from ) and qualitative values (name of locations, administrative units) Besides classic textual documents, other forms of media may also be geoparsed (i.e audio content) Basically geoparsing consists of main steps: • Entity extraction (separate character strings matching the search text), • Geotagging (selection of appropriate geographical identifiers for identified phrases, solving different ambiguities) The results of geoparsing may be joined with an original text to create new documents suitable for geographical application (Caldwell, 2009) Geoparsing is utilized for various purposes Beaman and Barry (2003) describe the application of these methods to improve streamlining and automate acquisition of biogeographic data The multi-step successive process includes pre-processing text for language, locale or project specific anomalies and phrase analysis Text parsing and pattern matching involve detecting feature types (i.e National Park, Island), place names, and their inter-relationships, calculation of geographic offsets (i.e 2.5 km WNW of something) and recording Geoparsing of disease alerts (Keller et al 2008) using the gazetteer approach and utilization of neural networks is applied to improve the georeferencing capability of the HealhMap server (www.healthmap.org) Geoparsing of Czech texts is still quite rare because of problems with different words’ terminations A web prototype for geoparsing of the Liberec Region (http://geoparser.kraj-lbc.cz) was presented by Košková and Kafka (2009) RSS (Really Simple Syndication) is a family of Web feed formats used to publish frequently updated works – such as blog entries, news headlines, audio, and video – in a standardized format An RSS document (refers also as "feed", "web feed", or "channel") includes full or summarized text, plus metadata such as publishing dates and authorship A web feed (or news feed) is a data format used for providing users with frequently updated content http://en.wikipedia.org/wiki/RSS_(file_format) A RSS document usually contains headlines and text of news and publishes them on a unique URI address By its internal form a RSS document is a subset of XML (RSS Specifications, 2011) GeoRSS represents a geographical extension of RSS A GeoRSS document is created by adding spatial information to RSS The main advantages of GeoRSS can be seen in an effective conceptual solution of geoparsing and probably more accurate localization Usually it is applied for special events like floods, wildfires or earthquakes, where an accurate geographical location is crucial 356 J Horák et al Evaluation and Selection of RSS Channels The objective of the study is to evaluate a geographical distribution of Czech news media First, the extended list of various available RSS channels providing media news was created Next, an evaluation and a selection of RSS channels were required using appropriate qualitative criteria It is hardly to find any recommendation concerning relevant evaluation of RSS channels’ quality; usually a maximum number of available channels (i.e all collected RSS feeds (Sia, Cho 2007)) or selection based on popularity are applied (i.e popularity in Gmail's "web clips" and Bloglines' „most popular feeds," (Jun, Ahamad 2006)) We suggest an own set of criteria All channels were evaluated using the following criteria: validity of RSS, change of address or structure, number of news items in the channel, average number of news items per day, content, domicile, special setting, GeoRSS, quality of formatting and metadata Every criterion was ranked on a scale of to according to the level of satisfaction Explanations of criteria follow: • Validity of RSS We applied „Feed Validation Service“ provided by W3C (http://validator.w3.org/feed/) A valid RSS channel is evaluated with (for small recommendations) or (for large recommendations) Invalid channels with small number of errors are evaluated with or Occurrences of important errors limit the evaluation to • Change of address or structure Unfortunately the RSS specification still does not provide tools how to announce coming changes of the address In the test period we recognized the change of address in the case of one channel (denik_kultura) The change of address was accompanied by the change of structure (seven new sections) In several other cases we recognized the changes of structure I.e channels of CT24 have extended their contents with multimedia elements • Number of news items in the channel The criterion reflects the required situation with the optimal number of news items (not too large but also not too small) Large amounts of news items cause a long processing time, occurrence of out-of-date news or server overloading Too small of a number of news items may demand frequent refreshing of content and increases the risk of data loss (with inadequate frequency of reading by consumers) Analysis of 245 news channels (Jun, Ahamad 2006) indicates the average counts for the news channels range from 10 to 126 and 65% channels have a fixed entry count We recommend an optimal range of 20 to 50 news items available in the channel Lower or higher numbers are penalized by lower evaluation of this criterion • Average number of news items per day Higher average number of news items represents a better channel providing more information The number of news items oscillates, thus the average number calculated for a long time testing interval is recommended The evaluation should take into account the natural differences in number of news items according to the thematic Geoparsing of Czech RSS News and Evaluation of Its Spatial Distribution 357 dedication of the channel Usually channels oriented to travelling or culture provides significantly less news than regional or national channels • Content The criterion evaluates the structure and length of the element This element contains a part (usually the first part) of the news It is often called „perex“ The evaluation covers a way of perex publishing, length of the perex and the delivery of the whole content of the news (some channels like CT24 provide the full content of news in the element ) • Domicile A domicile usually represents a place of residence; in our case it means a basic location of the news The presence of domicile facilitates and refines geoparsing of the news Channels may also differ in quality of the domicile indication • Special settings The criterion reflects various necessary modifications which have to be done before testing I.e CT24 channels require sending security strings Such modifications complicate utilization and thus they are penalized in the evaluation • geoRSS The best situation for the georeferencing of news is when geoparsing is superfluous due to a presence of direct geotaggs in the text The text may contain tags for point localization of object/events or for line segments (i.e roads with traffic jam) The criterion evaluates a presence of such geotags in the text and also their reliability (in some cases news contains geotags only for part of localities situated in the text) • Quality of formatting The criterion evaluates a quality of formatting rules inside the text Also occurrences of various special characters (like quotation marks, apostrophes, ampersands, hyphens) which complicate text processing are taken into account • metadata The criterion addresses a presence and quality of metadata Especially elements like (minimal interval for update) or (date and time of last updating) are valuable for processing A total evaluation uses multicriteria evaluation The weight assigned to every criterion provides information about the significance of the criterion in the total evaluation To find out the appropriate value of weights in the system of 10 criteria, analytic hierarchy process (AHP) was applied This technique was described by Saaty (1994) and Saaty and Vargas (2001) A final suitability of each RSS channel is expressed by weighted linear combination (WLC) WLC combines criteria scores and criteria weights The usability of RSS channels (tab 1) varies between 2.005 („novinky-cestovani“ - travelling) and 3.645 (CT24 national news) According results we decided to select channels for further processing: CT24_domaci (CT24 national news), ct24_regionalni (regional) and ct24_kultura (cultural) ... feature words into features Then, all of the infrequent features were removed and the remaining features became opinion features for opinion mining Opinion orientations based on opinion features... 5] Relational Data Model 2.1 Set Theory vs Relational Databases The relational databases are derived in a straight line from the set theory, which is one of the main branches of mathematical logic... which mention to this feature in 669 crawled reviews, calculating opinion weight, identification orientation A Feature-Based Opinion Mining Model on Product Reviews in Vietnamese 31 Table Results