edi:or$ Satchidananda Dehuri Sung-Bae Cho KNOWLEDGE MINING USING INTELLIGENT AGENTS Imperial College Press Knowledge Mining Using Intelligent Agents P639 tp.indd 10/18/10 5:34 PM Advances in Computer Science and Engineering: Texts Vol Knowledge Mining Using Intelligent Agents editors Satchidananda Dehuri Fakir Mohan University, India Sung-Bae Cho Yonsei University, Korea ICP P639 tp.indd Imperial College Press 10/18/10 5:34 PM Published by Imperial College Press 57 Shelton Street Covent Garden London WC2H 9HE Distributed by World Scientific Publishing Co Pte Ltd Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library KNOWLEDGE MINING USING INTELLIGENT AGENTS Advances in Computer Science and Engineering: Texts – Vol Copyright © 2011 by Imperial College Press All rights reserved This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA In this case permission to photocopy is not required from the publisher ISBN-13 978-1-84816-386-7 ISBN-10 1-84816-386-X Typeset by Stallion Press Email: enquiries@stallionpress.com Printed in Singapore Steven - Knowledge Mining Using.pmd 12/29/2010, 2:45 PM Advances in Computer Science and Engineering: Texts Editor-in-Chief: Erol Gelenbe (Imperial College) Advisory Editors: Manfred Broy (Technische Universitaet Muenchen) Gérard Huet (INRIA) Published Vol Computer System Performance Modeling in Perspective: A Tribute to the Work of Professor Kenneth C Sevcik edited by E Gelenbe (Imperial College London, UK) Vol Residue Number Systems: Theory and Implementation by A Omondi (Yonsei University, South Korea) and B Premkumar (Nanyang Technological University, Singapore) Vol 3: Fundamental Concepts in Computer Science edited by E Gelenbe (Imperial College Londo, UK) and J.-P Kahane (Université de Paris Sud - Orsay, France) Vol 4: Analysis and Synthesis of Computer Systems (2nd Edition) by Erol Gelenbe (Imperial College, UK) and Isi Mitrani (University of Newcastle upon Tyne, UK) Vol 5: Neural Nets and Chaotic Carriers (2nd Edition) by Peter Whittle (University of Cambridge, UK) Vol 6: Knowledge Mining Using Intelligent Agents edited by Satchidananda Dehuri (Fakir Mohan University, India) and Sung-Bae Cho (Yonsei University, Korea) Steven - Knowledge Mining Using.pmd 12/29/2010, 2:45 PM October 12, 2010 16:15 9in x 6in b995-fm Knowledge Mining Using Intelligent Agents PREFACE The primary motivation for adopting intelligent agent in knowledge mining is to provide researcher, students and decision/policy makers with an insight of emerging techniques and their possible hybridization that can be used for dredging, capture, distributions and utilization of knowledge in the domain of interest e.g., business, engineering, and science Knowledge mining using intelligent agents explores the concept of knowledge discovery processes and in turn enhances the decision making capability through the use of intelligent agents like ants, bird flocking, termites, honey bee, wasps, etc This book blends two distinct disciplines–data mining and knowledge discovery process and intelligent agents based computing (swarm intelligence + computational Intelligence) – in order to provide readers with an integrated set of concepts and techniques for understanding a rather recent yet pivotal task of knowledge discovery and also make them understand about their practical utility in intrusion detection, software engineering, design of alloy steels, etc Several advances in computer science have been brought together under the title of knowledge discovery and data mining Techniques range from simple pattern searching to advanced data visualization Since our aim is to extract knowledge from various scientific domain using intelligent agents, our approach should be characterized as “knowledge mining” In Chapter we highlight the intelligent agents and their usage in various domain of interest with gamut of data to extract domain specific knowledge Additionally, we will discuss the fundamental tasks of knowledge discovery in databases (KDD) and a few well developed mining methods based on intelligent agents Wu and Banzhaf in Chapter discuss the use of evolutionary computation in knowledge discovery from databases by using intrusion detection systems as an example The discussion centers around the role of evolutionary algorithms (EAs) in achieving the two high-level primary goals of data mining: prediction and description In particular, classification and regression tasks for prediction and clustering tasks for description The v October 12, 2010 vi 16:15 9in x 6in b995-fm Knowledge Mining Using Intelligent Agents Preface use of EAs for feature selection in the pre-processing step is also discussed Another goal of this chapter was to show how basic elements in EAs, such as representations, selection schemes, evolutionary operators, and fitness functions have to be adapted to extract accurate and useful patterns from data in different data mining tasks Natural evolution is the process of optimizing the characteristics and architecture of the living beings on earth Possibly evolving the optimal characteristics and architectures of the living beings are the most complex problems being optimized on earth since time immemorial The evolutionary technique though it seems to be very slow is one of the most powerful tools for optimization, especially when all the existing traditional techniques fail Chapter 3, contributed by Misra et al., presents how these evolutionary techniques can be used to generate optimal architecture and characteristics of different machine learning techniques Mainly the two different types of networks considered in this chapter for evolution are artificial neural network and polynomial network Though lots of research has been conducted on evolution of artificial neural network, research on evolution of polynomial networks is still in its early stage Hence, evolving these two networks and mining knowledge for classification problem is the main attracting feature of this chapter A multi-objective optimization approach is used by Chen et al, in Chapter to address the alloy design problem, which concerns finding optimal processing parameters and the corresponding chemical compositions to achieve certain pre-defined mechanical properties of alloy steels Neurofuzzy modelling has been used to establish the property prediction models for use in the multi-objective optimal design approach which is implemented using Particle Swarm Optimization (PSO) The intelligent agent like bird flocking, an inspiring source of PSO is used as the search algorithm, because its population-based approach fits well with the needs of multi-objective optimization An evolutionary adaptive PSO algorithm is introduced to improve the performance of the standard PSO Based on the established tensile strength and impact toughness prediction models, the proposed optimization algorithm has been successfully applied to the optimal design of heat-treated alloy steels Experimental results show that the algorithm can locate the constrained optimal solutions quickly and provide a useful and effective knowledge for alloy steels design Dehuri and Tripathy present a hybrid adaptive particle swarm optimization (HAPSO)/Bayesian classifier to construct an intelligent and October 12, 2010 16:15 9in x 6in b995-fm Preface Knowledge Mining Using Intelligent Agents vii more compact intrusion detection system (IDS) in Chapter An IDS plays a vital role of detecting various kinds of attacks in a computer system or network The primary goal of the proposed method is to maximize detection accuracy with a simultaneous minimization of number attributes, which inherently reduces the complexity of the system The proposed method can exhibit an improved capability to eliminate spurious features from huge amount of data aiding researchers in identifying those features that are solely responsible for achieving high detection accuracy Experimental results demonstrate that the hybrid intelligent method can play a major role for detection of attacks intelligently Today networking of computing infrastructures across geographical boundaries has made it possible to perform various operations effectively irrespective of application domains But, at the same time the growing misuse of this connectively in the form of network intrusions has jeopardized the security aspect of both the data that are transacted over the network and maintained in data stores Research is in progress to detect such security threats and protect the data from misuse A huge volume of data on intrusion is available which can be analyzed to understand different attack scenarios and devise appropriate counter-measures The DARPA KDDcup’99 intrusion data set is a widely used data source which depicts many intrusion scenarios for analysis This data set can be mined to acquire adequate knowledge about the nature of intrusions thereby one can develop strategies to deal with them In Chapter Panda and Patra discuss on the use of different knowledge mining techniques to elicit sufficient information that can be effectively used to build intrusion detection systems Fukuyama et al., present a particle swarm optimization for multiobjective optimal operational planning of energy plants in Chapter The optimal operational planning problem can be formulated as a mix-integer nonlinear optimization problem An energy management system called FeTOP, which utilizes the presented method, is also introduced FeTOP has been actually introduced and operated at three factories of one of the automobile companies in Japan and realized 10% energy reduction In Chapter 8, Jagadev et al., discuss the feature selection problems of knowledge mining Feature selection has been the focus of interest for quite some time and much work has been done It is in demand in areas of application for high dimensional datasets with tens or hundreds of thousands of variables are available This survey is a comprehensive overview of many existing methods from the 1970s to the present The October 12, 2010 viii 16:15 9in x 6in b995-fm Knowledge Mining Using Intelligent Agents Preface strengths and weaknesses of different methods are explained and methods are categorized according to generation procedures and evaluation functions The future research directions of this chapter can attract many researchers who are novice to this area Chapter presents a hybrid approach for solving classification problems of large data Misra et al., used three important neuro and evolutionary computing techniques such as polynomial neural network, fuzzy system, and Particle swarm optimization to design a classifier The objective of designing such a classifier model is to overcome some of the drawbacks in the existing systems and to obtain a model that consumes less time in developing the classifier model, to give better classification accuracy, to select the optimal set of features required for designing the classifier and to discard less important and redundant features from consideration Over and above the model remains comprehensive and easy to understand by the users Traditional software testing methods involve large amounts of manual tasks which are expensive in nature Software testing effort can be significantly reduced by automating the testing process A key component in any automatic software testing environment is the test data generator As test data generation is treated as an optimization problem, Genetic algorithm has been used successfully to generate automatically an optimal set of test cases for the software under test Chapter 10 describes a framework that automatically generates an optimal set of test cases to achieve path coverage of an arbitrary program We take this opportunity to thank all the contributors for agreeing to write for this book We greatfully acknowledge the technical support of Mr Harihar Kalia and financial support of BK21 project, Yonsei University, Seoul, South Korea S Dehuri and S.-B Cho October 12, 2010 16:15 9in x 6in b995-fm Knowledge Mining Using Intelligent Agents CONTENTS Preface v Theoretical Foundations of Knowledge Mining and Intelligent Agent S Dehuri and S.-B Cho The Use of Evolutionary Computation in Knowledge Discovery: The Example of Intrusion Detection Systems 27 S X Wu and W Banzhaf Evolution of Neural Network and Polynomial Network 61 B B Misra, P K Dash and G Panda Design of Alloy Steels Using Multi-Objective Optimization 99 M Chen, V Kadirkamanathan and P J Fleming An Extended Bayesian/HAPSO Intelligent Method in Intrusion Detection System 133 S Dehuri and S Tripathy Mining Knowledge from Network Intrusion Data Using Data Mining Techniques 161 M Panda and M R Patra Particle Swarm Optimization for Multi-Objective Optimal Operational Planning of Energy Plants Y Fukuyama, H Nishida and Y Todaka ix 201 ... 6: Knowledge Mining Using Intelligent Agents edited by Satchidananda Dehuri (Fakir Mohan University, India) and Sung-Bae Cho (Yonsei University, Korea) Steven - Knowledge Mining Using. pmd 12/ 29 /2010, ... Using. pmd 12/ 29 /2010, 2:45 PM October 12, 2010 16:15 9in x 6in b995-fm Knowledge Mining Using Intelligent Agents PREFACE The primary motivation for adopting intelligent agent in knowledge mining is... b995-fm Knowledge Mining Using Intelligent Agents CONTENTS Preface v Theoretical Foundations of Knowledge Mining and Intelligent Agent S Dehuri and S.-B Cho The Use of Evolutionary Computation in Knowledge