Bài báo công bố

Dựa vào hai kết quả nghiên cứu về xây dụng mô hình mạng HTM:

- Mô hình mạng OBN-SBN: chúng tôi đã có bài báo công bố tại hội nghị quốc tế

KES2010, bang Maryland, Hoa Kì vào 2010 với tiêu đề sau: “Applying HTM- Based System to Recognize Object in Visual Attention”.

- Mô hình mạng OBN-HSBN: chúng tôi đã có bài báo được công nhận tại hội nghị

quốc tế KICSS 2010 tổ chức ở Thái Lan vào 2010 với tiêu đề: “Applying HTM- Based System and Semantic Network to Recognize Object in Visual Attention”.

Tài liệu tham khảo

[1] M. Negnevitsky, Artificial Intelligence – A Guide to Intelligent Systems, 2nd Edition, pp. 20-21.

[2] J.Hawkins and S. Blakeslee, "On Intelligence", Times Books, 2004.

[3] H.B Le and T.T Tran, "Recognizing objects in images using visual attention schema", New Directions in Intelligent Interactive Multimedia Systems and Services - 2, vol. 226/2009, pp. 129-144, 07/2009.

[4] D. George, How the brain might work: A Hierarchical and Temporal Model for Learning and Recognition, pp. 5, 06/2008.

[5] Numenta, Inc. „Zeta1 Algorithms Reference‟, Version 1.5, Patents pending, http://www.numenta.com/ , 18/12/2008.

[6] S. Frintrop, E. Rome and H. I. Christensen, "Computational Visual Attention Systems and their Cognitive Foundations: A Survery", ACM Transactions on Applied Perception (TAP), 1/2010.

[7] T. Tran, “Ứng dụng mô hình tập trung để nhận diện đối tượng trong ảnh”, luận văn cao học được bảo vệ tại trường ĐHKHTN, 2009.

[8] L. Itti and C. Koch, "Computational Modelling of Visual Attention", Nature reviews. Neuroscience, Vol. 2, No. 3, pp. 194-203, 03/2001.

[9] M.C. Mozer and S.P. Vecera, “Object- and space-based attention”, Neurobiology of attention (pp. 130-134), 2005.

[10] M. I.Posner, "Orienting of attention". Q. J. Exp. Psychol, 1980.

[11] C. W. Eriksen and Y.-Y. Yeh, "Allocation of attention in the visual field",. J. Exp. Psychol. Hum. Percept. Perform. 11, 1985.

[12] J. Duncan , "Selective attention and the organization of visual information". J. Exp. Psychol. Gen. 113, 1984.

[13] S. P. Vecera and M. J. Farah, "Does visual attention select objects or locations?", J. Exp. Psychol. Gen. 123, pp. 146–160, 1994.

[14] D. L. Gordon, "The CODE theory of Visual Attention: An Integration of Space-Based and Object-Based Attention", Psychological Review, Vol 103(4), 603-649, Oct 1996. [15] N. Lavie and J. Driver, "On the spatial extent of attention in object-based visual

selection", Percept. Psychophys. 58, pp. 1238–1251, 1996.

[16] R. Desimone and J. Duncan, "Neural mechanisms of selective visual attention", Annual Reviews of Neuroscience 18, 193–222, 1995.

[17] A. M. Treisman and G. Gelade, "A feature integration theory of attention", Cognitive Psychology 12, 97–136, 1980.

[18] M. Mozer and S.Sitton, "In Attention", (ed. Pashler, H.),pp. 341–393, 1996.

[19] K. Schill, E. Umkehrer, S. Beinlich, G. Krieger and C.Zetzsche, "Scene analysis with saccadic eye movements: top-down and bottom-up modeling", J. Electronic Imaging. [20] G. Deco and J. Zihl, "A neurodynamical model of visual attention: Feedback

enhancement of spatial resolution in a hierarchical system", J. Comp. Neurosci. [21] L. W. Stark and Y. S. Choi, "In Visual Attention and Cognition" (eds Zangemeister,

W. H., Stiehl, H. S. & Freska, C.) 3–69 (Elsevier Science B. V., Amsterdam, 1996). [22] J. Doremalen and L. Boves, “Spoken Digit Recognition using a Hierarchical

Temporal Memory”.

[23] Y.J Hall and R.E. Poplin, "Using Numenta‟s hierarchical temporal memory to recognize CAPTCHAs", [S.l.], 2007.

[24] B. A. Bobier and M. Wirth, "Content-based image retrieval using hierarchical temporal memory ", Proceeding of the 16th ACM international conference on Multimedia, pp. 925-928, 2008.

[25] T.Kapuscinski and M. Wysocki, "Using Hierarchical Temporal Memory for Recognition of Signed Polish Words", Computer Recognition Systems 3, vol. 57/2009, pp. 355-362, May 2009.

[26] S. Russell and P. Norvig, Artifical Intelligence – A Modern Approach, 2nd Edition, pp 30, 2003.

[27] D. Anderson and G. McNeill, “Artificial Neural Networks Technology”, 08-1992. [28] C. Pennachin and B. Goertzel, Artificial General Intelligence, Springer, 2006.

[29] Turning Test, http://en.wikipedia.org/wiki/Turing_test [30] What is AI, http://www.alanturing.net/turing_archive/pages/Reference%20Articles/What%20is% 20AI.html [31] http://www.psych.utoronto.ca/users/reingold/courses/ai/turing.html [32] http://en.wikipedia.org/wiki/Blocks_world [33] http://sites.google.com/site/narswang/home/agi-introduction

Phụ lục

Trong phần này, chúng tôi sẽ giới thiệu nội dung hai bài báo đóng góp trong luận văn này.

1/ Mô hình mạng OBN-SBN: chúng tôi đã có bài báo công bố tại hội nghị quốc tế

KES2010, bang Maryland, Hoa Kì vào 2010 với tiêu đề sau: “Applying HTM-Based System to Recognize Object in Visual Attention”.

Applying HTM-based System to Recognize Object in Visual Attention

Hoai-Bac Le1, Anh-Phuong Pham2 and Thanh-Thang Tran3

Abstract. As our previous work [2], we presented a model of visual attention in that the space-based attention happens prior to the object-based attention using Hierarchical Temporal Memory (HTM) system. In this paper, we propose a novel model, applying an alternative flow of visual attention in which the object-based happens earlier than spaced-based. The new approach is able to recognize multiple objects while the previous one only identifies single object in image. Besides, the way of moving object around image’s centralization is applied to improve object identification at any positions in image. The experiments as well as results for identifying one object and two separated objects in multi-object image are presented.

Keywords:Image Processing; Visual attention; Spaced-based and Object-based attention; Hierarchical Temporal Memory.

1 Introduction

The attention mechanism in almost visual system is aim to limit processing to important information relevant to behaviors or visual tasks [4]. It’s well known that there are two complementary modes of selection in behavior studies, including space-driven and object-driven. Advocates of space-based attention argue that attention selects regions of space independent of the objects they contain. Attention is like a spotlight illuminating a region of space. Objects that fall within the beam are processed; objects that fall outside it are not. Advocates of object-based attention argue that attention selects objects rather than regions of space. Selection is spatial because objects necessarily occupy regions of space, but objects rather than the regions themselves are the things that are selected [3]. Many studies show that both of these attention modes coexist and influence each other, which the object-driven happens earlier or later the space-driven.

Our previous model [2] used space-then-object. That is, object-based effects occur within the focus of spatial attention. Basically, it is a schema of combining Hierarchical Temporal Memory Space-based Network (HTM-SBN) and Hierarchical Temporal Memory Object-based Networks (HTM-OBNs) for object recognition. The HTM-SBN is trained to identify several high possible objects. For each object, there is an associated HTM-OBN which is trained to recognize parts of the corresponding object. When an object is presented, its full image is applied to HTM-SBN to identify several candidates. The HTM-OBNs of these candidate objects are then applied for recognizing individual

1,2Faculty of Information Technology, University of Science, HCM City, Vietnam

3Vocation Department, Ton Duc Thang University, HCM City, Vietnam

parts. The average result of all parts is then used as a recognition value of that object. If the object has the highest value, it is considered as the final output.

We point out two problems as well as solutions of our previous model as follows:

Problem 1. How to identify an object in a particular trained image if the object is moved to any positions in the image?

Basically the system is able to recognize a trained image that the object is located at a particular position. However, if the object is moved to any positions in the image, the system is unable to recognize it unless it is trained at those positions.

The solution is that the object is moved to the location where it is nearest the trained one. Firstly, HTM networks train images having the identifying object positioned at centre of the image. Next, when a testing image is presented, the unidentified object is segmented out and moved around the centre position of the image within a predefined radius. For each position-created image, it is identified using HTM networks. Finally, the output is the one having the highest recognition value.

Problem 2. How to identify multiple objects in image? For instance, an image has chair and table concurrently.

The ability of object-driven is able to find candidates of parts based on well-known trained ones. When an object is presented, it is segmented into many individual parts based on color. Each part is identified through HTM-OBNs to find the best candidates of part with high possible. Then, they are combined each other to create available objects. Finally, the objects are identified in space-based using their own HTM-SBNs.

In this paper, we propose a new system based on above solutions. The new approach is able to identify multiple objects at any positions in mage.

The remainder of this paper is organized as follows. In Section 2, we present the way to generate training-testing image set and train HTM-SBNs and HTM-OBNs. Section 3 introduces and explains our new approach. In Section 4, we describe some experiments as well as results. We then discuss limitations and extensions in Section 5. Finally, we show the related works and conclusion.

2 Image set and HTM-based networks

For the training and testing images, we assume that they have been pre-processed in which object’s parts are colored differently. In another way, they are created based on solid-colored parts. We will mention the way to convert a natural object to our assumed object in discussion section. We use centralization-rotating method [2] to create training and testing image set for each object. The HTM network is applied as training and inference system for object recognition.

In this section, we present following items:

 Generating training and testing image set.

 Training HTM-SBNs and HTM-OBNs.

2.1 Generating training and testing image set

The total number of objects in the system is four. It consists of “Chair”, “Table”, “Computer” and “Telephone” whose parts are colored differently. Each object is in a 64×64 image.

An object has full or a few of its own parts as shown in Table 1. The object is considered as a multiple part-based if it has more than one associated parts while as a single part-based if it has only one unique part.

Table 1. List of multiple part-based and single part-based for each object.

Object Multiple part-based Single part-based

Computer

Case + Monitor Case + Keyboard Monitor + Keyboard Case + Monitor + Keyboard

Case Keyboard Monitor Chair Face + 4 Legs Face + Back Back + 4 Legs Face + Back + 4 Legs

Face Back Leg1 (Front-Left) Leg2 (Front-Right) Leg3 (Back-Left) Leg4 (Back-Right) Table

Face + 4 Legs Face

Leg1 (Front-Left) Leg2 (Front-Right) Leg3 (Back-Left) Leg4 (Back-Right) Telephone Hand + Base Hand + Button Base + Button Hand + Base + Button

Hand Base Button

For each multiple part-based and single part-based, we place it in 3D space and use rotating method at centralization [2] to create image set. Particularly, the object is rotated 3600 on Oy while the camera is moved from 00 to 450 on xOy concurrently. Each generated image is regarded as a timing feature of the object.

The output contains 200 continuous frames. We divide the output into training and testing image set in that a half number of pictures having even index are considered as training ones while the others are testing ones.

For training image set, all objects including multiple part-based and single part-based are moved to centre position of the image and converted to binary ones. We use these binary images as input to train HTM-based networks.

2.2 Training HTM-SBNs and HTM-OBNs

Basically, the HTM-SBN and HTM-OBN network has the same structure, using Hierarchical Temporal Memory (HTM) network as a learning and inference system. The HTM-SBN and HTM-OBN are considered as space-driven and object-driven in visual attention respectively. HTM-OBN is used to identify individual parts of object while HTM-SBN is used to recognize part-based combinations of object.

For training HTM networks, inputted images are selected in training image set. The training images are multiple part-based for HTM-SBN while single part-based for HTM- OBN.

Each object has one associated HTM-OBN and HTM-SBN. When an image is tested using the HTM-based network, the output is a prediction vector. Each element in the vector includes belief value and element name. The correctly identified object as output is the element having the highest belief.

For HTM-OBN, assume that object O has k parts and the output of the network is

k p p p

v 1 2.. . The correctly identified part of an inputted part Pis calculated by:

Belief p t k

OBNO( )max ( t), 1.. (1)

For HTM-SBN, assume that object O has k part-based combinations and the output of the network is vc1c2..ck . The correctly identified combination of an inputted combination C is calculated by:

Belief c t k

SBNO( )max ( t), 1.. (2)

3 A new approach

Assume that a multi-object image is considered as many differently connected solid- colored parts. We use the object-driven in visual attention to find the best identified candidates of parts through HTM-OBNs. Next, all these candidates are combined each other to create available objects. Each object is then identified using its own HTM-SBN. The output of the system is a vector vO1O2..Om; m is the number of involved objects in the system.

We choose m=4 because there are 4 testing objects as described in Session 2.1. The new approach has two differences in comparison with the previous model [2]:

 Using object-then-space instead of space-then-object in visual attention. That is, the system focuses on identifying individual parts prior to the whole object in space. Therefore, the system is able to recognize multiple objects in image.

 Before an object is presented through HTM networks, it is pre-processed in the way that it is moved around centre position. All position-created images are then identified using HTM networks to find the most correct one as the output.

When a color image is presented to the model, it is passed through following phases.

Phase 1. Pre-processing images

An inputted image is segmented into many parts based on color. Then, parts‟ color is converted into binary. Then, we move parts around center position within a predefined radius RADIUS_OBN.

We select RADIUS_OBN=2. So, a sample of “Monitor” segment generates 9 position-created images as shown in Fig. 1.

Fig. 1. Moving “Monitor” segment around centre position for position-created images with RADIUS_OBN=2.

Phase 2. Identifies object‟s parts

We identify segments based on their position-created images. They are passed through all HTM-OBNs to find a list of best candidate ones.

} .. 1 }, 9 .. 1 ), ( {max{ )

(Segment OBN Seg j i m

Value  Oi j   (3)

We sort the list based on belief value decreasingly.

} ..

{ )

(Segment Value1 Value2 Valuem

Value     (4)

With a predefined parameter TOP_OBN_PARTS (N), there are top N candidate parts of the segment as output.

} ,..,

, { )

(Segment Value1Value2 ValueN

Phase 3. Builds objects

We build available objects based on all different combinations among candidate parts. Then, these part-based combinations are moved around centre position within a predefined radius RADIUS_SBN. We select RADIUS_SBN=2. So, the number of position-created image of an input is 9.

Phase 4. Identifies objects

Each object has an associated HTM-SBN. In this phase, position-created combinations of object O are passed through its own HTM-SBN to find the best one in space. Assume that object O has k candidate part-based combinations; the correctly identified one is calculated by:

} 9 .. 1 , .. 1 ), ( max{ ) (Object  SBN C i k j Value O ij (6)

Assume that vector vO1O2..Om is the output of the system. Each element is the object value for a particular object. So, v

is: )} ( ),.., ( ), (

{Value O1 Value O2 Value Om

v (7)

We sort v

based on element‟s belief value decreasingly.

v{v1v2 ..vm} (8)

With a predefined parameter TOP_SBN_OBJECTS (N), we consider top N elements in vector v

as the correctly identified objects for an inputted image I. } ,.., , { ) (I v1 v2 vN Output  (9) 4 Experiments

We present two experiments as well as results. It consists of identifying one object and two separated objects in 128×128 images. Testing images are randomly selected in testing and training image set. These images are multiple part-based whose identifying object has full of solid-colored parts. Next, the object is placed at a random position for testing image.

The correct percentage of an object is calculated based on the number of correctly identified images over the number of inputted testing images. Then, we calculate the average correct percentage of the whole testing and training image set.

At the end of this session, we discuss about the evaluation method and compare the model with our previous one [2]. We configure the parameters used in all experiments as shown in Table 2.

Table 2. List of parameters used in all experiments

Parameter Value

RADIUS_OBN 2

RADIUS_SBN 2

TOP_OBN_PARTS 2

TOP_SBN_OBJECTS 1 or 2, which is the number of correctly identified objects as output of the system.

4.1 Experiment 1

Subjects. Identifying one object.

Procedures.

The value of TOP_SBN_OBJECTS parameter is configured to one. When an image is presented, the system returns one identified object name. A sample of testing images in this experiment is shown in Fig. 2.

Fig. 2. Sample of testing “Chair” images.

Result.

Chair Table Computer Telephone Avg. Perc Testing Image Set 96% 100% 100% 100% 99% Training Image Set 83% 96% 100% 100% 94.7%

4.2 Experiment 2

Hệ thống thị giác con người

Tập trung theo bottom-up và top-down