(LUẬN văn THẠC sĩ) indexation et recherche d’image par le contenu et par la localisation géographique

INTRODUCTION

M OTIVATION

The global climate is changing rapidly, leading to an increase in complex natural disasters worldwide, which cause significant damage to human life Effective coordination among rescue teams can greatly reduce the extent of destruction and fatalities Therefore, it is essential to have a robust decision support system in place for post-natural disaster situations.

Currently, there are numerous studies focused on content-based image retrieval; however, few explore the simultaneous use of image content and geographical information Existing research primarily targets tourism applications, while the IDEA project exemplifies a novel application in decision support for rescue operations.

O BJECTIFS

Les objectifs de ce travail de stage sont :

• Construire une base d’images de sinistres différents

• Simuler les informations géographiques pour ces images

To effectively manage emergency situations, it's essential to organize image information within two distinct visual content and geographic information spaces By integrating these two areas, we can collaboratively analyze and assess each image, ultimately determining the urgency level associated with various scenarios.

• Proposer une faỗon pour dộterminer un niveau d’urgence pour chaque sinistre en se basant sur la proximité des situations similaires et sur l’importance des monuments autour de chaque sinistre

To verify and validate the model, it is essential to provide various scenarios for IDEA that identify and describe situations where there is an interest in searching for images by combining location and image content, followed by conducting tests based on these scenarios.

C ONTRIBUTION

By combining fire images from student BUI The Quang of the 14th promotion of IFI with internet images under Creative Commons licenses, I have created a comprehensive image database for IDEA, featuring five different types of disasters: fires, damaged buildings, damaged roads, injuries, and floods Each category contains between 300 to 350 images Additionally, I simulated geographic location information for these images by assigning latitude and longitude coordinates, which can be allocated either randomly or based on proximity groups.

During this internship, we successfully implemented data structuring through SR-Tree partitioning to organize visual content descriptions and external geographic information descriptors of images This approach enhances and accelerates the search for similar images based on content and calculates proximity between situations in geographic space, avoiding exhaustive comparisons with all database elements Furthermore, when assigning an urgency level to each disaster image, we consider not only the proximity of similar situations but also the symbolic descriptions of nearby landmarks (such as hospitals, houses, buildings, and schools) commonly used in Geographic Information Systems (GIS).

E NVIRONNEMENT DE STAGE

This internship is conducted at the Laboratory of Computer Science, Image, and Interaction (L3I) at the University of La Rochelle, France, as part of the IDEA project (Images of natural Disasters from robot Exploration in urban Areas), funded by the STC-Asia program (MAE/CNRS/INRIA) The project involves collaboration between IFI and the University of La Rochelle, aiming to develop a decision-support system that utilizes image processing and computer vision for post-natural disaster scenarios It focuses on four main themes: urban localization using image clues, assessing damage to buildings and infrastructure, detecting the condition of human victims, and image-based decision support This internship primarily contributes to the last theme.

ETAT DE L’ART

I NDEXATION MULTIDIMENSIONNELLE

Les techniques principales d’indexations multidimensionnelles visent à regrouper les descripteurs de base et à les englober dans des cellules faciles à manipuler (hiérarchie)

This approach allows us to streamline our search by focusing only on the most relevant groups or packages of descriptors, thereby avoiding the need to consider all descriptors in the database Ultimately, we work exclusively with the descriptors contained within the selected packages.

Il y a deux grandes catégories de techniques de création de cellules : partitionnement des données et partitionnement de l’espace.

Data partitioning techniques generate cells based on the distribution of descriptors and their relative proximity in space This category includes methods such as B-trees, the R-tree family, SS-trees, SR-trees, and X-trees One notable example is the B-tree, a balanced tree structure introduced by Bayer and McCreight in 1972.

A B-tree is a balanced tree structure that organizes data across one dimension Each non-root node in a B-tree of order m contains between m/2 and m child nodes All leaves are at the same level and store the information.

Un nœud quelconque qui a k nœuds fils a k-1 éléments en ordre croissant qui sont les valeurs de séparation divisant les valeurs de l’axe choisi des nœuds fils

In research, we begin with the root structure, which allows us to verify the existence of an element in the database However, this structure is not suitable for finding elements that are close to any given input, as each internal node contains separation values based on a single axis The R-tree family, also known as Rectangle-tree, addresses this limitation.

The R-tree family includes three structures: R-tree, R+-tree, and R*-tree, which focus on indexing spatial objects using minimum bounding rectangles In multidimensional indexing, multidimensional bounding rectangles, or hyper-rectangles, are utilized This approach is based on a hierarchy of overlapping and non-disjoint hyper-rectangles that reflect data distribution through a balanced tree, with the actual data stored at the leaf level.

Un arbre R-tree a les propriétés suivantes [5] :

In the context of leaves, the bounding rectangle represents the minimum area that encompasses the data vectors associated with that node Each leaf node can contain a maximum of M data elements and a minimum of m, where m is less than or equal to M/2.

All nodes, except for the root, that are not leaves have between m and M child nodes; the bounding rectangle of these nodes is the minimum rectangle that encompasses the bounding rectangles of the child nodes.

• Le nœud racine a au moins deux fils sauf quand il est une feuille Le rectangle englobant du nœud racine recouvre tous les vecteurs de données de la base

• Toutes les feuilles sont au même niveau

Un rectangle englobant est déterminé par deux points S(s1,s2,…,sn) et T(t1,t2,…,tn) ; pour chaque élément X(x1,x2,…,xn) appartenant à ce rectangle, on a : si≤ xi≤ ti avec ∀i ∈ [1,n]

To retrieve all data within a specific rectangle Q, the search begins at the root node and descends to child nodes that intersect with the encompassing rectangle This process continues until reaching the leaf nodes, where all data corresponding to rectangle Q is returned.

The creation of an R-tree involves incrementally adding vectors to the tree The insertion algorithm begins at the root node and descends to locate the appropriate leaf for the new element At each node, the child node with the smallest encompassing rectangle is selected If a leaf is found to be full, it must be split into two separate leaves while minimizing the total area of the new encompassing rectangles, a process that can be exhaustive and may incur quadratic or linear costs Throughout the insertion process, the encompassing rectangle for each node is created and updated to ensure it is the smallest possible rectangle that covers all elements within that node's subtree.

The R-tree structure is specifically designed for interval searching and is ideal for indexing spatial data This method enhances search efficiency by allowing the consideration of only those child nodes whose bounding rectangles intersect with the input interval, rather than examining all elements in the database However, as the number of dimensions increases, the likelihood of intersection between rectangles also rises, which may make sequential traversal a more effective approach.

The R+-tree and R*-tree structures are designed to optimize search efficiency by minimizing the overlap of bounding rectangles The R+-tree achieves this by subdividing overlapping rectangles into smaller ones until no overlap remains, which may increase the tree's height but reduces the number of subtrees that need to be visited In contrast, the R*-tree not only minimizes overlap but also the volume of rectangles by re-inserting some child nodes from a full node before splitting it, aiming to find better positions for the nodes The R*-tree is recognized as the most successful structure within the R-tree family Experiments have demonstrated that the SR-tree can be effectively utilized for organizing multidimensional and spatial data.

The SS-tree is a similarity indexing structure that organizes feature vectors based on their similarity to one another In this context, the similarity measure employed is the Euclidean distance, applicable when all dimensions in the feature vector have equal weights.

The SS-tree structure is similar to that of the R-tree, but it replaces the bounding rectangle in each node with a bounding sphere defined by a center and a radius Data is maintained at the leaf level, where the center of the bounding sphere represents the centroid of all elements within that node's subtree At the leaf level, the bounding sphere encompasses the elements of that leaf, with the radius equal to the distance from the center to the furthest point In internal nodes, the bounding sphere covers the bounding spheres of all child nodes, ensuring that the radius is always greater than or equal to the distance from the center to the furthest point within the subtree.

The SS-tree employs a reinsertion mechanism similar to the R*-tree, aimed at reducing overlap between bounding spheres and their volumes Experimental results indicate that the SS-tree outperforms the R*-tree in similarity search applications involving high-dimensional data Additionally, the SR-tree, or Sphere/Rectangle Tree, was introduced in 1997, contributing to advancements in spatial data structures.

The SR-tree (Sphere/Rectangle Tree) concept merges the R*-tree and SS-tree structures by defining each node's region through the intersection of its bounding rectangle and bounding sphere For every node within the SR-tree, specific characteristics are evaluated to optimize spatial data management and retrieval efficiency.

• Un rectangle englobant recouvre tous les éléments (vecteurs) dans le sous-arbre de ce nœud Ce rectangle est déterminé comme dans le cas du R-tree

• Une sphère englobante recouvre tous les éléments dans le sous-arbre de ce nœud Cette sphère est déterminée comme dans le cas du SS-tree

T RAVAUX SIMILAIRES

SnapToTell is a multimedia information system designed for tourists, utilizing images captured by mobile phones along with location data Imagine traveling and encountering a stunning site; SnapToTell enhances your experience by providing relevant information right at your fingertips.

If you come across a lake, monument, sculpture, or any site of interest and wish to gather more information about it, simply take a photo with your mobile phone You can then send this image to a SnapToTell service provider, who will respond with detailed information about the site in the form of a multimedia message (MMS) or a text message.

The system utilizes the user's mobile phone's geographical location to ascertain their context However, the user's position alone does not reveal their intent, as the monument of interest may be several hundred meters away After determining the user's context, the system searches its database for visually similar images to the query, focusing only on images near the user's location This database compiles images of all famous sites in a country, such as Singapore, with each site represented by multiple images captured from various distances and viewpoints.

Figure 7 - Architecture client/serveur du système SnapToTell

The SnapToTell system utilizes a client/server architecture, where the client is a mobile phone equipped with a camera and MMS support Users can send a request to the server, which may include either an image of a scene or a vector description of the visual content, such as a color histogram The SnapToTell server receives the image along with the user's geographical location information provided by the mobile network operator Based on the user's positioning, the server searches its database for the visually closest image among those surrounding the mobile device's location.

We are developing a comprehensive image database showcasing various scenes from Singapore Each image is equipped with GPS information, organized hierarchically into three levels: Singapore is divided into multiple zones, each containing several locations, and each location corresponds to various scenes Every scene is characterized by photographs captured from different distances, perspectives, and lighting conditions, along with accompanying textual and/or audio descriptions that will be provided to users upon each query.

Figure 8 - Localisation hiérarchique des scènes de Singapour

When processing a request, we first utilize geographic location information to narrow down the number of images to search through Next, we identify the visually closest image to determine the scene It is important to note that we apply a threshold to assess whether a scene corresponds to the requested image.

Le vecteur de caractéristiques visuelles de l’image ici est l’histogramme des couleurs

MobiLog (Mobile Blogging Automation) is a platform designed to partially automate data entry for blogs written on mobile devices, while TraveLog is an application that operates within this platform It is important to note that TraveLog utilizes the outcomes of the Snap2Tell system discussed earlier.

The system allows users to capture images of interesting scenes while traveling and seamlessly create a blog using their mobile phones It automatically enhances their posts by incorporating contextual information such as the time, location, and weather, as well as personal details like the user's name and date of birth, along with descriptions of the images.

Figure 9 - Screen shots de la composition de blog sur un téléphone portable

The context information for image creation can be derived from the timestamp, GPS location of the mobile device, and climate data obtained from relevant servers User personal information is accessed through their profile To describe the scene, the TraveLog server sends user-uploaded images to the Snap2Tell server By analyzing the keywords in the description received from Snap2Tell, the system can search for relevant websites using web search engines and list them in the blog Additionally, the system aims to enhance image quality in the blog by employing histogram equalization techniques or replacing low-resolution images with higher-resolution versions of the same scene, sourced from Snap2Tell, which may include different perspectives and lighting conditions An example of a blog generated by the TraveLog system is illustrated in the following figure.

Figure 10 - Exemple de blog du système TraveLog

SR-TREE

I NSERTION DANS LE SR- TREE

L’algorithme d’insertion d’un nœud dans un SR-tree est basé sur celui du SS-tree

The initial step of insertion involves adding the node to be inserted into the list of nodes for reinsertion Subsequently, the nodes are inserted into the list until it is empty.

L’algorithme pour insérer un nœud N quelconque de la liste dans SR-tree est comme suivant :

If there is only a root node in the SR-tree (an empty structure), we will create an entry for the root node, add N to this entry, and update the root node accordingly.

To find a new parent for node N, we continue descending the tree until we identify the appropriate one The subtree selected during this descent is the one whose center is closest to node N.

When descending through each node, it is essential to update the point count (w) along with the rectangle (R) and sphere (S) associated with that node Upon discovering the parent node of N, the process should be completed efficiently.

• si le père n’est pas plein, on ajoute le nœud N comme un de ses fils

If the father is full, a portion of his offspring (for example, 30%) is reinserted if he has not yet been reinserted Conversely, if he has been reinserted, the offspring are split into two groups to create two new fathers The division is based on the dimension with the highest variance, and the split point is chosen to minimize the sum of variances on either side As a result, one new father, closer to the grandfather, is retained, while the other must be reinserted.

The bounding rectangle and bounding sphere of a node are generated and updated during the insertion phase of entries into the SR-tree The method for updating the regions that intersect with both the bounding rectangle and the bounding sphere is outlined as follows:

A bounding rectangle is defined by two points, S and T, where all coordinates of S are less than or equal to those of any point within the region, and all coordinates of T are greater than or equal to those of any point in the same area To update the bounding rectangle of a node, it is sufficient to adjust the coordinates accordingly.

S et T pour que le nouveau rectangle recouvre les rectangles des nœuds fils (supposons qu’on a n nœud fils) :

11 k est l’indice du fils C k et i est l’indice de la dimension C k s i et C k t i sont respectivement la ième coordonnée des points S et T du rectangle englobant du nœud fils C k

• Pour mettre à jour la sphère englobante : o Le centre de la sphère englobante x(x 1 , x 2 , …, x D ) est calculé comme suit :

In the context of a tree structure, C_k represents the k-th child, while i denotes the dimension index The term C_k.x_i refers to the i-th coordinate of the center of the child C_k, and C_k.w indicates the number of points within the subtree of C_k Additionally, the radius of the enclosing sphere, denoted as r, is calculated accordingly.

C_k.x and C_k.r represent the center and radius of the bounding sphere for thread C_k, while C_k.R denotes the bounding rectangle for the same thread The function MAXDIST(p, R) calculates the maximum distance between a point p and the rectangle R, as defined below.

On peut calculer MAXDIST(p,R) comme suit :

The MAXDIST(p, R) formula consistently yields a distance that is always greater than or equal to all distances between point p and points within rectangle R We define d_s as the maximum distance from the center of the parent node to the bounding spheres of its child nodes, and d_r as the maximum distance from the center of the parent node to the bounding rectangles of its child nodes The SS-tree calculates the radius r based on d_s, while the SR-tree determines the radius r by selecting the minimum value between d_s and d_r Consequently, the radius of the SR-tree can be smaller than that of the SS-tree.

S UPPRESSION DANS LE SR- TREE

La suppression dans le SR-tree est comme celle dans le R-tree [5] Pour supprimer un élément E d’un SR-tree, on suit les étapes suivantes :

Starting from the root node, we traverse down to locate the leaf node L that contains the element E During this descent, we check all child nodes to ensure that their region, defined by the bounding rectangle and the enclosing sphere, encompasses the element E.

• Si on trouve le record E dans un feuil L de l’arbre, on supprimera E de ce feuil

After removing element E, if list L contains fewer than mL elements (where mL is the minimum number of elements for a leaf node), we will delete L and reinsert all its elements If L has mL or more elements, we will update the regions of all nodes from L up to the root node.

R ECHERCHE DANS LE SR- TREE

The SR-tree structure supports various types of queries, including searching for a vector within the SR-tree, finding the k nearest neighbors of a specific point, and identifying data vectors that lie within a certain proximity radius of any given point.

The search for a vector in the SR-tree begins at the root and proceeds downwards through each level to nodes that encompass the vector's region The search concludes either when the vector is found in a leaf node or when there are no more nodes to examine.

La recherche des k plus proches voisins d’un point quelconque est comme dans

• Tout d’abord, on sélectionne k points quelconques comme k plus proches voisins initiaux

The search process utilizes the distance between the input point and the farthest point among k points to guide the exploration towards child nodes that overlap with the range of these k points.

La distance minimale entre le point entré et la région d’un fils C k qui est l’intersection du rectangle englobant et de la sphère englobante est calculée comme suit :

C k x et C k r est le centre et le rayon de la sphère englobante de C k , C k R est le rectangle englobant du fils C k MINDIST(p,R) est la distance minimale entre le point p et le rectangle R :

On peut calculer cette distance comme dans [16] :

= non si p t p si t s p si s r avec r p R p MINDIST i i i i i i i i n i i i

La distance MINDIST(p,R) calculée par la formule précédente donne une distance qui est toujours inférieure ou égale à toutes les distances entre le point P et les points dans le rectangle R

For each child C_k, if the minimum distance d from the input point to the region of that child exceeds the distance between point p and the furthest point among the k nearest neighbors, further search in those children is unnecessary For the remaining children, a depth-first search is conducted, prioritizing those with the smallest minimum distance d.

To find data vectors within a proximity radius r from a specific point p, the search focuses on child nodes that overlap with the sphere centered at point p with radius r.

Chapitre 4 – Système de recherche d’informations basé sur une double information de contenu des images et de localisation géographique

This internship project aims to develop an information retrieval model that combines image content and geographic location to identify urgent situations within a city The model will assess the urgency level based on the proximity of similar events This chapter outlines our approach to creating such a system.

The research in this project focuses on two main aspects: the visual content of images and their geographical location information Each image is characterized by an internal descriptor that represents its visual content as an n-dimensional vector, and an external descriptor that provides geographical coordinates in the form of GPS data, specifically longitude and latitude The geographical information for the images is simulated by the program.

Another hypothesis suggests that the system's input consists of a set of images arriving at a specific moment This means that urgent situations are identified and assigned a level of urgency based on the currently valid images.

The identification of emergency situations involves content-based image retrieval, a focus of NGUYEN Nhu Van's thesis, which we will leverage Instead of concentrating on finding an effective internal descriptor for visual content representation, we utilize the internal descriptors from his system, such as color histograms or visual word bags During this internship, we aim to address the challenge of structuring images across two dimensions: visual content and geographic location, and to manipulate these dimensions to identify emergency situations and assign an urgency level to them.

When assessing the urgency level of an incident, several factors must be considered, including the nature of the disaster—such as the differing urgency levels between a fire and a flood—and the scale of the event, whether it is a large or small fire Additionally, the proximity of various incidents and the presence of similar types of disasters nearby play a crucial role The significance of nearby structures, such as hospitals, homes, and schools, also influences the urgency assessment To comprehensively evaluate these factors, expertise from professionals in disaster response, like firefighters, is essential Furthermore, determining urgency based solely on visual cues, such as the size of a fire, requires careful analysis For the purpose of this study, we will assume that the urgency level is primarily determined by the geographical proximity of similar events and the nature of nearby structures related to the incident.

For the system we are building, we consider five different types of disasters: fire, injury, damaged buildings, damaged roads, and flooding, all with the same urgency levels Additionally, current Geographic Information Systems (GIS) can represent city data through multiple layers, with each layer being a shapefile that contains various types of information, such as houses, roads, and lakes We also assume there is a shapefile containing polygon objects that identify different monuments in the city, with each monument belonging to one of four categories: house, hospital, large building, or school.

In emergency situations, multiple individuals may capture images of the same incident, such as a fire, and upload them to a server To streamline the process, we assume that each unique emergency is represented by a single image, meaning that two distinct images must depict two separate incidents.

To effectively identify emergency situations in a city, we create a learning database that includes various images depicting different types of emergencies: fire, injury, damaged buildings, damaged roads, and flooding Each image is annotated according to its corresponding emergency type The process involves analyzing an incoming image at a given time, where the emergency type is unknown We then search the learning database for the k visually closest images to the input image By counting the occurrences of each emergency type among these k nearest images, we can determine the type of the incoming image For instance, if among the k nearest images there are 10 fire images, 5 injury images, 6 damaged buildings, 4 damaged roads, and 5 flooding images, we classify the input image as a "fire" type In cases where there are equal numbers of two types, such as 15 fire images and 15 damaged buildings among the 30 nearest images, the classification is based on the order of appearance of the first image of each type in the list.

Alors, dans notre système, on doit distinguer deux bases d’images :

• Une base d’images d’apprentissage contenant les images qui sont déjà annotées selon cinq types de sinistres différents

A current image database contains the images received by the system at a specific moment It is essential to identify the type of incident corresponding to each image in this database and assign an urgency level to each one.

We need to organize images into two distinct spaces: the internal descriptors of visual content and the external descriptors of geographic location The SR-tree structure is selected for both spaces as it enhances query performance for high-dimensional data (internal image descriptors) and is well-suited for geographic data structuring This structure facilitates and accelerates the search for similar images within the content space and the calculation of proximity between events in the geographic location space The system's data will be structured accordingly.

SYSTEME DE RECHERCHE D’INFORMATIONS BASE SUR UNE DOUBLE

ANALYSE DES RESULTATS

Tiêu đề	Indexation Et Recherche D’image Par Le Contenu Et Par La Localisation Géographique
Tác giả	Lai Hien Phuong
Người hướng dẫn	Jean-Marc Ogier, Alain Boucher, Nguyen Nhu Van
Trường học	Universite de La Rochelle
Chuyên ngành	Master d'informatique Option Intelligence Artificielle & Multimédia
Thể loại	Memoire de Fin D’etude
Năm xuất bản	2009
Thành phố	Hanọ

Định dạng
Số trang	52
Dung lượng	1,54 MB