APPROCHES COLLECTIVES POUR LE PROBLEME DE LA PATROUILLE MULTI AGENTS

Institut de la Francophonie pour l’Informatique Laboratoire Lorraine de Recherche en Informatique et ses Applications (LORIA) – UMR 7503 Master INTELLIGENCE ARTIFICIELLE ET MULTIMÉDIA, 2ème année, Spécialité RECHERCHE Année universitaire 2005 – 2007 APPROCHES COLLECTIVES POUR LE PROBLEME DE LA PATROUILLE MULTI-AGENTS Mémoire présenté par CHU Hoang Nam Stage effectué au LORIA, Projet INRIA MaIA Directeurs : • M Olivier SIMONIN – Mtre de Conférences (Université Henri Poincaré – Nancy 1) ã M Franỗois CHARPILLET Directeur de Recherche (INRIA) Vandœuvre-lès-Nancy, Septembre 2007 Remerciements Je tiens en premier lieu remercier tout particuliốrement Olivier Simonin et Franỗois Charpillet pour mavoir encadré pendant ces six mois Je remercie de leur contact chaleureux, leurs conseils et encouragements, leur soutien permanent et la liberté de recherche qu’il a bien voulu me laisser Je souhaite également remercier Alexis Drogoul pour m’avoir introduit ce stage, fait confiance et encouragé dès le début de mon travail Mes sincères remerciements vont également tous les professeurs de l’Institut de le Francophonie pour l’Informatique (IFI) pour m’avoir dirigé tout au long de mes études l’IFI Je remercie l’ensemble du personnel de l’équipe MaIA pour leur formidable accueil, leur gentillesse et une ambiance de travail particulièrement favorable Merci Cédric, Jamal, Yoann, Ilham et Arnaud pour leurs amabilités et chaleurs, Geoffray pour son cours de langue humoriste, Rodolphe, Nazim pour leurs conseils précieux Un grand merci aux mes camarades de la promotion XI pour leur amitié et leur aide dès le début de mon étude l’IFI Merci enfin mes parents et mes amis pour leur soutien et leur encouragement tout instant Table des matières REMERCIEMENTS TABLE DES MATIERES TABLE DES FIGURES INTRODUCTION PROBLÈME MULTI-AGENTS DE LA PATROUILLE 1.1 1.2 1.3 APPROCHE PAR SYSTÈMES MULTI-AGENTS RÉACTIFS 11 2.1 2.2 2.3 2.4 CRITÈRES D’ÉVALUATION ENVIRONNEMENT TRAVAUX ANTÉRIEURS 10 INTELLIGENCE COLLECTIVE (SWARM INTELLIGENCE) 11 PHÉROMONE DIGITALE 12 EVAP : UN MODÈLE BASÉ SUR L’ÉVAPORATION DES PHÉROMONES 12 CLING : UN MODÈLE BASÉ SUR LA PROPAGATION D’INFORMATIONS 14 COMPARAISON LES PERFORMANCES ENTRE EVAP ET CLING 17 3.1 SIMULATION ET ANALYSE 17 3.2 DISCUSSION 20 3.2.1 Complexité 20 3.2.2 Exploration et patrouille 21 3.2.3 Avantages et défauts des méthodes 21 PROBLÈME D’ÉNERGIE DANS LA PATROUILLE 22 4.1 4.2 LIMITATION ÉNERGÉTIQUE 22 MARKA : UN MODÈLE COLLECTIF BASÉ SUR LA CONSTRUCTION DE CHAMP NUMÉRIQUE POTENTIEL 23 4.2.1 Comportement des agents 24 4.2.2 Algorithme 25 4.2.3 Estimation de l’autosuffisance 26 4.3 TANKER : UNE APPROCHE AUTO-ORGANISÉE COLLECTIVE POUR L’OPTIMISATION DE POSITION DE TANKER 26 4.3.1 Les forces attractives et répulsives 27 4.3.2 Comportement du modèle (algorithme) 28 PERFORMANCES DE MARKA ET TANKER 30 5.1 5.2 5.3 TÂCHE SIMPLE 31 TÂCHE DYNAMIQUE 31 AVANTAGES ET DÉFAUTS DES MODÈLES 33 CONCLUSIONS 34 PERSPECTIVES 34 BIBLIOGRAPHIE 36 ANNEXES 39 SWARM APPROACHES FOR THE PATROLLING PROBLEM, INFORMATION PROPAGATION VS PHEROMONE EVAPORATION 40 Table des figures Figure : Espace « discret » et espace « continu » Figure : Oisiveté propagée 14 Figure : Topologies étudiées 17 Figure : Topologie sans obstacles, agents, 1000 itérations 18 Figure : Topologie sans obstacle, moyenne IGI 18 Figure : Topologie couloir-salles, agent, 4000 itération 19 Figure : Topologie 6-pièces, agents, 2000 itérations 19 Figure : Topologie 6-pièces, moyenne IGI 19 Figure : EVAP et CLInG, Map E 20 Figure 10 : Le processus de marquage d’environnement 24 Figure 11 : La formation de gradient des champs numériques 25 Figure 12 : Attraction guide le Tanker au barycentre des demandes 27 Figure 13 : Répulsion garde la distance entre Tankers A et B 28 Figure 14 : Diffusion en environnement discret 29 Figure 15 : MARKA et TANKER, agents, 4000 itérations 31 Figure 16 : Installation d’environnement 32 Figure 17 : MARKA et TANKER, groupes, 4000 itérations 32 Figure 18 : Illustration de TANKER 33 Approches collectives pour le problème de la patrouille multi-agents Introduction Ce stage a été réalisé dans le cadre du master recherche informatique, intelligence artificielle et multimédia, option intelligence artificielle Il a eu lieu au laboratoire LORIA (UMR 7503, Nancy) au sein de l’équipe INRIA MaIA Le stage s’est déroulé sous la direction d’Oliver SIMONIN, Chargé de Recherche INRIA, et Franỗois CHARPILLET, Directeur de Recherche INRIA, responsable scientifique de l’équipe MaIA Le problème multi-agents de la patrouille consiste faire parcourir un territoire des agents de telle sorte que les différentes parties du territoire soient visitées le plus souvent possible par ces agents Ce problème avait été introduit par Ramalho et al dans [8], et avait été abordé avec des algorithmes multi-agents classiques Dans le cadre du stage effectué au LORIA, nous abordons l’approche par l’intelligence en essaim pour le problème de la patrouille et de l’exploration multi-agents Plus précisément, ce stage se destine l’étude des algorithmes multi-agents réactifs dont le but est de patrouiller et explorer un environnement inconnu De plus, un autre objectif de ce stage est d’intégrer la limitation d’énergie au problème de la patrouille, de proposer un algorithme qui permette aux agents de coordonner les activités de patrouille et de recharge Le rapport se divise en parties La première introduit le problème multi-agents de la patrouille ainsi que les travaux antérieurs Dans une seconde partie, nous présentons l’intelligence collective et deux algorithmes, EVAP et CLInG, basés sur cette approche pour traiter le problème de la patrouille La troisième partie présente la comparaison des performances entre ces deux algorithmes Enfin, la dernière partie est consacrée au problème de l’énergie dans la patrouille CHU Hoang Nam Approches collectives pour le problème de la patrouille multi-agents Problème multi-agents de la patrouille Selon le dictionnaire Petit Larousse, une patrouille est « une mission de renseignements, de surveillance ou de liaison confiée une formation militaire (aérienne, terrestre ou navale) ou policière ; désigne également la formation ellemême » Selon le dictionnaire Oxford, « patrolling is the act of walking or travelling around an area, at regular interval, in order to protect or to supervise it » Le problème multiagents de la patrouille, ou patrolling en anglais, consiste déployer un ensemble d’agents, généralement en nombre fixe, afin de visiter intervalle régulier les lieux stratégiques d’une région Ce problème se pose typiquement dans les jeux vidéos [8] [10] lorsqu’une équipe de créatures virtuelles a pour mission de patrouiller sur un territoire déterminé, dans certaines applications internet, ainsi que dans le déplacement d’une équipe de robots, dans la surveillance d’un lieu ou d’un bâtiment en vue de le défendre de toute intrusion, etc Malgré son utilité et son intérêt scientifique, la patrouille multi-agents n’a été étudiée que récemment Dans [8], un des premiers travaux, Machado et al ont déjà proposé les premières notions et aussi évalué différents architectures d’agent pour traiter ce problème Ainsi, nous plaỗant dans cette configuration du problốme nous pensons que des approches de type intelligence en essaim peuvent s’avérer particulièrement pertinentes Elles reposent en général sur le marquage de l’environnement et définissent un moyen de communication et de calcul indirect entre les agents Les sous-sections suivantes présentent des critères d’évaluations de la performance d’une stratégie de patrouille, les types d’environnements ainsi que leur représentation 1.1 Critères d’évaluation Patrouiller efficacement dans un environnement, éventuellement dynamique, nécessite que le délai entre deux visites d’un même lieu soit minimal L’ensemble des travaux portant sur les stratégies de patrouille considèrent que l’environnement est connu, bidimensionnel et qu’il peut être réduit un graphe G(V,E) (V l’ensemble des nœuds visiter, E les arrêtes définissant les chemins valides entre les nœuds) Plusieurs critères peuvent être utilisés afin d’évaluer la qualité d’une stratégie de CHU Hoang Nam Approches collectives pour le problème de la patrouille multi-agents patrouille Nous utilisons ceux se basant sur le calcul de l’oisiveté des nœuds (ou Idleness) qui peuvent être calculés au niveau d’un nœud ou au niveau du graphe Nous utilisons les critères suivants qui sont introduits dans [8] : • Instantaneous Node Idleness (INI) : nombre de pas de temps où un nœud est resté non visité, appelé oisiveté dans le reste du présent rapport Critère calculé pour chaque nœud • Instantaneous Graph Idleness (IGI) : moyenne de l'Instantaneous Idleness de tous les noeuds pour un instant donné Critère calculé au niveau du graphe • Average Graph Idleness (AvgI) : moyenne de IGI sur n pas de temps Critère calculé au niveau du graphe • Instantaneous Worst Idleness (IWI) : plus grande INI apparue au cours d’un pas de temps donné, appelé oisiveté maximale ou pire oisiveté dans le reste du présent rapport Critère calculé au niveau du graphe 1.2 Environnement On trouve dans les travaux antérieurs deux types d’environnement utilisés par les modèles de la patrouille multi-agent : espace « discret » et espace « continu » Espace « discret » L’espace « discret », qui se compose d’un ensemble de nœuds visiter, est représenté sous forme un graphe G (V, E) (V l’ensemble des nœuds visiter, E les arrêtes définissant les chemins valides entre les nœuds) Ce type de représentation convient pour le cas de la patrouille entre les lieux intérêts Espace « continu » L’espace continu représente une aire couvrir, comme une chambre, un bâtiment etc On peut modéliser ce type d’espace par une grille où chaque cellule représente soit un lieu visiter, soit un lieu inaccessible (mur, obstacle) (cf Figure 1) CHU Hoang Nam Approches collectives pour le problème de la patrouille multi-agents Figure : Espace « discret » et espace « continu » La pré-connaissance de l’environnement est également une condition importante dans le problème de la patrouille En effet, elle influe sur le choix de l’algorithme de patrouille ainsi que sur sa performance Environnement connu Les agents sont ici dotés d’une pré-connaissance de l’environnement Une architecture de type cognitive conviendra donc ce type d’environnement Les agents peuvent travailler de faỗon offline, par exemple, mộmoriser la carte ou faire une planification du parcours optimale, avant l’exécution de la tâche [4] [1] Environnement inconnu La tâche de patrouille est exécutée sans connaissance de l’environnement Il est alors évident que les agents doivent effectuer deux tâches : explorer l’environnement et patrouiller Dans ce cadre, on peut utiliser des agents réactifs, ces derniers pouvant réaliser un apprentissage ou bien recourir des techniques basées sur le marquage de l’environnement Dans le cadre de ce stage, nous nous concentrons sur le problème de la patrouille en environnement inconnu, c'est-à-dire qu’il est impossible de disposer du graphe représentant l’environnement L’espace exploré par les agents est représenté comme une matrice de cellules dont chaque cellule peut être soit: • Libre • Occupée par un agent • Être inaccessible (un obstacle, un mur …) CHU Hoang Nam Approches collectives pour le problème de la patrouille multi-agents 1.3 Travaux antérieurs Le problème de la patrouille a été abordé ces dernières années selon des approches centralisées, heuristiques ou encore distribuées, mais toujours dans le cadre d’une représentation sous forme d’un graphe de l’environnement (un nœud étant un lieu prédéterminé qu’il faut visiter, une arrête un chemin reliant deux nœuds) et donc nécessairement une pré-connaissance de l’environnement Il existe divers travaux reposant sur des algorithmes de parcours de graphes dérivant souvent du problème du voyageur de commerce [1] On trouve dans [4] une solution reposant sur le principe d’optimisation par colonie de fourmis (ACO algorithms) mais qui nécessite encore une pré-connaissance de l’environnement sous la forme d’un graphe Il en est de même pour les techniques base d’apprentissage qui reposent sur la recherche d’un parcours multi-agent optimal calculé offline, c'est-à-dire que le parcours optimal est calculé avant l’exécution de tâche dans l’environnement considéré Par conséquent, une telle technique n’est pas capable de s’adapter un changement online du problème tel qu‘une modification de la topologie de l’environnement ou l’ajout ou la perte d’un certain nombre d’agents Une autre limite de ces solutions est l’explosion combinatoire lorsque la taille du graphe devient importante (plusieurs centaines de nœuds) ou que le nombre d’agents déployé est lui-même grandissant Or, aujourd’hui, de nombreuses applications concrètes présentent la problématique de la patrouille sur de vastes espaces, connus ou inconnus, avec un nombre important d’agents (drones déployés pour surveiller un lieu stratégique, surveillances de bâtiments par des robots mobiles, etc.) CHU Hoang Nam 10 Approches collectives pour le problème de la patrouille multi-agents a pas de redondance des positions des tankers Ils se répartissent dans les groupes grâce la force répulsive qu’ils exercent entre eux Figure 18 : Illustration de TANKER 5.3 Avantages et défauts des modèles MARKA est une solution efficace pour le problème d’énergie Un avantage de MARKA est la capacité d’estimation de l’autosuffisance d’agent qui est très importante dans ce problème En outre, cet algorithme peut même fonctionner dans un environnement avec obstacles Cependant, il pose encore des problèmes au niveau de la découverte des stations de recharge Par contre, TANKER est plus performant que MARKA dans le contexte des travaux dynamiques (multi-groupes sur différents régions) Un point remarquable de TANKER est la capacité de suivi des agents, d’évolution vers une position optimale ainsi que la répartition dans les groupes d’agents des tankers La limitation de TANKER est sa faible capacité d’adaptation des environnements complexes (avec obstacles, mur …) CHU Hoang Nam 33 Approches collectives pour le problème de la patrouille multi-agents Conclusions Ce stage concernait l’étude de l’approche par l’intelligence collective du problème multi-agent de la patrouille dans un environnement inconnu De plus, un autre objectif de ce stage était d’intégrer la limitation d’énergie au problème de la patrouille et de proposer un algorithme qui permette aux agents de coordonner les activités de patrouille et de recharge Nous avons effectué des tests comparatifs entre le modèle CLInG [16] et EVAP Cette étude expérimentale a montré l’intérêt d’une approche par l’intelligence collective : • Simple mais efficace : Le comportement des agents est simple, mais les résultats obtenus sont très étonnants • Convergence vers une performance stable • Robustesse : Le système est capable de se réorganiser de lui-même pour s’adapter aux différentes configurations de la patrouille • Auto-organisations Cette étude est présentée plus en détail dans un article accepté la conférence IEEE ICTAI 2007 (International Conference on Tools with Artificial Intelligence) Nous avons également proposé deux modèles, MARKA et TANKER, pour traiter le problème de la patrouille avec limitation d’énergie Les bonnes performances exhibées par TANKER montrent que c’est un modèle efficace pour la version dynamique et multi-groupes du problème de l’approvisionnement en énergie Ces résultats sont une validation de thèse de Moujahed et al Perspectives L’un des objectifs de la poursuite de ces travaux sur les modèles TANKER et MARKA est l’expérimentation réelle en utilisant les WIFIBots Cependant, le passage de la simulation la réalité reste encore l’heure actuelle un verrou tres fort Il est nécessaire d’approfondir de l’étude de TANKER et MARKA au niveau de la robustesse, de l’adaptation aux perturbations afin d’évaluer le comportement des robots De plus, une comparaison des performances avec des solutions centralisées serait intéressante et utile CHU Hoang Nam 34 Approches collectives pour le problème de la patrouille multi-agents Je trouve que les deux modèles MARKA et TANKER exhibent des caractéristiques appropriées au contexte du projet SCOUT (Survey of Catastrophes and Observation un Urban Territories) au VIETNAM Il sera donc nécessaire de poursuivre l’étude de l’adaptation de ces modèles au contexte de ce projet CHU Hoang Nam 35 Approches collectives pour le problème de la patrouille multi-agents Bibliographie [1] A Almeida, G Ramalho, H Santana, P Tedesco, T Menezes, V Corruble, Y Chevaleyre, Recent Advances on Multi-Agent Patrolling, Proceedings of the 17th Brazilian Symposium on Artificial Intelligence, pp.474 – 483, 2004 [2] E Bonabeau, M Dorigo, G Theraulaz, Swarm Intelligence: From Natural to Artificial Systems, Oxford University Press, 1999 [3] Y Chevaleyre, Le Problème Multiagent de la Patrouille, In Annales du LAMSADE No 4, 2005 http://www.lamsade.dauphine.fr/~chevaley/papers/anna_patro.pdf [4] Y Chevaleyre, Theoretical analysis of multi-agent patrolling problem, Proceedings of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology, pp.302 - 308, 2004 [5] H.N Chu, A Glad, O Simonin, F Sempé, A Drogoul, F Charpillet, Swarm approches for the patrolling problem, information propagation vs pheromone evaporation, IEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2007 [6] O Gérard, J-N Patillon, F d’Alché-Buc, Discharge Prediction of Rechargeable Batteries with Neural Networks, in Integrated ComputerAided Engineering, Volume 6, pp 41 – 52, 1999 [7] F Lauri, F Charpillet, Ant Colony Optimization applied to the Multi-Agent Patrolling Problem, IEEE Swarm Intelligence Symposium, 2006 [8] K Kouzoubov, D Austin, Autonomous Recharging for Mobile Robotics, Proceedings of Australian Conference on Robotics and Automation, 2002 [9] A Machado, G Ramalho, J.-D Zucker, A Drogoul, Multi-Agent Patrolling: an Empirical Analysis of Alternative Architectures, Proceedings of MultiAgent Based Simulation, pp.155 – 170, 2002 [10] A Machado, A Almeida, G Ramaldo, J.-D Zucker, A Drogoul, MultiAgent Movement Coordination in Patrolling, In 3rd International Conference on Computers and Games, 2002 CHU Hoang Nam 36 Approches collectives pour le problème de la patrouille multi-agents [11] A Muñoz-Meléndez, F Sempé, A Drogoul, Sharing a charging station without explicit communication in collective robotics, Proceedings of the 7th International Conference on Simulation of Adaptive Behavior on From Animals to Animals, pp 383 – 384, 2002 [12] S Moujahed, O Simonin, A Koukam, Location Problems Optimization by a Self-Organizing Multiagent Approach, in MAGS International Journal on Multigent and Grid Systems (IOS Press), Special Issue on Engineering Environments For Multiagent Systems, 2007 [13] L Panait, S Luke, A pheromone-based utility model for collaborative foraging, Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp.36 – 43, 2004 [14] H.V Parunak, M Purcell, R O’Connell, Digital Pheromones for Autonomous coordination of swarming UAV’s, Proceedings of AIAA First Technical Conference and Workshop on Unmanned Aerospace Vehicles, Systems, and Operations, 2002 [15] H.Santana, G.Ramalho, V Corruble, R Bohdana Multi-Agent Patrolling with Reinforcement Learning, Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multi-Agent Systems, pp.11221129, 2004 [16] F Sempé, Auto-organisation d'une collectivité de robots: application l'activité de patrouille en présence de perturbations, PhD Thesis Université Paris VI., 2004 [17] F Sempé, A Drogoul, Adaptive Patrol for a Group of Robots, Proceedings of the 2003 IEEE/RSJ, Intelligence Robots and Systems, Las Vegas, Nevada, pp 2865 – 2869, 2003 [18] M Silverman, D M Nies, B Jung, and G S Sukhatme, Staying alive: A docking station for autonomous robot recharging, Proceedings of the IEEE International Conference on Robotics and Automation, Washington D.C., pp 1050 – 1055, 2002 [19] O Simonin, F Charpillet, E Thierry, Collective construction of numerical potential fields for the foraging problem, INRIA Rapport de Recherche, 2007 CHU Hoang Nam 37 Approches collectives pour le problème de la patrouille multi-agents [20] I.A Wagner, M Lindenbaum, A.M Bruckstein, Distributed Covering by Ant-Robots using Evaporating Traces, IEEE Transactions on Robotics and Automation, vol.15, n.5, pp.918-933, 1999 [21] P Zebrowski, R T Vaughan, Recharging Robot Teams: A Tanker Approach, Proceedings of the International Conference on Advanced Robotics, pp 803 – 810, 2003 CHU Hoang Nam 38 Approches collectives pour le problème de la patrouille multi-agents Annexes CHU Hoang Nam 39 Approches collectives pour le problème de la patrouille multi-agents Swarm Approaches for the Patrolling Problem, Information Propagation vs Pheromone Evaporation Hoang-Nam Chu1,2, Arnaud Glad1, Olivier Simonin1, Franỗois Sempộ2, Alexis Drogoul3,2, Franỗois Charpillet1 MAIA, INRIA Lorraine, Campus scientifique, BP 239, 54506 Vandœuvre-lès-Nancy, France Institut Francophone pour l’Informatique, Hanoï, Vietnam IRD - Institut de Recherche pour le Développement, Bondy, France {hoangnam.chu, arnaud.glad, olivier.simonin, francois.charpillet}@loria.fr francois@ifi.edu.vn, drogoul@mac.com Abstract This paper deals with the multi-agent patrolling problem in unknown environment using two collective approaches exploiting environmental dynamics After specifying criteria of performances, we define a first algorithm based only on the evaporation of a pheromone dropped by reactive agents (EVAP) Then we present the model CLInG [17] proposed in 2003 which introduces the diffusion of the idleness of areas to visit We systematically compare by simulations the performances of these two models on growingcomplexity environments The analysis is supplemented by a comparison with the theoretical optimum performances, allowing to identify topologies for which methods are the most adapted Keywords: Multi-agent patrolling, reactive multiagents system, digital pheromones Introduction Patrolling consists in deploying a set of agents (robots) in an environment in order to visit regularly all the accessible places [5] This problem was studied in recent years according to centralized, heuristic and distributed approaches, but always within a discrete representation of the environment, i.e a graph A vertex is a predetermined place that should be visited and an edge is a valid path linking two places Thus, various work based on graph search algorithms have been proposed, often deriving from the problem of the traveling salesman (cf [1] for a presentation of these various techniques and their comparison) For instance Lauri and Charpillet [4] proposed a solution relying on ACO algorithms (ants colonies optimization) which requires CHU Hoang Nam a representation of the environment through a graph There are also approaches based on learning techniques (e.g [8]) They consist in computing offline an optimal multi-agent path, which is then carried out online in the considered environment Consequently, this type of solution is not able to self-adapt to online changes of the problem/environment, such as variations of the number of agents or moves of obstacles, etc Moreover, these approaches are subject to combinatory explosion when the graph size becomes important (several hundreds of nodes) or when the number of deployed agents increases However, nowadays, many concrete applications present the patrolling problem on large spaces, known or unknown, with a significant number of agents (drones deployed to supervise a strategic place, patrolling of buildings by mobile robots, etc.) So, to deal with such a configuration of the problem (unknown environments) we think that swarm intelligence could be an efficient approach It is generally based on the marking of the environment, inspired by the ants’ pheromone drop, which defines an indirect calculation and means of communication among the agents [2] These digital pheromones rely on two processes calculated by the environment: the diffusion and the evaporation of information (pheromone’s quantity) The diffusion process enables the propagation of the information by effect of vicinity, while evaporation allows removing gradually the information Sempé et al [10] proposed in 2003 an algorithm named CLInG exploiting the propagation of information, which is close to the diffusion process, showing the interest of an approach based on an active environment However this approach appears relatively expensive as it exploits processes of propagation and of counting in the environment (idleness evaluation) Thus we propose in this article another algorithm based 40 Approches collectives pour le problème de la patrouille multi-agents only on the evaporation of a digital pheromone laid down by agents, named EVAP model Our objective is to compare these two collective techniques, which are based on an active environment, but having different complexity, in order to better understand the functioning and the performances of these stigmergic principles The article is organized as follows Section defines the patrolling problem and gives the performance criteria In section we present the EVAP model, and in section the CLInG model Section presents experimentation of both models, where they are compared and analyzed on environments with an increasing complexity In section we synthesize these results to draw the interest of the two approaches Finally this work ends with a conclusion and presents some perspectives More precisely, the environment is a matrix where each cell may be free, occupied by an agent or unreachable (obstacle/wall) So two agents cannot occupy the same cell at the same moment Patrolling is simulated with a constant time step that enables every agent to an elementary action: move to one of the four neighboring cells The definition of idleness presented above for the nodes of graphs is retained and applied to each cell that must be visited This paper focuses on the generic patrolling problem which consists in minimizing the average idleness of accessible cells We also take into account to minimize the worst idleness The multi-agent patrolling problem 3.1 Evaporation process Patrolling consists in deploying several agents, generally in a fixed number, in order them to visit the strategic places of an area periodically It aims at obtaining reliable information, seeking objects, watching over places in order to defend them for any intrusion, etc An efficient patrol in an environment (possibly dynamic) needs the delay between two visits of the same place to be minimal Related works on multiagent patrol strategies consider that the environment is known, two-dimensional and that it can be reduced to a graph G (V, E) (V the set of the nodes to be visited, E the arcs defining the valid paths between nodes) Several criteria may be used in order to evaluate the efficiency of a patrolling strategy We adopt those based on the node idleness computing, which corresponds to the time elapsed from the last visit by an agent The following criteria, relative to idleness and presented in [5], can be calculated either at the node level or at the graph level: • Instantaneous Node Idleness (INI): Number of time steps elapsed while the node is not visited This criterion is computed for every node • Instantaneous Graph Idleness (IGI): Average of Instantaneous Node Idleness of each node at a given time This criterion is computed at the graph level • Instantaneous Worst Idleness (IWI): Highest INI computed at a given time, graph level criterion As it was emphasized in the introduction, in this paper we tackle the multi-agent patrolling problem in unknown environments, i.e for which we not have a graph of the areas to visit So, we consider the space to patrol as a vast empty grid, which have a granularity defined by the area an agent can perceive at a given time The swarm intelligence principle, inspired by the study of social insects [2], is based on the collective organization of the agents through their indirect communications in the environment These communications are based on the deposit and the diffusion of a chemical substance which allows agents to cooperate through the environment So, the computational model, called digital pheromones, needs to define an active environment allowing both evaporation and diffusion process This indirect way of communication is particularly interesting to deal with tasks in initially unknown environments (e.g foraging [10], path-planning [14], coverage, exploration ) So, we propose a new algorithm for multi-agent patrolling in unknown environment, named EVAP, which relies on the evaporation process of a pheromone dropped by the agents This model only uses the evaporation process to exploit the remaining quantity of pheromone as an indicator of the time elapsed since the last visit of a cell (representing the idleness) Thus, we define the agents behavior as a descent of the pheromone gradient (i.e moving towards the cells containing the less pheromone quantity) The evaporation process of the pheromone in a cell is expressed by the following geometrical series on the quantity: qn+1 = qn * (1 – ρ) This process enables the creation of an oriented gradient following the cells visit chronology It only requires ρ ∈ (0, 1) and q0 > In fact qn decreases for any ρ value in (0, 1), so the model is independent from ρ parameter In the experiments ρ = 0.001 CHU Hoang Nam EVAP: model based on deposit and evaporation of information 41 Approches collectives pour le problème de la patrouille multi-agents 3.2 Algorithm The following-gradient-flow behaviour allows agents to explore the most formerly visited (or not visited at all) cells An agent can perceive over its four neighbouring cells (noted Neighbourhood in the algorithms) for which it can read the pheromone quantity Then, it moves towards the one containing the minimum value and drops Qmax value In case several neighbouring cells contain the same amount of pheromone, the agent chooses to move randomly among them However, to avoid too erratic trajectories, if the current direction can be kept, the agent chooses, according to a probability p, to maintain it EVAP Agent ALGORITHM m = min(PheroQ (Neighborhood)) For each cell c of Neighborhood Do If PheroQ(c) = m Then NeighborsList ← NeighborsList + c EndIf EndFor nextCell ← cell to which we are heading If nextCell ∈ NeighborsList and random(1) < p Then moveTo(nextCell) Else moveTo(random(NeighborsList)) EndIf DropPheromones(Qmax) It is obvious that a random behavior may conduct to the following “wrong” choice: The initial marking created a gradient that forces the agent to cross already visited cells before going back and continuing its exploration task This phenomenon may happen every time a similar choice happens: A means to avoid this kind of problem is to give to agents a perception further than the immediate nodes, in order to avoid paths returning to already visited cells For this purpose, we will see that CLInG uses the environment to attract agents towards the not-yetvisited nodes For EVAP, we keep a limited perception as we assume that cooperation between agents can reduce the problem If two agents visit in a short delay a node which requires a choice between two possible routes; if the first one makes the “bad” choice, the second one will necessary make the good choice, i.e to continue the exploration because it will follow a not yet explored path: EVAP Environment ALGORITHM For each cell c of environment Do If PheroQ(c) > Then ComputeEvapPhero(c) EndIf EndFor The EVAP model can be seen as an extension of the Yanovski et al algorithm [12] to grid environments Indeed, the algorithm in [12] uses a pheromone which is dropped on the edges of a given graph necessarily both oriented and eulerian The principle of the EVAP agent behavior is the same as the gradient descent described in [12] 3.3 Interest of the swarm intelligence In an early work on the graph coverage problem, Wagner et al [11] emphasized drawbacks of reactive approaches using marking of visited nodes They showed that sub-optimal exploration can be generated for certain topologies when using only one agent This problem comes from the very limited perceptions of agents It can be explained as following Suppose that an agent has visited three consecutive nodes of the following graph while dropping pheromones It now faces to the choice between two not yet explored nodes: CHU Hoang Nam We notice that the more agents there are, the most this drawback will be attenuated (see section 5) We show through the experiments conducted in section that this limit exists only for specific topographies In those cases, CLInG proves to be a good solution CLInG: model based on information diffusion 4.1 Approach based on idleness diffusion Sempé et al [16] [10] proposed a patrolling algorithm which supposes that agents are reactive (like EVAP) and that the environment calculates two following information: - the Idleness of each cell - the diffusion of maximal Idleness values At each iteration, the environment calculates the idleness of each accessible cell by incrementing its value by unity The idleness of a cell is reset to zero when it is visited The originality of algorithm CLInG is adding second information into the environment by the diffusion of the maximum idleness Propagating makes 42 Approches collectives pour le problème de la patrouille multi-agents a second gradient that guides the agents to cells of interests (the most formerly visited) More formally, a cell i carries a propagated idleness OPi besides its individual idleness Oi The gradient created by the propagated idleness is shared by the whole collectivity, cf Figure 4.2 Algorithm CLInG Agent ALGORITHM m = max(Propagated_I (NeighboringCells)) For each cell c of NeighboringCells Do If Propagated_I (c) = m Then ListNeighbors + c ListNeighbors End If End For moveTo (Random(ListNeighbors)) Idleness(currentCell) CLInG Environment ALGORITHM Figure Propagated Idleness The propagated idleness of a cell depends on the propagated idleness of its neighbors and its individual idleness It is equivalent to a function that takes into account the idleness and the presence of agents on the way OPi = max[Oi , max ( f (i, j ))] with j the neighboring cells of i, and propagation function : f (i , j ) = OPj − α − β I ( j ) f If OPj > OPmin = OPj − Otherwise α is the coefficient of propagation Its value is important (for example 30 in the experiments) in order to create a short distance gradient that does not risk to attract all the agents to a maximum idleness cell I(j) is the interception function that stops a propagation whenever it meets an agent I(j) equals to if there is an agent in cell j, otherwise This factor also restrains the gathering of agents from the same way (the order of magnitude of β is 10), cf details in [16] [10] OPmin is a threshold that assures the propagated idleness forms always a gradient and remains positive due to the fact that the individual idleness is always positive The behavior of each agent consists in following the gradient of maximal idleness (cf illustration Figure 1) This is a dual approach of the previous algorithm (EVAP), but this time, the information in the neighboring cells can come from further cells The propagation of the maximum idleness allows exploiting the inherent environment’s properties and to transform objective information into subjective one which can be used directly by the agents The algorithm provides thus an organization among the agents according to the distribution of the idleness in the environment CHU Hoang Nam Simulations and analysis 5.1 Methodology the If OPj − α − β I ( j ) ≥ OPmin = OPmin For each cell c of Environment Do Calculate Idleness(c) End For For each cell c of Environment Do Calculate Propagated_I(c) End For The simulations were implemented in NetLogo The conditions of simulation, perceptions and moves are strictly the same for the two algorithms The experiments of the models were performed in environments with growing complexity, taken or adapted from [1] and [18], cf Figure (obstacles in black) Topology A is an open-field environment which means that agents are free for any movement Environment B is a spiral enabling a corridor with dead ends Environment C allows constraining the environment by a density of obstacle cells, generated randomly (we used a density of 20%, without isolated cells) Environment D represents a corridor overlooking rooms At last, E presents rooms which have imbricate entries, that we name 6-rooms More generally we define the n-rooms problem as n rooms with imbricate entries Figure Experimental topologies topologies We tested the algorithms with different population sizes by doubling systematically the number of agents: 1, 2, 4, 8, 16, 32 and 64 We aim at evaluating performances and collective skills of both models 43 Approches collectives pour le problème de la patrouille multi-agents Every simulation is executed during 3000 iterations (4000 for environments D and E) and 10 times to compute means values 5.2 Simple environments Non-obstacle environment (Map A) Figure shows performances with agents, during the first 1000 iterations, in a 20x20 cells environment (agents are initially randomly located) This plot illustrates the average idleness and the worst idleness of both studied methods (CLInG and EVAP) We note apparently that both average idleness values are rapidly stable and identical, and moreover, very close to the optimum theoretical value (represented by the horizontal line) the theoretical optimum value It is clear that CLInG and EVAP have the same effectiveness for this topology, except the single agent case This result is also verified with bigger environments Spiral environment (Map B) Figure illustrates the average and the worst idleness values with agents in a 20x20 cells spiral environment, over 3000 iterations (agents are initially randomly located) Regarding to the worst idleness values, EVAP is slightly better despite the fact that it does not become stable As in the previous topology, the average idleness values are very close to the optimum value and relatively stable Figure Spiral topology, agents 3000 iteration iteration Figure NonNon-obstacle topology, agents, 1000 it The theoretical optimum idleness values are calculated as follows: Let c be the number of accessible cells of the environment Considering one agent moving towards a new cell at each iteration, then it visits all cells in c-1 iterations Thus, the idleness of the departure cell will reach c-1 For n agents, the optimum maximum idleness is (c/n)-1 The optimum average idleness is therefore ((c/n)-1))/2 (as idleness values are linearly distributed between and the optimum value) Figure shows that EVAP is a bit more efficient that CLInG on average idleness for a small number of agents (up to 4) This is clear for one and two agents, where EVAP almost attains the theoretical optimum value It is important to note that for algorithm EVAP, we obtained this optimum performance for p=1, the probability of keeping the current direction when there is a choice between several cells with equal quantity of pheromone Figure 7.a shows the regularity of the pheromone deposit (the gradient) and as a consequence the optimum paths followed by agents This emergent solution is equivalent to the unique cycle strategy proposed in [1][3] for patrolling in a graph Figure NonNon-obstacle topology, average IGI Figure illustrates the average idleness of both methods for a variation of the agents number In both cases, the doubling of agents improves the performances linearly For each configuration, we notice that the obtained values almost correspond to CHU Hoang Nam Figure Spiral topology, topology, average IGI Gradients in Figure 7.b show the idleness propagations in CLInG (the highest values are the lightest) It appears that these gradients are not regular and may generate noise in agents search (cf section 6) 44 Approches collectives pour le problème de la patrouille multi-agents Figure Map B (screenshot) a) Pheromone Values of EVAP (max is light) b) Propagated Propagated Idleness of CLInG (max is light) Environment with obstacle density (Map C) We measured performances of both models with agents in a 20x20 cells environment having 20% obstacles (agents are initially randomly located) Concerning the average idleness, both methods are close to the theoretical optimum value, with a slight advantage to CLInG algorithm This advantage appears clearly in Figure with the worst idleness measures However this distance declines when the number of agents increases In the following sub-section, we consider more complex topologies in order to identify those creating the most important distance between the two methods Figure Corridororridor-rooms topology, agent, 4000 it Globally the simulations with this topology not show a real difference of performance between both models On the other hand, we show below that "imbricate" rooms constitute a topology discriminating the performance Imbricate rooms (Map E) Figure 10 presents the performances in average and worst idleness of agents in a 20x20 cells environment consisting of six imbricate rooms (cf topology in Figure 11) The agents are initially all located at the same place (right bottom corner of the environment) Figure 20% obstacles topology, average IWI 5.3 Complex environments Corridor – rooms (Map D) We study now the behavior of both models towards complex environments composed of several rooms We first start with map D topology (agents always start at the entrance of the corridor: the room at the bottom) Figure illustrates the performances in average and worst idleness on map D for the particular case of using a single agent In fact, this case shows that the average and worst idleness of EVAP converges to the optimum performance at the same time CLInG is a little less competitive and not stabilize to a constant performance Whenever we increase the population size, we note, however, that the performance of two methods are identical and remain close to the theoretical optimum values, without being stable CHU Hoang Nam Figure 10 10 6-rooms topology, topology, agents, 2000 it Over 2000 iterations, two distinct phases appears clearly for EVAP model We observe in Figure 10, up to iteration 830, that the average idleness, as well as the worst idleness, is higher than CLInG values This can be explained by the fact that agents have a first phase of exploration, which consists in getting into the rooms for the first time (cf Figure 11.a) Then, a second phase, which consists in re-visiting the rooms, is more efficient due to the fact that the pheromone leads the agents directly to the most distant rooms - the more formerly visited - (cf Figure 11.b) The difficulty for EVAP is situated at the doors separating two rooms An agent who explores a room and arrives close to a door has an equal probability to continue the room exploration or to enter in the next room If it chooses to continue the exploration, it risks to not pass again close the door and therefore to ignore 45 Approches collectives pour le problème de la patrouille multi-agents the unvisited room We find here the problem identified by Wagner [18], due to the local vision of agents, of the choice between two nodes with equal interest (cf details in section 3.3) CLInG does not suffer this problem because the unexplored rooms propagate a strong idleness which ensures that agents, close to a door, will be attracted by the unvisited room This propagation allows agents to get into rooms following an optimum way from the first exploration (cf Figure 11.c) Discussion 6.1 Model complexity The objective of the previous section was to measure the interest of introducing the propagation of information compared to a model which only uses the evaporation process Indeed, CLInG can be defined as the algorithm EVAP (idleness playing the role of the pheromone) augmented with the diffusion of information through the environment We identified topologies where CLInG proves to be more effective due to this propagation of information However this process has a cost The difference of algorithms complexity is in the calculation performed by environment at each iteration More precisely, in EVAP, for c cells containing the pheromone, it takes c evaporation operations (eq 1) Thus for a n × n cells environment, it takes at most n2 evaporation operations The environment of CLInG is much costly since it needs n2 idleness calculation operations plus n2 spread operations In practice, CLInG proves to be twice as costly in execution times Figure 11 11 EVAP and CLInG, Map E (screenshot) As we mentioned in section 3.3, the problem of the choice between several nodes, for the EVAP model, is reduced proportionately to the number of agents The measures presented in Figure 12 verify this hypothesis (variation of agents number for an identical configuration) Figure 12 12 66-rooms topology, average IGI Concerning the EVAP model, it is interesting to focus on the exploration phase, which corresponds to the formation of a global information Indeed, the order of rooms exploration, induced by the topology, build a gradient of pheromone which is necessarily oriented following this order So, once the gradient is formed, an agent will get into the rooms efficiently while generating systematically the inverse gradient That allows it to alternate getting in and getting out in an optimum way This explains the system converges to a stable performance CHU Hoang Nam 6.2 Exploration and patrol The simulations carried out on complex environments (corridor and imbricate rooms) revealed two functioning phases, in particular for EVAP Initially, the system performs a first exploration of the environment, then, it changes brutally for more stable and effective behaviour These two phases also exist for CLInG but, generally, the first phase is shorter than the EVAP one It is due to attractions induced by the not yet explored areas (see Figure and Figure 10) It is interesting to pay attention that in both methods the system selforganizes via the marking of the environment, and converges towards an effective patrolling (possibly optimal, see EVAP on map B) 6.3 Advantages and drawbacks of models EVAP is an effective solution for average complexity environments (non-obstacle, spiral, corridor-rooms) We have shown that the deposit and the evaporation of a pheromone guarantee a low average idleness On the other hand, considering the worst idleness, CLInG proves generally to be more efficient than EVAP, except for spiral and corridor topologies when the number of agents is low More generally, a surprise of this study is the good performances, even optimal, for mono-agent patrolling 46 Approches collectives pour le problème de la patrouille multi-agents It shows that marking the environment can constitute a good solution for mono-agent problems, while guaranteeing scalability on the agent number Indeed, the average idleness decreases linearly with the number of agents, and makes it possible to be more efficient on the exploration phase The study has shown that CLInG is more efficient than EVAP on complex environments composed of imbricate rooms, particularly if the number of agents is low The difference of performances reduces gradually when the number of agents increases This is the consequence of using a swarm approach (see Figure 12 and section 3.3) The information propagated in CLInG model naturally generates complexity in the system While attracting agents towards zones of interest, this process can also generate noise, for example by attracting several agents to the same cell It explains why CLInG does not reach to stabilize to a stationary global behavior (as observed on the spiral topology) Conclusion In this article, we have tackled the multi-agent patrolling problem in unknown environments Algorithm CLING [16], one of the rare propositions, exploits the marking of the environment and the diffusion of information We proposed a simpler model (EVAP) exploiting only the evaporation of a pheromone (no diffusion) The experimental study on environments with growing complexity has shown that the propagation of information is efficient only with some environment topologies For this reason, we have identified the N-imbricate-rooms problem which defines a new case study for this problem We have shown that EVAP model is less efficient for the initial exploration phase but then converges to an average performance identical to CLInG This difference of performance is reduced when the population increases, showing the collective nature of the proposed models So, the propagation process is proved to be an accelerator of the exploration phase of the multi-agent patrolling Moreover, this experimental study has indicated that these algorithms could constitute a competitive solution for the mono-agent patrolling problem We plan as perspectives of this work to go deep in the study of CLInG parameters’ role, to add energy limitation and dynamic obstacles variants Finally, we will implement EVAP in a multi-robot simulator to evaluate the model on more realistic environments References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] Tedesco, T Meuezes, V Corruble, Y Chevaleyre et al "Recent Advances on MultiAgent Patrolling”, Proceedings of the 17th Brazilian Symposium on Artificial Intelligence, pp 474-483, 2004 E Bonabeau, M Dorigo, G Theraulaz, Swarm Intelligence : From Natural to Artificial Systems, Oxford University Press, 1999 Y Chevaleyre, “The Patrolling Problem”, Technical Report, Paris 9, http://www.lamsade.dauphine.fr/~chevaley/pap ers/anna_patro.pdf, 2003 F Lauri, F Charpillet, “Ant Colony Optimization applied to the Multi-Agent Patrolling Problem”, IEEE Swarm Intelligence Symposium, 2006 A Machado, G Ramalho, J.-D Zucker, A Drogoul, “Multi-Agent Patrolling: an Empirical Analysis of Alternative Architectures”, Proceedings of Multi-Agent Based Simulation, pp 155-170, 2002 L Panait, S Luke, “A pheromone-based utility model for collaborative foraging”, Proceedings AAMAS, pp 36-43, 2004 H.V Parunak, M Purcell, R O’Connell, “Digital Pheromones for Autonomous coordination of swarming UAV’s”, Proceedings of AIAA First Technical Conference and Workshop on Unmanned Aerospace Vehicles, Systems, and Operations, 2002 H.Santana, G.Ramalho, V Corruble, R Bohdana “Multi-Agent Patrolling with Reinforcement Learning”, Proceedings of the 3rd international Joint Conference on Autonomous Agents and Multi-Agent Systems, pp 1122-1129, 2004 F Sempé, "Auto-organisation d'une collectivité de robots: application l'activité de patrouille en présence de perturbations", PhD Thesis, Université Paris VI., 2004 F Sempé, A Drogoul, “Adaptive Patrol for a Group of Robots”, Proceedings of the 2003 IEEE/RSJ, Intelligence Robots and Systems, Las Vegas, Nevada, pp 2865-2869, 2003 I.A Wagner, M Lindenbaum, A.M Bruckstein, “Distributed Covering by Ant-Robots using Evaporating Traces”, IEEE Transactions on Robotics and Automation, vol.15, n.5, pp 918933, 1999 V Yanovski, I.A Wagner, A.M Bruckstein, “A Distributed Ant Algorithm for Efficiently Patrolling a Network”, Algorithmica, vol.73, pp 165-186, 2003 A Almeida, G Ramalho, H Santana, P CHU Hoang Nam 47

Định dạng
Số trang	47
Dung lượng	1 MB