Case-Based Learning Behavior in a Real Time Multi-Agent System

Introduction

This Master’s project introduces a distributed multi-strategy learning methodology that integrates case-based reasoning, enabling agents to engage in both individual learning through environmental observation and cooperative learning via neighbor interactions Acknowledging that cooperative learning incurs higher costs due to communication and processing demands, the methodology implements a cautious utility-based adaptive mechanism to effectively merge both learning approaches Key components of the project include an interaction protocol for information exchange, a neighbor selection strategy, and a chronological casebase Additionally, our research highlights that agents exhibit varying learning behaviors depending on their environmental context, all within the framework of a real-time distributed multi-agent system (MAS).

Recent research has begun to bridge the gap between Machine Learning (ML) and agent-based systems, highlighting the significance of learning and adaptation as key features of intelligence Autonomy in agents necessitates the capability to make independent decisions, which requires equipping them with suitable tools for decision-making Given the unpredictability of dynamic environments, it is impossible for designers to anticipate every scenario an agent may face, making adaptability essential, particularly in multi-agent systems Thus, learning plays a vital role in enhancing the autonomy of agents.

Multi-agent systems (MAS) introduce the challenge of distributed learning, where multiple agents learn independently to achieve a shared objective Existing algorithms are primarily designed for single-agent learning, necessitating significant modifications for effective application in a multi-agent context Successful distributed learning relies on cooperation and communication among agents, a topic currently under extensive investigation by MAS researchers This emerging focus highlights the potential for collaboration between multi-agent systems and machine learning (ML), suggesting that both fields could greatly benefit from each other's advancements.

Our project incorporates case-based learning within a distributed multi-agent system, where each agent engages in both individual and cooperative learning Individual learning enables agents to develop their casebase based on personal experiences and perceptions, fostering their unique areas of specialization In contrast, cooperative learning promotes knowledge sharing through interactions among agents, enhancing the overall learning process within the system.

Motivation

We introduce a distributed multi-strategy learning methodology utilizing case-based reasoning (CBR) within a multi-agent environment This approach enables agents to enhance their problem-solving abilities, particularly when they struggle to find satisfactory solutions independently However, cooperative learning can incur high costs and risks due to the additional communication and coordination required, potentially rendering it ineffective Furthermore, since each agent learns based on its unique experiences and perspectives, solutions may not be universally applicable across agents facing similar challenges The introduction of external knowledge could increase processing costs without necessarily improving solution quality This concern aligns with findings in [2], which highlighted that excessive role specialization in multi-agent systems could hinder overall team performance when agents share experiences from differing roles.

To enhance team performance in multi-agent systems, our project utilizes a cautious utility-based adaptive mechanism that integrates both cooperative and individual learning We aim to determine if this combined learning approach yields better performance than relying solely on individual learning.

1 The effects of cooperative learning in subsequent individual learning,

2 The roles of cooperative learning in agents of different initial knowledge,

3 The feasibility of our multi-strategy learning methodology,

4 Agents’ adaptability to different problem domains/environments.

Problem Domain

In a multi-agent system designed for multi-sensor target tracking and adaptive CPU reallocation, agents operate in a noisy environment simulated by a JAVA-based program called RADSIM Each agent, equipped with identical capabilities, controls a sensor positioned uniquely to search and detect environmental changes Upon detecting a moving target, agents collaborate with at least two neighbors to form a tracking coalition, which can lead to CPU resource shortages due to increased activity demands To mitigate these shortages, agents must establish a CPU coalition when they recognize a deficit in processing power.

Agents operate within a dynamic and unpredictable environment, often possessing incomplete and partially inaccurate knowledge of the world To succeed, they must quickly identify and respond to significant changes; otherwise, they risk missing opportunities or encountering negative outcomes However, the limitations of their resources—such as restricted CPU speed, memory, and sensor capabilities—complicate the development of multi-agent systems and the intelligence of the agents within them.

Brief Description of Our Approach

Our methodology integrates a utility-based adaptive mechanism that combines cooperative and individual learning, along with an interaction protocol for information exchange and a chronological casebase In a multi-agent environment, agents negotiate to collaborate on real-time tasks like multi-sensor target tracking and CPU resource allocation, each employing a dynamically generated negotiation strategy derived from case-based reasoning (CBR) Each agent maintains two casebases—one for initiating negotiations and another for responding When faced with a negotiation challenge, an agent captures the environment parameters to create a problem description and searches its casebases for the most relevant case, adapting the negotiation strategy accordingly After negotiations conclude, the agent assesses whether to learn from the new case based on its usage history, initiating a cooperative learning process if it identifies a problematic case This cautious approach ensures that only cases with poor performance histories are critically evaluated, promoting effective cooperative learning among agents.

Organization of Report

This report is structured into several key chapters: the introduction, background, methodology, experiments and results, and conclusions with future work The methodology chapter outlines a utility-based adaptive mechanism that integrates two distinct learning strategies Following this, the experiments and results chapter, the most extensive section of the report, details our complex experimental setup and presents significant findings We conducted two experiments: the first examines the interplay between individual and cooperative learning in a real-time multi-agent environment, while the second investigates agent learning across varying environments The report concludes with insights and directions for future research.

Background

Distributed Learning

Distributed leaning has become an increasingly important paradigm in ML due to the proliferation of multi-agent systems and distributed problem solving environments.

In general, distributed learning deals with decomposing a learning task into subtasks so that each agent is responsible for a part of the learning

In a study, distributed learning is presented as an AI-driven solution that merges inductive learning algorithms with meta-learning techniques to create precise classification models for detecting electronic fraud Inductive learning algorithms identify anomalous behaviors across distributed datasets, while meta-learning methods synthesize this knowledge into advanced classification models This collaborative approach enables financial organizations to share models without revealing sensitive data, enhancing cross-institutional fraud protection Additionally, it offers scalability to accommodate the increasing size and number of databases, making it a flexible and effective strategy for combating fraudulent transactions.

A meta-learning approach is introduced as a technique to enhance prediction accuracy by combining results from multiple learning algorithms applied to various training datasets This report outlines several strategies for integrating independently learned classifiers, resulting in an overall classifier that includes both the individual classifiers and a meta-classifier derived from the meta-learning strategy Notably, these strategies are independent of the specific learning algorithms employed Preliminary experiments conducted on two molecular biology sequence analysis datasets show promising results, highlighting the potential of machine learning techniques to improve automated knowledge discovery systems.

Agent-based Knowledge Discovery introduces an innovative approach to data mining across distributed databases by leveraging techniques from Distributed AI and Machine Learning Software agents, equipped with learning algorithms, mine local databases and collaborate to integrate the acquired knowledge, ultimately presenting the findings to users This report examines the new software agent language, Agent-K, and the application of first-order learning techniques in data mining However, it does not address key aspects such as the interaction between agents and the integration of knowledge.

Mutually supervised learning in multi-agent systems enhances agent performance by enabling them to learn from each other's knowledge and strategic behaviors In dynamic environments, agents can effectively respond to unexpected events by generalizing insights gained during training The report outlines various learning rules where each agent serves as a teacher to its partner, facilitating a learning process through examples from a sample space This collaborative learning approach allows agents to apply concepts learned from their instructors, reducing the need for repetitive coordination in similar problem-solving scenarios and minimizing real-time communication by utilizing established coordination concepts.

A novel distributed method for analyzing heterogeneous spatial databases allows for learning relationships without requiring all data to be centralized This approach effectively handles large volumes of spatial data spread across multiple sites by independently applying a spatial clustering algorithm to identify similar regions The method involves transferring convex hulls of these clusters for integration, enabling the construction of local regression models that can be shared among data sites The proposed technique demonstrates both computational efficiency and accuracy compared to traditional methods that rely on centralized data availability.

The research in distributed learning has been typically applied to network routing

[23], knowledge discovery and data mining [7] [24], neural networks[12][25], and so on.

Distributed learning involves breaking down a learning task into smaller subtasks, allowing each agent to take responsibility for a specific part of the overall learning process However, the agents in this system often lack the ability to enhance the overall performance of the entire system.

Cooperative Learning

Coordination is a key characteristic of multi-agent systems, defined by the ability to minimize unnecessary activities, reduce resource conflicts, and prevent issues like livelock and deadlock while ensuring safety Cooperation occurs among non-antagonistic agents, whereas negotiation involves competitive or self-interested agents Successful cooperation requires each agent to maintain an understanding of other agents and anticipate future interactions, highlighting the importance of sociability among agents.

Sugawara and Lesser developed a cooperative learning methodology for agents to enhance coordination strategies This approach includes adjusting goal ratings and messages, reordering operations and communications, reallocating tasks to idle agents, and utilizing results from other agents The primary motivation for this learning process is to identify issues such as insufficient information, improper ratings, and excessive incoming messages.

In their study, Chan and Stolfo introduced strategies for cooperative learning that involve combiners and arbiters Combiners aim to merge predictions from multiple base classifiers by understanding the correlation between these predictions and the correct outcomes Conversely, arbiters offer a more informed prediction when the base classifiers yield differing results.

The article proposes the integration of short-term and long-term learning techniques to improve search efficiency in multi-agent design systems by enabling agents to learn about non-local requirements during local search processes The short-term learning technique allows agents to gather and utilize constraining information from agent communication to enhance problem-solving within the same instance In contrast, the long-term technique classifies and indexes problem instances to retrieve relevant constraining information for new challenges Experimental results indicate that conflict-driven short-term learning significantly enhances search outcomes, while agents can leverage their past experiences to predict potential conflicts, reducing the need for communication feedback typically required in short-term learning scenarios.

Learning in a dynamic, uncertain, multi-agent setting is a challenging task In

The article discusses techniques for transforming noisy run-time activities into effective procedures for future problem-solving This conversion process creates cooperative procedures that are stored in a resource known as collective memory The authors illustrate how this memory allows agents to learn cooperative strategies that extend beyond their initial planners Additionally, experimental results are presented, showing that collective memory significantly enhances agent performance.

The objective of [12] is to enhance learning speed and accuracy by utilizing multiple neural networks of the same architecture that simultaneously execute the standard back-propagation algorithm Each identical network is assigned a subset of training examples, allowing them to periodically communicate and collaborate on mastering the complete training set In instances where a network becomes trapped in a local minimum, the other networks assist in overcoming this challenge The findings indicate that collaborative learning among a few networks leads to faster results compared to individual learning efforts.

Cooperative learning among agents does not always outperform individual learning, as highlighted by Ishida's research on bi-directional searches His findings indicate that the processing overhead associated with message exchanges can diminish the effectiveness of collaborative efforts, suggesting that individual learning may sometimes be more advantageous.

Case-Based Reasoning (CBR) vs Case-Based Learning (CBL)

In recent years, case-based reasoning (CBR) has evolved from a niche research area to a widely recognized field within Artificial Intelligence CBR distinguishes itself from traditional Knowledge-Based Systems by focusing on specific knowledge derived from previous problem-solving experiences, rather than relying solely on generalized knowledge This approach enables the resolution of new problems by identifying and reusing similar past cases Additionally, CBR promotes incremental learning, as each solved problem contributes new experiences that can be applied to future challenges.

The process involved in CBR can be represented by a schematic cycle (see Figure

2) Aamodt and Plaza in [1] have described CBR typically as a cyclical process comprising the four REs:

1 RETRIEVE the most similar case(s);

2 REUSE the case(s) to attempt to solve the problem;

3 REVISE the proposed solution if necessary, and

4 RETAIN the new solution as a part of a new case.

A case is a contextual knowledge piece that encapsulates an experience, encompassing valuable lessons from the past and the relevant context for their application Typically, a case includes a problem that illustrates the circumstances at the time it occurred, a solution that outlines the resolution to that problem, and an outcome that reflects the world’s state following the case.

In this report, we will explore essential techniques in case-based reasoning, including case indexing, case storage, case retrieval, and case adaptation, providing a comprehensive overview of each method.

Case-based learning differs significantly from traditional case-based reasoning (CBR) In CBR, the casebase is viewed as a complete set of cases, allowing individuals to retrieve the best case for problem-solving without the capacity to learn new cases Conversely, case-based learning treats the casebase as potentially empty or incomplete at the outset This approach enables the casebase to learn and evolve over time, continuously refining and expanding its collection of cases through the learning process.

Case-based reasoning systems have typically excelled in high-level reasoning for problems represented through discrete, symbolic formats However, real-world scenarios like autonomous robotic navigation are more effectively represented using continuous models These domains necessitate ongoing performance, including continuous sensor-motor interactions and real-time adaptation and learning A novel approach to continuous case-based reasoning has been introduced, focusing on its application for the dynamic selection, modification, and acquisition of robotic behaviors within autonomous navigation systems.

The utility problem of casebase, as discussed in [19], arises when attempts to enhance system performance inadvertently lead to degradation This issue is primarily attributed to the slowdown of traditional memory systems as the volume of stored items grows Unconstrained learning algorithms can overwhelm the system, resulting in a performance decline that surpasses the benefits gained from individual learned rules To address this challenge, effective strategies include limiting learning to items with exceptionally high utility and reducing the extent of memory that is searched.

Research in distributed and cooperative case-based reasoning (CBR) highlights the advantages of treating corporate memories as distributed case libraries, which can enhance resource discovery and leverage previous expertise Techniques such as Negotiated Retrieval enable the assembly of "case pieces" from various resources, while federated peer learning introduces cooperation modes like DistCBR and COLCBR, allowing agents to utilize peer expertise for local tasks However, our methodology differs significantly; instead of viewing corporate memories as distributed libraries, each agent maintains a small casebase and selects the most beneficial neighbor to "borrow" a case, streamlining the process of knowledge sharing and collaboration.

Auction-Based Retrieval (ABR) is a novel method for distributed case retrieval inspired by auction dynamics in electronic trading This approach is particularly relevant for agent-mediated systems, where each agent operates from its own casebase and pursues individual interests while collaborating with others to address new challenges A key challenge in Cooperative Case-Based Reasoning (CoopCBR) is coordinating case retrieval across multiple casebases Auctions serve as effective market institutions, providing well-established coordination mechanisms for participants with limited information and diverse objectives The primary goal of ABR is to identify the optimal case from the submitted bids.

Multi-Agent Learning

Distributed learning encompasses both multi-agent and non multi-agent learning Multi-agent systems exhibit significant complexity in their structure and function, requiring agents to develop not only individual learning capabilities but also the ability to coordinate activities, learn from interactions with other agents, and communicate effectively When agents learn collaboratively from one another, it is termed cooperative learning, whereas learning derived from personal experiences and observations is referred to as individual learning.

In recent years, multi-agent systems (MAS) have garnered significant attention within the AI community, focusing on the autonomous, rational, and flexible behaviors of entities like software programs and robots These systems are crucial for effective interaction and coordination across various domains, including robotics and information management Given the unpredictability of potential situations agents may face, it is essential for them to learn and adapt to their environments, highlighting the importance of learning in enhancing agents' autonomy Consequently, research in agent and multi-agent systems should prioritize the development of learning algorithms, which have predominantly been designed from a single-agent perspective.

While these algorithms can be viewed as facilitating individual learning, they do not qualify as multi-agent learning if an agent's learning process is neither influenced by nor impacts neighboring agents.

A multi-agent system exemplifies how agents utilize learning modules to evaluate various hypotheses with limited individual knowledge Through individual learning, agents enhance their understanding by receiving confidence messages from peers, facilitating cooperative learning The primary objective of this system is for agents to collaboratively converge on the most accurate hypothesis based on the information available to them.

Reinforcement learning techniques are effectively utilized in multi-agent systems, where agents employing Q-learning algorithms can enhance their learning speed through cooperation By exchanging information such as partial solutions and action plans, these collaborative agents outperform those that operate independently.

Methodology

Design of the Methodology

Agents engage in real-time negotiations for tasks like multi-sensor target tracking and CPU resource allocation, enhancing their negotiation skills within resource-constrained environments Each negotiation is framed as a case, where the task represents the problem, the strategy serves as the solution, and the outcome is the result Our approach is argumentative, allowing agents to take on the roles of initiator or responder; the initiator seeks to persuade the responder to allocate resources or assist with tasks, while the responder assesses requests based on its constraints Each agent maintains two casebases corresponding to these roles, retrieving the most relevant case during negotiations to adapt its strategy After concluding a negotiation, agents document the outcome and decide whether to incorporate the new case into their casebases, as illustrated in our methodology.

Figure 3 The CBR module, the negotiation task, and the learning modules in our methodology for an agent

The project builds upon previously designed components, including the Initiating and Responding Casebases, the negotiation task, the CBR Module, the Negotiation Module, and the Individual Learning Module Our research specifically targets the development of the Cooperative Learning Module, which is enhanced by incorporating additional attributes and methods into the Initiating Casebase.

CB (casebase), Responding CB (casebase), the CBR Module and the Individual LearningModule.

3.1.1 Chronological Casebases and Usage History

We implement a chronological casebase where each case is assigned a time-of-birth, indicating its creation date, and a time-of-membership, marking when it was added to the casebase All initial or pre-existing cases share the same time-of-birth and time-of-membership To manage these pre-existing cases, we employ a case-forgetting mechanism, detailed in section 3.1.2 Additionally, when a foreign case is imported into a local casebase, it may have an earlier time-of-birth compared to its time-of-membership.

We analyze the usage history of each case, as detailed in Table 1, to assess its performance An agent determines whether a case is underperforming, which may lead to its replacement or removal from consideration If a case is identified as problematic, a cooperative learning process is initiated to facilitate its replacement.

Table 1 The usage history that an agent profiles of each case

_timesOfUsed the number of times the case has been used

_timesOfSuccessUsed the number of times the case has been used in a successful negotiation

_timesOfIncurNewCase the number of times the usage of the case has led to a new case getting added to the casebase

_timesOfRequest the number of times the case has been designated as a problematic case, i.e., with very low utility

_timeStamp the last time that the case was used or the time when the case was added

Here are some heuristics we use in tandem with the chronological casebase:

H1 Currency : If a case has not been used in a long time, then this case is more likely to be replaced

H2 Evolution : With everything else equal, an old case is more likely to be replaced than a young case.

H3 Usefulness : If a case’s _timesOfSuccessUsed is significantly small, then the case is more likely to be replaced

H4 Solution Quality I : If a case has a high _timesOfUsed but a low

_timesOfSuccessUsed, then it is a problematic case.

H5 Solution Quality II : If a case has a low _timesOfSuccessUsed, and a high

_timesOfIncurNewCase, then the solution of this case is probably not suitable for the problems encountered by the agent and it is a problematic case.

All of these five heuristics have been implemented in our project.

Upon completing a negotiation, the agent incorporates new cases into the casebase if they enhance its diversity, forming the foundation of our individual learning strategy When the casebase reaches a predetermined size, the agent evaluates the possibility of replacing an existing case with the new addition, facilitating both incremental and refinement learning For a deeper understanding, readers can refer to [22] The individual learning process is illustrated in Figure 4, while refinement learning utilizes heuristics H1, H2, and H3.

Figure 4 The incremental and refinement learning features of the individual learning strategy

Here we briefly describe the expected learning curve of an agent’s casebase. Figure 5 shows the graph of the learning curve.

Figure 5 The expected learning curve of an agent’s casebase (four phases)

Figure 5 illustrates four distinct learning phases, with Phase 1 identified as the non-selective learning phase During this phase, the agent focuses on acquiring a variety of useful cases to enhance the quality of its casebase, without concern for the unused, pre-existing cases.

As more cases are added to the casebase, and when the number of cases in the casebase reaches a threshold (i.e., MAX_CASE_BASE_SIZE), the learning transits into Phase 2.

Case forgetting occurs when unused, pre-existing cases are removed from the casebase, leading the agent into Phase 3, known as the refinement or precision learning phase In this phase, since the cases in the casebase can no longer be forgotten, the agent must determine whether to replace them with superior cases or simply add new ones This methodical approach enables the agent to gradually acquire additional knowledge about the system Eventually, as the agent encounters most problems in the environment, it enters Phase 4, the convergence phase, where learning becomes infrequent However, we have yet to analyze our experiments to confirm that agents exhibit this anticipated learning curve, which will be addressed in our future research.

If an agent repeatedly struggles to negotiate effectively in specific scenarios, it should seek improved negotiation strategies, which underpins the cooperative learning approach in our methodology Our cooperative learning process involves profiling a case, triggering collaborative efforts, selecting neighboring agents for assistance, exchanging cases, and adapting solutions before implementation Importantly, this cooperative learning occurs independently from the immediate problem-solving process, as negotiation tasks require prompt resolution without interference from the learning phase.

As previously mentioned, we have adhered to a cautious approach to cooperative learning:

1 The agent evaluates the case to determine whether it is problematic To designate a case as problematic, we use heuristics H4 and H5 : a (frequently used) case is problematic if it has a low success rate (_timesOfSuccessUsed/_timesOfUsed) and a high incurrence rate (_timesOfIncurNewCase/_timesOfUsed) That means the case addresses an appropriate problem but does not provide a satisfactory solution This is because if a case has a very low success rate, then it has not been useful Also, if a case has been used in several occasions, and its “adapted” version has been stored into the casebase after the application of the adapted version, it means the adapted versions have been significantly different from this original case That also means that this case has a good problem scope, however, its solution is not good This is also the reason that why the agent can ask for help from other agents.

2 The agent only requests help from another agent that it thinks is good at this The idea here is that we want to approach neighbors who have initiated successful negotiations with the current agent, with the hope that the agent may be able to learn how those neighbors have been able to be successful Each agent keeps a profile of every neighbor that documents the negotiation relationships between the two agents in [22]. Among the profiled parameters is _helpRate This is the percentage of times that the agent has agreed to a request from a particular neighbor The agent selects the neighbor with the highest _helpRate to ask for help This increases the chance that the neighbor may have a better solution than the agent’s

3 The agent adapts the foreign case before adopting it into its casebase The adaptation is based on the difference between the foreign case and the problematic case, and the success rate of the foreign case At the same time, the usage history parameters of the new case are reset.

Figure 6 The cooperative learning strategy: usage history profiling, utility-based trigger, neighbor selection, interaction protocol, and case adaptation before learning

The case exchange operates through a structured interaction protocol where agent A initiates a learning process by sending a CASE_REQUEST message containing a problematic case to agent B In response, agent B retrieves the most similar case from its casebase and replies with a CASE_RESPONSE message that includes both cases This protocol enables agent A to submit multiple requests for various problematic cases simultaneously, streamlining the internal management of these cases However, it also increases communication costs, particularly when dealing with large cases.

Innovations

After introducing the detailed methodology we employ in this project, we want to give a list of innovations that we have made for this project to clarify our contributions

 Time-of-birth of Each Case

 Time-of-membership of Each Case

 Usage of History for Each Case

 5 heuristics ( H1 , H2 , H3 , H4 and H5 ) used to evaluate the performance of each case in the casebase and to judge whether the cooperative learning is needed

 Utility-Based Cooperative Trigger (Judged by H4 and H5 )

 Case Hierarchy (explained in the following paragraph)

 Neighbor Selection (based on how helpful the neighbor was to the agent)

 Case Adaptation (the adaptation of foreign case that is from other agents before it is stored into the casebase)

In section 3.1, most innovations are discussed, with the exception of the case hierarchy, which enhances the _timesOfSuccessUsed and _timesOfIncurNewCase for heuristics H4 and H5 During standard Case-Based Reasoning (CBR) operations, an agent facing a negotiation task retrieves the most relevant case, adapting its solution as the negotiation strategy After the negotiation concludes, the agent documents the outcome, creating a new case that includes the negotiation task, the adapted solution, and the result; this new case is termed the child case, while the original is the parent case If the negotiation is successful, the _timesOfSuccessUsed for the parent case is incremented Additionally, if the new case is sufficiently diverse, the _timesOfIncurNewCase for the parent case is also increased The parent case vector maintains the relationship between child and parent cases, allowing for easy retrieval of the parent case when a child case is identified.

Problem Application

Our methodology is versatile and can be utilized across various problem domains This section highlights its application in real-time scenarios, including CPU reallocation and target tracking, as well as in situations involving incomplete information during multi-agent negotiations.

A real-time system relies on both the logical correctness of computations and the timeliness of their results; failure to meet timing constraints leads to system failure Therefore, it is crucial to ensure that these timing constraints are consistently met, necessitating a predictable system behavior Achieving a high degree of utilization while adhering to timing requirements is also desirable Effective system design should account for safe and correct behavior, allowing for some delays without resulting in failure Thus, even if a computation is slightly late, the system can still function effectively.

Our methodology can be applied in real time domains due to its characteristics:

1 The CBR Module is able to come up with solutions to problems based on past experience It means that the “good-enough, soon-enough” can be found by referring to old, similar cases The system need not spend too much time on searching the whole problem space.

2 The casebases normally are of small size so that the evaluation, retrieval, adaptation, and learning of cases can be done in short period of time

The Cooperative Learning Module enhances real-time applications by providing valuable external experiences, thereby saving time when the Case-Based Reasoning (CBR) Module is still acquiring expertise When satisfactory solutions are not available in the case bases, this module effectively integrates foreign knowledge to expedite the problem-solving process.

In our study, we focus on two primary tasks: CPU-reallocation and target-tracking within a multi-agent system The objective of the agents is to effectively coordinate and collaborate to ensure accurate real-time tracking of multiple fast-moving targets To achieve this, agents engage in negotiation strategies, seeking to persuade at least two other agents to either assist in tracking or to allocate their resources This negotiation must occur swiftly to meet strict time constraints Our proposed methodology enhances the efficiency of retrieving these negotiation strategies, ultimately improving the overall effectiveness of the target-tracking task.

The CPU-reallocation task is a real-time process that differs from target-tracking tasks in its collaborative nature, requiring only two agents for execution While it must be performed quickly to meet time constraints, it uniquely incorporates the CPU as a resource involved in the negotiation between agents This necessitates a careful balancing act to optimize CPU resource allocation within the agent itself.

3.3.2 Domains with Incomplete Information for Multi-Agent Negotiations

Our methodology can also be applied in domains with incomplete information for multi-agent negotiations

In multi-agent systems, effective problem-solving often hinges on collaborative agents satisfying resource use constraints Negotiation is a crucial method for agents to reach consensus on the allocation of limited resources This negotiation process adapts to evolving evaluation criteria and constraints By utilizing our CBR Module and Individual Learning Module, agents can learn, choose, and implement negotiation strategies to secure the resources necessary for reasoning and task execution.

In a multi-agent system, each agent possesses only a limited perspective of the environment, compounded by the presence of noise, which results in incomplete information To address this challenge, our Cooperative Learning Module enhances knowledge acquisition by enabling agents to learn from one another, thereby creating a more comprehensive understanding of the world.

In environments with incomplete information, agents utilize both the Case-Based Reasoning (CBR) Module and the Cooperative Learning Module to adapt effectively Each case in the casebase offers a detailed representation of past negotiations, encompassing the agent's perspective on its environment, partners, and itself This knowledge enables the agent to develop and refine negotiation parameters that shape its behavior and strategies When the CBR Module does not yield satisfactory negotiation strategies, the agent turns to the Cooperative Learning Module to gain insights from other agents, enhancing its negotiation capabilities.

Implementation

Implementation of Methodology

This section outlines the comprehensive implementation of our methodology, divided into four key areas The first area focuses on the foundational ANTS project that serves as the basis for our work The second area delves into the integration of usage history and chronological casebases Finally, the third, fourth, and fifth areas explore the processes involved in case sharing.

4.1.1 Autonomous Negotiating Teams (ANTS) Project

Life is a series of negotiations, as individuals rarely achieve their desires immediately This concept is mirrored in DARPA's latest initiative, Autonomous Negotiating Teams (ANTS), where every military component—be it a brigade, soldier, or aircraft—functions within a computer network These entities negotiate resources, evaluate capabilities, strategize, and execute actions autonomously, while also competing for available tasks and operational support.

Investigators Costas Tsatsoulis and Douglas Niehaus from the University of Kansas are utilizing a case-based, reflective negotiation (CRN) problem-solving approach in their ANTS project, which combines case-based reasoning (CBR) with utility theory to enhance virtual reality scenarios.

In a critical scenario where U.S planes are approaching an enemy missile poised to launch a warhead imminently, each aircraft possesses the capability to neutralize the threat However, factors such as proximity to the target and available fuel vary among them Consequently, the planes' onboard computers initiate negotiations to determine which aircraft can deliver the most effective response in the shortest time possible.

Investigators have developed problem-solving computations that enable ANTS to assess their own success probabilities, utilities, and priorities, as well as those of other negotiating ANTS These computations allow ANTS to evaluate negotiation strategies aimed at achieving their common objective of timely missile destruction According to Costas Tsatsoulis, "We view the retrieval of a negotiation strategy as a decision problem," integrating Case-Based Reasoning (CBR) with utility theory to effectively address challenges involving uncertain, incomplete, or missing information.

This problem-solving research may one day be applied to non-military situations.

In a manufacturing setting, equipment-through computers-may negotiate for its own parts from suppliers Computers supporting emergency response teams may negotiate for supplies, vehicles, and medical personnel.

The ANTS software enables agents within a multi-agent system to utilize case-based reasoning for effective negotiation Prior to initiating negotiations, each agent retrieves the most relevant case from its casebase and adapts the previous solution to address the current issue Similarly, the responding agent also accesses its casebase to formulate its negotiation strategy Developed using the C++ programming language, ANTS employs an object-oriented design to enhance its functionality.

Our research centers on the Cooperative Learning Module, as outlined in section 3.1 and depicted in Figure 6 To enhance this module, we have integrated additional attributes and methods into the original Initiating Casebase, Responding Casebase, and CBR Module Furthermore, we have implemented new methods to improve the processes of case request and case response message passing Detailed implementations are provided below.

4.1.2 Usage History and Chronological Casebases

The ANTS software incorporates various classes for implementing case-based reasoning and case-based learning, all located within the CBR subdirectory This class hierarchy is visually represented in Figure 7.

Initiat ingInput Initiat ingOutput RespondingInput RespondingOutput

Figure 7 The class hierarchy that had been implemented in CBR for the original ANTS software

The class agent implements the basic Agent It has an object of

The CaseBaseManager class serves as a Department for executing case-based reasoning, effectively managing two distinct casebases One casebase is represented by the InitiatingCaseBase class, while the other is implemented through a separate class, ensuring organized and efficient handling of cases.

The InitiatingCaseBase class comprises a vector of InitiatingCase objects, while the RespondingCase class contains a vector of RespondingCase objects Each InitiatingCase includes an InitiatingInput object, an InitiatingOutput object, and an enumerated variable _outcome Similarly, the RespondingCase features a RespondingInput object, a RespondingOutput object, and its own enumerated variable _outcome.

RespondingInput and RespondingOutput invokes methods or use const attributes in the classes UtilCase, UtilTos, UtilToken

Figure 8 presents the updated class hierarchy of CBR for our project, featuring enhancements and new additions to the original structure shown in Figure 7 The classes marked with a star indicate those that have been improved or newly introduced Details regarding these new classes and improvements will be elaborated upon in the subsequent sections.

ParentInitiat ingCase* InitiatingCase* RespondingCase* ParentRespondingCase*

Initiat ingInput Initiat ingOutput RespondingInput RespondingOutput

Figure 8 The new class hierarchy of CBR in our project

In a negotiation, there are two parties: the initiating side and the responding side, each managing their own casebases—initiating and responding, respectively Our implementation treats both casebases symmetrically, ensuring that any changes made to the initiating casebase are mirrored in the responding casebase Consequently, we focus on introducing enhancements and new classes exclusively for the initiating casebase, as the responding casebase follows the same principles and processes.

Five new attributes _timesofUsed, _timesofSuccessUsed, _timesofIncurNewCase,

_timesofRequest and _timeStamp (see Table 1 of chapter 3) are added to class InitiatingCase To get, increase or reset these attributes, several new methods are added to class InitiatingCase, too

In the CBR directory, a new class, ParentInitiatingCase is added This class has an object of class InitiatingInput and an object of class InitiatingCaseBase In the class

The InitiatingCaseBase employs a method called InitiatingInput to efficiently retrieve the most suitable case for the current problem using the function `InitiatingCase * retrieveBestCase(InitiatingCase * newCase)` Each time a case is accessed from the casebase, its attribute `_timesofUsed` is incremented by one, reflecting its usage frequency Additionally, both the current problem and the best-matched case are stored in the ParentInitiatingCaseBase, which is implemented as a vector of objects from the class.

ParentInitiatingCase) The best case is considered as the parent case of the current problem

After the negotiation ends, a new case comes up A method in class

The CaseBaseManager class is essential for managing cases, allowing users to store new cases with the method `storeCase(InitiatingCase * newCase)` when the casebase size is below the defined maximum limit of 30 Alternatively, when the casebase reaches its maximum capacity, the method `storeCaseReplace(InitiatingCase * newCaseA)` can be utilized to replace existing cases with new ones, ensuring efficient case management even at full capacity.

When the casebase size is below the MAX_CASE_BASE_SIZE of 30, the agent engages in incremental learning Conversely, if the casebase size meets or exceeds this threshold, refinement learning is activated (see section 3.1.2) Upon storing a new case related to the current problem, the agent evaluates the _outcome attribute of the new case to determine if it was successful If successful, the agent retrieves the parent case from the ParentInitiatingCaseBase and increments its _timesofSuccessUsed attribute by one Additionally, if the new case is added to the casebase, the parent case's _timesofIncurNewCase attribute is also increased by one.

_timesofIncurNewCase keeps unchanged Finally, the tuple of both the current problem and the parent case of the current problem is deleted from the ParentInitiatingCaseBase.

Measurement Collection

RADSIM is a Java-based simulator designed to emulate radar sensors, targets, and environmental conditions, including communication elements like noisy and delayed messages Developed by Rome Laboratory for the ANTS Challenge Problem, RADSIM features agents with limited CPU resources, where negotiation processes among these agents can lead to varying outcomes or potential failures under diverse conditions.

We evaluate the results of each case based on the utility values assigned to their negotiation outcomes, as outlined in Table 2 To simplify the calculations, we utilize a utility scale ranging from 0 to 10.

Table 2 Utility of each outcome for a case

1 If the negotiation outcome is N_SUCCESS, it means the negotiation between agents is successful and the negotiation strategy is of high utility So we assign 10, the highest value to the negotiation outcome N_SUCCESS

2 If the negotiation outcome is N_CHANNEL_JAMMED, it means that the negotiation fails due to the jam of the communication channel It does not mean that the negotiation strategy is of low utility So we assign the value 6 to the negotiation outcomeN_CHANNEL_JAMMED, which is lower than N_SUCCESS

3 If the negotiation outcome is N_ABORTED, it means that the negotiation fails because the either side of the negotiation aborts it It aborts the negotiation because it has other more important things to do or the requested task/resource is not longer available. However, it is still possible that the negotiation strategy is persuasive So we assign the value 5 to the negotiation outcome N_ABORTED

4 If the negotiation cannot be reached in a period of time, we assign the value 4 to the negotiation outcome N_OUT_OF_TIME It means that the negotiation is somewhat useful but cannot be reached in a period of time.

5 If the negotiation fails because the agent has not enough resources to continue the negotiation, we assign the value 3 to the negotiation outcome N_OUT_OF_RESOURCES It has lower utility value than N_OUT_OF_TIME because it is due to the fact that the agent has not enough physical resources to continue the negotiation, even if given enough time.

6 If the negotiation fails because the other side of the negotiation rejects it, we assign the value 2 to the negotiation outcome N_REJECTED It means that the negotiation strategy has low utility and the other side refuses to continue the negotiation. However, although the negotiation strategy is of low utility, both sides of negotiation still negotiate for a while It means that the negotiation strategy still has some kind of utility. Instead of assigning the utility value 0, we assign the utility value 2 to the negotiation outcome N_REJECTED

7 N_UNKNOWN is the default outcome It should never work as an outcome negotiation In this case, we assign the value 0 to the negotiation strategy withN_UNKNOWN outcome.

We used two main parameters to evaluate the casebases: utility and diversity The average utility of the casebase is 

( where n is the size of the casebase The diversity measure of a casebase is compiled from the difference of the cases in the casebase, i.e.,  

( i j n i n j case i case ference computeDif Three kinds of slopes, sizeSlope, diffSlope, and utilSlope are also used to evaluate the casebases.

They are computed as follows: sizeSlope first last first last

 , diffSlope first last first last

 , and utilSlope first last first last

The initial values represent the starting point of learning in the casebases, while the final values indicate the endpoint of learning By analyzing these slopes, we can assess the rates of learning A high-quality casebase is characterized by both high utility and significant diversity, meaning it consists of cases that are frequently accessed and widely varied in nature.

The statistical information of a casebase is computed in the class

The CaseBaseManager includes a method called compileStat() designed to calculate the size, average utility, and average diversity of a casebase This method logs the computed statistics for future reference whenever a negotiation process concludes.

Interface with Other Modules in the System

The agent operates as a functional organization comprised of various departments, each possessing distinct capabilities These departments are designed to interact with one another, facilitating the effective operation of the organization Importantly, their structure was established prior to the initiation of our project.

The CaseBaseManager class oversees the management of two agent casebases and interacts with various departments within the agent system These departments include the communication, profile, negotiation, and think departments, all of which are depicted in Figure 10, showcasing their relationships with the CaseBaseManager.

Figure 10 Interface with other modules in the system

The negotiation department is the most important one that interacts with

CaseBaseManager The negotiation department is in charge of negotiations between two

The CaseBaseManager is responsible for extracting negotiation strategies from casebases and delivering them to the negotiation department Once the negotiation concludes, the CaseBaseManager stores any newly generated negotiation strategies back into the casebases, provided they demonstrate sufficient diversity.

The communication department serves as the central hub for agent interactions, facilitating message exchanges between agents When an agent wishes to communicate with another, the message is first directed to the communication department, which functions like a switch center, relaying the information to the intended recipient This streamlined process ensures efficient communication among agents.

The CaseBaseManager and the communication department share a similar process in handling messages When one agent's CaseBaseManager sends a CASE message, whether it's a CASE_REQUEST or CASE_RESPONSE, the communication department is responsible for relaying this message to the CaseBaseManager of another agent.

Each agent's profile department maintains detailed information about other agents, enabling more effective interactions and collaborations As discussed in section 4.1.3, the CaseBaseManager utilizes the _helpRate of other agents within the profile department to assess their helpfulness during past negotiations, allowing agents to identify the most supportive collaborators.

The Think Department serves as the intelligent hub for agents, conducting feasibility studies and assessing the utility values of potential neighbors to facilitate negotiations When the CaseBaseManager encounters a challenging case, the Think Department identifies the most suitable neighbor agent to provide assistance, ensuring optimal performance outcomes This strategic selection process enables the CaseBaseManager to effectively leverage support from the right agent.

Discussion of Results

Overall Experiment Strategy

We conduct two sets of experiments, Comprehensive Experiment A (CEA) and Comprehensive Experiment B (CEB) We carry out CEA to study:

1 The effects of individual learning in subsequent cooperative learning,

2 The roles of cooperative learning in agents of different initial knowledge,

3 The feasibility of our multi-strategy learning methodology

We perform CEB to investigate the effects of the environment (different problem domains) on the agents’ learning

Here is our experimental setup:

1 Our application is multi-sensor target tracking and distributed CPU re-allocation. Each agent lives in a noisy environment (simulated by a Java-based program called RADSIM, refer to section 4.2) and controls a sensor with three radar sectors For our experiment, we use four agents A1, A2, A3 and A4 Each agent can activate the sensor to search-and-detect the environment When an agent detects a moving target, it tries to form a tracking coalition A tracking coalition requires at least three members or agents.This is due to the fact that at least three radars must work together to fully triangulated and keep track of the moving target Thus, the initiating agent will try to recruit at least two neighbors (agents that are physically close and can communicate directly) to help out The distributed CPU re-allocation problem is to re-allocate CPU resources among the agents When an agent detects a CPU shortage (usage greater than allocated), it will ask help from its neighbors

2 The RADSIM simulator provides the communication channels among the agents. All messages go through RADSIM Each agent has a communication department (refer to section 4.3) If agent A wants to communicate with agent B, its communication department sends messages to RADSIM Then the communication department of agent

B retrieves these messages from RADSIM.

3 The tracking module is also a part of RADISM It provides actual target locations and speed Each target has a tracking module Each agent has an interface to communicate with the tracking module, in order to receive measurements of moving target and perform triangulations to obtain the measured location of the target In our experiments, there is only one target, moving in a pre-determined route: in a square-like route within the four sensors.

Comprehensive Experiment Set A (CEA)

The goal of CEA is to understand the individual and cooperative learning in our agents We want to see the impact of each learning behavior from analyzing the results.

We conducted four experimental sets in CEA to explore the impact of varying initial casebase sizes on learning outcomes, as detailed in Table 3 According to section 3.1.2, the MAX_CASE_BASE_SIZE is set at 30, indicating that case replacement begins once this threshold is reached However, this limit is flexible, allowing for the addition of valuable and diverse cases without necessitating the removal of existing useful ones Additionally, each agent operates with two distinct casebases: one for its role as the initiating agent and another for its role as the responding agent.

Table 3 Experiment sets For example, in ES1, every agent has 16 cases in its casebase; and so on

There are four kinds of typical combinations of initial casebase sizes for the 4 agents

1 Four agents have exactly the same size of initial casebases We want to observe how agents learn when given exactly the same casebase size Since the

MAX_CASE_BASE_SIZE is 30, we set the initial casebase size equal to 16 so that agents can learn some new knowledge before the casebase size reaches 30

2 Only one agent has a very small initial casebase size, however, other three agents have bigger casebase We want to observe how the agent with very small casebase size learns differently from other three agents and whether knowledge transfer from agents that have bigger casebase size to the agent with small casebase size happens We choose size 2 for the agent with very small casebase size and 16 for the other three agents

3 Only one agent has a very big initial casebase size (but still smaller than 30) and other three agents have smaller casebases We want to observe how the agent with very a big casebase learns after it reaches size 30 and how the agent with big initial casebase size learns differently from other three agents with smaller initial casebase size We choose 28 for the agent with a very big casebase and 16 for the other three agents.

4 The initial casebase size is 2 for agent 1, 10 for agent 2, 20 for agent 3 and 28 for agent 4 We want to observe the learning rate and learning behavior of the agents with different initial casebase sizes Agents should learn differently based on the different initial knowledge in the casebase

Exp1 Exp2 Exp1 Exp2 Exp1 Exp2 Exp1 Exp2

Figure 11 The experiment setup of CEA

Further, for each experiment set, we had two sub-experiments (see Figure 11): (1) combine-first-combine-later (Exp1), and (2) individual-first-combine-later (Exp2).

“Combine” means that agents perform two types of learning (both cooperative and individual learning) “Individual” means that agents perform only individual learning.

We assume that each agent starts with the initial casebase C i The design for the first sub-experiment Exp1 (combine-first-combine-later) consists of the following steps:

1 Turn on the cooperative learning mechanism and the individual learning mechanism,

2 Run the system (all components) for around 1 million milliseconds (after we run the experiments several times, we find that 1 million milliseconds are enough for us to collect necessary results for further analysis),

3 Collect the casebases after the run, C f

4 Compute statistical information on both C i and C f

5 Create a new casebase, by removing manually all unused, pre-existing cases from

7 Run the system (all components) for around 1 million milliseconds with the new casebase,

8 Collect the casebases after the run, C  f

The design for the second experiment (individual-first-combine-later), Exp2 consists of the following steps:

1 Turn off the cooperative learning mechanism, but turn on the individual learning mechanism,

2 Run the system (all components) for around 1 million milliseconds,

3 Collect the casebases after the run, C f

4 Compute statistical information on both C i and C f

5 Create a new casebase, by removing manually all unused, pre-existing cases from C f ,

7 Run the system (all components) for around 1 million milliseconds with the new casebase,

8 Collect the casebases after the run, C  f

This article explores the initial experiment set (ES1) of CEA, divided into three key sections The first section outlines the initial casebase sizes of the agents involved The second section provides an analysis of the experimental results Finally, the third section offers a conclusion regarding ES1.

All agents have the same casebase, C i and C i is 16.

This section discusses the analysis on the experiment results for ES1 The

1 The comparison of C i and C f between the first stage of both sub-experiments of ES1: experiment 1 (Exp1) and experiment 2 (Exp2).

2 The comparison of utility and diversity gains between the first stage of both Exp1 and Exp2

3 The impacts of different initial casebases on the learning behavior of agents in the second stage of both Exp1 and Exp2

This section analyzes the differences between C i and C f in Experiment 1 and Experiment 2, aiming to determine if cooperative learning enhances diversity in the casebase compared to individual learning A casebase consists of cases, each comprising three components: input, output, and outcome (as detailed in section 4.1.2) The input describes the current problem faced by the agent, while the output outlines the negotiation strategy employed, and the outcome reflects the result of the negotiation Here, we focus solely on the input, as it signifies the range of problems agents encounter and their ability to adapt to evolving challenges.

Input is defined by a collection of problem descriptors, which vary between the initiating casebase and the responding casebase The initiating casebase includes descriptors such as the number of tasks the agent is currently handling (#task), the number of partners the agent is open to communicating with (#partner), the target's speed (speed), and the allotted negotiation time (timeAllowed) In contrast, the responding casebase focuses on the number of tasks the agent is performing (#task) and the current CPU resource usage of the agent.

The agent's current CPU usage (cpuUsage) reflects its operational power (power), while the maximum CPU percentage it is prepared to relinquish (maxCPUGiveUp) is defined along with the time allocated for negotiations (timeAllowed).

Table 4 presents the results of C i and C f from two sets of experiments focused on initiating casebases, specifically examining the problem descriptors For clarity in comparisons, we analyze the average (ave) and standard deviation (std dev) of problem descriptors across all cases within the casebase.

Table 4 presents the outcomes of the initial and final casebases following non-selective learning across two experimental sets Experiment 1 incorporates both cooperative and individual learning mechanisms, while Experiment 2 relies solely on the individual learning mechanism.

A1 Exp1 Exp2 A2 Exp1 Exp2 A3 Exp1 Exp2 A4 Exp1 Exp2

Based on Table 4, we observe the following:

1 The average of the parameter #task shifts for all agents’ casebases, while the standard deviation of the parameter remains almost the same Since the coverage (standard deviation) of the parameter remains almost the same, this means that the agents have learned to adapt to the environment dynamically and some pre-existing cases have been replaced

2 The average speed value shifts down noticeably, while the coverage (standard agents have learned to adapt to the environment dynamically and some pre-existing cases have been replaced.

3 The average of the parameter #partner remains almost unchanged However, the coverage (or the standard deviation) of the parameter is smaller That means we observe a parameter focusing that increases the resolution of the parameter in the casebase.

4 The average of the parameter timeAllowed remains the same However, the coverage (or the standard deviation) of the parameter becomes larger after learning This means that the problem space expands and the diversity of the casebase increases.

5 Comparing the two experiments’ results, we do not observe significant differences between the two That is, the combine learning and the individual-only learning do not make too much of a difference However, in general, we see that the combine learning tends to result in a higher coverage than individual-only learning This implies that the cooperative learning is able to introduce more diversity to the casebase, as expected.

Table 5 shows the results of C i and C f of the two sets of experiments for the responding casebases, looking only at the problem descriptors.

Comprehensive Experiment Set B (CEB)

In this section, we explore the Coalition Environment Behavior (CEB), focusing on how learning outcomes vary across different scenarios Within our domain, two types of coalitions emerge: multi-sensor target tracking and CPU re-allocation Tracking coalitions require a minimum of three agents to effectively monitor a moving target, while CPU coalitions need just one neighboring agent for assistance The tracking task is not only time-consuming but also highly time-sensitive, necessitating prompt coalition formation to ensure the target remains within the sensor's coverage area Consequently, managing negotiations in tracking scenarios is more complex and prone to failures, often influenced by external environmental factors rather than the negotiation strategies themselves.

In the CEB, we conducted four distinct experiment sets: Experiment Set 1 (ES1), Experiment Set 2 (ES2), Experiment Set 3 (ES3), and Experiment Set 4 (ES4), each focusing on different combinations of coalitions formed among agents.

ES1 A CPU coalition happens much more often than a tracking coalition.

ES2 Both CPU coalitions and tracking coalitions have similar numbers of occurrences. ES3 A tracking coalition happens much more often than a CPU coalition.

ES4 Agents A1 and A2 have much more CPU coalitions than tracking coalition. Agents A3 and A4 have much more tracking coalitions than CPU coalitions.

We selected four distinct coalitions of agents to analyze how learning outcomes vary across different environments In ES1, agents operate in a more favorable setting, facilitating the formation of CPU coalitions, while ES3 presents a more challenging atmosphere for tracking agents ES2 offers a balanced environment, and in ES4, two agents experience a milder setting, contrasting with two agents in a more stringent environment.

In the CEB experiment, each agent begins with an identical initial casebase of size 16, focusing solely on one experiment per set Unlike CEA, which explores three complex objectives including individual and cooperative learning effects, CEB simplifies the process by allowing observation of agent learning behaviors in varied environments By maintaining the same initial casebase and employing both cooperative and individual learning, the design enables clearer insights into how different environments influence agent learning.

1 Turn on the cooperative and individual learning mechanism,

2 Run the system (all components) for around 1 million milliseconds,

3 Collect the casebases after the run, C f ,

Table 31 presents the number of negotiations (#Neg) attempted by agents within both CPU (cpu) and Tracking (trac) coalitions, categorized into two distinct sections: initiating casebase (INI) and responding casebase (RES).

Table 31 Actual numbers of negotiations in ES1, ES2, ES3 and ES4

ES trac cpu Trac Cpu trac cpu trac cpu

This section discusses the analysis on the experiment results for ES1 in which a CPU coalition happens much more often than a tracking coalition The analysis focuses on:

1 The learning behavior of agents in ES1 in terms of utility, diversity and learning rate.

2 The negotiation outcome analysis for agents.

We have general observations of Experiment Set 1 (ES1) where CPU coalitions happen much more often than tracking coalitions Table 32 shows the average numbers collected for ES1

Table 32 Utility and difference gains for ES1

INI CPU RES CPU INI TRACKING RES TRACKING

# learnings 80.7500 5.2500 65.0000 12.2500 16.5000 7.0000 7.5000 1.7500 ave utility gain 0.0408 0.4139 0.1732 0.1459 0.1439 0.5707 1.2131 1.4341 ave diff gain

From Table 32, we can observe:

1 Both the cooperative and individual learning for CPU coalitions happen more often than those for tracking coalitions.

2 The ratio of cooperative learning occurrences to those of individual learning is higher in tracking-related than in CPU-related cases for initiating casebase This is probably due to the lower success rates of tracking-related negotiations With a lower rate, there are more problematic cases to trigger the cooperative learning mechanism.

3 In general, learning has a higher impact on the utility gain in tracking-related cases Also, cooperative learning has a higher average utility gain than individual learning in tracking-related cases Both individual and cooperative learning introduced more utility into tracking-related casebases than CPU-related ones This implies that the role of learning is more important (in terms of the utility—the quality of the solution) in tracking-related tasks than in CPU-related tasks It is possibly due to the fact that the tracking coalition is hard to form and once formed, it can bring high utility into casebases.

4 Both cooperative learning and individual learning bring in more diversity into CPU-related casebases This implies that the role of learning was more important (in terms of the learning rate and the problem coverage) in CPU-related tasks than in tracking-related ones This is possibly because the agents are more adapted to CPU- related tasks than to tracking-related tasks

We report on the analysis of negotiation outcomes for CEB There are various negotiation outcomes possible (see Table 2 in chapter 4): N_SUCCESS,

N_CHANNEL_JAMMED, N_OUT_OF_TIME, N_OUT_OF_RESOURCES,

In our analysis, we examine the statistics of negotiation outcomes categorized as N_ABORTED and N_REJECTED, considering both the initiating and responding perspectives The data, illustrated in Figures 12 and 13, reveals that in ES1, CPU coalitions were significantly more prevalent than tracking coalitions, highlighting the distinct patterns in negotiation results.

Figure 12 The percentages of different initiating negotiation outcomes in ES1

1 Track 1 CPU 2 Track 2 CPU 3 Track 3 CPU 4 Track 4 CPUN_SUCCESS N_CHANNEL_JAMMED N_OUT_OF_TIMEN_OUT_OF_RESOURCES N_ABORTED N_REJECTED

Figure 13 The percentages of different responding negotiation outcomes in ES1

1 The diversity of negotiation outcomes is generally greater on the initiating side of the negotiations than on the responding side This is due to the fact that the initiating agent has more responsibility for managing a negotiation than responding agent An initiating agent monitors all its concurrent negotiations for each coalition that it is trying to form and the failure of one negotiation within the coalition affects greatly what the agent intends to do with the other remaining negotiations within the same coalition This implies that learning on the initiating side would be better in terms of diversity and coverage Correspondingly, learning on the responding side would be more focused and detailed We aim to study this in the future.

2 There were occurrences of “outright rejections.” An outright rejection occurs when the responding agent refuses to negotiate as its feasibility study of the requested task or resource (put forth by the initiating agent) is negative Thus, from the point of view of the responding agent, it never negotiates However, from the point of view of the initiating agent, it has initiated a negotiation By looking at Figures 12 and 13, we see

1 Track 1 CPU 2 Track 2 CPU 3 Track 3 CPU 4 Track 4 CPU

Negotiation outcomes reveal that the impact of rejections is more pronounced for initiating agents than for responding ones, indicating that agents face outright rejections When comparing tracking tasks to CPU reallocation tasks, tracking negotiations lead to a higher frequency of outright rejections due to their resource-intensive nature Agents engaged in tracking a target find it impractical to accept requests for tracking additional targets with different sensors simultaneously, resulting in increased rejections during tracking-related negotiations This highlights a crucial point: negotiation failures stem from the dynamic activities of agents rather than their strategies, diminishing the effectiveness of learning in this context Future research will focus on the feasibility of predicting the activities of other agents.

3 Different coalitions bring out different distributions of negotiation outcomes For both initiating and responding negotiations, the percentage of a successful outcome for a CPU request is higher than that for a tracking request This is due to the reason we have described earlier in the beginning of section 5.3 about CPU-related coalitions being easier to form Due to the same reason, the negotiation outcome of N_OUT_OF_RESOURCES for a tracking request shows up on the initiating side But the negotiations on the initiating side have no outcome type of N_OUT_OF_RESOURCES for a CPU request.

This section discusses the analysis on the experiment results for ES2 in which both CPU coalitions and tracking coalitions have similar numbers of occurrences The analysis focuses on:

1 The learning behavior of agents in ES2 in terms of utility, diversity and learning rate.

2 The negotiation outcome analysis for agents.

We have general observations of Experiment Set 2 (ES2) where the numbers of CPU-related and tracking-related negotiations are similar Table 33 shows the average numbers tallied for ES2

1 The ratio of cooperative learning occurrences to those of individual learning is the same as what we observe in ES1.

2 Generally, we observe the same as what we observe in ES1 in terms of utility gains Learning has a higher impact on the utility gain in tracking-related cases.

3 Cooperative learning introduces more cases into CPU-related casebases than tracking-related casebases, when normalized with the total number of cases learned It also brings in more diversity into CPU-related casebases This implies that the role of cooperative learning is more important (in terms of the learning rate and the problem coverage) in CPU-related tasks than in tracking-related ones.

Table 33 Utility and difference gains for ES2

INI CPU RES CPU INI TRACKING RES TRACKING

0.1407 0.1778 0.3471 0.3665 0.0951 0.4449 0.5439 0.5580 ave diff gain 0.0167 0.0130 0.0529 0.0073 0.0535 0.0042 0.0536 0.0063 size slope 0.1420 0.1438 0.3216 0.2625 0.2829 0.2073 0.2043 0.1146 diff slope 0.0166 0.0143 0.0473 0.0376 0.0489 0.0315 0.0419 0.0167 utility slope 0.1254 0.0615 0.3323 0.2191 0.1396 0.0763 0.5035 0.2796

Figures 14 and 15 show the percentages of the outcomes in ES2 (where CPU and tracking coalitions occurred roughly as frequently).

Figure 14 The percentages of different initiating negotiation outcomes in ES2

1 Track 1 CPU 2 Track 2 CPU 3 Track 3 CPU 4 Track 4 CPU

N_OUT_OF_TIME N_OUT_OF_RESOURCES

Figure 15 The percentages of different responding negotiation outcomes in ES2

1 The diversity of negotiation outcomes was greater on the initiating side of the negotiations than on the responding side This is the same to what happened in ES1 It is due to the same reason explained in ES1

Future Work and Conclusions

Future Work

In section 3.1.2, we talk about the expected learning curve of an agent’s casebase.

The study is structured into four phases, yet we have not yet analyzed our experiments to confirm that the agents follow the anticipated learning curve Future analysis will be conducted, as understanding this learning curve is crucial for comprehending an agent's casebase and its learning behavior in greater depth.

In section 4.1.3, the COOPERATIVE_TRIGGER parameter, ranging from 0 to 1.0 with a current value of 0.5, serves as a threshold for assessing success rates and new case occurrences This value can be adjusted according to specific requirements; for instance, a lower value can be set to minimize the frequency of cooperative learning when it proves to be costly, particularly in challenging scenarios with noisy or delayed communication Future investigations will focus on optimizing this parameter to achieve more favorable outcomes as needed.

In section 5.3.1.1.2, it is noted that the diversity of negotiation outcomes is typically greater for the initiating party compared to the responding party, as the initiator bears more responsibility in managing negotiations This suggests that learning on the initiating side is more varied and extensive, while learning on the responding side tends to be more focused and detailed Future research will aim to clarify how different environments impact agents' learning processes Additionally, it is highlighted that negotiations often fail not due to strategies but because of the dynamic interactions between agents, which diminishes the effectiveness of learning in ES1 of CEB We intend to explore the potential for predicting the activities of other agents in future studies to enhance the learning experience for agents.

Conclusions

Chapter 3 outlines our methodology, which integrates a cautious utility-based adaptive mechanism to blend cooperative and individual learning It includes an interaction protocol for effective information solicitation and exchange, as well as the concept of a chronological casebase.

Chapter 4 provides an in-depth exploration of our methodology's implementation, focusing on key components such as usage history, chronological casebases, and the sharing of cases through cooperative triggers, neighbor selection, and message passing Additionally, we outline the processes for measurement collection and detail the interfaces between the Case-Based Reasoning (CBR) module and other modules within our agent system.

Chapter 5 describes the experiments and the analyses on the experiment results It includes the discussions of overall experiment strategy, the comprehensive experiment set A and the comprehensive experiment set B.

Our project aims to enhance agents' problem-solving capabilities by implementing a distributed multi-strategy learning methodology rooted in case-based reasoning (CBR) within a multi-agent environment While this cooperative learning approach has potential, it may incur significant communication and coordination costs that could hinder its efficiency and timeliness Additionally, since each agent learns based on its unique experiences and perspectives, solutions developed by one agent may not be transferable to others facing similar challenges Furthermore, incorporating external knowledge can introduce risks, potentially increasing processing costs without necessarily improving the quality of solutions.

To prevent the multi-agent system from degrading team performance, we employ the cautious utility-based adaptive mechanism to combine cooperative learning and individual learning together in our project

This project aims to investigate whether a combination of cooperative and individual learning enhances performance in multi-agent systems compared to individual learning alone Our research objectives include examining the impact of cooperative learning on subsequent individual learning, exploring the roles of cooperative learning among agents with varying initial knowledge, assessing the feasibility of our multi-strategy learning methodology, and evaluating agents' adaptability across different problem domains and environments.

We conclude that cooperative learning brings more diversity and utility than individual learning and the environment does affect the learning behavior of agents.

Finally, the paper Combining Individual and Cooperative Learning for Multi- agent Negotiations, which is written based on our project, has been accepted by the

Second International Joint Conference on Autonomous Agents and Multi-Agent Systems(AAMAS03).

[1] Aamodt, A and Plaza, E Case-Based Reasoning: Foundational Issues,

Methodological Variations, and System Approaches AI Communications, 7(i): pp 39-

[2] Marsella, S., Adibi, J., Al-Onaizan, Y., Kaminka, G A., Muslea, I., Tallis, M., and Tambe, M, On being a teammate: experiences acquired in the design of RoboCup teams, Proc 3 rd Agents’99, Seattle, WA, 221-227, 1999.

[3] A L Prodromidis and S J Stolfo, Agent-based distributed learning applied to fraud detection In Sixteenth National Conference on Artificial Intelligence (Submitted for publication).

[4] Philip K Chan and Salvatore J Stolfo, Experiments in Multistrategy Learning by

Meta-Learning, Proceedings of the second international conference on information and knowledge management, Washington, DC, 314-323, 1993.

[5] Winton Davies and, Pete Edwards, Agent-Based Knowledge Discovery In Working Notes of the AAAI Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments Stanford University, Stanford, CA, March, 1995

[6] Goldman, C and Rosenschein, J Mutually, Supervised learning in multi-agent systems In Proceedings of the IJCAI-95 Workshop on Adaptation and Learning in Multi-

Agent Systems, Montreal, CA., August 1995

[7] A Lazarevic, D Pokrajac, and Z Obradovic, Distributed Clustering and Local regression for Knowledge Discovery in Multiple Spatial Databases, In Proceedings of 8th

European Symposium on Artificial Neural Networks, pp 129-134, 2000.

[8] T Sugawara and V Lesser, Learning to improve coordinated actions in cooperative distributed problem-solving environments, Machine Learning, 33(2-3):129-153, 1998.

[9] Chan, P K and Stolfo, S J Experiments on multistrategy learning by meta- learning, Proc 2 nd CIKM, Washington, DC, 314-323, 1993.

[10] Nagendra Prasad, Susan Lander, and Victor Lesser, Cooperative learning over composite search spaces: Experiences with a multi-agent design system In Proceedings of Thirteenth National Conference on Artificial Intelligence, pp 68-73, 1996.

[11] Andrew Garland and Richard Alterman, Learning Cooperative Procedures, In AAAI Symposium on Integrating Planning, Scheduling and Execution in Dynamic and Uncertain Environments AAAI Technical Report WS-98-02, 54-61, 1998.

[12] R Venkateswaran and Z Obradovi'c, Efficient learning through cooperation In World Congress on Neural Networks, volume 3, pages 390-395, San Diego, June 1994.

[13] Ishida, T., Two is not always better than one: Experiences in real-time bidirectional search In Proceedings of the International Conference on Multi-Agent Systems, pp 185-

[14] Nagendra Prasad and Eric Plaza, Corporate Memories as Distributed Case

Libraries, In Corporate Memory & Enterprise Modeling track in KAW’96, Tenth

Knowledge Acquisition for Knowledge-Based Systems, pp1-19, 1996.

[15] Francisco J Martin, Eric Plaza and Josep L Arcos, Knowledge and Experience through Communication among Competent (Peer) Agents, International Journal of

Software Engineering and Knowledge Engineering, pp1-21, 1999.

[16] Francisco J Martin and Eric Plaza, Auction-based Retrieval, Second Congres Catala d'Intel.ligencia Artificial, pp 1-9, 1999.

[17] Enric Plaza, Josep L Arcos and Francisco Martin, Cooperative Case-Based

Reasoning, Distributed Artificial Intelligence meets Machine Learning, Lecture Notes in

Artificial Intelligence, G Weiss, Springer Verlag, pp1-21, 1997.

[18] Ashwin Ram and Juan Carlos Santamaria, Continuous Case-Based Reasoning, Proceedings of the AAAI-93 Workshop on Case-Based Reasoning, pp 86-93, Washington DC, July 1993.

[19] Anthony G Francis and Ashwin Ram, The Utility Problem in Case Based

Reasoning, Case-Based Reasoning: Papers from the 1993 Workshop, Technical Report

WS-93-01, Washington, D.C., AAAI Press, July 11-12.

[20] Eduardo Alonso, Mark d’Inverno, Daniel Kudenko, Michael Luck and Jason Noble, learning in Multi-Agent Systems, Third Workshop of the UK’s Special Interest Group on

[21] Soh, L.-K and Tsatsoulis, C Reflective negotiating agents for real-time multisensor target tracking, Proc IJCAI’01, (Seattle, WA, August 6-11 2001), 1121-1127.

[22] Soh, L.-K and Tsatsoulis, C Learning to form negotiation coalitions in a multi- agent system, AAAI Spring Symposium on Collaborative Learning Agents, 2002.

[23] Littman, M and Boyan, J A distributed reinforcement learning scheme for network routing, Proc Int Workshop on Application of Neural Networks to Telecomm., 45-51, 1993.

[24] Lazarevic, A and Obradovic, Z The distributed boosting algorithm, Knowledge Discovery and Data Mining, 311-316, 2001.

[25] Gerhard Weiss, Multi-agent Systems: A modern Approach to Distributed Artificial Intelligence, The MIT Press, London, England, 1999.

[26] Jacobs, R., et al Adaptive mixture of local experts, Neural Computation, 3(1), 79-

[27] S Sian Adaptation based cooperative learning multi-agent systems, In Distributed

[28] Christopher Watkins and Peter Dayan, Technical Note Q-learning, Machine Learning, 8:279-292, 1992.

[29] M Tan, Multi-agent reinforcement Learning: Independent vs Cooperative Agents,Machine Learning: Proceedings of the Tenth International Conference, 330-337, 1993.

Table 1 Utility and difference gains for Agent A2, performing both learning mechanisms in ES2 of CEA

Initiating Casebase Individual Coop Ave Responding Casebase Individual Coop Ave

#new cases 11 1 12.0000 #new cases 13 1 14.0000 avg util gain 0.0618 0.1842 0.0715 avg util gain 0.0792 0.205 0.1107 max util gain 0.5556 0.2906 0.5556 max util gain 0.9357 0.5555 0.9357 min util gain -0.1005 0.0952 -0.1005 min util gain -0.6667 0 -0.6667

#util gain

Tiêu đề	Case-Based Learning Behavior in a Real Time Multi-Agent System
Tác giả	Juan Luo
Người hướng dẫn	Dr. Leen-kiat Soh
Trường học	University of Nebraska
Chuyên ngành	Computer Science
Thể loại	Master's Project
Năm xuất bản	2003
Thành phố	Lincoln

Định dạng
Số trang	146
Dung lượng	1,88 MB