This repository presents a meta-learning approach for reinforcement learning (RL) environments, leveraging Graph Neural Networks (GNNs) to enable dynamic adaptability. The project emphasizes multi-agent setups where agents collaboratively learn optimal policies, focusing on flexibility, shared information, and environment-aware strategies.
The project aims to equip RL agents with the ability to adapt to varying task difficulties and dynamic interactions using meta-learning techniques and GNNs. Agents interact in a simulated city-like grid, taking on distinct objectives and utilizing shared information for optimized decision-making.
- Meta-Learning: Dynamically balances success and failure rates to achieve a 50/50 outcome.
- Graph Neural Networks: Models agent relationships, enabling enhanced real-time adaptability.
- Multi-Agent Policies: Develops specialized strategies for distinct roles.
- Dynamic Environment: Adjusts parameters like agent count and resources to ensure evolving difficulty.
- Shared Policemen Policy: Unifies strategies across agents for improved coordination.
The simulation involves a grid-based city environment where:
- MrX: Operates as the target agent, focusing on evasion.
- Policemen: Cooperatively work to track and capture MrX.
- Difficulty Parameter: Modifies agent capabilities and resources to fine-tune task complexity.
The outer loop adjusts task difficulty through:
- Collecting and analyzing performance data from multiple episodes.
- Balancing success and failure rates to maintain a stable learning environment.
- Embedding difficulty adjustments as a learnable parameter directly into the environment.
GNNs enhance the system by:
- Spatial and Temporal Encoding: Capturing dynamic relationships among agents.
- State Sharing: Facilitating coordinated strategies across multiple agents.
- Policy Adaptability: Supporting flexible decision-making through graph-based message passing.
- MrX Policy: Optimized to maximize evasion success.
- Policemen Policy: Shared across agents to promote efficient collaboration and coordination.
-
main.py
- Contains the main entry point (train and evaluate functions) and the central training loop.
- Sets up the command-line arguments, loads configurations, initializes the logger, environment, and agents.
- Implements the logic for either training or evaluating the RL agents based on arguments.
-
logger.py
- Defines the Logger class for handling logging to console, file, TensorBoard, and Weights & Biases.
- Manages logging metrics, weights, and model artifacts.
-
Enviroment/base_env.py
- Declares an abstract base class (BaseEnvironment) for custom environments using PettingZoo’s ParallelEnv.
-
Enviroment/graph_layout.py
- Contains a custom ConnectedGraph class for creating random connected graphs with optional extra edges and weights.
- Provides graph sampling logic (e.g., Prim’s algorithm to ensure connectivity).
-
Enviroment/yard.py
- Implements CustomEnvironment, which inherits from BaseEnvironment.
- Manages environment reset, step logic, agent positions, reward calculations, rendering, and graph observations.
-
RLAgent/base_agent.py
- Declares an abstract BaseAgent class defining the interface (select_action, update, etc.) for all RL agents.
-
RLAgent/gnn_agent.py
- Defines GNNAgent, a DQN-like agent using a GNN (GNNModel) to compute Q-values for graph nodes.
- Handles experience replay, epsilon-greedy action selection, and network updates.
- Initialize logger, network(s), optimizers, and hyperparameters.
- For each epoch:
- Randomly choose environment config (number of agents, money, etc.).
- Forward pass through the RewardWeightNet to compute reward weights for the environment.
- Inside loop: for each episode:
- Reset environment, get initial state.
- While not done:
- Build GNN input (create_graph_data), pick actions (MrX and Police) using the GNN agents.
- env.step(actions), compute rewards/terminations, update agents.
- Evaluate performance (num_eval_episodes), compute target difficulty, backpropagate loss in RewardWeightNet.
- Log metrics and proceed to the next epoch.
Clone the repository and install the required dependencies using apptainer:
git clone https://github.com/elte-collective-intelligence/Mechanism-Design.git
cd Mechanism-Design
./build.sh
This should build the apptainer image.
If you want to use wandb to log your experiments, dont forget to set the enviromental variables:
- WANDB_PROJECT
- WANDB_ENTITY
- WANDB_API_KEY
- In the experiment folder, create a folder with the name of you experiment.
- Add a config.yml file to it, with the required configurations (there are examples)
Start the training with one experiment:
./run.sh name-of-experiment
Start the training with every experiments defined in the experiment folder:
./run_all_experiments.sh
If you want to evaluate the policies, with visualized graphs add the
evaluate=True
We welcome contributions! To contribute:
- Fork the repository.
- Create a feature branch.
- Submit a pull request with detailed descriptions of changes.
This project is licensed under the MIT License. See the LICENSE
file for details.