Spaces:
Sleeping
Sleeping
| title: AI Agent UN - Multi-Agent Simulation Framework | |
| emoji: 🏛️ | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 4.44.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
|  | |
| # AI Agent United Nations: Multi-Agent Simulation Framework | |
| A structured system for simulating international diplomatic decision-making using 195 AI agents with constrained JSON outputs. | |
| ## System Overview | |
| This is an experimental framework demonstrating: | |
| - **Multi-agent coordination** across 195 independent AI agents | |
| - **Structured output constraints** with strict JSON schema validation | |
| - **Generic prompt templates** producing country-specific behaviors | |
| - **Task execution model** for running resolutions through all agents | |
| ### High-Level Concept | |
| ```mermaid | |
| graph TB | |
| subgraph "Input Layer" | |
| RES[UN Resolution Text] | |
| end | |
| subgraph "Agent Layer - 195 Independent Agents" | |
| A1[Agent: USA<br/>System Prompt] | |
| A2[Agent: China<br/>System Prompt] | |
| A3[Agent: Russia<br/>System Prompt] | |
| ADOT[...] | |
| A195[Agent: Tuvalu<br/>System Prompt] | |
| end | |
| subgraph "LLM Processing" | |
| LLM[Claude 3.5 Sonnet<br/>Structured JSON Output] | |
| end | |
| subgraph "Output Layer" | |
| V1[Vote: yes<br/>Statement: ...] | |
| V2[Vote: no<br/>Statement: ...] | |
| V3[Vote: yes<br/>Statement: ...] | |
| VDOT[...] | |
| V195[Vote: yes<br/>Statement: ...] | |
| end | |
| subgraph "Aggregation" | |
| AGG[Combined Results<br/>Vote Counts + All Statements] | |
| end | |
| RES --> A1 | |
| RES --> A2 | |
| RES --> A3 | |
| RES --> ADOT | |
| RES --> A195 | |
| A1 --> LLM | |
| A2 --> LLM | |
| A3 --> LLM | |
| ADOT --> LLM | |
| A195 --> LLM | |
| LLM --> V1 | |
| LLM --> V2 | |
| LLM --> V3 | |
| LLM --> VDOT | |
| LLM --> V195 | |
| V1 --> AGG | |
| V2 --> AGG | |
| V3 --> AGG | |
| VDOT --> AGG | |
| V195 --> AGG | |
| style RES fill:#6366f1 | |
| style LLM fill:#8b5cf6 | |
| style AGG fill:#22c55e | |
| style A1 fill:#f59e0b | |
| style A2 fill:#f59e0b | |
| style A3 fill:#f59e0b | |
| style A195 fill:#f59e0b | |
| ``` | |
| ## System Architecture | |
| ```mermaid | |
| graph TB | |
| subgraph Input | |
| M[Motion Text<br/>tasks/motions/] | |
| C[Country List<br/>195 UN Members] | |
| end | |
| subgraph "Agent Processing" | |
| SP[System Prompt<br/>Generic Template] | |
| UP[User Prompt<br/>+ Motion Text] | |
| LLM[Claude 3.5 Sonnet<br/>Temperature: 0.7] | |
| end | |
| subgraph "Output Validation" | |
| JSON[JSON Parser] | |
| V[Schema Validator] | |
| E[Error Handler] | |
| end | |
| subgraph Results | |
| AGG[Aggregated Results] | |
| META[Metadata] | |
| FILE[JSON Output File] | |
| end | |
| M --> UP | |
| C --> SP | |
| SP --> LLM | |
| UP --> LLM | |
| LLM --> JSON | |
| JSON --> V | |
| V --> E | |
| E --> AGG | |
| AGG --> META | |
| META --> FILE | |
| style LLM fill:#6366f1 | |
| style JSON fill:#22c55e | |
| style V fill:#f59e0b | |
| style FILE fill:#8b5cf6 | |
| ``` | |
| ## Agent Processing Flow | |
| ```mermaid | |
| sequenceDiagram | |
| participant CLI as CLI Runner | |
| participant Agent as Country Agent | |
| participant LLM as Claude 3.5 | |
| participant Val as Validator | |
| participant Store as Storage | |
| CLI->>Agent: Load system prompt | |
| CLI->>Agent: Send motion text | |
| Agent->>LLM: System + User Prompt | |
| LLM->>Agent: Raw text response | |
| Agent->>Val: Parse JSON | |
| alt Valid JSON | |
| Val->>Val: Check schema | |
| alt Valid Schema | |
| Val->>Store: Save vote + statement | |
| else Invalid Schema | |
| Val->>Store: Save as abstain + error | |
| end | |
| else Invalid JSON | |
| Val->>Store: Save as abstain + error | |
| end | |
| Store->>CLI: Continue to next country | |
| ``` | |
| ## Core Components | |
| ### 1. Agent System Prompts | |
| ```mermaid | |
| graph LR | |
| subgraph "Generic Template" | |
| T[Template Structure] | |
| end | |
| subgraph "Variables" | |
| CN[Country Name] | |
| P5[P5 Status] | |
| end | |
| subgraph "195 Agents" | |
| US[United States] | |
| CN2[China] | |
| RU[Russia] | |
| DOT[...] | |
| TV[Tuvalu] | |
| end | |
| T --> CN | |
| T --> P5 | |
| CN --> US | |
| CN --> CN2 | |
| CN --> RU | |
| CN --> DOT | |
| CN --> TV | |
| style T fill:#6366f1 | |
| style US fill:#22c55e | |
| style CN2 fill:#22c55e | |
| style RU fill:#22c55e | |
| style TV fill:#22c55e | |
| ``` | |
| - 195 country-specific agents (one per UN member state) | |
| - Generic template structure (identical for all countries) | |
| - Only country name and P5 status differ between prompts | |
| - AI infers policy positions from training data | |
| ### 2. Structured Output Schema | |
| ```json | |
| { | |
| "vote": "yes" | "no" | "abstain", | |
| "statement": "Brief explanation (2-4 sentences)" | |
| } | |
| ``` | |
| ### 3. Validation Pipeline | |
| ```mermaid | |
| graph TD | |
| A[LLM Response] --> B{Valid JSON?} | |
| B -->|Yes| C{Has vote field?} | |
| B -->|No| ERR1[Error: Parse Failure] | |
| C -->|Yes| D{Has statement field?} | |
| C -->|No| ERR2[Error: Missing Vote] | |
| D -->|Yes| E{Vote is yes/no/abstain?} | |
| D -->|No| ERR3[Error: Missing Statement] | |
| E -->|Yes| SUCCESS[Save Response] | |
| E -->|No| ERR4[Error: Invalid Vote] | |
| ERR1 --> DEFAULT[Save as Abstain + Error Flag] | |
| ERR2 --> DEFAULT | |
| ERR3 --> DEFAULT | |
| ERR4 --> DEFAULT | |
| style SUCCESS fill:#22c55e | |
| style DEFAULT fill:#f59e0b | |
| style ERR1 fill:#ef4444 | |
| style ERR2 fill:#ef4444 | |
| style ERR3 fill:#ef4444 | |
| style ERR4 fill:#ef4444 | |
| ``` | |
| ### 4. Model Configuration | |
| - **Primary:** Claude 3.5 Sonnet (claude-3-5-sonnet-20241022) | |
| - **Temperature:** 0.7 (balance consistency + variation) | |
| - **Max tokens:** 800 per response | |
| - **Provider:** Anthropic API | |
| ## What This Tests | |
| - **LLM Geopolitical Knowledge**: How well models understand different countries' foreign policies | |
| - **Structured Outputs**: Consistency in producing valid JSON under constraints | |
| - **Multi-Agent Systems**: Coordinating hundreds of independent AI agents | |
| - **Prompt Engineering**: Generic templates yielding specific behaviors | |
| - **Error Handling**: Graceful degradation when agents produce invalid outputs | |
| ## Technical Implementation | |
| ### Execution Flow | |
| ```mermaid | |
| graph TD | |
| START[Start Simulation] --> LOAD_MOTION[Load Motion Text<br/>tasks/motions/motion_id.md] | |
| LOAD_MOTION --> LOAD_COUNTRIES[Load Country List<br/>195 UN Members] | |
| LOAD_COUNTRIES --> LOOP_START{For Each Country} | |
| LOOP_START -->|Country 1-195| LOAD_PROMPT[Load System Prompt<br/>agents/representatives/country/] | |
| LOAD_PROMPT --> BUILD_USER[Build User Prompt<br/>Motion + Instructions] | |
| BUILD_USER --> API_CALL[API Call to Claude<br/>System + User Prompt] | |
| API_CALL --> PARSE[Parse JSON Response] | |
| PARSE --> VALIDATE[Validate Schema] | |
| VALIDATE -->|Valid| STORE[Store Result] | |
| VALIDATE -->|Invalid| ERROR[Store Error + Abstain] | |
| STORE --> LOOP_START | |
| ERROR --> LOOP_START | |
| LOOP_START -->|All Done| AGGREGATE[Aggregate Results] | |
| AGGREGATE --> CALC_STATS[Calculate Vote Summary] | |
| CALC_STATS --> ADD_META[Add Metadata<br/>model, timestamp, etc] | |
| ADD_META --> SAVE_TIME[Save Timestamped File<br/>motion_id_timestamp.json] | |
| SAVE_TIME --> SAVE_LATEST[Save Latest File<br/>motion_id_latest.json] | |
| SAVE_LATEST --> END[Complete] | |
| style API_CALL fill:#6366f1 | |
| style VALIDATE fill:#f59e0b | |
| style STORE fill:#22c55e | |
| style ERROR fill:#ef4444 | |
| style END fill:#8b5cf6 | |
| ``` | |
| ### Command Line Interface | |
| ```bash | |
| # Run simulation | |
| python scripts/run_motion.py 01_gaza_ceasefire_resolution | |
| # With specific model | |
| python scripts/run_motion.py 01_gaza_ceasefire_resolution --model claude-3-5-sonnet-20241022 | |
| # Test with sample | |
| python scripts/run_motion.py 01_gaza_ceasefire_resolution --sample 5 | |
| ``` | |
| ### Output Structure | |
| ```mermaid | |
| graph LR | |
| subgraph "JSON Output" | |
| ROOT[Root Object] | |
| META[Metadata] | |
| VOTES[Votes Array] | |
| end | |
| subgraph "Metadata Fields" | |
| ID[motion_id] | |
| TS[timestamp] | |
| MODEL[model] | |
| TOTAL[total_votes] | |
| SUMMARY[vote_summary] | |
| end | |
| subgraph "Vote Summary" | |
| YES[yes: count] | |
| NO[no: count] | |
| ABS[abstain: count] | |
| end | |
| subgraph "Individual Votes" | |
| V1[Vote 1: Country, vote, statement] | |
| V2[Vote 2: Country, vote, statement] | |
| V3[...] | |
| V195[Vote 195: Country, vote, statement] | |
| end | |
| ROOT --> META | |
| ROOT --> VOTES | |
| META --> ID | |
| META --> TS | |
| META --> MODEL | |
| META --> TOTAL | |
| META --> SUMMARY | |
| SUMMARY --> YES | |
| SUMMARY --> NO | |
| SUMMARY --> ABS | |
| VOTES --> V1 | |
| VOTES --> V2 | |
| VOTES --> V3 | |
| VOTES --> V195 | |
| style ROOT fill:#8b5cf6 | |
| style META fill:#6366f1 | |
| style VOTES fill:#22c55e | |
| ``` | |
| ## Case Study: Gaza Ceasefire Resolution | |
| The Space includes a case study demonstrating the system with a Gaza ceasefire resolution voted on by all 195 agents. | |
| ### Results Overview | |
| ```mermaid | |
| pie title Vote Distribution (195 Countries) | |
| "Yes" : 190 | |
| "No" : 3 | |
| "Abstain" : 2 | |
| ``` | |
| **Key Statistics:** | |
| - **Yes:** 190 countries (97.4%) | |
| - **No:** 3 countries (1.5%) | |
| - **Abstain:** 2 countries (1.0%) | |
| This serves as a concrete example of the framework in action, showing how generic prompts + model knowledge produce diverse, country-specific diplomatic responses. | |
| ## Research Applications | |
| - Testing LLM knowledge of international relations | |
| - Evaluating structured output consistency | |
| - Studying emergent behavior in multi-agent systems | |
| - Educational demonstrations of diplomatic complexity | |
| ## Limitations | |
| This is a simulation for research and education: | |
| - AI positions based on training data, not actual policies | |
| - Does NOT predict real government decisions | |
| - Should NOT be considered authoritative | |
| - Real diplomacy involves classified information and human judgment | |
| ## Open Source | |
| All code, prompts, and data available on GitHub: | |
| - Repository: https://github.com/danielrosehill/AI-Agent-UN | |
| - System Prompts: https://github.com/danielrosehill/AI-Agent-UN/tree/main/agents/representatives | |
| - Execution Script: https://github.com/danielrosehill/AI-Agent-UN/blob/main/scripts/run_motion.py | |
| --- | |
| Built with Gradio | Powered by Anthropic Claude | |