--- title: AI Agent UN - Multi-Agent Simulation Framework emoji: 🏛️ colorFrom: blue colorTo: indigo sdk: gradio sdk_version: 4.44.0 app_file: app.py pinned: false license: mit --- ![AI Agent UN Banner](images/banner.jpg) # AI Agent United Nations: Multi-Agent Simulation Framework A structured system for simulating international diplomatic decision-making using 195 AI agents with constrained JSON outputs. ## System Overview This is an experimental framework demonstrating: - **Multi-agent coordination** across 195 independent AI agents - **Structured output constraints** with strict JSON schema validation - **Generic prompt templates** producing country-specific behaviors - **Task execution model** for running resolutions through all agents ### High-Level Concept ```mermaid graph TB subgraph "Input Layer" RES[UN Resolution Text] end subgraph "Agent Layer - 195 Independent Agents" A1[Agent: USA
System Prompt] A2[Agent: China
System Prompt] A3[Agent: Russia
System Prompt] ADOT[...] A195[Agent: Tuvalu
System Prompt] end subgraph "LLM Processing" LLM[Claude 3.5 Sonnet
Structured JSON Output] end subgraph "Output Layer" V1[Vote: yes
Statement: ...] V2[Vote: no
Statement: ...] V3[Vote: yes
Statement: ...] VDOT[...] V195[Vote: yes
Statement: ...] end subgraph "Aggregation" AGG[Combined Results
Vote Counts + All Statements] end RES --> A1 RES --> A2 RES --> A3 RES --> ADOT RES --> A195 A1 --> LLM A2 --> LLM A3 --> LLM ADOT --> LLM A195 --> LLM LLM --> V1 LLM --> V2 LLM --> V3 LLM --> VDOT LLM --> V195 V1 --> AGG V2 --> AGG V3 --> AGG VDOT --> AGG V195 --> AGG style RES fill:#6366f1 style LLM fill:#8b5cf6 style AGG fill:#22c55e style A1 fill:#f59e0b style A2 fill:#f59e0b style A3 fill:#f59e0b style A195 fill:#f59e0b ``` ## System Architecture ```mermaid graph TB subgraph Input M[Motion Text
tasks/motions/] C[Country List
195 UN Members] end subgraph "Agent Processing" SP[System Prompt
Generic Template] UP[User Prompt
+ Motion Text] LLM[Claude 3.5 Sonnet
Temperature: 0.7] end subgraph "Output Validation" JSON[JSON Parser] V[Schema Validator] E[Error Handler] end subgraph Results AGG[Aggregated Results] META[Metadata] FILE[JSON Output File] end M --> UP C --> SP SP --> LLM UP --> LLM LLM --> JSON JSON --> V V --> E E --> AGG AGG --> META META --> FILE style LLM fill:#6366f1 style JSON fill:#22c55e style V fill:#f59e0b style FILE fill:#8b5cf6 ``` ## Agent Processing Flow ```mermaid sequenceDiagram participant CLI as CLI Runner participant Agent as Country Agent participant LLM as Claude 3.5 participant Val as Validator participant Store as Storage CLI->>Agent: Load system prompt CLI->>Agent: Send motion text Agent->>LLM: System + User Prompt LLM->>Agent: Raw text response Agent->>Val: Parse JSON alt Valid JSON Val->>Val: Check schema alt Valid Schema Val->>Store: Save vote + statement else Invalid Schema Val->>Store: Save as abstain + error end else Invalid JSON Val->>Store: Save as abstain + error end Store->>CLI: Continue to next country ``` ## Core Components ### 1. Agent System Prompts ```mermaid graph LR subgraph "Generic Template" T[Template Structure] end subgraph "Variables" CN[Country Name] P5[P5 Status] end subgraph "195 Agents" US[United States] CN2[China] RU[Russia] DOT[...] TV[Tuvalu] end T --> CN T --> P5 CN --> US CN --> CN2 CN --> RU CN --> DOT CN --> TV style T fill:#6366f1 style US fill:#22c55e style CN2 fill:#22c55e style RU fill:#22c55e style TV fill:#22c55e ``` - 195 country-specific agents (one per UN member state) - Generic template structure (identical for all countries) - Only country name and P5 status differ between prompts - AI infers policy positions from training data ### 2. Structured Output Schema ```json { "vote": "yes" | "no" | "abstain", "statement": "Brief explanation (2-4 sentences)" } ``` ### 3. Validation Pipeline ```mermaid graph TD A[LLM Response] --> B{Valid JSON?} B -->|Yes| C{Has vote field?} B -->|No| ERR1[Error: Parse Failure] C -->|Yes| D{Has statement field?} C -->|No| ERR2[Error: Missing Vote] D -->|Yes| E{Vote is yes/no/abstain?} D -->|No| ERR3[Error: Missing Statement] E -->|Yes| SUCCESS[Save Response] E -->|No| ERR4[Error: Invalid Vote] ERR1 --> DEFAULT[Save as Abstain + Error Flag] ERR2 --> DEFAULT ERR3 --> DEFAULT ERR4 --> DEFAULT style SUCCESS fill:#22c55e style DEFAULT fill:#f59e0b style ERR1 fill:#ef4444 style ERR2 fill:#ef4444 style ERR3 fill:#ef4444 style ERR4 fill:#ef4444 ``` ### 4. Model Configuration - **Primary:** Claude 3.5 Sonnet (claude-3-5-sonnet-20241022) - **Temperature:** 0.7 (balance consistency + variation) - **Max tokens:** 800 per response - **Provider:** Anthropic API ## What This Tests - **LLM Geopolitical Knowledge**: How well models understand different countries' foreign policies - **Structured Outputs**: Consistency in producing valid JSON under constraints - **Multi-Agent Systems**: Coordinating hundreds of independent AI agents - **Prompt Engineering**: Generic templates yielding specific behaviors - **Error Handling**: Graceful degradation when agents produce invalid outputs ## Technical Implementation ### Execution Flow ```mermaid graph TD START[Start Simulation] --> LOAD_MOTION[Load Motion Text
tasks/motions/motion_id.md] LOAD_MOTION --> LOAD_COUNTRIES[Load Country List
195 UN Members] LOAD_COUNTRIES --> LOOP_START{For Each Country} LOOP_START -->|Country 1-195| LOAD_PROMPT[Load System Prompt
agents/representatives/country/] LOAD_PROMPT --> BUILD_USER[Build User Prompt
Motion + Instructions] BUILD_USER --> API_CALL[API Call to Claude
System + User Prompt] API_CALL --> PARSE[Parse JSON Response] PARSE --> VALIDATE[Validate Schema] VALIDATE -->|Valid| STORE[Store Result] VALIDATE -->|Invalid| ERROR[Store Error + Abstain] STORE --> LOOP_START ERROR --> LOOP_START LOOP_START -->|All Done| AGGREGATE[Aggregate Results] AGGREGATE --> CALC_STATS[Calculate Vote Summary] CALC_STATS --> ADD_META[Add Metadata
model, timestamp, etc] ADD_META --> SAVE_TIME[Save Timestamped File
motion_id_timestamp.json] SAVE_TIME --> SAVE_LATEST[Save Latest File
motion_id_latest.json] SAVE_LATEST --> END[Complete] style API_CALL fill:#6366f1 style VALIDATE fill:#f59e0b style STORE fill:#22c55e style ERROR fill:#ef4444 style END fill:#8b5cf6 ``` ### Command Line Interface ```bash # Run simulation python scripts/run_motion.py 01_gaza_ceasefire_resolution # With specific model python scripts/run_motion.py 01_gaza_ceasefire_resolution --model claude-3-5-sonnet-20241022 # Test with sample python scripts/run_motion.py 01_gaza_ceasefire_resolution --sample 5 ``` ### Output Structure ```mermaid graph LR subgraph "JSON Output" ROOT[Root Object] META[Metadata] VOTES[Votes Array] end subgraph "Metadata Fields" ID[motion_id] TS[timestamp] MODEL[model] TOTAL[total_votes] SUMMARY[vote_summary] end subgraph "Vote Summary" YES[yes: count] NO[no: count] ABS[abstain: count] end subgraph "Individual Votes" V1[Vote 1: Country, vote, statement] V2[Vote 2: Country, vote, statement] V3[...] V195[Vote 195: Country, vote, statement] end ROOT --> META ROOT --> VOTES META --> ID META --> TS META --> MODEL META --> TOTAL META --> SUMMARY SUMMARY --> YES SUMMARY --> NO SUMMARY --> ABS VOTES --> V1 VOTES --> V2 VOTES --> V3 VOTES --> V195 style ROOT fill:#8b5cf6 style META fill:#6366f1 style VOTES fill:#22c55e ``` ## Case Study: Gaza Ceasefire Resolution The Space includes a case study demonstrating the system with a Gaza ceasefire resolution voted on by all 195 agents. ### Results Overview ```mermaid pie title Vote Distribution (195 Countries) "Yes" : 190 "No" : 3 "Abstain" : 2 ``` **Key Statistics:** - **Yes:** 190 countries (97.4%) - **No:** 3 countries (1.5%) - **Abstain:** 2 countries (1.0%) This serves as a concrete example of the framework in action, showing how generic prompts + model knowledge produce diverse, country-specific diplomatic responses. ## Research Applications - Testing LLM knowledge of international relations - Evaluating structured output consistency - Studying emergent behavior in multi-agent systems - Educational demonstrations of diplomatic complexity ## Limitations This is a simulation for research and education: - AI positions based on training data, not actual policies - Does NOT predict real government decisions - Should NOT be considered authoritative - Real diplomacy involves classified information and human judgment ## Open Source All code, prompts, and data available on GitHub: - Repository: https://github.com/danielrosehill/AI-Agent-UN - System Prompts: https://github.com/danielrosehill/AI-Agent-UN/tree/main/agents/representatives - Execution Script: https://github.com/danielrosehill/AI-Agent-UN/blob/main/scripts/run_motion.py --- Built with Gradio | Powered by Anthropic Claude