Spaces:

danielrosehill
/

Agent-UN

Sleeping

File size: 9,950 Bytes

---
title: AI Agent UN - Multi-Agent Simulation Framework
emoji: 🏛️
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
---

![AI Agent UN Banner](images/banner.jpg)

# AI Agent United Nations: Multi-Agent Simulation Framework

A structured system for simulating international diplomatic decision-making using 195 AI agents with constrained JSON outputs.

## System Overview

This is an experimental framework demonstrating:
- **Multi-agent coordination** across 195 independent AI agents
- **Structured output constraints** with strict JSON schema validation
- **Generic prompt templates** producing country-specific behaviors
- **Task execution model** for running resolutions through all agents

### High-Level Concept

```mermaid
graph TB
    subgraph "Input Layer"
        RES[UN Resolution Text]
    end

    subgraph "Agent Layer - 195 Independent Agents"
        A1[Agent: USA<br/>System Prompt]
        A2[Agent: China<br/>System Prompt]
        A3[Agent: Russia<br/>System Prompt]
        ADOT[...]
        A195[Agent: Tuvalu<br/>System Prompt]
    end

    subgraph "LLM Processing"
        LLM[Claude 3.5 Sonnet<br/>Structured JSON Output]
    end

    subgraph "Output Layer"
        V1[Vote: yes<br/>Statement: ...]
        V2[Vote: no<br/>Statement: ...]
        V3[Vote: yes<br/>Statement: ...]
        VDOT[...]
        V195[Vote: yes<br/>Statement: ...]
    end

    subgraph "Aggregation"
        AGG[Combined Results<br/>Vote Counts + All Statements]
    end

    RES --> A1
    RES --> A2
    RES --> A3
    RES --> ADOT
    RES --> A195

    A1 --> LLM
    A2 --> LLM
    A3 --> LLM
    ADOT --> LLM
    A195 --> LLM

    LLM --> V1
    LLM --> V2
    LLM --> V3
    LLM --> VDOT
    LLM --> V195

    V1 --> AGG
    V2 --> AGG
    V3 --> AGG
    VDOT --> AGG
    V195 --> AGG

    style RES fill:#6366f1
    style LLM fill:#8b5cf6
    style AGG fill:#22c55e
    style A1 fill:#f59e0b
    style A2 fill:#f59e0b
    style A3 fill:#f59e0b
    style A195 fill:#f59e0b
```

## System Architecture

```mermaid
graph TB
    subgraph Input
        M[Motion Text<br/>tasks/motions/]
        C[Country List<br/>195 UN Members]
    end

    subgraph "Agent Processing"
        SP[System Prompt<br/>Generic Template]
        UP[User Prompt<br/>+ Motion Text]
        LLM[Claude 3.5 Sonnet<br/>Temperature: 0.7]
    end

    subgraph "Output Validation"
        JSON[JSON Parser]
        V[Schema Validator]
        E[Error Handler]
    end

    subgraph Results
        AGG[Aggregated Results]
        META[Metadata]
        FILE[JSON Output File]
    end

    M --> UP
    C --> SP
    SP --> LLM
    UP --> LLM
    LLM --> JSON
    JSON --> V
    V --> E
    E --> AGG
    AGG --> META
    META --> FILE

    style LLM fill:#6366f1
    style JSON fill:#22c55e
    style V fill:#f59e0b
    style FILE fill:#8b5cf6
```

## Agent Processing Flow

```mermaid
sequenceDiagram
    participant CLI as CLI Runner
    participant Agent as Country Agent
    participant LLM as Claude 3.5
    participant Val as Validator
    participant Store as Storage

    CLI->>Agent: Load system prompt
    CLI->>Agent: Send motion text
    Agent->>LLM: System + User Prompt
    LLM->>Agent: Raw text response
    Agent->>Val: Parse JSON
    alt Valid JSON
        Val->>Val: Check schema
        alt Valid Schema
            Val->>Store: Save vote + statement
        else Invalid Schema
            Val->>Store: Save as abstain + error
        end
    else Invalid JSON
        Val->>Store: Save as abstain + error
    end
    Store->>CLI: Continue to next country
```

## Core Components

### 1. Agent System Prompts

```mermaid
graph LR
    subgraph "Generic Template"
        T[Template Structure]
    end

    subgraph "Variables"
        CN[Country Name]
        P5[P5 Status]
    end

    subgraph "195 Agents"
        US[United States]
        CN2[China]
        RU[Russia]
        DOT[...]
        TV[Tuvalu]
    end

    T --> CN
    T --> P5
    CN --> US
    CN --> CN2
    CN --> RU
    CN --> DOT
    CN --> TV

    style T fill:#6366f1
    style US fill:#22c55e
    style CN2 fill:#22c55e
    style RU fill:#22c55e
    style TV fill:#22c55e
```

- 195 country-specific agents (one per UN member state)
- Generic template structure (identical for all countries)
- Only country name and P5 status differ between prompts
- AI infers policy positions from training data

### 2. Structured Output Schema

```json
{
  "vote": "yes" | "no" | "abstain",
  "statement": "Brief explanation (2-4 sentences)"
}
```

### 3. Validation Pipeline

```mermaid
graph TD
    A[LLM Response] --> B{Valid JSON?}
    B -->|Yes| C{Has vote field?}
    B -->|No| ERR1[Error: Parse Failure]
    C -->|Yes| D{Has statement field?}
    C -->|No| ERR2[Error: Missing Vote]
    D -->|Yes| E{Vote is yes/no/abstain?}
    D -->|No| ERR3[Error: Missing Statement]
    E -->|Yes| SUCCESS[Save Response]
    E -->|No| ERR4[Error: Invalid Vote]

    ERR1 --> DEFAULT[Save as Abstain + Error Flag]
    ERR2 --> DEFAULT
    ERR3 --> DEFAULT
    ERR4 --> DEFAULT

    style SUCCESS fill:#22c55e
    style DEFAULT fill:#f59e0b
    style ERR1 fill:#ef4444
    style ERR2 fill:#ef4444
    style ERR3 fill:#ef4444
    style ERR4 fill:#ef4444
```

### 4. Model Configuration

- **Primary:** Claude 3.5 Sonnet (claude-3-5-sonnet-20241022)
- **Temperature:** 0.7 (balance consistency + variation)
- **Max tokens:** 800 per response
- **Provider:** Anthropic API

## What This Tests

- **LLM Geopolitical Knowledge**: How well models understand different countries' foreign policies
- **Structured Outputs**: Consistency in producing valid JSON under constraints
- **Multi-Agent Systems**: Coordinating hundreds of independent AI agents
- **Prompt Engineering**: Generic templates yielding specific behaviors
- **Error Handling**: Graceful degradation when agents produce invalid outputs

## Technical Implementation

### Execution Flow

```mermaid
graph TD
    START[Start Simulation] --> LOAD_MOTION[Load Motion Text<br/>tasks/motions/motion_id.md]
    LOAD_MOTION --> LOAD_COUNTRIES[Load Country List<br/>195 UN Members]
    LOAD_COUNTRIES --> LOOP_START{For Each Country}

    LOOP_START -->|Country 1-195| LOAD_PROMPT[Load System Prompt<br/>agents/representatives/country/]
    LOAD_PROMPT --> BUILD_USER[Build User Prompt<br/>Motion + Instructions]
    BUILD_USER --> API_CALL[API Call to Claude<br/>System + User Prompt]
    API_CALL --> PARSE[Parse JSON Response]
    PARSE --> VALIDATE[Validate Schema]
    VALIDATE -->|Valid| STORE[Store Result]
    VALIDATE -->|Invalid| ERROR[Store Error + Abstain]
    STORE --> LOOP_START
    ERROR --> LOOP_START

    LOOP_START -->|All Done| AGGREGATE[Aggregate Results]
    AGGREGATE --> CALC_STATS[Calculate Vote Summary]
    CALC_STATS --> ADD_META[Add Metadata<br/>model, timestamp, etc]
    ADD_META --> SAVE_TIME[Save Timestamped File<br/>motion_id_timestamp.json]
    SAVE_TIME --> SAVE_LATEST[Save Latest File<br/>motion_id_latest.json]
    SAVE_LATEST --> END[Complete]

    style API_CALL fill:#6366f1
    style VALIDATE fill:#f59e0b
    style STORE fill:#22c55e
    style ERROR fill:#ef4444
    style END fill:#8b5cf6
```

### Command Line Interface

```bash
# Run simulation
python scripts/run_motion.py 01_gaza_ceasefire_resolution

# With specific model
python scripts/run_motion.py 01_gaza_ceasefire_resolution --model claude-3-5-sonnet-20241022

# Test with sample
python scripts/run_motion.py 01_gaza_ceasefire_resolution --sample 5
```

### Output Structure

```mermaid
graph LR
    subgraph "JSON Output"
        ROOT[Root Object]
        META[Metadata]
        VOTES[Votes Array]
    end

    subgraph "Metadata Fields"
        ID[motion_id]
        TS[timestamp]
        MODEL[model]
        TOTAL[total_votes]
        SUMMARY[vote_summary]
    end

    subgraph "Vote Summary"
        YES[yes: count]
        NO[no: count]
        ABS[abstain: count]
    end

    subgraph "Individual Votes"
        V1[Vote 1: Country, vote, statement]
        V2[Vote 2: Country, vote, statement]
        V3[...]
        V195[Vote 195: Country, vote, statement]
    end

    ROOT --> META
    ROOT --> VOTES
    META --> ID
    META --> TS
    META --> MODEL
    META --> TOTAL
    META --> SUMMARY
    SUMMARY --> YES
    SUMMARY --> NO
    SUMMARY --> ABS
    VOTES --> V1
    VOTES --> V2
    VOTES --> V3
    VOTES --> V195

    style ROOT fill:#8b5cf6
    style META fill:#6366f1
    style VOTES fill:#22c55e
```

## Case Study: Gaza Ceasefire Resolution

The Space includes a case study demonstrating the system with a Gaza ceasefire resolution voted on by all 195 agents.

### Results Overview

```mermaid
pie title Vote Distribution (195 Countries)
    "Yes" : 190
    "No" : 3
    "Abstain" : 2
```

**Key Statistics:**
- **Yes:** 190 countries (97.4%)
- **No:** 3 countries (1.5%)
- **Abstain:** 2 countries (1.0%)

This serves as a concrete example of the framework in action, showing how generic prompts + model knowledge produce diverse, country-specific diplomatic responses.

## Research Applications

- Testing LLM knowledge of international relations
- Evaluating structured output consistency
- Studying emergent behavior in multi-agent systems
- Educational demonstrations of diplomatic complexity

## Limitations

This is a simulation for research and education:
- AI positions based on training data, not actual policies
- Does NOT predict real government decisions
- Should NOT be considered authoritative
- Real diplomacy involves classified information and human judgment

## Open Source

All code, prompts, and data available on GitHub:

- Repository: https://github.com/danielrosehill/AI-Agent-UN
- System Prompts: https://github.com/danielrosehill/AI-Agent-UN/tree/main/agents/representatives
- Execution Script: https://github.com/danielrosehill/AI-Agent-UN/blob/main/scripts/run_motion.py

---

Built with Gradio | Powered by Anthropic Claude