File size: 9,950 Bytes
c266ed1
f209cc2
 
c266ed1
f209cc2
c266ed1
 
 
 
 
 
8c1f582
24d65a0
 
f209cc2
8c1f582
f209cc2
8c1f582
f209cc2
8c1f582
f209cc2
 
 
 
 
8c1f582
3478ac2
8c1f582
3478ac2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8c1f582
f209cc2
 
 
 
8c1f582
3478ac2
 
f209cc2
 
 
 
 
 
8c1f582
3478ac2
8c1f582
3478ac2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8c1f582
f209cc2
8c1f582
f209cc2
 
 
 
 
8c1f582
f209cc2
8c1f582
3478ac2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8c1f582
f209cc2
 
 
8c1f582
f209cc2
 
8c1f582
f209cc2
 
 
8c1f582
3478ac2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8c1f582
f209cc2
8c1f582
3478ac2
 
 
 
 
 
 
 
 
 
 
 
 
8c1f582
f209cc2
8c1f582
f209cc2
8c1f582
f209cc2
 
 
 
8c1f582
f209cc2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8c1f582
c266ed1
8c1f582
f209cc2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
---
title: AI Agent UN - Multi-Agent Simulation Framework
emoji: 🏛️
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
---

![AI Agent UN Banner](images/banner.jpg)

# AI Agent United Nations: Multi-Agent Simulation Framework

A structured system for simulating international diplomatic decision-making using 195 AI agents with constrained JSON outputs.

## System Overview

This is an experimental framework demonstrating:
- **Multi-agent coordination** across 195 independent AI agents
- **Structured output constraints** with strict JSON schema validation
- **Generic prompt templates** producing country-specific behaviors
- **Task execution model** for running resolutions through all agents

### High-Level Concept

```mermaid
graph TB
    subgraph "Input Layer"
        RES[UN Resolution Text]
    end

    subgraph "Agent Layer - 195 Independent Agents"
        A1[Agent: USA<br/>System Prompt]
        A2[Agent: China<br/>System Prompt]
        A3[Agent: Russia<br/>System Prompt]
        ADOT[...]
        A195[Agent: Tuvalu<br/>System Prompt]
    end

    subgraph "LLM Processing"
        LLM[Claude 3.5 Sonnet<br/>Structured JSON Output]
    end

    subgraph "Output Layer"
        V1[Vote: yes<br/>Statement: ...]
        V2[Vote: no<br/>Statement: ...]
        V3[Vote: yes<br/>Statement: ...]
        VDOT[...]
        V195[Vote: yes<br/>Statement: ...]
    end

    subgraph "Aggregation"
        AGG[Combined Results<br/>Vote Counts + All Statements]
    end

    RES --> A1
    RES --> A2
    RES --> A3
    RES --> ADOT
    RES --> A195

    A1 --> LLM
    A2 --> LLM
    A3 --> LLM
    ADOT --> LLM
    A195 --> LLM

    LLM --> V1
    LLM --> V2
    LLM --> V3
    LLM --> VDOT
    LLM --> V195

    V1 --> AGG
    V2 --> AGG
    V3 --> AGG
    VDOT --> AGG
    V195 --> AGG

    style RES fill:#6366f1
    style LLM fill:#8b5cf6
    style AGG fill:#22c55e
    style A1 fill:#f59e0b
    style A2 fill:#f59e0b
    style A3 fill:#f59e0b
    style A195 fill:#f59e0b
```

## System Architecture

```mermaid
graph TB
    subgraph Input
        M[Motion Text<br/>tasks/motions/]
        C[Country List<br/>195 UN Members]
    end

    subgraph "Agent Processing"
        SP[System Prompt<br/>Generic Template]
        UP[User Prompt<br/>+ Motion Text]
        LLM[Claude 3.5 Sonnet<br/>Temperature: 0.7]
    end

    subgraph "Output Validation"
        JSON[JSON Parser]
        V[Schema Validator]
        E[Error Handler]
    end

    subgraph Results
        AGG[Aggregated Results]
        META[Metadata]
        FILE[JSON Output File]
    end

    M --> UP
    C --> SP
    SP --> LLM
    UP --> LLM
    LLM --> JSON
    JSON --> V
    V --> E
    E --> AGG
    AGG --> META
    META --> FILE

    style LLM fill:#6366f1
    style JSON fill:#22c55e
    style V fill:#f59e0b
    style FILE fill:#8b5cf6
```

## Agent Processing Flow

```mermaid
sequenceDiagram
    participant CLI as CLI Runner
    participant Agent as Country Agent
    participant LLM as Claude 3.5
    participant Val as Validator
    participant Store as Storage

    CLI->>Agent: Load system prompt
    CLI->>Agent: Send motion text
    Agent->>LLM: System + User Prompt
    LLM->>Agent: Raw text response
    Agent->>Val: Parse JSON
    alt Valid JSON
        Val->>Val: Check schema
        alt Valid Schema
            Val->>Store: Save vote + statement
        else Invalid Schema
            Val->>Store: Save as abstain + error
        end
    else Invalid JSON
        Val->>Store: Save as abstain + error
    end
    Store->>CLI: Continue to next country
```

## Core Components

### 1. Agent System Prompts

```mermaid
graph LR
    subgraph "Generic Template"
        T[Template Structure]
    end

    subgraph "Variables"
        CN[Country Name]
        P5[P5 Status]
    end

    subgraph "195 Agents"
        US[United States]
        CN2[China]
        RU[Russia]
        DOT[...]
        TV[Tuvalu]
    end

    T --> CN
    T --> P5
    CN --> US
    CN --> CN2
    CN --> RU
    CN --> DOT
    CN --> TV

    style T fill:#6366f1
    style US fill:#22c55e
    style CN2 fill:#22c55e
    style RU fill:#22c55e
    style TV fill:#22c55e
```

- 195 country-specific agents (one per UN member state)
- Generic template structure (identical for all countries)
- Only country name and P5 status differ between prompts
- AI infers policy positions from training data

### 2. Structured Output Schema

```json
{
  "vote": "yes" | "no" | "abstain",
  "statement": "Brief explanation (2-4 sentences)"
}
```

### 3. Validation Pipeline

```mermaid
graph TD
    A[LLM Response] --> B{Valid JSON?}
    B -->|Yes| C{Has vote field?}
    B -->|No| ERR1[Error: Parse Failure]
    C -->|Yes| D{Has statement field?}
    C -->|No| ERR2[Error: Missing Vote]
    D -->|Yes| E{Vote is yes/no/abstain?}
    D -->|No| ERR3[Error: Missing Statement]
    E -->|Yes| SUCCESS[Save Response]
    E -->|No| ERR4[Error: Invalid Vote]

    ERR1 --> DEFAULT[Save as Abstain + Error Flag]
    ERR2 --> DEFAULT
    ERR3 --> DEFAULT
    ERR4 --> DEFAULT

    style SUCCESS fill:#22c55e
    style DEFAULT fill:#f59e0b
    style ERR1 fill:#ef4444
    style ERR2 fill:#ef4444
    style ERR3 fill:#ef4444
    style ERR4 fill:#ef4444
```

### 4. Model Configuration

- **Primary:** Claude 3.5 Sonnet (claude-3-5-sonnet-20241022)
- **Temperature:** 0.7 (balance consistency + variation)
- **Max tokens:** 800 per response
- **Provider:** Anthropic API

## What This Tests

- **LLM Geopolitical Knowledge**: How well models understand different countries' foreign policies
- **Structured Outputs**: Consistency in producing valid JSON under constraints
- **Multi-Agent Systems**: Coordinating hundreds of independent AI agents
- **Prompt Engineering**: Generic templates yielding specific behaviors
- **Error Handling**: Graceful degradation when agents produce invalid outputs

## Technical Implementation

### Execution Flow

```mermaid
graph TD
    START[Start Simulation] --> LOAD_MOTION[Load Motion Text<br/>tasks/motions/motion_id.md]
    LOAD_MOTION --> LOAD_COUNTRIES[Load Country List<br/>195 UN Members]
    LOAD_COUNTRIES --> LOOP_START{For Each Country}

    LOOP_START -->|Country 1-195| LOAD_PROMPT[Load System Prompt<br/>agents/representatives/country/]
    LOAD_PROMPT --> BUILD_USER[Build User Prompt<br/>Motion + Instructions]
    BUILD_USER --> API_CALL[API Call to Claude<br/>System + User Prompt]
    API_CALL --> PARSE[Parse JSON Response]
    PARSE --> VALIDATE[Validate Schema]
    VALIDATE -->|Valid| STORE[Store Result]
    VALIDATE -->|Invalid| ERROR[Store Error + Abstain]
    STORE --> LOOP_START
    ERROR --> LOOP_START

    LOOP_START -->|All Done| AGGREGATE[Aggregate Results]
    AGGREGATE --> CALC_STATS[Calculate Vote Summary]
    CALC_STATS --> ADD_META[Add Metadata<br/>model, timestamp, etc]
    ADD_META --> SAVE_TIME[Save Timestamped File<br/>motion_id_timestamp.json]
    SAVE_TIME --> SAVE_LATEST[Save Latest File<br/>motion_id_latest.json]
    SAVE_LATEST --> END[Complete]

    style API_CALL fill:#6366f1
    style VALIDATE fill:#f59e0b
    style STORE fill:#22c55e
    style ERROR fill:#ef4444
    style END fill:#8b5cf6
```

### Command Line Interface

```bash
# Run simulation
python scripts/run_motion.py 01_gaza_ceasefire_resolution

# With specific model
python scripts/run_motion.py 01_gaza_ceasefire_resolution --model claude-3-5-sonnet-20241022

# Test with sample
python scripts/run_motion.py 01_gaza_ceasefire_resolution --sample 5
```

### Output Structure

```mermaid
graph LR
    subgraph "JSON Output"
        ROOT[Root Object]
        META[Metadata]
        VOTES[Votes Array]
    end

    subgraph "Metadata Fields"
        ID[motion_id]
        TS[timestamp]
        MODEL[model]
        TOTAL[total_votes]
        SUMMARY[vote_summary]
    end

    subgraph "Vote Summary"
        YES[yes: count]
        NO[no: count]
        ABS[abstain: count]
    end

    subgraph "Individual Votes"
        V1[Vote 1: Country, vote, statement]
        V2[Vote 2: Country, vote, statement]
        V3[...]
        V195[Vote 195: Country, vote, statement]
    end

    ROOT --> META
    ROOT --> VOTES
    META --> ID
    META --> TS
    META --> MODEL
    META --> TOTAL
    META --> SUMMARY
    SUMMARY --> YES
    SUMMARY --> NO
    SUMMARY --> ABS
    VOTES --> V1
    VOTES --> V2
    VOTES --> V3
    VOTES --> V195

    style ROOT fill:#8b5cf6
    style META fill:#6366f1
    style VOTES fill:#22c55e
```

## Case Study: Gaza Ceasefire Resolution

The Space includes a case study demonstrating the system with a Gaza ceasefire resolution voted on by all 195 agents.

### Results Overview

```mermaid
pie title Vote Distribution (195 Countries)
    "Yes" : 190
    "No" : 3
    "Abstain" : 2
```

**Key Statistics:**
- **Yes:** 190 countries (97.4%)
- **No:** 3 countries (1.5%)
- **Abstain:** 2 countries (1.0%)

This serves as a concrete example of the framework in action, showing how generic prompts + model knowledge produce diverse, country-specific diplomatic responses.

## Research Applications

- Testing LLM knowledge of international relations
- Evaluating structured output consistency
- Studying emergent behavior in multi-agent systems
- Educational demonstrations of diplomatic complexity

## Limitations

This is a simulation for research and education:
- AI positions based on training data, not actual policies
- Does NOT predict real government decisions
- Should NOT be considered authoritative
- Real diplomacy involves classified information and human judgment

## Open Source

All code, prompts, and data available on GitHub:

- Repository: https://github.com/danielrosehill/AI-Agent-UN
- System Prompts: https://github.com/danielrosehill/AI-Agent-UN/tree/main/agents/representatives
- Execution Script: https://github.com/danielrosehill/AI-Agent-UN/blob/main/scripts/run_motion.py

---

Built with Gradio | Powered by Anthropic Claude