Spaces:

danielrosehill
/

Agent-UN

Sleeping

App Files Files Community

Agent-UN / README.md

danielrosehill

Add banner image to README

24d65a0 2 months ago

preview code

raw

history blame contribute delete

9.95 kB

	---
	title: AI Agent UN - Multi-Agent Simulation Framework
	emoji: 🏛️
	colorFrom: blue
	colorTo: indigo
	sdk: gradio
	sdk_version: 4.44.0
	app_file: app.py
	pinned: false
	license: mit
	---

	![AI Agent UN Banner](images/banner.jpg)

	# AI Agent United Nations: Multi-Agent Simulation Framework

	A structured system for simulating international diplomatic decision-making using 195 AI agents with constrained JSON outputs.

	## System Overview

	This is an experimental framework demonstrating:
	- Multi-agent coordination across 195 independent AI agents
	- Structured output constraints with strict JSON schema validation
	- Generic prompt templates producing country-specific behaviors
	- Task execution model for running resolutions through all agents

	### High-Level Concept

	```mermaid
	graph TB
	subgraph "Input Layer"
	RES[UN Resolution Text]
	end

	subgraph "Agent Layer - 195 Independent Agents"
	A1[Agent: USA<br/>System Prompt]
	A2[Agent: China<br/>System Prompt]
	A3[Agent: Russia<br/>System Prompt]
	ADOT[...]
	A195[Agent: Tuvalu<br/>System Prompt]
	end

	subgraph "LLM Processing"
	LLM[Claude 3.5 Sonnet<br/>Structured JSON Output]
	end

	subgraph "Output Layer"
	V1[Vote: yes<br/>Statement: ...]
	V2[Vote: no<br/>Statement: ...]
	V3[Vote: yes<br/>Statement: ...]
	VDOT[...]
	V195[Vote: yes<br/>Statement: ...]
	end

	subgraph "Aggregation"
	AGG[Combined Results<br/>Vote Counts + All Statements]
	end

	RES --> A1
	RES --> A2
	RES --> A3
	RES --> ADOT
	RES --> A195

	A1 --> LLM
	A2 --> LLM
	A3 --> LLM
	ADOT --> LLM
	A195 --> LLM

	LLM --> V1
	LLM --> V2
	LLM --> V3
	LLM --> VDOT
	LLM --> V195

	V1 --> AGG
	V2 --> AGG
	V3 --> AGG
	VDOT --> AGG
	V195 --> AGG

	style RES fill:#6366f1
	style LLM fill:#8b5cf6
	style AGG fill:#22c55e
	style A1 fill:#f59e0b
	style A2 fill:#f59e0b
	style A3 fill:#f59e0b
	style A195 fill:#f59e0b
	```

	## System Architecture

	```mermaid
	graph TB
	subgraph Input
	M[Motion Text<br/>tasks/motions/]
	C[Country List<br/>195 UN Members]
	end

	subgraph "Agent Processing"
	SP[System Prompt<br/>Generic Template]
	UP[User Prompt<br/>+ Motion Text]
	LLM[Claude 3.5 Sonnet<br/>Temperature: 0.7]
	end

	subgraph "Output Validation"
	JSON[JSON Parser]
	V[Schema Validator]
	E[Error Handler]
	end

	subgraph Results
	AGG[Aggregated Results]
	META[Metadata]
	FILE[JSON Output File]
	end

	M --> UP
	C --> SP
	SP --> LLM
	UP --> LLM
	LLM --> JSON
	JSON --> V
	V --> E
	E --> AGG
	AGG --> META
	META --> FILE

	style LLM fill:#6366f1
	style JSON fill:#22c55e
	style V fill:#f59e0b
	style FILE fill:#8b5cf6
	```

	## Agent Processing Flow

	```mermaid
	sequenceDiagram
	participant CLI as CLI Runner
	participant Agent as Country Agent
	participant LLM as Claude 3.5
	participant Val as Validator
	participant Store as Storage

	CLI->>Agent: Load system prompt
	CLI->>Agent: Send motion text
	Agent->>LLM: System + User Prompt
	LLM->>Agent: Raw text response
	Agent->>Val: Parse JSON
	alt Valid JSON
	Val->>Val: Check schema
	alt Valid Schema
	Val->>Store: Save vote + statement
	else Invalid Schema
	Val->>Store: Save as abstain + error
	end
	else Invalid JSON
	Val->>Store: Save as abstain + error
	end
	Store->>CLI: Continue to next country
	```

	## Core Components

	### 1. Agent System Prompts

	```mermaid
	graph LR
	subgraph "Generic Template"
	T[Template Structure]
	end

	subgraph "Variables"
	CN[Country Name]
	P5[P5 Status]
	end

	subgraph "195 Agents"
	US[United States]
	CN2[China]
	RU[Russia]
	DOT[...]
	TV[Tuvalu]
	end

	T --> CN
	T --> P5
	CN --> US
	CN --> CN2
	CN --> RU
	CN --> DOT
	CN --> TV

	style T fill:#6366f1
	style US fill:#22c55e
	style CN2 fill:#22c55e
	style RU fill:#22c55e
	style TV fill:#22c55e
	```

	- 195 country-specific agents (one per UN member state)
	- Generic template structure (identical for all countries)
	- Only country name and P5 status differ between prompts
	- AI infers policy positions from training data

	### 2. Structured Output Schema

	```json
	{
	"vote": "yes" \| "no" \| "abstain",
	"statement": "Brief explanation (2-4 sentences)"
	}
	```

	### 3. Validation Pipeline

	```mermaid
	graph TD
	A[LLM Response] --> B{Valid JSON?}
	B -->\|Yes\| C{Has vote field?}
	B -->\|No\| ERR1[Error: Parse Failure]
	C -->\|Yes\| D{Has statement field?}
	C -->\|No\| ERR2[Error: Missing Vote]
	D -->\|Yes\| E{Vote is yes/no/abstain?}
	D -->\|No\| ERR3[Error: Missing Statement]
	E -->\|Yes\| SUCCESS[Save Response]
	E -->\|No\| ERR4[Error: Invalid Vote]

	ERR1 --> DEFAULT[Save as Abstain + Error Flag]
	ERR2 --> DEFAULT
	ERR3 --> DEFAULT
	ERR4 --> DEFAULT

	style SUCCESS fill:#22c55e
	style DEFAULT fill:#f59e0b
	style ERR1 fill:#ef4444
	style ERR2 fill:#ef4444
	style ERR3 fill:#ef4444
	style ERR4 fill:#ef4444
	```

	### 4. Model Configuration

	- Primary: Claude 3.5 Sonnet (claude-3-5-sonnet-20241022)
	- Temperature: 0.7 (balance consistency + variation)
	- Max tokens: 800 per response
	- Provider: Anthropic API

	## What This Tests

	- LLM Geopolitical Knowledge: How well models understand different countries' foreign policies
	- Structured Outputs: Consistency in producing valid JSON under constraints
	- Multi-Agent Systems: Coordinating hundreds of independent AI agents
	- Prompt Engineering: Generic templates yielding specific behaviors
	- Error Handling: Graceful degradation when agents produce invalid outputs

	## Technical Implementation

	### Execution Flow

	```mermaid
	graph TD
	START[Start Simulation] --> LOAD_MOTION[Load Motion Text<br/>tasks/motions/motion_id.md]
	LOAD_MOTION --> LOAD_COUNTRIES[Load Country List<br/>195 UN Members]
	LOAD_COUNTRIES --> LOOP_START{For Each Country}

	LOOP_START -->\|Country 1-195\| LOAD_PROMPT[Load System Prompt<br/>agents/representatives/country/]
	LOAD_PROMPT --> BUILD_USER[Build User Prompt<br/>Motion + Instructions]
	BUILD_USER --> API_CALL[API Call to Claude<br/>System + User Prompt]
	API_CALL --> PARSE[Parse JSON Response]
	PARSE --> VALIDATE[Validate Schema]
	VALIDATE -->\|Valid\| STORE[Store Result]
	VALIDATE -->\|Invalid\| ERROR[Store Error + Abstain]
	STORE --> LOOP_START
	ERROR --> LOOP_START

	LOOP_START -->\|All Done\| AGGREGATE[Aggregate Results]
	AGGREGATE --> CALC_STATS[Calculate Vote Summary]
	CALC_STATS --> ADD_META[Add Metadata<br/>model, timestamp, etc]
	ADD_META --> SAVE_TIME[Save Timestamped File<br/>motion_id_timestamp.json]
	SAVE_TIME --> SAVE_LATEST[Save Latest File<br/>motion_id_latest.json]
	SAVE_LATEST --> END[Complete]

	style API_CALL fill:#6366f1
	style VALIDATE fill:#f59e0b
	style STORE fill:#22c55e
	style ERROR fill:#ef4444
	style END fill:#8b5cf6
	```

	### Command Line Interface

	```bash
	# Run simulation
	python scripts/run_motion.py 01_gaza_ceasefire_resolution

	# With specific model
	python scripts/run_motion.py 01_gaza_ceasefire_resolution --model claude-3-5-sonnet-20241022

	# Test with sample
	python scripts/run_motion.py 01_gaza_ceasefire_resolution --sample 5
	```

	### Output Structure

	```mermaid
	graph LR
	subgraph "JSON Output"
	ROOT[Root Object]
	META[Metadata]
	VOTES[Votes Array]
	end

	subgraph "Metadata Fields"
	ID[motion_id]
	TS[timestamp]
	MODEL[model]
	TOTAL[total_votes]
	SUMMARY[vote_summary]
	end

	subgraph "Vote Summary"
	YES[yes: count]
	NO[no: count]
	ABS[abstain: count]
	end

	subgraph "Individual Votes"
	V1[Vote 1: Country, vote, statement]
	V2[Vote 2: Country, vote, statement]
	V3[...]
	V195[Vote 195: Country, vote, statement]
	end

	ROOT --> META
	ROOT --> VOTES
	META --> ID
	META --> TS
	META --> MODEL
	META --> TOTAL
	META --> SUMMARY
	SUMMARY --> YES
	SUMMARY --> NO
	SUMMARY --> ABS
	VOTES --> V1
	VOTES --> V2
	VOTES --> V3
	VOTES --> V195

	style ROOT fill:#8b5cf6
	style META fill:#6366f1
	style VOTES fill:#22c55e
	```

	## Case Study: Gaza Ceasefire Resolution

	The Space includes a case study demonstrating the system with a Gaza ceasefire resolution voted on by all 195 agents.

	### Results Overview

	```mermaid
	pie title Vote Distribution (195 Countries)
	"Yes" : 190
	"No" : 3
	"Abstain" : 2
	```

	Key Statistics:
	- Yes: 190 countries (97.4%)
	- No: 3 countries (1.5%)
	- Abstain: 2 countries (1.0%)

	This serves as a concrete example of the framework in action, showing how generic prompts + model knowledge produce diverse, country-specific diplomatic responses.

	## Research Applications

	- Testing LLM knowledge of international relations
	- Evaluating structured output consistency
	- Studying emergent behavior in multi-agent systems
	- Educational demonstrations of diplomatic complexity

	## Limitations

	This is a simulation for research and education:
	- AI positions based on training data, not actual policies
	- Does NOT predict real government decisions
	- Should NOT be considered authoritative
	- Real diplomacy involves classified information and human judgment

	## Open Source

	All code, prompts, and data available on GitHub:

	- Repository: https://github.com/danielrosehill/AI-Agent-UN
	- System Prompts: https://github.com/danielrosehill/AI-Agent-UN/tree/main/agents/representatives
	- Execution Script: https://github.com/danielrosehill/AI-Agent-UN/blob/main/scripts/run_motion.py

	---

	Built with Gradio \| Powered by Anthropic Claude