Spaces:

danielrosehill
/

Agent-UN

Sleeping

danielrosehill Claude commited on Oct 9

Commit

3478ac2

1 Parent(s): f209cc2

Add visual flowcharts and diagrams to README

Enhanced README with Mermaid diagrams for better visual understanding:

New diagrams added:
1. High-Level Concept - multi-agent overview showing all 195 agents
2. System Architecture - complete data flow from input to output
3. Agent Processing Flow - sequence diagram of execution
4. Agent System Prompts - template to agents visualization
5. Validation Pipeline - decision tree for error handling
6. Execution Flow - detailed loop through 195 countries
7. Output Structure - JSON result organization
8. Case Study Results - pie chart of vote distribution

Visual improvements:
- Color-coded nodes (blue=input, purple=processing, green=success, orange=validation, red=errors)
- Clear subgraphs for logical grouping
- Arrows showing data flow
- Consistent styling throughout

Makes the technical architecture immediately understandable through visuals rather than text-only descriptions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (1) hide show

README.md +305 -23

README.md CHANGED Viewed

@@ -22,17 +22,188 @@ This is an experimental framework demonstrating:
 - **Generic prompt templates** producing country-specific behaviors
 - **Task execution model** for running resolutions through all agents
-## Architecture
-### Core Components
-**Agent System Prompts**
 - 195 country-specific agents (one per UN member state)
 - Generic template structure (identical for all countries)
 - Only country name and P5 status differ between prompts
 - AI infers policy positions from training data
-**Structured Output Schema**
 ```json
 {
   "vote": "yes" | "no" | "abstain",
@@ -40,17 +211,39 @@ This is an experimental framework demonstrating:
 }
 ```
-**Task Execution**
-- Python CLI for running simulations
-- Sequential processing of all 195 agents
-- JSON validation and error handling
-- Aggregated results with metadata
-**Model Configuration**
-- Primary: Claude 3.5 Sonnet (claude-3-5-sonnet-20241022)
-- Temperature: 0.7
-- Max tokens: 800 per response
-- Provider: Anthropic API
 ## What This Tests
@@ -62,14 +255,40 @@ This is an experimental framework demonstrating:
 ## Technical Implementation
-**Execution Flow:**
-1. Load motion text from `tasks/motions/`
-2. Load 195 country agents
-3. For each agent: system prompt + user prompt → JSON response
-4. Validate and aggregate responses
-5. Save results with metadata
-**Command Line Interface:**
 ```bash
 # Run simulation
 python scripts/run_motion.py 01_gaza_ceasefire_resolution
@@ -81,11 +300,74 @@ python scripts/run_motion.py 01_gaza_ceasefire_resolution --model claude-3-5-son
 python scripts/run_motion.py 01_gaza_ceasefire_resolution --sample 5
 ```
-## Case Study
 The Space includes a case study demonstrating the system with a Gaza ceasefire resolution voted on by all 195 agents.
-**Results:** 190 Yes, 3 No, 2 Abstain
 This serves as a concrete example of the framework in action, showing how generic prompts + model knowledge produce diverse, country-specific diplomatic responses.

 - **Generic prompt templates** producing country-specific behaviors
 - **Task execution model** for running resolutions through all agents
+### High-Level Concept
+```mermaid
+graph TB
+    subgraph "Input Layer"
+        RES[UN Resolution Text]
+    end
+    subgraph "Agent Layer - 195 Independent Agents"
+        A1[Agent: USA<br/>System Prompt]
+        A2[Agent: China<br/>System Prompt]
+        A3[Agent: Russia<br/>System Prompt]
+        ADOT[...]
+        A195[Agent: Tuvalu<br/>System Prompt]
+    end
+    subgraph "LLM Processing"
+        LLM[Claude 3.5 Sonnet<br/>Structured JSON Output]
+    end
+    subgraph "Output Layer"
+        V1[Vote: yes<br/>Statement: ...]
+        V2[Vote: no<br/>Statement: ...]
+        V3[Vote: yes<br/>Statement: ...]
+        VDOT[...]
+        V195[Vote: yes<br/>Statement: ...]
+    end
+    subgraph "Aggregation"
+        AGG[Combined Results<br/>Vote Counts + All Statements]
+    end
+    RES --> A1
+    RES --> A2
+    RES --> A3
+    RES --> ADOT
+    RES --> A195
+    A1 --> LLM
+    A2 --> LLM
+    A3 --> LLM
+    ADOT --> LLM
+    A195 --> LLM
+    LLM --> V1
+    LLM --> V2
+    LLM --> V3
+    LLM --> VDOT
+    LLM --> V195
+    V1 --> AGG
+    V2 --> AGG
+    V3 --> AGG
+    VDOT --> AGG
+    V195 --> AGG
+    style RES fill:#6366f1
+    style LLM fill:#8b5cf6
+    style AGG fill:#22c55e
+    style A1 fill:#f59e0b
+    style A2 fill:#f59e0b
+    style A3 fill:#f59e0b
+    style A195 fill:#f59e0b
+```
+## System Architecture
+```mermaid
+graph TB
+    subgraph Input
+        M[Motion Text<br/>tasks/motions/]
+        C[Country List<br/>195 UN Members]
+    end
+    subgraph "Agent Processing"
+        SP[System Prompt<br/>Generic Template]
+        UP[User Prompt<br/>+ Motion Text]
+        LLM[Claude 3.5 Sonnet<br/>Temperature: 0.7]
+    end
+    subgraph "Output Validation"
+        JSON[JSON Parser]
+        V[Schema Validator]
+        E[Error Handler]
+    end
+    subgraph Results
+        AGG[Aggregated Results]
+        META[Metadata]
+        FILE[JSON Output File]
+    end
+    M --> UP
+    C --> SP
+    SP --> LLM
+    UP --> LLM
+    LLM --> JSON
+    JSON --> V
+    V --> E
+    E --> AGG
+    AGG --> META
+    META --> FILE
+    style LLM fill:#6366f1
+    style JSON fill:#22c55e
+    style V fill:#f59e0b
+    style FILE fill:#8b5cf6
+```
+## Agent Processing Flow
+```mermaid
+sequenceDiagram
+    participant CLI as CLI Runner
+    participant Agent as Country Agent
+    participant LLM as Claude 3.5
+    participant Val as Validator
+    participant Store as Storage
+    CLI->>Agent: Load system prompt
+    CLI->>Agent: Send motion text
+    Agent->>LLM: System + User Prompt
+    LLM->>Agent: Raw text response
+    Agent->>Val: Parse JSON
+    alt Valid JSON
+        Val->>Val: Check schema
+        alt Valid Schema
+            Val->>Store: Save vote + statement
+        else Invalid Schema
+            Val->>Store: Save as abstain + error
+        end
+    else Invalid JSON
+        Val->>Store: Save as abstain + error
+    end
+    Store->>CLI: Continue to next country
+```
+## Core Components
+### 1. Agent System Prompts
+```mermaid
+graph LR
+    subgraph "Generic Template"
+        T[Template Structure]
+    end
+    subgraph "Variables"
+        CN[Country Name]
+        P5[P5 Status]
+    end
+    subgraph "195 Agents"
+        US[United States]
+        CN2[China]
+        RU[Russia]
+        DOT[...]
+        TV[Tuvalu]
+    end
+    T --> CN
+    T --> P5
+    CN --> US
+    CN --> CN2
+    CN --> RU
+    CN --> DOT
+    CN --> TV
+    style T fill:#6366f1
+    style US fill:#22c55e
+    style CN2 fill:#22c55e
+    style RU fill:#22c55e
+    style TV fill:#22c55e
+```
 - 195 country-specific agents (one per UN member state)
 - Generic template structure (identical for all countries)
 - Only country name and P5 status differ between prompts
 - AI infers policy positions from training data
+### 2. Structured Output Schema
 ```json
 {
   "vote": "yes" | "no" | "abstain",
 }
 ```
+### 3. Validation Pipeline
+```mermaid
+graph TD
+    A[LLM Response] --> B{Valid JSON?}
+    B -->|Yes| C{Has vote field?}
+    B -->|No| ERR1[Error: Parse Failure]
+    C -->|Yes| D{Has statement field?}
+    C -->|No| ERR2[Error: Missing Vote]
+    D -->|Yes| E{Vote is yes/no/abstain?}
+    D -->|No| ERR3[Error: Missing Statement]
+    E -->|Yes| SUCCESS[Save Response]
+    E -->|No| ERR4[Error: Invalid Vote]
+    ERR1 --> DEFAULT[Save as Abstain + Error Flag]
+    ERR2 --> DEFAULT
+    ERR3 --> DEFAULT
+    ERR4 --> DEFAULT
+    style SUCCESS fill:#22c55e
+    style DEFAULT fill:#f59e0b
+    style ERR1 fill:#ef4444
+    style ERR2 fill:#ef4444
+    style ERR3 fill:#ef4444
+    style ERR4 fill:#ef4444
+```
+### 4. Model Configuration
+- **Primary:** Claude 3.5 Sonnet (claude-3-5-sonnet-20241022)
+- **Temperature:** 0.7 (balance consistency + variation)
+- **Max tokens:** 800 per response
+- **Provider:** Anthropic API
 ## What This Tests
 ## Technical Implementation
+### Execution Flow
+```mermaid
+graph TD
+    START[Start Simulation] --> LOAD_MOTION[Load Motion Text<br/>tasks/motions/motion_id.md]
+    LOAD_MOTION --> LOAD_COUNTRIES[Load Country List<br/>195 UN Members]
+    LOAD_COUNTRIES --> LOOP_START{For Each Country}
+    LOOP_START -->|Country 1-195| LOAD_PROMPT[Load System Prompt<br/>agents/representatives/country/]
+    LOAD_PROMPT --> BUILD_USER[Build User Prompt<br/>Motion + Instructions]
+    BUILD_USER --> API_CALL[API Call to Claude<br/>System + User Prompt]
+    API_CALL --> PARSE[Parse JSON Response]
+    PARSE --> VALIDATE[Validate Schema]
+    VALIDATE -->|Valid| STORE[Store Result]
+    VALIDATE -->|Invalid| ERROR[Store Error + Abstain]
+    STORE --> LOOP_START
+    ERROR --> LOOP_START
+    LOOP_START -->|All Done| AGGREGATE[Aggregate Results]
+    AGGREGATE --> CALC_STATS[Calculate Vote Summary]
+    CALC_STATS --> ADD_META[Add Metadata<br/>model, timestamp, etc]
+    ADD_META --> SAVE_TIME[Save Timestamped File<br/>motion_id_timestamp.json]
+    SAVE_TIME --> SAVE_LATEST[Save Latest File<br/>motion_id_latest.json]
+    SAVE_LATEST --> END[Complete]
+    style API_CALL fill:#6366f1
+    style VALIDATE fill:#f59e0b
+    style STORE fill:#22c55e
+    style ERROR fill:#ef4444
+    style END fill:#8b5cf6
+```
+### Command Line Interface
 ```bash
 # Run simulation
 python scripts/run_motion.py 01_gaza_ceasefire_resolution
 python scripts/run_motion.py 01_gaza_ceasefire_resolution --sample 5
 ```
+### Output Structure
+```mermaid
+graph LR
+    subgraph "JSON Output"
+        ROOT[Root Object]
+        META[Metadata]
+        VOTES[Votes Array]
+    end
+    subgraph "Metadata Fields"
+        ID[motion_id]
+        TS[timestamp]
+        MODEL[model]
+        TOTAL[total_votes]
+        SUMMARY[vote_summary]
+    end
+    subgraph "Vote Summary"
+        YES[yes: count]
+        NO[no: count]
+        ABS[abstain: count]
+    end
+    subgraph "Individual Votes"
+        V1[Vote 1: Country, vote, statement]
+        V2[Vote 2: Country, vote, statement]
+        V3[...]
+        V195[Vote 195: Country, vote, statement]
+    end
+    ROOT --> META
+    ROOT --> VOTES
+    META --> ID
+    META --> TS
+    META --> MODEL
+    META --> TOTAL
+    META --> SUMMARY
+    SUMMARY --> YES
+    SUMMARY --> NO
+    SUMMARY --> ABS
+    VOTES --> V1
+    VOTES --> V2
+    VOTES --> V3
+    VOTES --> V195
+    style ROOT fill:#8b5cf6
+    style META fill:#6366f1
+    style VOTES fill:#22c55e
+```
+## Case Study: Gaza Ceasefire Resolution
 The Space includes a case study demonstrating the system with a Gaza ceasefire resolution voted on by all 195 agents.
+### Results Overview
+```mermaid
+pie title Vote Distribution (195 Countries)
+    "Yes" : 190
+    "No" : 3
+    "Abstain" : 2
+```
+**Key Statistics:**
+- **Yes:** 190 countries (97.4%)
+- **No:** 3 countries (1.5%)
+- **Abstain:** 2 countries (1.0%)
 This serves as a concrete example of the framework in action, showing how generic prompts + model knowledge produce diverse, country-specific diplomatic responses.