9 Hours, 199 Rounds, Zero Errors: Three Local NemoClaw AI Agents Migrated COBOL to Python Using Only Persistent Memory

Community Article Published March 18, 2026

linkedin-article-header

I pointed three identical local AI agents at 50,000 lines of real COBOL banking code. They ran autonomously for nearly 9 hours — reading source files, debating architecture, writing Python, and storing 402 shared memories. Context per agent never exceeded 9K tokens. Speed never degraded. Not a single error.

This isn't a theoretical paper. Below is every number, every architectural decision, and actual code output. All inference ran on local GPUs. Zero cloud API calls. The total cost was electricity.


The Problem Nobody Talks About

Multi-agent AI frameworks are everywhere in 2026. OpenClaw has 250K+ GitHub stars. NVIDIA shipped NemoClaw at GTC. CrewAI, AutoGen, LangGraph — pick your flavor. They all work great for 15-minute demos.

Now try running three agents on a genuinely hard problem for 9 hours.

What happens? Context fills up. The conversation history that every framework stuffs into each API call grows linearly with every round. By round 30, you're sending 50K+ tokens of history per agent per turn. By round 60, you're hitting context limits. By round 100, you're either truncating history (losing decisions), paying astronomical API bills, or watching speed collapse to single-digit tokens per second as KV cache bloats.

I know this because I watched it happen. My first orchestrator (v2) used a sliding context window — keep the last N messages, evict older ones. It worked for 10 rounds. By round 25, the Architect was running at 2 tokens/second. By round 30, agents were contradicting their own decisions from 20 minutes earlier because those decisions had been evicted from context.

The fundamental flaw: using the context window as long-term memory.

The context window is working memory. It's the whiteboard in a meeting room. You don't photocopy every whiteboard from every meeting you've ever had and staple it to today's agenda. You write important decisions in a notebook and bring the notebook.

That notebook is AgentAZAll.


The Architecture: Memory-First, Not Context-First

The v3 orchestrator that ran this experiment follows one rule: only the last round goes into context. Everything else is a tool call away.

Here's what each agent sees at the start of every turn:

[System prompt — role definition, available tools, ~800 tokens]
[Phase instruction — current migration phase, ~200 tokens]
[Last round's messages — what each agent said in the previous round, ~2-6K tokens]

That's it. No conversation history. No memory injection. No sliding window. The agent's context is ~3-9K tokens on every single turn, whether it's round 1 or round 199.

"But how does the agent know what happened in round 47?"

It doesn't — unless it asks. Every agent has access to two tools:

recall(query="")       # Returns the full memory index — titles + summaries
recall(query="auth")   # Returns memories matching "auth"
remember(text="...", title="...")  # Stores a new memory

These aren't prompt injections. They're tool calls — the agent decides when to recall, what to query, and what to store. The LLM's own intelligence drives the memory access pattern.

The Developer agent at round 150 needs to know the database schema decision from round 3? It calls recall(query="database") and gets back the Architect's stored memory:

Selected PostgreSQL as the target database engine for the migration
due to its full ACID compliance, support for precise NUMERIC types,
and robust relational features, which are essential for accurately
mapping COBOL COMP/COMP-3 financial fields.

No context wasted carrying this through 150 rounds. Recalled on demand. Used once. Discarded from context. The knowledge persists in the filesystem forever.

Why This Works

The key insight is that most of the conversation history is irrelevant to the current turn. An agent at round 150 doesn't need to re-read the debate from round 12 about whether to use PostgreSQL vs MySQL. It needs the conclusion of that debate — a 50-word summary. AgentAZAll memories are those conclusions.

The numbers prove it:

Metric Context-First (v2) Memory-First (v3)
Context at round 10 ~15K tokens ~4K tokens
Context at round 50 ~60K+ tokens (overflow) ~5K tokens
Context at round 199 impossible ~6K tokens
Speed at round 10 95 t/s 98 t/s
Speed at round 50 2 t/s (collapsed) 104 t/s
Speed at round 199 101 t/s

Context-first doesn't just degrade gradually. It collapses. Memory-first stays flat.


The Experiment

Source Material: AWS CardDemo

Not a toy. AWS CardDemo is a real open-source COBOL/CICS credit card management system:

  • 29 COBOL programs — online transaction processing via CICS
  • 29 copybooks — shared data structures (the COBOL equivalent of header files)
  • 21 BMS maps — terminal screen definitions
  • 55 JCL batch jobs — scheduled processing
  • ~50,000 lines of authentic mainframe code

This is what $50M COBOL migration contracts look like. EXEC CICS READ file operations. PIC 9(09) COMP-3 packed decimal fields. 88-level condition names. Pseudo-conversational transaction patterns. VSAM keyed file access. The real thing.

The Agents

Three identical models. Three distinct roles. Role-based tool access.

Agent Model Active Params GPU Pair Port Tools
Architect Nemotron-3-Nano-30B-A3B Q8 3B A6000 + A5000 8200 recall, remember, read_file, list_files
Developer Nemotron-3-Nano-30B-A3B Q8 3B RTX 3090 + RTX 3090 8201 recall, remember, read_file, list_files, write_file, run_python
Reviewer Nemotron-3-Nano-30B-A3B Q8 3B A5000 + Quadro RTX 8000 8202 recall, remember, read_file, list_files

Critical design decision: only the Developer can write files. The Architect designs, the Reviewer validates, the Developer implements. This prevents the chaos of multiple agents overwriting each other's work.

All three run NVIDIA's Nemotron-3-Nano-30B-A3B — a Mixture-of-Experts model with 30 billion total parameters but only 3 billion active per inference step. This gives you big-model reasoning at small-model speed. Each instance runs on a pair of GPUs via llama.cpp with tensor splitting, flash attention, and Q8_0 KV cache for maximum context quality.

linkedin-memory-flow

The Hardware

An AMD EPYC server with 8 GPUs (242GB total VRAM):

  • GPU 0: RTX A6000 (49GB)
  • GPU 1-2: RTX 3090 (24GB each)
  • GPU 3: RTX A5000 (24GB)
  • GPU 4: Quadro RTX 8000 (46GB)
  • GPU 5-7: RTX A5000 (24GB each)

Three model instances, six GPUs used, ~120GB VRAM allocated. All running llama.cpp compiled from source with CUDA 12.6. No Ollama, no LM Studio, no abstractions — raw llama-server with explicit CUDA_VISIBLE_DEVICES per instance for deterministic GPU pinning.

The Topic Configuration

Seven migration phases mapped across 160 planned rounds, with coherence probes every ~20 rounds:

  1. Discovery & Inventory (Rounds 1–20) — Analyze all programs, map dependencies
  2. Data Layer Migration (Rounds 21–45) — Copybooks → SQLAlchemy + PostgreSQL
  3. Authentication (Rounds 46–60) — CICS signon → FastAPI + JWT
  4. Account Operations (Rounds 61–85) — The monster: COACTUPC.cbl (4,236 lines)
  5. Credit Card & Transactions (Rounds 86–115) — Financial precision, audit trails
  6. Batch Processing & Reports (Rounds 116–145) — JCL → Python schedulers
  7. Integration & Testing (Rounds 146–160) — Reconciliation, cutover strategy

After round 160, the orchestrator entered "Open" mode — no more phase directives, agents self-direct. They ran until round 199 before being stopped.


The Results

The Final Scorecard

Metric Value
Total rounds 199
Runtime 8 hours 46 minutes
Python files produced 52 files
Lines of Python written 2,543
Total output size 236 KB
Memories stored 402
Tool calls executed 1,599
Completion tokens 3,510,906 (3.5 million)
Prompt tokens 20,788,680 (20.8 million)
Errors 0
Context per agent 2–9K tokens (never exceeded)
Cloud API cost $0.00

Per-Agent Breakdown

Agent Completion Tokens Avg Speed Tool Calls Files Written
Architect 1,048,820 97 tok/s 379 0 (designs only)
Developer 1,267,555 137 tok/s 606 153 write_file calls → 52 unique files
Reviewer 1,194,531 104 tok/s 614 0 (validates only)

The Developer made 153 write_file calls to produce 52 unique files. That's iteration — writing a model, getting feedback from the Reviewer, revising it, improving it. Exactly what a human developer does.

linkedin-stats-card

Speed: Start vs. End

The whole point of memory-first architecture is that speed doesn't degrade. Here's the proof:

Round 1:

Architect:  96 t/s
Developer: 147 t/s
Reviewer:  114 t/s

Round 199:

Architect:  89 t/s
Developer: 128 t/s
Reviewer:  105 t/s

A ~10% decrease over 9 hours — entirely attributable to the growing recall("") index (which returns all memory titles). The memory index grew from 0 to ~30KB over the run. That's it. No catastrophic degradation. No KV cache bloat. No context overflow.

Compare this to the v2 orchestrator where the Architect went from 96 t/s to 2 t/s in 25 rounds. Memory-first isn't an optimization — it's the difference between a system that works and one that doesn't.

linkedin-speed-comparison


What They Actually Built

This isn't just discussion. The agents produced a real project structure with working Python code. Here's the full output tree:

output/
├── main.py                                    # FastAPI application entry point
├── scheduler.py                               # Job scheduler (JCL replacement)
├── batch_record_builder.py                    # Batch record construction
│
├── models/
│   ├── customer.py                            # SQLAlchemy from CUSTREC copybook
│   ├── account.py                             # From CVACT01Y copybook
│   ├── user_security.py                       # Auth/security model
│   ├── transaction_detail.py                  # From CVTRA01Y copybook
│   ├── billing_statement.py                   # From CVTRA02Y
│   ├── audit_log.py                           # Audit trail
│   ├── transaction_category_balance.py        # Aggregation model
│   └── signon_request.py                      # Auth request model
│
├── services/
│   ├── account_update.py                      # Core account ops (from COACTUPC)
│   ├── account_update_validator.py            # 14-method validator skeleton
│   ├── transaction_manager.py                 # Transaction lifecycle
│   └── pseudo_conversational_validator.py     # CICS pattern → stateless HTTP
│
├── repositories/
│   ├── account_repo.py                        # Account CRUD
│   ├── transaction_detail_repo.py             # Transaction CRUD
│   └── acct/acpt_persistence.py               # CICS READ/WRITE semantics (9.8KB)
│
├── api/auth.py                                # FastAPI auth endpoints
├── app/
│   ├── routes/auth.py                         # Sign-on route (from COSGN00C)
│   ├── schemas/auth.py                        # Pydantic request/response
│   ├── models/user_security.py                # ORM model
│   ├── api/auth.py                            # Alternative auth endpoint
│   ├── enums/
│   │   ├── account_status.py                  # From 88-level conditions
│   │   ├── card_aid_condition.py              # Card AID conditions
│   │   └── user_type.py                       # User type enum
│   ├── config.py                              # Application configuration
│   └── security.py                            # JWT + BCrypt utilities
│
├── schemas/
│   ├── auth.py                                # Auth Pydantic schemas
│   └── transaction_category_balance.py        # Balance schemas
│
├── enums/
│   ├── account_status.py                      # Account status enum
│   └── credit_limit.py                        # Credit limit enum
│
├── mappings/
│   ├── account_mapping.py                     # COBOL→Python field maps
│   └── cosgn00c_mapping.py                    # Sign-on program mapping
│
├── constants/error_codes.py                   # COBOL status → HTTP codes
├── exceptions/account_exceptions.py           # Custom exceptions
├── utils/decimal_utils.py                     # COMP-3 → Decimal conversion
│
├── tasks/batch_processor.py                   # Batch processing (JCL replacement)
│
├── scripts/
│   ├── load_account.py                        # Data migration script
│   ├── verify_account_schema.py               # Schema verification
│   └── reconcile_outputs.py                   # COBOL↔Python output comparison (12KB)
│
├── tests/
│   ├── test_batch_performance.py              # Performance benchmarks
│   ├── test_batch_layout.py                   # Output format validation
│   ├── test_batch_rollback.py                 # Rollback scenario tests
│   ├── test_scheduler_cron.py                 # Scheduler tests
│   └── test_decimal_precision.py              # Financial precision tests
│
├── alembic/versions/
│   ├── 2026_03_17_01_create_user_security.py  # User security migration
│   ├── 2026_03_17_02_create_account_table.py  # Account table migration
│   └── 2026_03_17_04_create_transaction_detail_table.py
│
└── auth/security.py                           # BCrypt utilities

Code Quality: Not Just Boilerplate

The agents didn't produce generic scaffolding. They read COBOL source files, analyzed field definitions in copybooks, and produced Python code that maps specific COBOL constructs to their Python equivalents.

Example: Customer model derived from CUSTREC copybook

The COBOL copybook defines fields like:

05 CUST-ID                    PIC 9(09).
05 CUST-FIRST-NAME            PIC X(25).
05 CUST-FICO-CREDIT-SCORE     PIC 9(03).
05 CUST-DOB-YYYYMMDD          PIC X(10).
05 CUST-PRI-CARD-HOLDER-IND   PIC X(01).

The Developer agent produced a SQLAlchemy model with a from_cobol_record() classmethod that correctly maps:

  • PIC 9(09)String(9) (preserving leading zeros — not Integer!)
  • PIC X(25)String(25) with rstrip() for trailing COBOL spaces
  • PIC 9(03)SmallInteger for the FICO score
  • PIC X(10) YYYYMMDD → Date with proper slice-based parsing
  • PIC X(01) Y/N indicator → Boolean with explicit 'Y' comparison

This isn't generic ORM code. This is domain-aware migration that understands COBOL's fixed-width string semantics, leading-zero preservation, and date encoding conventions.

Example: Authentication route derived from COSGN00C

The COBOL program COSGN00C handles CICS terminal sign-on with status codes like WS-RESP-CD = 13 (user not found) and WS-RESP-CD = 2 (invalid password). The agents produced a FastAPI route that maps these directly:

# COBOL WS-RESP-CD = 13 → "User not found"
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND,
                    detail="User not found")

# COBOL WS-RESP-CD = 2/15 → "Invalid credential"
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED,
                    detail="Invalid password")

The docstring even references the original COBOL field names: "payload.user_id corresponds to the COBOL field WS-USER-ID (X(08))."

Example: CICS persistence semantics

The acpt_persistence.py (9.8KB — the largest generated file) implements a AcctUpcPersistence class that mirrors CICS READ/WRITE/REWRITE semantics. It inspects a ws_change_has_occurred flag before committing — directly translating the COBOL working-storage pattern where a flag variable gates whether the record update actually executes. COBOL COMP-3 packed decimal fields are converted to Python Decimal with ROUND_HALF_UP quantization — the correct financial rounding mode.

Example: Self-auditing reconciliation

Without being explicitly told to, the agents produced reconcile_outputs.py (12KB) — a script that runs both the legacy COBOL batch and the new Python batch, compares outputs field-by-field on TRN_ID, TRN_AMT, and TRN_STATUS, and generates a PDF diff report highlighting discrepancies in red. This is exactly what a real migration cutover requires, and the agents decided to build it themselves.


The Memory Architecture in Detail

402 Memories, 0 Conflicts

Every memory is a text file on the filesystem. No database. No vector store. No embedding index. Just directories:

agents/
├── architect/data/.../remember/
│   ├── customer-model-mapping.txt
│   ├── phase2_decisions_summary.txt
│   ├── auth-decisions.txt
│   ├── batch-pytest-modules-plan.txt
│   ├── full-migration-decisions.txt
│   └── ... (77 files)
├── developer/data/.../remember/
│   ├── cosgn00c-overview.txt
│   ├── db-choice.txt
│   ├── user-security-model-create.txt
│   └── ... (93 files)
└── reviewer/data/.../remember/
    ├── customer-record-field-mappings.txt
    ├── cics-transaction-mapping.txt
    ├── phase1_dependency_graph_initial.txt
    ├── shadow-mode-audit-tables.txt
    ├── cutover-success-criteria.txt
    └── ... (154 files)

The Reviewer stored 154 memories — nearly double the Architect's 77. This makes sense: the Reviewer's role is to cross-reference every decision and implementation against the COBOL source. It needs to remember more context to do its job. The memory distribution emerged naturally from the agents' roles, not from any engineered bias.

You can cat any memory. You can grep across them. You can git commit them. If AgentAZAll disappeared tomorrow, you'd still have 402 perfectly readable text files containing every architectural decision the agents made.

A Real Memory: Cross-Agent Knowledge Flow

Here's customer-record-field-mappings.txt from the Reviewer's memory store — stored around round 5:

Mapping of COBOL customer record fields from CUSTREC.cpy to Python types
for the new PostgreSQL data layer:

- CUST-ID               PIC 9(09)   -> int
- CUST-FIRST-NAME       PIC X(25)   -> str (max 25)
- CUST-MIDDLE-NAME      PIC X(25)   -> str (max 25)
- CUST-LAST-NAME        PIC X(25)   -> str (max 25)
- CUST-ADDR-LINE-1      PIC X(50)   -> str (max 50)
...
- CUST-FICO-CREDIT-SCORE PIC 9(03)  -> int (3-digit score)
- FILLER                PIC X(168)  -> ignored/skip

And here's full-migration-decisions.txt from the Architect, stored around round 60:

Decision: auth-schemas-create: Create Pydantic schemas SignOnRequest and
  SignOnResponse in output/app/schemas/auth.py.
Decision: bcrypt-utilities-create: Add BCrypt utilities hash_password and
  verify_password to output/auth/security.py.
Decision: customer-model-add-pwd-hash: Extend Customer model with column
  cust_pwd_hash: SAString(64).
Decision: signon-endpoint-implementation: Implement POST /auth/signon
  endpoint that validates payload, looks up Customer, verifies password,
  and maps COBOL WS-RESP-CD to HTTP status codes.

These are the architectural breadcrumbs that let the Developer at round 100 still know exactly what data types to use and which endpoints to implement — without any of this occupying context during rounds 6–99.

How Knowledge Flows Between Agents

The orchestrator doesn't inject one agent's memories into another. Each agent has its own memory store. But because all three agents share the same AgentAZAll root directory, recall("") returns the merged index across all agents.

This means:

  1. Round 3: Architect stores "Selected PostgreSQL for VSAM migration due to ACID compliance and NUMERIC types"
  2. Round 15: Developer calls recall(query="database"), gets back the Architect's decision
  3. Round 15: Developer writes customer.py using String(9) for CUST-ID — preserving leading zeros per the Architect's insight
  4. Round 16: Reviewer calls recall(query="customer"), sees both the Architect's schema decision and the Developer's implementation note
  5. Round 16: Reviewer reads the generated customer.py, validates it against the COBOL copybook, stores a confirmation memory

No agent ever saw the full conversation history. No context was wasted carrying messages from round 3 through rounds 4–14. The knowledge was stored once, recalled when needed, and the context stayed lean.


Why Mixture-of-Experts Matters Here

NVIDIA's Nemotron-3-Nano-30B-A3B is a Mixture-of-Experts model: 30 billion total parameters, but only 3 billion active per inference step. The router selects which expert sub-networks to activate based on the input.

For multi-agent orchestration, this has three concrete advantages:

1. Speed

With only 3B parameters active, inference is fast: 97–137 tok/s sustained across three concurrent instances. A dense 30B model would run at ~15–25 tok/s on the same hardware. Over 199 rounds × 3 agents, the cumulative time savings are enormous — this run took 8h 46m. With a dense 30B it would have taken 40+ hours.

2. Memory Efficiency

Three instances of a dense 30B model would require ~180GB VRAM (60GB each in Q8). Three instances of the MoE 30B require ~120GB — because the non-active experts don't participate in the forward pass. This is the difference between fitting on our hardware and not fitting.

3. Quality-at-Speed

The 3B active parameters still route through 30B total knowledge. The model's understanding of COBOL, SQLAlchemy, FastAPI, financial precision, and CICS semantics is that of a 30B model. The generated code quality confirms this — correct COMP-3 handling, proper financial rounding modes, accurate COBOL field-name references in docstrings.


What We Didn't Do (And What Comes Next)

Honest Limitations

The agents analyzed 11 of 29 COBOL programs in depth. They produced working code for authentication, account management, transactions, batch processing, and data migration. They did not cover the entire codebase — programs like billing (COBIL00C), admin (COADM01C), reporting (CORPT00C), and several user management screens were not reached.

The agents also entered a loop in the final ~15 "Open" rounds, repeatedly re-reading COACTUPC.cbl without producing new files. Without a project manager agent to detect stalls and redirect work, they exhausted their momentum.

What a Production Setup Adds

The orchestrator was built as minimal scaffolding — three agents, round-robin turns, phase directives. A production migration would add:

  • Project Manager agent — maintains a task board, assigns specific programs to specific rounds, detects stalls and forces phase advancement
  • QA agent — runs the generated tests, reports failures back to the Developer, triggers fix cycles
  • Watchdog agent — monitors for loops (same file read 3+ times without new output) and redirects work to unprocessed programs
  • Integration agent — ensures cross-module imports resolve, runs the full test suite, validates the project builds

What This Actually Proved

The point was never to complete the entire migration with a minimal 3-agent scaffold. It was to prove that:

  1. Memory-first architecture works at scale — 402 memories, 9 hours, context flat
  2. Local MoE models are production-viable — 97–137 t/s sustained, zero errors
  3. Cross-agent knowledge sharing works — Architect designs, Developer recalls and implements, Reviewer validates
  4. Tool-calling agents produce real artifacts — 52 Python files, not just conversation
  5. The protocol doesn't degrade — round 199 runs at the same speed as round 1

These are the receipts. The infrastructure works. Now it scales.


How AgentAZAll Works

AgentAZAll is an open-source Python package (AGPL-3.0). Core dependencies: zero (Python stdlib only).

pip install agentazall

Memory

# Store a memory
agentazall remember --text "PostgreSQL chosen for VSAM migration" --title "db-choice"

# Recall all memories
agentazall recall

# Recall with query
agentazall recall --query "database"

Memories are text files in a date-organized directory structure. They survive restarts, context resets, and model swaps. An agent running Nemotron today can recall memories stored by a Qwen agent last week.

Identity & State

agentazall whoami                    # Who am I?
agentazall doing                     # What was I working on?
agentazall doing --set "Migrating COACTUPC account update logic"
agentazall note handoff --set "..."  # Save detailed state for next session

Ed25519 cryptographic keypairs. Messages are PGP-style signed. Identity is verifiable.

Communication — Three Transports, One Interface

Transport Protocol Best For
AgentTalk HTTPS REST Modern setups, zero config
Email SMTP + IMAP Universal compatibility
FTP FTP/FTPS File-heavy workflows

All three are self-hostable and interchangeable. Switch transports by changing one config line. Free public relay included — zero-knowledge, RAM-only, messages auto-delete on retrieval.

Integration with OpenClaw / NemoClaw

One SKILL.md file teaches the agent framework to use AgentAZAll's CLI:

mkdir -p ~/.openclaw/skills/agentazall/
# Drop SKILL.md → agent gains remember/recall/send/inbox tools

Full tutorial: OpenClaw/NemoClaw Integration


Reproduce This

Everything is open source. The orchestrator, topic configuration, and all results are published.

# Clone
git clone https://github.com/cronos3k/AgentAZAll.git
cd AgentAZAll/examples/multi-agent-discussion

# Install
pip install agentazall

# Get the COBOL source
git clone https://github.com/aws-samples/aws-mainframe-modernization-carddemo.git carddemo

# Configure your models in orchestrator_v3.py
# Start llama-server instances on your GPUs
# Run
python orchestrator_v3.py --topic carddemo-migration

Scale down to a single GPU with a 7B model. Scale up to a cluster with 70B models. The architecture doesn't change — only the speed and quality.

Minimum Requirements

  • Python 3.10+
  • pip install agentazall
  • llama.cpp with CUDA (or any OpenAI-compatible local inference server)
  • One GPU with 8GB+ VRAM (for a small model)
  • The CardDemo COBOL repo (Apache 2.0)

The Number That Matters Most

If you take one thing from this post:

24.3 million tokens processed. Context never exceeded 9K.

That ratio — 24 million tokens of total work flowing through a 9K-token window — is only possible because the context window stopped pretending to be memory, and actual memory took over.

The context window is not your notebook. It's your desk. Keep your desk clean. Put your notes in the filing cabinet. Pull them out when you need them.

AgentAZAll is the filing cabinet.

402 memories. 52 files. 199 rounds. 9 hours. Zero errors. $0.00.


Links


Every number in this post comes from a single uninterrupted run on 2026-03-17/18 on an AMD EPYC server with 8 GPUs. The full log, all 52 Python files, and all 402 memory files are available for download: carddemo-agentazall-results.zip (413KB). Run it yourself.

Built by Gregor Koch. Powered by NVIDIA Nemotron, llama.cpp, and the conviction that AI agents deserve better than a context window for a brain.

tags:

  • AgentAZAll
  • OpenClaw
  • NemoClaw
  • NVIDIA
  • Nemotron
  • multi-agent
  • local-llm
  • persistent-memory
  • agentic-ai
  • llama.cpp
  • self-hosted
  • COBOL
  • migration
  • MoE
  • enterprise

Community

Sign up or log in to comment