8bit-threshold-computer

A Turing-complete CPU implemented entirely as threshold logic gates, with 8-bit and 32-bit ALU support.

Every logic gate is a threshold neuron: output = 1 if (Ξ£ wα΅’xα΅’ + b) β‰₯ 0 else 0

8-bit CPU:   8,290,134 params (full) / 32,397 params (pure ALU)
32-bit ALU:  202,869 params (1KB scratch memory)

What Is This?

A complete processor where every operationβ€”from Boolean logic to arithmetic to control flowβ€”is implemented using only weighted sums and step functions. No traditional gates.

Component 8-bit CPU 32-bit ALU
Registers 4 Γ— 8-bit N/A (pure computation)
Memory 0B–64KB configurable 1KB scratch
ALU 16 ops @ 8-bit ADD, SUB, MUL, DIV, CMP, bitwise, shifts
Precision 0–255 0–4,294,967,295
Flags Z, N, C, V Carry/overflow
Control Full ISA Stateless

Turing complete. The 8-bit CPU is verified with loops, conditionals, recursion, and self-modification. The 32-bit ALU extends arithmetic to practical ranges (0–4B) where 8-bit (0–255) is insufficient.


Execution Model

A self-contained, autonomous computational machine:

  • Pure tensor computation: State in, state out
  • Frozen circuits: Integer weights, Heaviside activation
  • ACT execution: Internal loop until HALT
  • No external orchestration: One forward pass = complete program execution
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚      Initial State          β”‚
            β”‚  [PC|Regs|Flags|Memory...]  β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚                             β”‚
            β”‚   Threshold Circuit Layer   β”‚
            β”‚                             β”‚
            β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
            β”‚  β”‚   Fetch: PC β†’ Instr   β”‚  β”‚
            β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”‚
            β”‚  β”‚   Decode: Opcode/Ops  β”‚  β”‚
            β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”‚
            β”‚  β”‚   Execute: ALU/Mem    β”‚  β”‚
            β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”‚
            β”‚  β”‚   Writeback: Results  β”‚  β”‚
            β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”‚
            β”‚  β”‚   PC Update           β”‚  β”‚
            β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
            β”‚              β”‚              β”‚
            β”‚         β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”         β”‚
            β”‚         β”‚ HALTED? β”‚         β”‚
            β”‚         β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜         β”‚
            β”‚              β”‚              β”‚
            β”‚         no ──┴── yes        β”‚
            β”‚          β”‚       β”‚          β”‚
            β”‚          β–Ό       β–Ό          β”‚
            β”‚       [loop]   [exit]       β”‚
            β”‚                             β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚       Final State           β”‚
            β”‚  [PC|Regs|Flags|Memory...]  β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Instruction Set

Opcode Mnemonic Operation
0x0 ADD R[d] = R[a] + R[b]
0x1 SUB R[d] = R[a] - R[b]
0x2 AND R[d] = R[a] & R[b]
0x3 OR R[d] = R[a] | R[b]
0x4 XOR R[d] = R[a] ^ R[b]
0x5 SHL R[d] = R[a] << 1
0x6 SHR R[d] = R[a] >> 1
0x7 MUL R[d] = R[a] * R[b]
0x8 DIV R[d] = R[a] / R[b]
0x9 CMP flags = R[a] - R[b]
0xA LOAD R[d] = M[addr]
0xB STORE M[addr] = R[s]
0xC JMP PC = addr
0xD Jcc PC = addr if cond (imm8[2:0]: 0=Z,1=NZ,2=C,3=NC,4=N,5=P,6=V,7=NV)
0xE CALL push PC; PC = addr
0xF HALT stop execution

Design Principles

  1. Autonomy: Machine runs without external logic
  2. Purity: forward(state) β†’ state', no side effects
  3. Transparency: All weights inspectable, all operations traceable
  4. Universality: Turing complete, runs arbitrary programs

Background

Threshold Logic

A threshold gate computes a Boolean function by taking a weighted sum of binary inputs and comparing it to a threshold. If the sum meets or exceeds the threshold, the output is 1; otherwise, 0. This can be expressed as a neuron with Heaviside step activation: output = 1 if (Ξ£ wα΅’xα΅’ + b) β‰₯ 0 else 0, where weights wα΅’ and bias b are integers.

Threshold gates are strictly more powerful than standard Boolean gates. A single threshold gate can compute any linearly separable Boolean functionβ€”this includes AND, OR, NAND, NOR, and many others that require multiple levels of conventional gates. Functions that are not linearly separable (such as XOR or parity) require multiple threshold gates arranged in layers.

Historical Development

Warren McCulloch and Walter Pitts introduced the threshold neuron model in 1943, proving that networks of such neurons could compute any Boolean function. This work preceded both the perceptron and modern neural networks, establishing the theoretical foundation for neural computation.

The 1960s saw significant development in threshold logic synthesis. Researchers including Saburo Muroga, Robert McNaughton, and Michael Dertouzos developed algebraic methods for determining whether a Boolean function could be implemented as a single threshold gate, and if so, how to calculate appropriate weights. This work produced systematic techniques for threshold gate design but focused on individual gates rather than complete systems.

Frank Rosenblatt's Mark I Perceptron (1957-1960) implemented threshold neurons in hardware using potentiometers for weights, but it was a pattern classifier that learned its weights through trainingβ€”the final weight configurations were not published. Bernard Widrow's ADALINE and MADALINE systems (1960-1963) similarly used adaptive threshold elements with weights learned via the LMS algorithm.

Hava Siegelmann and Eduardo Sontag proved in the 1990s that recurrent neural networks are Turing complete. Their construction, however, relied on continuous sigmoid activation functions with infinite precisionβ€”not the discrete step function used in threshold logic. Other theoretical work on neural Turing machines and differentiable computers followed similar patterns: proving computational universality using continuous, differentiable activations suitable for gradient-based training.

Neuromorphic Hardware

Modern neuromorphic processors implement large arrays of configurable threshold-like neurons in silicon:

Intel Loihi (2017) provides 128 neuromorphic cores with programmable synaptic weights, spike-based communication, and on-chip learning. The architecture supports integer weights and configurable neuron dynamics.

IBM TrueNorth (2014) integrates one million neurons and 256 million synapses in a 4096-core array. Each neurosynaptic core implements 256 neurons with configurable weights and thresholds. The chip was designed as an alternative to von Neumann architecture rather than an implementation of one.

BrainChip Akida (2021) targets edge deployment with event-based processing and integer weights. The architecture supports standard neural network operations mapped onto neuromorphic primitives.

SpiNNaker (University of Manchester) uses ARM processor cores to simulate spiking neural networks at scale. The platform has hosted various neural models but is simulation-based rather than native neuromorphic silicon.

Despite the availability of these platforms, published work has focused on neural network inference, sensory processing, and pattern recognition. A 2024 paper demonstrated basic logic gates, adders, and decoders on SpiNNaker and Dynap-SE1, describing this as "a first step toward the construction of a spiking computer"β€”the implementation lacked instruction fetch, program counter, memory systems, and control logic.

This Implementation

The weights in this repository implement a complete 8-bit computer: registers, ALU with 16 operations, status flags, conditional branching, subroutine calls, stack operations, and memory access. Every component is built from threshold neurons with integer weights. The weight configurations are published in safetensors format for direct loading and deployment.


Circuit Categories

Category Circuits Examples
Boolean 9 AND, OR, NOT, NAND, NOR, XOR, XNOR, IMPLIES, BIIMPLIES
Arithmetic 18 Half/full adder, 2/4/8-bit ripple carry, comparators
ALU 3 8-bit ALU, control decoder, flag computation
Combinational 10 MUX (2:1, 4:1, 8:1), DEMUX, encoders, decoders
Control Flow 16 JMP, conditional jumps, CALL, RET, PUSH, POP
Error Detection 11 Parity (XOR tree), checksum, CRC, Hamming
Modular 11 Divisibility by 2-12 (multi-layer for non-powers-of-2)
Threshold 13 k-of-n gates, majority, minority, exactly-k
Pattern 10 Popcount, leading/trailing ones, symmetry
Memory 3 N-bit addr decoder, 2^NΓ—8 read mux, write cells (configurable, packed)

Usage

import torch
from safetensors.torch import load_file

tensors = load_file("neural_computer8.safetensors")

def heaviside(x):
    return (x >= 0).float()

# AND gate: fires when both inputs are 1
w = tensors['boolean.and.weight']  # [1, 1]
b = tensors['boolean.and.bias']    # [-2]

for a, b_in in [(0,0), (0,1), (1,0), (1,1)]:
    inp = torch.tensor([a, b_in], dtype=torch.float32)
    out = heaviside(inp @ w + b)
    print(f"AND({a}, {b_in}) = {int(out.item())}")

State Tensor Layout

All multi-bit fields are MSB-first (index 0 is the most-significant bit).

[ PC[N] | IR[16] | R0[8] R1[8] R2[8] R3[8] | FLAGS[4] | SP[N] | CTRL[4] | MEM[2^N][8] ]

Where N = address bits (configurable: 0-16).

Flags are ordered as: Z, N, C, V.

Control bits are ordered as: HALT, MEM_WE, MEM_RE, RESERVED.

Memory Profile Addr Bits Memory Size State Bits
Full CPU 16 64KB 524,376
Reduced 12 4KB 32,856
Scratchpad 8 256B 2,104
Registers 4 16B 184
Pure ALU 0 0B 56

Instruction Encoding (16-bit)

All instruction bits are MSB-first.

15..12  11..10  9..8  7..0
opcode  rd      rs    imm8

Interpretation:

  • R-type: rd = rd op rs (imm8 ignored).
  • I-type: rd = op rd, imm8 (rs ignored).
  • Address-extended: LOAD, STORE, JMP, JZ, CALL consume the next word as a 16-bit address (big-endian). imm8 is reserved, and the PC skips 4 bytes when the jump is not taken.

Verification

python eval.py
python threshold_cpu.py

Verification Status

Category Status Notes
Boolean gates Exhaustively tested All 2^n input combinations
Arithmetic Exhaustively tested Full 8-bit range
ALU Exhaustively tested All operations, all inputs
Control flow Exhaustively tested Branch/jump conditions
Threshold Exhaustively tested k-of-n, majority, etc.
Modular (mod 3,5,6,7,9,10,11,12) Exhaustively tested Multi-layer, hand-constructed
Parity Exhaustively tested XOR tree, hand-constructed
Modular (mod 2,4,8) Exhaustively tested Single-layer, trivial

The modular arithmetic circuits for non-powers-of-2 and the parity circuits were hand-constructed because:

  • Divisibility by 3, 5, etc. is not linearly separable in binary
  • 8-bit parity (XOR of all bits) requires a tree of XOR gates

All circuits pass exhaustive testing over their full input domains.


Tensor Naming Convention

{category}.{circuit}[.{layer}][.{component}].{weight|bias}

Examples:
  boolean.and.weight
  boolean.xor.layer1.neuron1.weight
  arithmetic.ripplecarry8bit.fa7.ha2.sum.layer1.or.weight
  modular.mod5.layer2.eq3.weight
  error_detection.paritychecker8bit.stage2.xor1.layer1.nand.bias

Memory circuits are stored as packed tensors to keep the safetensors header size manageable
(e.g., `memory.addr_decode.weight`, `memory.read.and.weight`, `memory.write.and_old.weight`).

Hardware Compatibility

All weights are integers. All activations are Heaviside step. Designed for:

  • Intel Loihi β€” Neuromorphic research chip
  • IBM TrueNorth β€” 1M neuron chip
  • BrainChip Akida β€” Edge neuromorphic processor

LLM Integration

The threshold circuits can be embedded into transformer MLP layers to give LLMs exact arithmetic capability.

For LLM integration, use --memory-profile none to generate a pure ALU model (~32K params) without memory circuits.

Core Thesis

Standard LLMs fail at arithmetic because they're interpolatorsβ€”they approximate functions over training distributions rather than compute exact results. A 360M parameter model trained on internet text has seen "127 + 128 = 255" zero or few times, so it guesses based on pattern matching.

We solve this by embedding frozen, proven-correct arithmetic circuits directly into the transformer's MLP layers. The circuits use threshold logic (weighted sums + step activation), which is structurally compatible with neural network layers. We train only the interface layers that learn to:

  1. Extract operands from token embeddings
  2. Route computation through the circuits
  3. Inject results back into the residual stream

The model learns call dispatch, not arithmetic. The arithmetic is already solved.

Target Model: SmolLM2-360M-Instruct

We use HuggingFace's SmolLM2-360M-Instruct as our base model. See llm_integration/SMOLLM2_ARCHITECTURE.md for the complete technical analysis.

Property Value
Parameters 361.82M
Hidden Dimension 960 (matches extractor input)
Layers 32 transformer blocks
Attention 15 query heads, 5 KV heads (GQA)
MLP SwiGLU (960β†’2560β†’960)
Position Encoding RoPE (theta=100k, max 8192)

Key insight: The hidden dimension of 960 exactly matches our extractor requirementsβ€”no projection layer needed.

Tokenization: Digits are tokenized individually ("47 + 86" β†’ ['4', '7', ' +', ' ', '8', '6']), with digit token IDs following token_id = 32 + digit_value. This enables position-based operand extraction.

Hidden State Extraction: Layer 31 (final, pre-LM-head) provides well-normalized representations (std=1.34) ideal for bit extraction. All 33 hidden state outputs are available (embedding + 32 layers).

Architecture

Standard MLP block with parallel circuit path:

x ──┬── MLP path ────────────────┬── + ── output
    β”‚                            β”‚
    └── BitExtractor ── Circuit ─┴── BitInjector
                          β”‚
                       Router (learned weighting)

Augmented MLP forward pass:

def forward(x):  # x: [batch, seq, d_model=960]
    # Original MLP path (unchanged)
    mlp_out = self.down_proj(silu(self.gate_proj(x)) * self.up_proj(x))

    # Circuit path (new)
    a_bits, b_bits = self.bit_extractor(x)       # [batch, seq, 8] each
    result_bits, carry = self.circuits.add_8bit(a_bits, b_bits)
    flags = self.compute_flags(result_bits, carry)
    circuit_delta = self.bit_injector(result_bits, flags)

    # Routing
    route_weights = self.router(x)  # [batch, seq, 2] softmax

    # Combine
    return mlp_out + route_weights[..., 1:2] * circuit_delta

Threshold Logic Fundamentals

A threshold gate computes:

output = 1  if  (Ξ£ wα΅’xα΅’ + b) β‰₯ 0
         0  otherwise

Example gates:

AND: w=[1,1], b=-2
  AND(0,0) = H(-2) = 0
  AND(1,1) = H(0)  = 1

OR: w=[1,1], b=-1
  OR(0,1) = H(0) = 1
  OR(1,1) = H(1) = 1

XOR: requires 2 layers (not linearly separable)
  Layer 1: OR + NAND
  Layer 2: AND

Full adder = 2 half-adders + carry OR, ~4 threshold layers. 8-bit ripple carry = 8 chained full adders, ~32 threshold layers.

Interface Layers (Trainable)

Extractor β€” Extracts operands and operation from LLM hidden states:

class Extractor(nn.Module):
    """Attention pooling + per-bit extraction networks."""

    def __init__(self, hidden_dim=960):
        self.attention_pool = AttentionPooling(hidden_dim, num_heads=4)
        self.a_extractor = MultiHeadBitExtractor(hidden_dim)  # 8 separate bit networks
        self.b_extractor = MultiHeadBitExtractor(hidden_dim)
        self.op_router = nn.Sequential(
            nn.Linear(hidden_dim, 256), nn.GELU(),
            nn.Linear(256, 6)  # 6 operations
        )

    def forward(self, hidden_states, attention_mask):
        pooled = self.attention_pool(hidden_states, attention_mask)  # (batch, 960)
        a_bits, _ = self.a_extractor(pooled)  # (batch, 8)
        b_bits, _ = self.b_extractor(pooled)  # (batch, 8)
        op_logits = self.op_router(pooled)    # (batch, 6)
        return a_bits, b_bits, op_logits

MultiHeadBitExtractor β€” 8 specialized networks, one per bit:

class MultiHeadBitExtractor(nn.Module):
    def __init__(self, hidden_dim=960):
        self.bit_extractors = nn.ModuleList([
            nn.Sequential(nn.Linear(hidden_dim, 128), nn.GELU(), nn.Linear(128, 1))
            for _ in range(8)
        ])

    def forward(self, x):
        logits = torch.cat([ext(x) for ext in self.bit_extractors], dim=-1)
        soft = torch.sigmoid(logits)
        hard = heaviside_ste(logits)
        return hard - soft.detach() + soft, logits  # STE

AttentionPooling β€” Learns which token positions matter:

class AttentionPooling(nn.Module):
    """CLS-token style pooling with learned attention."""

    def __init__(self, hidden_dim=960, num_heads=4):
        self.cls_token = nn.Parameter(torch.randn(1, 1, hidden_dim) * 0.02)
        self.query = nn.Linear(hidden_dim, hidden_dim)
        self.key = nn.Linear(hidden_dim, hidden_dim)
        self.value = nn.Linear(hidden_dim, hidden_dim)

Trainable Parameters

For SmolLM2-360M (hidden_dim=960):

Component Parameters Description
AttentionPooling ~3.7M 4-head attention over sequence
MultiHeadBitExtractor (Γ—2) ~245K each 8 per-bit MLPs for A and B
OpRouter ~246K 960β†’256β†’6 MLP
Extractor Total ~4.4M Full extraction module

Alternative architectures:

  • PositionExtractor: ~1.5M (position-specific, no attention)
  • DigitExtractor: ~1.2M (predicts digits 0-9 instead of bits)

With --unfreeze_layers 4: Adds ~39.3M trainable params (top 4 transformer layers).

Gradient Flow

Heaviside has zero gradient almost everywhere. We use Straight-Through Estimator (STE):

class HeavisideSTE(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x):
        return (x >= 0).float()

    @staticmethod
    def backward(ctx, grad_output):
        return grad_output  # pass through unchanged

Training Strategy

  1. Data: Random 8-bit arithmetic problems (operands 0-255, 6 operations)
  2. Loss: Multi-component BCE + CE
    • result_loss: BCE on output bits vs expected
    • a_loss, b_loss: BCE on extracted bits vs ground truth (2Γ— weight)
    • op_loss: CE on operation classification
  3. Optimizer: AdamW, lr=3e-4, gradient clipping at 1.0
  4. Curriculum: Epoch-based range expansion (0-9 β†’ 0-99 β†’ 0-255)
  5. Batching: 256-4096 samples per batch (VRAM-dependent)
# Example training commands
python train.py --mode router --epochs 100                    # Sanity check
python train.py --mode llm --epochs 100 --batch_size 256      # Frozen LLM
python train.py --mode llm --unfreeze_layers 4 --batch_size 4096  # Fine-tune top layers

Inference

At inference, Heaviside is true step functionβ€”no approximation. If the Extractor correctly identifies operands, the circuit will output the correct result.

Target Performance

Condition Configuration Accuracy
Control Vanilla SmolLM2-360M 11.90%
Circuits only Ground truth bits 100.00%
Experimental LLM + Extractor + Circuits Target: 100%

The interface generalizes to all 65,536 8-bit additions once trainedβ€”no memorization, the circuits compute.

LLM Integration: Proof of Concept (In Progress)

Before proceeding with architectural extensions, we are validating the core thesis: that frozen threshold circuits can provide exact arithmetic capability to language models that otherwise fail at computation.

Baseline Evaluation

We evaluated SmolLM2-360M-Instruct on randomized 8-bit arithmetic using a generous answer extraction protocol. The model was prompted with a system message instructing it to output only numeric answers, and we accepted any correct number found in the output (first number, last number, or word-to-number conversion).

Operation SmolLM2-360M Accuracy Notes
Addition (A + B) 35.92% Best performance, still fails 2/3
Subtraction (A - B) 17.72% Poor handling of borrowing
Multiplication (A Γ— B) 1.25% Near-total failure
Greater Than (A > B) 14.37% Often echoes expression
Less Than (A < B) 4.31% Often echoes expression
Equality (A == B) 0.28% Near-total failure
Overall Fitness 11.90% 238/2000 correct

Methodology: 2000 randomized test cases with operands uniformly sampled from [0, 255]. Ground truth computed as 8-bit arithmetic (matching the threshold circuit specification). Batch size 64, greedy decoding (temperature=0).

Key Observations:

  • Multiplication accuracy (1.25%) is essentially random guessing over the output space
  • Comparison operations fail because the model often echoes the expression rather than evaluating it
  • Even additionβ€”the simplest operationβ€”fails nearly two-thirds of the time on 8-bit operands
  • Performance degrades sharply as operand magnitude increases (edge cases like 127+128 are almost never correct)

These results establish the control condition for our experiment.

Experimental Design

Condition Model Configuration Target Fitness
Control Vanilla SmolLM2-360M-Instruct 11.90% (measured)
Experimental SmolLM2-360M + Frozen ThresholdALU + Trained Interface 100%

The experimental condition adds:

  1. BitEncoder (trainable): Projects hidden states β†’ 24 bits (3 Γ— 8-bit operands)
  2. OpRouter (trainable): Selects which circuit to activate based on context
  3. BitDecoder (trainable): Projects 8-bit result β†’ hidden state delta
  4. ThresholdALU (frozen): The verified circuits from this repository

Training Signal: The fitness function itself. We do not provide answer supervisionβ€”the model must learn to correctly encode operands and route to circuits such that the frozen circuits produce correct outputs. This is possible because the circuits are proven correct; the interface layers need only learn the encoding/routing protocol.

Success Criterion: If the experimental condition achieves 100% fitness on randomized arithmetic while the control remains at ~12%, this demonstrates:

  1. The frozen threshold circuits provide exact computation
  2. Neural interface layers can learn to use discrete computational substrates
  3. Small language models can achieve perfect arithmetic via architectural augmentation rather than scale

Progress

Stage 1: Circuit Validation β€” COMPLETE

The frozen threshold circuits achieve 100% accuracy when given correctly formatted bit inputs:

Test Result
DirectCircuitModel (ground truth bits) 100.00% on 10,000 random cases
All operations (ADD, SUB, MUL, GT, LT, EQ) 100.00% each

This confirms the circuits compute correctly. However, this was already established by eval.py.

Stage 2: LLM Baseline β€” COMPLETE

SmolLM2-360M-Instruct baseline on randomized 8-bit arithmetic:

Operation Accuracy
Addition 35.92%
Subtraction 17.72%
Multiplication 1.25%
Comparisons 0.28–14.37%
Overall 11.90%

Head-to-head on 50 random cases: SmolLM2 got 7/50 (14%), circuits got 50/50 (100%).

Stage 3: LLM Integration β€” IN PROGRESS

The challenge: train an interface that extracts operands and operations from natural language (not from pre-formatted bit inputs).

"47 + 86"
    ↓
[Language Model / Extractor]
    ↓
[a_bits, b_bits, op_logits]
    ↓
[Frozen threshold circuits]
    ↓
[Result bits] β†’ 133

SmolLM2 Approach (llm_integration/):

Initial experiments used SmolLM2-360M-Instruct as the language understanding backbone.

Mode Description Status
--mode router Train OpRouter with ground truth bits 100% achieved
--mode interface Train BitEncoder + OpRouter Ready
--mode llm Train from LLM hidden states Explored

LLM Mode Options:

  • --unfreeze_layers N: Fine-tune top N transformer layers
  • --extract_layer N: Extract from intermediate layer (-1 = final)
  • --position_extract: Position-specific extraction (uses token positions)
  • --digit_pred: Predict digits (0-9) instead of bits

Extraction Architectures (model.py):

  • Extractor: Attention pooling + per-bit MLPs
  • PositionExtractor: Position-aware (operand A from positions 0-2, B from 5-7)
  • DigitExtractor: Predicts 3 digits per operand, converts to bits
  • HybridExtractor: Digit lookup + MLP fallback for word inputs

Curriculum Learning: Training progresses 0-9 β†’ 0-99 β†’ 0-255 over epochs.

Observations: SmolLM2 integration proved challengingβ€”360M parameters of pre-trained representations largely irrelevant to arithmetic parsing, high VRAM requirements, and gradient conflicts between frozen circuits and pre-trained weights.

Pivot: From-Scratch Extractor

Given that the task is fundamentally simpleβ€”parse (a, b, op) from structured textβ€”a lightweight purpose-built model may be more appropriate than adapting a general LLM.

"one thousand plus two thousand"
    ↓
[Char-level tokenizer: ~40 tokens]
    ↓
[Small transformer: ~1-5M params]
    ↓
[3 heads: a_value, b_value, op_idx]
    ↓
[Frozen 32-bit threshold circuits]
    ↓
3000

Design principles:

  • Minimal Python: All parsing logic learned in weights, not hardcoded
  • Character-level input: No word tokenization; model learns "forty seven" = 47
  • From-scratch training: No pre-trained weights to conflict with
  • 32-bit target: Practical arithmetic range (0–4,294,967,295)

Planned architecture:

  • Vocab: ~40 chars (a-z, 0-9, space, operators)
  • Embedding: 40 Γ— 128d
  • Encoder: 2-3 transformer layers
  • Output heads: a_classifier, b_classifier, op_classifier
  • Total: ~1-5M params (vs 360M for SmolLM2)

This approach treats the problem as what it is: a structured parsing task where the frozen circuits handle all computation. The extractor need only learn the mapping from text to operandsβ€”no world knowledge required.

Proof of Concept Scope

  • 32-bit operands (0–4,294,967,295)
  • Six operations: ADD, SUB, MUL, GT, LT, EQ
  • Structured input: Digits ("1000 + 2000") and number words ("one thousand plus two thousand")

Current Status:

  • Circuit validation: Complete (100% on 8-bit operations)
  • 32-bit circuits: Built and tested (adder verified on 1M+2M=3M, etc.)
  • LLM baseline: Measured (11.90% - establishes control condition)
  • SmolLM2 integration: Infrastructure complete, training explored
  • From-scratch extractor: Design phase

Extension Roadmap

Completed

  1. 32-bit operations (0–4,294,967,295) β€” Full 32-bit ALU implemented via --bits 32 flag:

    • 32-bit ripple carry adder (32 chained full adders) β€” verified
    • 32-bit subtractor (NOT + adder with carry-in)
    • 32-bit multiplication (1024 partial product ANDs)
    • 32-bit division (32 restoring stages)
    • 32-bit comparators (GT, LT, GE, LE, EQ)
    • 32-bit bitwise ops (AND, OR, XOR, NOT)
    • 32-bit shifts (SHL, SHR), INC, DEC, NEG

    Known issue: Single-layer 32-bit comparators use weights up to 2Β³ΒΉ, which exceeds float32 mantissa precision (24 bits). Comparisons between large numbers differing only in low bits may fail. Fix planned: cascaded byte-wise comparison (compare MSB first, if equal compare next byte, etc.).

  2. 3-operand addition (15 + 27 + 33 = 75) β€” arithmetic.add3_8bit chains two 8-bit ripple carry stages. 16 full adders, 144 gates, 240 test cases verified.

  3. Order of operations (5 + 3 Γ— 2 = 11) β€” arithmetic.expr_add_mul computes A + (B Γ— C) using shift-add multiplication then addition. 64 AND gates + 64 full adders, 73 test cases verified.

Planned

  1. Cascaded 32-bit comparators β€” Replace single-layer weighted comparison with multi-layer byte-wise cascade. Each byte comparison uses 8-bit weights (max 128), well within float32 precision. Hardware-accurate and extensible to 64-bit, 128-bit, etc.

  2. Parenthetical expressions ((5 + 3) Γ— 2 = 16) β€” Explicit grouping overrides precedence. Parser must recognize parens and build correct tree. Evaluation proceeds innermost-out.

  3. Multi-operation chains (a + b - c Γ— d) β€” Sequential dispatch through multiple circuits with intermediate result routing. Requires state management in interface layers.

  4. Floating point arithmetic β€” IEEE 754-style with separate circuits for mantissa and exponent. ADD: align exponents, add mantissas, renormalize. MUL: add exponents, multiply mantissas.

  5. Full CPU integration β€” Enable memory access circuits for stateful computation. Allows multi-step algorithms executed entirely within threshold logic.


Build Tool

Output filenames are auto-generated from configuration:

Format: neural_{alu|computer}{BITS}[_{MEMORY}].safetensors

Examples:
  neural_alu8.safetensors               # 8-bit, no memory
  neural_alu32.safetensors              # 32-bit, no memory
  neural_computer8.safetensors          # 8-bit, full memory (default)
  neural_computer32.safetensors         # 32-bit, full memory
  neural_computer8_small.safetensors    # 8-bit, 1KB memory
  neural_computer32_small.safetensors   # 32-bit, 1KB memory
  neural_computer8_addr12.safetensors   # 8-bit, custom 4KB (2^12 bytes)
# 8-bit CPU (default)
python build.py --apply all                                # -> neural_computer8.safetensors
python build.py -m none --apply all                        # -> neural_alu8.safetensors
python build.py -m scratchpad --apply all                  # -> neural_computer8_scratchpad.safetensors

# 16-bit ALU
python build.py --bits 16 --apply all                      # -> neural_computer16.safetensors
python build.py --bits 16 -m none --apply all              # -> neural_alu16.safetensors

# 32-bit ALU
python build.py --bits 32 -m small --apply all             # -> neural_computer32_small.safetensors
python build.py --bits 32 -m none --apply all              # -> neural_alu32.safetensors

# Custom address width
python build.py --bits 16 -a 6 --apply all                 # -> neural_computer16_addr6.safetensors

Bit widths (--bits):

Width Range Use Case
8 0–255 Full CPU, legacy
16 0–65,535 Extended arithmetic
32 0–4,294,967,295 Practical arithmetic

Memory profiles (-m):

Profile Size Addr Bits Filename Suffix Params Use Case
none 0B β€” (uses alu) ~32K Pure ALU
registers 16B 4 _registers ~34K Minimal state
scratchpad 256B 8 _scratchpad ~63K 8-bit scratch
small 1KB 10 _small ~123K 32-bit scratch
reduced 4KB 12 _reduced ~549K Small programs
full 64KB 16 (none) ~8.29M Full CPU

Custom address width (-a N): Memory size = 2^N bytes, suffix = _addrN


Citation

@misc{8bit-threshold-computer,
  title={8bit-threshold-computer: A Turing-Complete Threshold Logic CPU},
  author={Norton, Charles},
  year={2026},
  howpublished={Hugging Face},
  url={https://huggingface.co/phanerozoic/8bit-threshold-computer}
}

License

MIT


References

  1. McCulloch & Pitts (1943). "A Logical Calculus of Ideas Immanent in Nervous Activity"
  2. Muroga (1971). "Threshold Logic and Its Applications"
  3. Siegelmann & Sontag (1995). "On the Computational Power of Neural Nets"
  4. Bengio et al. (2013). "Estimating or Propagating Gradients Through Stochastic Neurons"
  5. Ma et al. (2024). "The Era of 1-bit LLMs" (BitNet b1.58)
  6. HuggingFace (2024). "SmolLM2: Small Language Models" β€” Model Card
  7. Vaswani et al. (2017). "Attention Is All You Need" β€” Transformer architecture
  8. Su et al. (2021). "RoFormer: Enhanced Transformer with Rotary Position Embedding" β€” RoPE
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support