MALM-165M: Memory-Augmented Language Model

A 165M parameter Memory-Augmented Language Model (MALM) for semantic code search, trained on CodeParrot.

Quick Start

# Install dependencies
pip install mlx huggingface_hub numpy

# Download model
huggingface-cli download codelion/malm-165m --local-dir ./malm-165m

# Run semantic search
python malm-165m/inference.py --query "function that sorts a list"

Example output:

Query: function that sorts a list
------------------------------------------------------------

1. array_sort (score: 0.9526)
   Signature: array_sort(col)
   Docstring: Collection function: sorts the input array in ascending order...

2. sort_array (score: 0.7707)
   Signature: sort_array(col, asc)
   Docstring: Collection function: sorts the input array in ascending or descending order...

Python API

from huggingface_hub import snapshot_download
from pathlib import Path
import sys

# Download and import
model_path = snapshot_download("codelion/malm-165m")
sys.path.insert(0, model_path)

from inference import load_model, search_functions

# Load model
model, tokenizer, functions, config = load_model(Path(model_path))
print(f"Loaded {len(functions)} functions")

# Search
results = search_functions(
    model, tokenizer, functions,
    query="connect to database",
    top_k=5
)

for name, signature, docstring, score in results:
    print(f"{name}: {score:.4f}")

Model Description

MALM combines a transformer with learned memory retrieval for semantic code search:

  1. Query encoder - Encodes natural language queries into embeddings
  2. Value encoder - Encodes function signatures/docstrings
  3. Retrieval - Attention-based lookup from query to memory
  4. Memory bank - 2000 Python functions from CodeParrot

Why not mlx-lm?

MALM uses a memory-augmented architecture different from standard LLMs:

  • Separate query and value encoders for retrieval
  • Requires a memory bank of functions
  • Inference is retrieval-based, not autoregressive generation

This architecture doesn't fit mlx-lm generate, so we provide a custom inference script.

Architecture

Component Parameters
Embedding 11.1M
Position Embedding 0.1M
Query Encoder (4 layers) 28.4M
Value Encoder (4 layers) 28.4M
Decoder (12 layers) 85.1M
Output Projection 11.1M
Total ~165M

Configuration

{
  "vocab_size": 14407,
  "d_model": 768,
  "n_heads": 12,
  "n_layers": 12,
  "n_query_layers": 4,
  "max_seq_len": 128,
  "num_parameters": 165123656,
  "num_functions": 2000
}

Files

File Description
model.npz Model weights (MLX-compatible NumPy format)
config.json Model configuration
tokenizer.json Tokenizer vocabulary
functions.json Memory bank of 2000 Python functions
inference.py Standalone inference script

Training

Trained on CodeParrot with a focus on Python function retrieval:

  • Encodes natural language queries into embedding space
  • Learns semantic similarity between queries and function signatures
  • Uses attention-based retrieval over a memory bank

Related Work

Part of the HashHop project exploring long-context evaluation and memory-augmented architectures.

License

Apache 2.0

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train codelion/malm-165m