MALM-165M: Memory-Augmented Language Model
A 165M parameter Memory-Augmented Language Model (MALM) for semantic code search, trained on CodeParrot.
Quick Start
# Install dependencies
pip install mlx huggingface_hub numpy
# Download model
huggingface-cli download codelion/malm-165m --local-dir ./malm-165m
# Run semantic search
python malm-165m/inference.py --query "function that sorts a list"
Example output:
Query: function that sorts a list
------------------------------------------------------------
1. array_sort (score: 0.9526)
Signature: array_sort(col)
Docstring: Collection function: sorts the input array in ascending order...
2. sort_array (score: 0.7707)
Signature: sort_array(col, asc)
Docstring: Collection function: sorts the input array in ascending or descending order...
Python API
from huggingface_hub import snapshot_download
from pathlib import Path
import sys
# Download and import
model_path = snapshot_download("codelion/malm-165m")
sys.path.insert(0, model_path)
from inference import load_model, search_functions
# Load model
model, tokenizer, functions, config = load_model(Path(model_path))
print(f"Loaded {len(functions)} functions")
# Search
results = search_functions(
model, tokenizer, functions,
query="connect to database",
top_k=5
)
for name, signature, docstring, score in results:
print(f"{name}: {score:.4f}")
Model Description
MALM combines a transformer with learned memory retrieval for semantic code search:
- Query encoder - Encodes natural language queries into embeddings
- Value encoder - Encodes function signatures/docstrings
- Retrieval - Attention-based lookup from query to memory
- Memory bank - 2000 Python functions from CodeParrot
Why not mlx-lm?
MALM uses a memory-augmented architecture different from standard LLMs:
- Separate query and value encoders for retrieval
- Requires a memory bank of functions
- Inference is retrieval-based, not autoregressive generation
This architecture doesn't fit mlx-lm generate, so we provide a custom inference script.
Architecture
| Component | Parameters |
|---|---|
| Embedding | 11.1M |
| Position Embedding | 0.1M |
| Query Encoder (4 layers) | 28.4M |
| Value Encoder (4 layers) | 28.4M |
| Decoder (12 layers) | 85.1M |
| Output Projection | 11.1M |
| Total | ~165M |
Configuration
{
"vocab_size": 14407,
"d_model": 768,
"n_heads": 12,
"n_layers": 12,
"n_query_layers": 4,
"max_seq_len": 128,
"num_parameters": 165123656,
"num_functions": 2000
}
Files
| File | Description |
|---|---|
model.npz |
Model weights (MLX-compatible NumPy format) |
config.json |
Model configuration |
tokenizer.json |
Tokenizer vocabulary |
functions.json |
Memory bank of 2000 Python functions |
inference.py |
Standalone inference script |
Training
Trained on CodeParrot with a focus on Python function retrieval:
- Encodes natural language queries into embedding space
- Learns semantic similarity between queries and function signatures
- Uses attention-based retrieval over a memory bank
Related Work
Part of the HashHop project exploring long-context evaluation and memory-augmented architectures.
License
Apache 2.0
- Downloads last month
- 16