My research focuses on deep reasoning with small language models, Transformer architecture innovation, and knowledge distillation for efficient alignment and transfer.
We deleted the Embedding Layer -- INTRO Our Collins-Embedding-3M NoesisLab/Collins-Embedding-3M Most "small" models are just giant vocab tables in a trench coat. Collins-3M changes that. By using 2-Universal Hashing and Chernoff-bound noise suppression, weโve collapsed the embedding space into a fixed O(1) hash-map. * STSB: 0.7114 (Beating many 100M+ models) * Size: 3M (Edge-ready, IoT-ready) * Tech: Randomized Sign-Hashing + RoPE positional injection. Built by NoesisLab
We deleted the Embedding Layer -- INTRO Our Collins-Embedding-3M NoesisLab/Collins-Embedding-3M Most "small" models are just giant vocab tables in a trench coat. Collins-3M changes that. By using 2-Universal Hashing and Chernoff-bound noise suppression, weโve collapsed the embedding space into a fixed O(1) hash-map. * STSB: 0.7114 (Beating many 100M+ models) * Size: 3M (Edge-ready, IoT-ready) * Tech: Randomized Sign-Hashing + RoPE positional injection. Built by NoesisLab
If you've ever tried to compare GPT-5.2 and Claude Opus 4.6 side by side, you've probably hit the same wall: the official Hugging Face leaderboard only tracks open-source models, so the most widely used AI systems simply aren't there. ALL Bench fixes that by bringing closed-source models, open-weight models, and โ uniquely โ all four teams under South Korea's national sovereign AI program into a single leaderboard. Thirty-one frontier models, one consistent scoring scale. Scoring works differently here too. Most leaderboards skip benchmarks a model hasn't submitted, which lets models game their ranking by withholding results. ALL Bench treats every missing entry as zero and divides by ten, so there's no advantage in hiding your weak spots. The ten core benchmarks span reasoning (GPQA Diamond, AIME 2025, HLE, ARC-AGI-2), coding (SWE-bench Verified, LiveCodeBench), and instruction-following (IFEval, BFCL). The standout is FINAL Bench โ the world's only benchmark measuring whether a model can catch and correct its own mistakes. It reached rank five in global dataset popularity on Hugging Face in February 2026 and has been covered by Seoul Shinmun, Asia Economy, IT Chosun, and Behind. Nine interactive charts let you explore everything from composite score rankings and a full heatmap to an open-vs-closed scatter plot. Operational metrics like context window, output speed, and pricing are included alongside benchmark scores. All data is sourced from Artificial Analysis Intelligence Index v4.0, arXiv technical reports, Chatbot Arena ELO ratings, and the Korean Ministry of Science and ICT's official evaluation results. Updates monthly.
Today weโre publicly releasing Kanon 2 Enricher, and with it, an entirely new class of AI model that weโre calling a hierarchical graphitization model. This is fundamentally different from both universal extraction models and generative models.
As a hierarchical graphitization model, Kanon 2 Enricher natively outputs a ๐ธ๐ป๐ผ๐๐น๐ฒ๐ฑ๐ด๐ฒ ๐ด๐ฟ๐ฎ๐ฝ๐ต rather than tokens, which makes it architecturally incapable of hallucinating or inventing text that wasnโt present in the input.
What that enables in practice is unlike any other model or ML architecture on the market:
โข ๐ก๐ผ ๐ต๐ฎ๐น๐น๐๐ฐ๐ถ๐ป๐ฎ๐๐ถ๐ผ๐ป๐ ๐ค It cannot hallucinate. All references and links are stored as spans, meaning exact character offsets anchored to the original text.
โข ๐๐ถ๐ฒ๐ฟ๐ฎ๐ฟ๐ฐ๐ต๐ถ๐ฐ๐ฎ๐น ๐๐ฒ๐ด๐บ๐ฒ๐ป๐๐ฎ๐๐ถ๐ผ๐ป, ๐ป๐ผ๐ ๐ท๐๐๐ ๐ฒ๐ ๐๐ฟ๐ฎ๐ฐ๐๐ถ๐ผ๐ป ๐ It deconstructs a documentโs full nested hierarchy, down to chapters, sections, clauses, schedules, signatures, and even singular sentences, and classifies each span with dozens of contextual features.
โข ๐๐ป๐๐ถ๐๐ ๐ฒ๐ ๐๐ฟ๐ฎ๐ฐ๐๐ถ๐ผ๐ป, ๐ฑ๐ถ๐๐ฎ๐บ๐ฏ๐ถ๐ด๐๐ฎ๐๐ถ๐ผ๐ป, ๐ฎ๐ป๐ฑ ๐น๐ถ๐ป๐ธ๐ถ๐ป๐ด ๐ It resolves what references actually point to, then links entities, citations, and cross-references into a single coherent graph.
โข ๐๐ฟ๐ฎ๐ฝ๐ต-๐ณ๐ถ๐ฟ๐๐ ๐ฒ๐ณ๐ณ๐ถ๐ฐ๐ถ๐ฒ๐ป๐ฐ๐ ๐โโก๏ธ Small enough to run locally on a consumer PC with sub-second latency, and it stays reliable on long documents where front
๐ฅ UPGRADE in Kai: 30B Scaling! ๐ฅ NoesisLab/Kai-30B-Instruct NoesisLab/Kai-30B-Instruct We are incredibly excited to announce that the Kai-30B-Instruct model and its official Space are now LIVE! ๐ If you've been following the journey from Kai-0.35B to Kai-3B, you know we're rethinking how models reason. Tired of verbose, slow Chain-of-Thought (CoT) outputs that flood your screen with self-talk? So are we. Kai-30B-Instruct scales up our Adaptive Dual-Search Distillation (ADS) framework. By bridging classical A* heuristic search with continuous gradient descent , we use an information-theoretic log-barrier to physically prune high-entropy reasoning paths during training. The result? Pure implicit reasoning. The model executes structured logic, arithmetic carries, and branch selections as a reflex in a single forward passโno external scaffolding required. At 3B, we observed a phase transition where the model achieved "logical crystallization". Now, at 30B, we are giving the ADS regularizer the massive representational capacity it needs to tackle higher-order symbolic abstractions and complex reasoning tasks. ๐งช Test Kai yourself in our new Space: NoesisLab/Kai-30B-Instruct ๐ฆ Model Weights: NoesisLab/Kai-30B-Instruct Bring your hardest math, logic, and coding benchmarks. We invite the community to stress-test the limits of the penalty wall! ๐งฑ๐ฅ
๐ฅ UPGRADE in Kai: 30B Scaling! ๐ฅ NoesisLab/Kai-30B-Instruct NoesisLab/Kai-30B-Instruct We are incredibly excited to announce that the Kai-30B-Instruct model and its official Space are now LIVE! ๐ If you've been following the journey from Kai-0.35B to Kai-3B, you know we're rethinking how models reason. Tired of verbose, slow Chain-of-Thought (CoT) outputs that flood your screen with self-talk? So are we. Kai-30B-Instruct scales up our Adaptive Dual-Search Distillation (ADS) framework. By bridging classical A* heuristic search with continuous gradient descent , we use an information-theoretic log-barrier to physically prune high-entropy reasoning paths during training. The result? Pure implicit reasoning. The model executes structured logic, arithmetic carries, and branch selections as a reflex in a single forward passโno external scaffolding required. At 3B, we observed a phase transition where the model achieved "logical crystallization". Now, at 30B, we are giving the ADS regularizer the massive representational capacity it needs to tackle higher-order symbolic abstractions and complex reasoning tasks. ๐งช Test Kai yourself in our new Space: NoesisLab/Kai-30B-Instruct ๐ฆ Model Weights: NoesisLab/Kai-30B-Instruct Bring your hardest math, logic, and coding benchmarks. We invite the community to stress-test the limits of the penalty wall! ๐งฑ๐ฅ