Train search agents on your corpus with reinforcement learning
Send us your corpus. We run the full KARL training pipeline—agentic data synthesis, OAPL reinforcement learning, parallel inference—and give you back a model that outperforms frontier LLMs on your data. Built on our open-source framework.
The managed service for knowledge agent training
We pair KARL's reinforcement learning pipeline with your document corpus. Within days, you get a trained model with side-by-side evals proving it outperforms standard RAG and frontier LLMs on your data.
Send us your corpus
SEC filings, internal docs, research papers, legal contracts—any text. We ingest, chunk, embed, and index it.
We run the full pipeline
Agentic QA synthesis generates training data from your docs. OAPL reinforcement learning trains the model on successful search strategies. Automatic GPU provisioning.
You get a trained agent
A LoRA adapter that turns any base model into a domain expert on your corpus. Benchmark report included. Deploy anywhere with vLLM.
Why not just RAG?
Standard RAG retrieves once and hopes for the best.
KONASH trains your model to search iteratively, reason across documents, and know when to search again.
The model doesn't memorize your documents. It learns how to search them—what queries to issue, when to refine, how to synthesize evidence from multiple sources. This generalizes to new questions and even new corpora.
The pipeline
- Agentic QA SynthesisExplores corpus, generates grounded questions
- OAPL TrainingOff-policy RL on successful search trajectories
- Token MaskingTrains search strategy, not text reproduction
- Parallel ThinkingN rollouts + aggregation at inference
- Value-Guided SearchLearned value model guides tree search
Open source framework. Managed infrastructure.
The full KONASH framework is Apache 2.0. The managed service handles the GPUs, orchestration, and optimization so you can focus on your data.
Open Source
FREERun the full pipeline yourself. pip install konash
- ✓Full training pipeline (synthesis → rollouts → OAPL)
- ✓CLI + Python API
- ✓Any Together AI model
- ✓Pre-built benchmark indexes
- ✓Apache 2.0 license
Managed Training
Everything in open source, plus we handle the infrastructure.
- ✓Automatic H100 provisioning across 20+ cloud providers
- ✓Managed training runs with live monitoring
- ✓Corpus analysis and training optimization
- ✓Side-by-side benchmark reports
- ✓Dedicated support, deployed in days not weeks
Results
GLM 4.5 Air on FinanceBench (150 SEC filing questions). The KARL paper reports 76% after RL training—KONASH implements the same pipeline.
| Mode | Accuracy |
|---|---|
| Base model (single rollout) | 48% |
| + Parallel Thinking (N=3) | 51% |
| + OAPL Training (2 iterations) | 76% |
Source: KARL paper (Databricks, 2026). Nugget-based evaluation, Appendix D.1.
Supported Models
- GLM 4.5 AirDefault. The KARL base model.
- Qwen3 80B-A3BMoE. Good value.
- Llama 3.3 70B TurboDense. General-purpose.
- DeepSeek R1Reasoning-focused.
- Any Together AI modelEnter any model ID.
Supported Corpora
- BrowseComp-Plus67K articles. Pre-built index.
- FinanceBench150 SEC filing questions.
- QAMPARI250K+ encyclopedic chunks.
- FreshStackTechnical documentation.
- Your documents.txt .md .pdf .json .html .py +
Updates
What we're building and shipping.
KONASH v0.4.0
RELEASEFull training pipeline: agentic QA synthesis, rollout generation, pass-rate filtering, OAPL training, parallel thinking inference. CLI with interactive setup wizard.
FinanceBench baseline
BENCHMARKGLM 4.5 Air achieves 48% single-rollout, 51% with parallel thinking (N=3). Establishing baseline for OAPL improvement.
Cloud GPU provisioning
INFRAAutomatic H100 provisioning via Shadeform across 20+ providers. OAPL gradient step costs ~$0.50.
KARL paper published
RESEARCHDatabricks publishes KARL: Knowledge Agents via Reinforcement Learning. RL-trained agents match or exceed frontier models on grounded reasoning. KONASH development begins.
Ready to train on your corpus?
Use the open-source framework yourself, or let us handle the infrastructure and deliver a trained model in days.