Kanon encoder family
Small, specialized legal encoder models — embedder, reranker, enricher, and classifier — each doing a single thing exceptionally well, with zero generative hallucinations.
The family
One family. Many specialists.
Kanon is a family of small yet highly accurate legal AI models for classifying, extracting information from, and assessing the similarity of legal documents — contracts, cases, legislation, textbooks, or anything else. Because every Kanon model is a non-generative encoder, the whole family is architecturally incapable of producing generative hallucinations.
Generations
Successive frontier models
Each generation expands context window and accuracy. Task-specific models are fine-tuned on top of one.
Kanon 3
SoonA single successor to every task-specific Kanon model — able to vectorize, classify, segment and enrich data all at once.
Kanon 2
A frontier legal encoder pretrained on billions of legal tokens, with a 16,384-token context window. Its shared representations power the Embedder, Reranker and Enricher.
Kanon
The original Isaacus legal encoder architecture, with a 512-token context window. It still powers the Answer Extractor and the Universal Classifier.
Task-specific fine-tunes
Modes
Each mode is a fine-tune of a generation’s shared encoder, specialized for one capability. These are the model IDs you call by name in the API.
Kanon 2 Embedder
An embedding model that turns legal text into vectors capturing meaning for semantic search, clustering, and RAG retrieval.
The Kanon 2 Embedder is top-of-the-line model, outperforming large frontier embedding models. It's the fastest of all commercial models on MLEB.
Thanks to its extreme parameter efficiency, Kanon 2 Embedder sets the new Pareto frontier in balancing inference time with legal information retrieval performance. Kanon 2 Embedder is also privacy and security friendly — none of your data is used to train our models by default — and can be self-hosted for enterprises with heightened security or reliability requirements.
Use Kanon 2 Embedder for transforming documents, code, and knowledge bases into vector representations optimized for retrieval and search.
Kanon 2 Reranker
A reranking model that reorders retrieved legal documents by their true relevance to a query.
Kanon 2 Reranker excels at scoring the relevance of laws, decisions, contracts, evidence, and other legal documents to queries. It's the most powerful reranker for legal RAG applications, being customised to suit this purpose.
When paired with our state-of-the-art legal embedding model, Kanon 2 Embedder, as a fast, affordable, and accurate first-stage retriever, Kanon 2 Reranker delivers the top legal information retrieval performance on Legal RAG Bench and better performance on the Massive Legal Embedding Benchmark (MLEB).
Uniquely, Kanon 2 Reranker boasts first-class support for documents of any length powered by our semchunk semantic chunking library.
Use Kanon 2 Reranker for ranking retrieved legal documents by semantic relevance, surfacing the most useful information for search, retrieval, and RAG applications.
Kanon 2 Enricher
An enrichment model that turns unstructured legal documents into structured knowledge graphs.
Kanon 2 Enricher belongs to an entirely new class of AI models known as hierarchical graphitization models, transforming unstructured documents of any length into rich, highly structured knowledge graphs with sub-second latency.
In addition to extracting entities referenced within documents it can also disambiguate entities and link them together, as well as fully deconstruct the structural hierarchy of documents.
Because it natively outputs knowledge graphs rather than tokens, Kanon 2 Enricher is architecturally incapable of producing the types of hallucinations suffered by general-purpose generative models. Its graph-first architecture is small enough to run locally on a consumer PC while still outperforming frontier LLMs.
Use Kanon 2 Enricher for turning unstructured legal documents into structured legal knowledge that powers research, compliance, due diligence, and intelligent legal applications.
Kanon Answer Extractor
An extractive question answering model that pulls answers straight from legal documents.
Kanon Answer Extractor excels at pulling out answers to questions from legal documents, whether it’s as simple as 'What is the governing law of this contract?' or as complex as 'What doctrine did the judge rely on in making their decision?'.
Because answers are taken directly and exclusively from users’ inputs, Kanon Answer Extractor is immune from the sort of hallucinations that typically plague generative models. It is also significantly faster, more precise, and otherwise cost-effective than generative models by virtue of being specialized for a single, domain-specific task.
Use Kanon Answer Extractor for extracting precise, source-grounded answers from contracts, legislation, case law, and evidence for legal research, review, and analysis.
Kanon Universal Classifier
A zero-shot classifier that labels legal documents with no training data required.
The world’s most accurate and efficient universal legal classifiers of their size, Kanon Universal Classifier and Kanon Universal Classifier Mini can take a statement like “this clause entitles one to terminate an agreement in the event of circumstances beyond their reasonable control” and evaluate it against thousands of documents in mere seconds, producing startlingly accurate confidence scores — no fine-tuning necessary.
Kanon and Kanon Mini punch far above their weight, achieving 6% and 12% better performance, respectively, than their closest general-purpose counterparts.
Use Kanon Universal Classifier and Kanon Universal Classifier Mini for building document tagging, compliance screening, and legal intelligence systems without training custom models.