Embedding
Search legal text by meaning, not just words
Overview
The retrieval layer for legal AI.
Embedding turns legal text into vectors that capture meaning. Use it to power semantic search, clustering, recommendations, document comparison, and RAG systems that can find the right clause, case, authority, or passage even when the user does not know the exact words to search for.
Why it matters
Legal retrieval that feels less brittle
Kanon 2 Embedder gives legal engineers and product builders a fast, accurate, and private way to connect messy legal questions with the right underlying text.
-
Find the law by meaning, not keywords
Turn cases, contracts, legislation, regulations, textbooks, and internal know-how into embeddings that retrieve the passages lawyers actually meant to find.
-
Fast enough for real products
Kanon 2 Embedder is 30% quicker than OpenAI’s fastest current-generation embedder and is the fastest commercial model on MLEB, so retrieval does not become the slow part of your workflow.
-
Built for confidential legal data
Your data is not used to train our models by default, and Kanon 2 Embedder can be self-hosted for teams with heightened security, reliability, or deployment requirements.
How teams use it
From prototype to production retrieval
Build the semantic layer behind legal search, research copilots, contract review tools, knowledge systems, and RAG applications.
Semantic legal search
Build search that understands legal meaning across matters, repositories, clauses, citations, issues, and authorities.
Keyword search is brittle in law. The same issue can be described with different phrases across cases, clauses, regulations, correspondence, and internal notes. Embeddings make those relationships searchable by meaning.
Kanon 2 Embedder turns legal text into vectors that can be stored in a vector database and compared against user queries, document chunks, or other passages. That gives legal engineers a strong retrieval layer for search, recommendations, clustering, and RAG.
For vibe coders, the core loop is simple: chunk your documents, embed the chunks, embed the query, retrieve the closest matches, and send the strongest context to the rest of your application.
Kanon 2 Embedder
The legal embedding model behind the capability, built to balance retrieval quality, speed, and deployment flexibility.
Kanon 2 Embedder ranks first on the Massive Legal Embedding Benchmark (MLEB), beating OpenAI’s Text Embedding 3 Large by 9% while also being 30% quicker than Text Embedding 3 Small.
Its parameter efficiency makes it practical for production retrieval systems where latency, cost, and accuracy all matter. You can use it to power semantic search, clustering, document comparison, and retrieval-augmented generation without defaulting to a general-purpose model.
It also fits real legal infrastructure. Kanon 2 Embedder can be used through Isaacus and can be self-hosted for enterprises with strict security, reliability, or data-residency requirements.
Private retrieval infrastructure
Use embeddings with legal data without giving up control over confidentiality, deployment, or reliability.
Legal products often start with sensitive material: client documents, filings, deal rooms, knowledge bases, contracts, regulatory guidance, and research work product. The retrieval layer has to respect that from day one.
Isaacus does not use your data to train its models by default. For teams with stricter requirements, Kanon 2 Embedder can also be self-hosted, including in enterprise environments where model inference must stay inside controlled infrastructure.
That makes embedding a safer foundation for legal AI systems: the model can sit close to the data, the data can stay where it belongs, and the product can still retrieve the right context when users ask messy legal questions.