Embedding

Search legal text by meaning, not just words

Overview

The retrieval layer for legal AI.

Embedding turns legal text into vectors that capture meaning. Use it to power semantic search, clustering, recommendations, document comparison, and RAG systems that can find the right clause, case, authority, or passage even when the user does not know the exact words to search for.

Find the law by meaning, not keywords

Turn cases, contracts, legislation, regulations, textbooks, and internal know-how into embeddings that retrieve the passages lawyers actually meant to find.
Fast enough for real products

Kanon 2 Embedder is 30% quicker than OpenAI’s fastest current-generation embedder and is the fastest commercial model on MLEB, so retrieval does not become the slow part of your workflow.
Built for confidential legal data

Your data is not used to train our models by default, and Kanon 2 Embedder can be self-hosted for teams with heightened security, reliability, or deployment requirements.

Semantic legal search

Build search that understands legal meaning across matters, repositories, clauses, citations, issues, and authorities.

Keyword search is brittle in law. The same issue can be described with different phrases across cases, clauses, regulations, correspondence, and internal notes. Embeddings make those relationships searchable by meaning.
Kanon 2 Embedder turns legal text into vectors that can be stored in a vector database and compared against user queries, document chunks, or other passages. That gives legal engineers a strong retrieval layer for search, recommendations, clustering, and RAG.
For vibe coders, the core loop is simple: chunk your documents, embed the chunks, embed the query, retrieve the closest matches, and send the strongest context to the rest of your application.

Purpose-built for legal retrieval across cases, contracts, legislation, regulations, and other legal materials.

Read the embedding docs →
Build a legal RAG app →
Kanon 2 Embedder

The legal embedding model behind the capability, built to balance retrieval quality, speed, and deployment flexibility.

Kanon 2 Embedder ranks first on the Massive Legal Embedding Benchmark (MLEB), beating OpenAI’s Text Embedding 3 Large by 9% while also being 30% quicker than Text Embedding 3 Small.
Its parameter efficiency makes it practical for production retrieval systems where latency, cost, and accuracy all matter. You can use it to power semantic search, clustering, document comparison, and retrieval-augmented generation without defaulting to a general-purpose model.
It also fits real legal infrastructure. Kanon 2 Embedder can be used through Isaacus and can be self-hosted for enterprises with strict security, reliability, or data-residency requirements.
- Kanon
Ranks first on the Massive Legal Embedding Benchmark (MLEB).

Read Introducing Kanon 2 Embedder →
Read the embedding docs →
Private retrieval infrastructure

Use embeddings with legal data without giving up control over confidentiality, deployment, or reliability.

Legal products often start with sensitive material: client documents, filings, deal rooms, knowledge bases, contracts, regulatory guidance, and research work product. The retrieval layer has to respect that from day one.
Isaacus does not use your data to train its models by default. For teams with stricter requirements, Kanon 2 Embedder can also be self-hosted, including in enterprise environments where model inference must stay inside controlled infrastructure.
That makes embedding a safer foundation for legal AI systems: the model can sit close to the data, the data can stay where it belongs, and the product can still retrieve the right context when users ask messy legal questions.
- Kanon
Available for teams that need private, self-hosted, or enterprise-controlled deployment.

Read about Isaacus on AWS Marketplace →
Read the embedding docs →

Embedding

The retrieval layer for legal AI.

Legal retrieval that feels less brittle

Find the law by meaning, not keywords

Fast enough for real products

Built for confidential legal data

From prototype to production retrieval

Semantic legal search

Kanon 2 Embedder

Private retrieval infrastructure