Simple, transparent, and affordable legal AI.
The Isaacus API has no upfront costs and no hidden fees. You pay only for what you use.
Plans
Pay as you go
API / Cloud
Self-serve, per-token pricing. You pay only for what you use.
- No upfront costs and no hidden fees
- $100 in free credits for new users
- Per-token pricing — all amounts in USD
Advanced
Enterprise
Private deployments, volume discounts, or an alternative pricing model.
- Private air-gapped deployments
- Finetuning on your own data
- Azure on request
Per-token prices
Usage of our models is charged based on the number of tokens inputted into them. All amounts are in USD.
| Model | Price |
|---|---|
| kanon-2-embedder | $0.35 / 1M tokens $0.00000035 / token |
| kanon-2-reranker | $0.35 / 1M tokens $0.00000035 / token |
| kanon-universal-classifier | $1.00 / 1M tokens $0.000001 / token |
| kanon-answer-extractor | $1.50 / 1M tokens $0.0000015 / token |
| kanon-2-enricher | $3.50 / 1M tokens $0.0000035 / token |
These prices apply only to the cloud-hosted Isaacus API. The pricing for our air-gapped Amazon SageMaker models is publicly available on AWS Marketplace.
What each model does
-
Embedding
kanon-2-embedder
The most accurate legal embedding model on the Massive Legal Embedding Benchmark (MLEB).
-
Reranking
kanon-2-reranker
The most accurate legal reranker on Legal RAG Bench.
-
Universal classification
kanon-universal-classifier
Our most powerful universal classification model.
-
Extractive question answering
kanon-answer-extractor
Our base answer extractor, designed to balance precision with throughput.
-
Enrichment
kanon-2-enricher
The first enrichment and hierarchical graphitization model.
-
Open source
semchunk
Our semantic chunking algorithm is free and open-source. View on GitHub.
The fine print
Credits, and how we calculate the number of tokens you're charged for. Expand only what you need.
All new users to the Isaacus API receive $100 in free credits immediately upon adding a payment method to their Isaacus Platform account to be redeemed within two months.
Startups can also access US$50k in free credits for their first four months along with a 50% discount on model usage for the first twelve months by applying to the Isaacus Startup Program.
If you're an academic or non-profit, please reach out as we may be able to allocate additional credits for your use case.
These prices may change in the future but we will always notify you before new pricing goes into effect.
These prices do not include any taxes that we may be required to collect by law in your particular jurisdiction — such taxes will be applied automatically to your invoices.
Isaacus charges API calls based on the number of tokens that actually get inputted into a model, not necessarily the number of tokens inputted into an endpoint.
Note that the number of tokens that get inputted into a universal classifier may differ from the number of tokens inputted into an API endpoint, for example, due to the addition of boilerplate tokens, as explained below.
The first difference between the number of tokens inputted into an API endpoint and the number of tokens inputted into a model is that boilerplate tokens can be added to inputs after they are received by the API endpoint.
Boilerplate tokens are typically, but not always, used to structure inputs into whatever format that the model expects. The table below shows the number of boilerplate tokens that are added to inputs for each of our models.
| Model | Boilerplate tokens | Description |
|---|---|---|
| kanon-2-enricher | 2 | Inputs are formatted as <|startoftext|>{text}<|endoftext|>. |
| kanon-2-embedder | 2 – 13 | Queries use a retrieval query wrapper (13 tokens), documents a retrieval passage
wrapper (12 tokens), and all other texts <|startoftext|>{text}<|endoftext|> (2 tokens). |
| kanon-2-reranker | 3 | Queries are formatted alongside texts as query + text pairs. |
| kanon-answer-extractor | 3 | Queries are formatted alongside texts as query + text pairs. |
| kanon-universal-classifier | 3 | Statements are formatted alongside texts as text + statement pairs. |
When an input to an answer extractor or universal classifier is received that is longer than the maximum input length that a model can process in a single go, we will, unless chunking is disabled, automatically split that input up into smaller chunks and will then process each chunk separately.
We use our own semchunk algorithm to chunk inputs in such a way that they are unlikely to cut off in the middle of an important sentence or paragraph.
Although semchunk is a deterministic algorithm, it can still be difficult to predict exactly how many chunks will be created for any given input, due to the fact that the algorithm is designed to create chunks that are semantically meaningful as possible rather than to create chunks of a fixed size.
The default chunk size is the maximum input length of the model less overhead, which includes not only boilerplate tokens but also, if a model that takes a query as input is being used, the number of tokens in the longest statement in that query. For every chunk that is created, the number of tokens inputted into a model will increase by the number of tokens in that chunk in addition to the number of boilerplate tokens that are added to the input.
Additionally, if the chunks are being passed to a model that also takes a statement as input, such as a universal classifier, that statement will have to be added to each chunk, which will therefore increase the number of tokens inputted into the model by the number of tokens in the statement multiplied by the number of chunks. The use of a chunk overlap ratio will also increase the number of tokens and oftentimes the number of chunks being inputted to a model.
When using the Isaacus Query Language (IQL), the number of tokens inputted into a model is multiplied by the number of statements in your query. This is simply a consequence of the fact that each statement in your query has to be evaluated separately, with the results of each statement being combined to form the final output. If your query has only a single statement, the number of tokens inputted into a model will be no different to passing that statement with IQL disabled.
When you invoke an IQL template, that template will later be transformed into a model-specific, Isaacus-optimized query. That query will then be passed to the model and you will be charged for the number of tokens in that query. It is possible for a template's underlying queries to contain multiple statements, in which case the number of tokens inputted to a model will multiply by however many statements there are in a query. Currently, however, all of our templates use only a single statement.
The number of tokens and statements in a template's queries can change at any time without notice, so it is recommended that you check the template documentation for the most up-to-date information.
The following Python function can be used to approximate the number of tokens that will be inputted into a model typically within a margin of a couple dozen tokens though absolutely no warranties or guarantees are made as to its reliability.
import math
def approximate_number_of_input_tokens_for_input(
number_of_tokens_in_text: int,
number_of_boilerplate_tokens: int,
# Chunking-specific parameters (only applicable if chunking is enabled)
chunk_size: int | None = None,
chunk_overlap_ratio: float | None = None,
# Statement-specific parameters (only applicable if using IQL)
number_of_tokens_in_longest_statement: int | None = None,
average_number_of_tokens_in_statements: int | None = None,
number_of_statements: int | None = None,
) -> int:
if (
len(
{
number_of_tokens_in_longest_statement,
average_number_of_tokens_in_statements,
number_of_statements,
}
)
!= 1
):
raise ValueError("You can either provide all of the statement-specific parameters or none of them.")
elif number_of_tokens_in_longest_statement is None:
number_of_tokens_in_longest_statement = 0
average_number_of_tokens_in_statements = 0
number_of_statements = 1
if (chunk_size is None) != (chunk_overlap_ratio is None):
raise ValueError("You can either provide both chunk_size and chunk_overlap_ratio or neither of them.")
if chunk_size is None:
number_of_chunks = 1
else:
effective_chunk_size = chunk_size - number_of_boilerplate_tokens - number_of_tokens_in_longest_statement
number_of_chunks = math.ceil(number_of_tokens_in_text / effective_chunk_size) * (1 + chunk_overlap_ratio)
approximate_number_of_input_tokens_for_input = (
(
(number_of_tokens_in_text / number_of_chunks)
+ number_of_boilerplate_tokens
+ average_number_of_tokens_in_statements
)
* number_of_statements
* number_of_chunks
)
return approximate_number_of_input_tokens_for_input