Collections

Count tokens for texts

Count the number of tokens in the provided texts using the BGE M3 tokenizer. This is useful for checking if texts will fit within the embedding model's token limit (8,192 tokens per text) before sending them for embedding.

POST
/v2/collections/token-count
x-api-key<token>

Authenticate using API Key in request header

In: header

Request Body

application/json

texts*array<string>

List of texts to count tokens for

Response Body

application/json

curl -X POST "https://api.bem.ai/v2/collections/token-count" \  -H "Content-Type: application/json" \  -d '{    "texts": [      "Hello, world!",      "This is another text to count tokens for."    ]  }'
{
  "token_counts": [
    {
      "index": 0,
      "token_count": 4,
      "exceeds_limit": false,
      "char_count": 13
    },
    {
      "index": 1,
      "token_count": 10,
      "exceeds_limit": false,
      "char_count": 43
    }
  ],
  "total_tokens": 14,
  "max_token_limit": 8192,
  "texts_exceeding_limit": 0
}
Empty
Empty