The pricing structure for consists of two components: a platform subscription fee and a charge based on model usage, calculated on a per-token basis (referred to as the model usage charge). The cost of model usage remains uniform across all subscription tiers. We ensure pricing consistency for both LoRAs and their corresponding base models. Below is a list of the models currently supported by our platform:

  • Llama3-8B-Instruct
  • Llama3-70B-Instruct
  • Mixtral-8x7B-Instruct
  • Mixtral-8x22B-Instruct


+ model usage charge*
  • $10 life-time model usage credit
  • 10 request-per-minute inference rate limiting
  • 1 active LoRA deployment


+ model usage charge*
  • $100 monthly model usage credits*
  • 100 request-per-minute inference rate limiting
  • Up to 10 active LoRA deployments


+ model usage charge*
  • $250 monthly model usage credits*
  • 250 request-per-minute inference rate limiting
  • Up to 50 active LoRA deployments


  • Unlimited LoRAs
  • Unlimited rate limit
  • Custom base model
  • Private VPC deployment

Model Usage Price

Model sizePrice 1M input tokensPrice 1M output tokens
Up to 8B$0.2$0.2
8.1B to 16B (including Mixtral 8x7B)$0.5$0.5
16.1B and Up (including Mixtral 8x22B)$1.2$1.2

Frequently Asked Questions

What is model usage charge?
The Model Usage Charge is a fee that is determined by the usage of models, including LoRAs and base models. This charge is calculated according to the number of parameters in the base model, as detailed in the table provided above.
What is the difference between the life-time credit and monthly credit?
Monthly credits are reset at the beginning of each month, and any remaining balance will not be carried over. Meanwhile, lifetime credits never expire but also do not get refreshed.
How the request per minute inference rate limit is calculated?
RPM (requests per minute) is calculated based on the number of generation requests received on a per-minute basis and is reset every minute.
What is the billing cycle?
Billing cycle is based on the calendar month, from the 1st day of the month to the last day of the month. Invoices will be generated on the last day of each month and the first invoice will be prorated based on the subscription start date.
Where should I get started?
You can follow the quick start guide to deploy your first LoRA. Please contact us if you have any question.

Ready to start?

Deploy and serve your first fine-tuned LLM in 1 minute for free!

