scx.ai logo

Service Token Units (STUs)

One simple unit for every AI workload.

STUs unify usage across LLM tokens, speech-to-text, text-to-speech, and embeddings—so you can plan, compare, and scale with a single meter. Priced for high-throughput, low-latency inference on SCX.ai's sovereign, energy-efficient AI factory.

What is an STU?

An STU (Service Token Unit) is our common billing unit across AI workloads. Instead of juggling separate meters for text tokens, audio minutes, and vectors, you consume STUs. Each workload has a published conversion—so you can mix models and modalities while keeping your budget predictable.

STUs also power clear plan limits: every plan includes a pool of STUs. If you need more, you add top-ups. If you prefer not to top-up, we throttle (we don't silently overcharge).

How STUs map to workloads

These are the current public conversions on SCX.ai. (We review them as models evolve to keep pricing fair and simple.)

Workload (band/type)Unit measuredSTU per unitWhat 1 STU buys
LLM – Band-L (efficient/light models)1M text tokens1.00 STU1.0M tokens
LLM – Band-S (standard/70B-class)1M text tokens2.00 STU0.5M tokens
LLM – Band-P (premium/large or MoE)1M text tokens8.00 STU0.125M tokens
STT – Realtime1 hour audio0.90 STU~1.11 h
STT – Batch (Standard)1 hour audio0.40 STU2.50 h
STT – Batch (Turbo)1 hour audio0.10 STU10.0 h
TTS – Standard1M characters0.25 STU4.0M chars
TTS – Neural1M characters1.60 STU0.625M chars
TTS – Expressive1M characters5.00 STU0.20M chars
Embeddings – Small1M tokens0.033 STU~30.3M tokens
Embeddings – Large1M tokens0.217 STU~4.61M tokens
Bands primer
  • Band-L: small/efficient models for high-QPS apps.

  • Band-S: ~70B-class general models (balanced quality/cost).

  • Band-P: larger or advanced models (e.g., MoE or frontier-class). Band-P is available on Growth (with unlock) and Enterprise.

Why STUs?

Simplify your AI billing and planning

Predictable
One meter across modalities means simple forecasts and clean CFO conversations. Plans include an STU pool; if you exceed it, add top-ups. Prefer not to? We throttle rather than auto-overcharge.
Fair
Workloads consume proportionally to the compute they require. A heavier model or neural TTS uses more STUs than a lightweight model or batch pipeline—which keeps economics aligned to real costs.
Flexible
Switch models or mix workloads without changing SKUs. As your application evolves (LLM + STT + TTS + embeddings), STUs let you keep a single budget line.

Quick example

Sample monthly usage

50M Band-L tokens, 5M Band-S tokens, 100h STT Batch, 2M TTS chars, 20M embeddings

Band-L:50M × 1.00/1M
50.00 STU
Band-S:5M × 2.00/1M
10.00 STU
STT Batch Std:100 h × 0.40
40.00 STU
TTS Std:2M × 0.25/1M
0.50 STU
Embeddings Small:20M × 0.033/1M
0.66 STU
Total≈ 101.16 STU

Recommendation: Starter includes 400 STU → plenty of headroom for this workload.

Included STUs by plan

Choose the plan that fits your workload

Starter

400 STU

Best for pilots and early launches
POPULAR
Growth

5,000 STU

Scale to meaningful production; Band-P unlock available
Enterprise

15,000 STU

Large workloads, highest caps, Band-P native

Add Top-Ups anytime (100 / 1,000 / 3,000 STU sizes) or enable Reserved Throughput (Growth) to raise concurrency and tokens-per-minute.

How throttling works

Transparent usage management
  • No automatic overage charges.

  • If you run out of STUs and don't top-up, we throttle requests to your plan's safe baseline rate.

  • You can remove throttling instantly by purchasing a top-up or reducing traffic.

Frequently Asked Questions

Ready to Calculate Your Usage?

Use our calculator to see how many STUs your workload needs and get a personalised plan recommendation.

Service Token Units (STUs) - Unified AI Billing | SCX.ai