SCX.ai Business Manager Primer
A practical guide to deploying production-grade AI with Australian Sovereignty, predictable cost, and measurable efficiency.
1. The AI Stack in Plain English
Think in three layers:
What you buy (Services)
- Inference-as-a-Service: APIs that turn prompts into answers. Pay per token.
- RAG Vault: Managed retrieval to ground answers in your docs.
- Fine-tuning: LoRA adapters to add tone/skills.
- GLP: Real-time filtering for safety & leakage.
- Secure Tool Runner: Safe execution of tool/DB calls.
What runs it (Models)
- Foundation models: (Llama, Gemma, GPT-OSS) do the reasoning.
- Agents: Models + rules + tools.
- Embeddings: Numeric fingerprints for search.
- RAG: Fetches snippets for grounded answers.
What it runs on (Chips)
- Accelerators: (SambaNova RDUs) for high throughput.
- GPUs: For flexibility.
- Facilities: Australian Sovereign Cloud (for compliance).
Why this matters: You get production-grade AI with Australian Sovereignty, predictable cost, and measurable efficiency (tokens/kWh), without building a data centre or hiring a research lab.
2. How Workflows Actually Run
(Auth)
(PII Redaction)
(Sovereignty Check)
(Local Retrieve)
(Locked Version)
(Safety)
Result: Deterministic Egress → Audit Bus (≤100ms p95 Latency)
The Lifecycle of a Prompt
- Your app calls SCX.ai with a prompt (and RAG context).
- GLP pre-filter removes PII/secrets and blocks injection.
- The router picks an approved model under policy.
- The model answers; if needed, it retrieves via RAG or calls the Secure Tool Runner.
- GLP post-filter checks the response for compliance.
- Return answer to app; everything logged with version/model.
- Speedp95 Latency
- QualityAccept Rate
- Cost$ / 1M tokens
- ESGTokens/kWh & gCO₂e
3. Making Good Commercial Decisions
Small models for classification. Standard for reasoning. Premium only when truly necessary.
(Avg input + retrieved context + output) × requests/month → tokens/month → cost.
Buy reserved throughput for peak hours; throttle or queue the rest.
Cache frequent retrievals and common answers to cut spend and latency.
Use LoRA when prompts/RAG aren't enough; treat adapters like software releases.
Report tokens/kWh alongside $ / 1M tokens to align finance and ESG.
1x
Light
3.5x
Standard
9x
Premium
4. Sovereign Security & Compliance
- Identity (OIDC/SAML)
- Keys (KMS/HSM)
- Policy Engine
- Model Registry
- Audit Logs
- GLP Pre/Post Filters
- Sovereign RAG
- Model Endpoint
- Tool Runner
- Deterministic Egress
- Control vs Data Plane: Identity and policy are separate from execution.
- Deterministic Egress: Outputs return only to caller; tools allow-listed only.
- GLP Guardrails: Enforced on input/output; logged with reason codes.
- Version-locked: Signed artifacts, reproducible builds, one-click rollback.
- Audit by default: Immutable logs tie answer to model/policy/tools.
5. What Your Team Needs to Do
Nominate an AI product owner (owns outcomes and KPIs).
Appoint a data steward (owns corpus quality, chunking, retention).
Involve security early (GLP rules, egress lists, key handling).
Start with one use case (60–90 days to 'boringly good' production).
Publish SLOs & dashboards (latency, cost, grounded answers).
Plan rollbacks (prompts and models), then practice them.
6. Quick Wins by Industry
Glossary (Business-Focused)
What outcome and KPI in 90 days?
Small/standard/premium—why?
Which corpus, chunking plan, and filters?
GLP rules (PII, secrets, jailbreaks).
Which endpoints, credentials, allow-lists?
p95 requirement and peak plan.
Tokens/month, $/1M tokens, STUs, cache.
Version lock, rollback, audit export.
Tokens/kWh and gCO₂e/token reporting.
Data remains in AU
Ready to Deploy Sovereign AI?
Start your journey with Australian-hosted, production-grade AI infrastructure today.