scx.ai logo

SCX.ai Business Manager Primer

A practical guide to deploying production-grade AI with Australian Sovereignty, predictable cost, and measurable efficiency.

1. The AI Stack in Plain English

Think in three layers:

Layer 1: Services
The Interface (What you buy)
API Surface
Inference APIs
RAG Vault
Fine-Tuning
GLP Protection
Tool Runner
Capacity
Layer 2: Models
The Brains (What runs it)
Intelligence
Foundation ModelsLlama, GPT-OSS, Mistral
AgentsReasoning + Tool Use
EmbeddingsVector Search
RAG EngineContext Retrieval
Layer 3: Infrastructure
The Metal (Sovereign Compute)
AU Hardware
AcceleratorsRDUs / LPUs
Sovereign DCNSW / SA

What you buy (Services)

  • Inference-as-a-Service: APIs that turn prompts into answers. Pay per token.
  • RAG Vault: Managed retrieval to ground answers in your docs.
  • Fine-tuning: LoRA adapters to add tone/skills.
  • GLP: Real-time filtering for safety & leakage.
  • Secure Tool Runner: Safe execution of tool/DB calls.

What runs it (Models)

  • Foundation models: (Llama, Gemma, GPT-OSS) do the reasoning.
  • Agents: Models + rules + tools.
  • Embeddings: Numeric fingerprints for search.
  • RAG: Fetches snippets for grounded answers.

What it runs on (Chips)

  • Accelerators: (SambaNova RDUs) for high throughput.
  • GPUs: For flexibility.
  • Facilities: Australian Sovereign Cloud (for compliance).

Why this matters: You get production-grade AI with Australian Sovereignty, predictable cost, and measurable efficiency (tokens/kWh), without building a data centre or hiring a research lab.

2. How Workflows Actually Run

Client/App

(Auth)

GLP Pre-Filter

(PII Redaction)

AU Router

(Sovereignty Check)

RAG & Tools

(Local Retrieve)

Model

(Locked Version)

GLP Post-Filter

(Safety)

Result: Deterministic Egress → Audit Bus (≤100ms p95 Latency)

The Lifecycle of a Prompt

  1. Your app calls SCX.ai with a prompt (and RAG context).
  2. GLP pre-filter removes PII/secrets and blocks injection.
  3. The router picks an approved model under policy.
  4. The model answers; if needed, it retrieves via RAG or calls the Secure Tool Runner.
  5. GLP post-filter checks the response for compliance.
  6. Return answer to app; everything logged with version/model.
What you measure
  • Speedp95 Latency
  • QualityAccept Rate
  • Cost$ / 1M tokens
  • ESGTokens/kWh & gCO₂e

3. Making Good Commercial Decisions

1
Match model to task

Small models for classification. Standard for reasoning. Premium only when truly necessary.

2
Budget by workflow

(Avg input + retrieved context + output) × requests/month → tokens/month → cost.

3
Reserve peaks

Buy reserved throughput for peak hours; throttle or queue the rest.

4
Cache wins

Cache frequent retrievals and common answers to cut spend and latency.

5
Fine-tune sparingly

Use LoRA when prompts/RAG aren't enough; treat adapters like software releases.

6
Track efficiency

Report tokens/kWh alongside $ / 1M tokens to align finance and ESG.

Token Cost Composition
Support Q&AInput / Context / Output
Policy LookupInput / Context / Output
Email DraftInput / Context / Output
Input
Context (RAG)
Output
Model Pricing Multipliers

1x

Light

3.5x

Standard

9x

Premium

4. Sovereign Security & Compliance

Control Plane
  • Identity (OIDC/SAML)
  • Keys (KMS/HSM)
  • Policy Engine
  • Model Registry
  • Audit Logs
Data Plane
  • GLP Pre/Post Filters
  • Sovereign RAG
  • Model Endpoint
  • Tool Runner
  • Deterministic Egress
  • Control vs Data Plane: Identity and policy are separate from execution.
  • Deterministic Egress: Outputs return only to caller; tools allow-listed only.
  • GLP Guardrails: Enforced on input/output; logged with reason codes.
  • Version-locked: Signed artifacts, reproducible builds, one-click rollback.
  • Audit by default: Immutable logs tie answer to model/policy/tools.
Australian Alignment
IRAP AssessedISO 27001Australian Privacy PrinciplesData SovereigntyEssential Eight

5. What Your Team Needs to Do

1

Nominate an AI product owner (owns outcomes and KPIs).

2

Appoint a data steward (owns corpus quality, chunking, retention).

3

Involve security early (GLP rules, egress lists, key handling).

4

Start with one use case (60–90 days to 'boringly good' production).

5

Publish SLOs & dashboards (latency, cost, grounded answers).

6

Plan rollbacks (prompts and models), then practice them.

6. Quick Wins by Industry

Financial Services
Trust deed review, ESG scanning, adviser assistants (RAG + LoRA tone).
Government
Citizen agents with grounded answers, secure document processing, grants/fraud triage.
Healthcare
Clinical summarisation, coding assistance, device telemetry analysis (strict GLP, PHI handling).
AI-first SaaS
Multi-tenant RAG, tool-rich agents, continuous LoRA updates—ship features faster.

Glossary (Business-Focused)

One-Page Decision Checklist
Key questions for your first 90 days.
?
Use case

What outcome and KPI in 90 days?

?
Model choice

Small/standard/premium—why?

?
RAG

Which corpus, chunking plan, and filters?

?
Guardrails

GLP rules (PII, secrets, jailbreaks).

?
Tools

Which endpoints, credentials, allow-lists?

?
Latency target

p95 requirement and peak plan.

?
Budget

Tokens/month, $/1M tokens, STUs, cache.

?
Governance

Version lock, rollback, audit export.

?
ESG

Tokens/kWh and gCO₂e/token reporting.

Sovereignty

Data remains in AU

Ready to Deploy Sovereign AI?

Start your journey with Australian-hosted, production-grade AI infrastructure today.

Executive Primer | SCX.ai