# FoundationModels: Quick Reference Cheatsheet The scannable companion to the full iOS 26 FoundationModels reference — key types, patterns, and things not to do category: Reference date: 2026-03-01 reading-time: 8 min read excerpt: Everything you need to know about FoundationModels on one page — key types, session patterns, token budget formula, and the 10 anti-patterns. Links into the full reference for every section. ## Table of Contents - The 15-Part Index - Key Types - Session Init Variants - respond() vs streamResponse() - Token Budget Formula - @Generable vs Raw String - The AnyObject? Pattern - Context Engineering — 4 Patterns - The 10 Anti-Patterns - Minimum Viable Service Pattern - Availability Cases at a Glance --- > This is the fast version. For full detail on any section, follow the deep-dive links to the [complete reference](/writing/foundation-models-reference). --- ## The 15-Part Index | Part | Covers | One thing to know | Deep dive | |------|--------|-------------------|-----------| | 1 | Availability & Setup | `.modelNotReady` is transient — model is downloading, not missing | [→](/writing/foundation-models-reference#part-1-availability--setup) | | 2 | Sessions & Basic Prompting | `respond()` returns `Response` — always unwrap `.content` | [→](/writing/foundation-models-reference#part-2-sessions--basic-prompting) | | 3 | Prompt Engineering | Short beats long. `<200 words`. Explicit rules beat prose descriptions | [→](/writing/foundation-models-reference#part-3-prompt-engineering-for-on-device-models) | | 4 | `@Generable` | Macro generates a structured output schema; `@Guide` adds constraints | [→](/writing/foundation-models-reference#part-4-guided-generation-generable) | | 5 | Streaming | `streamResponse()` → `AsyncSequence`; use `.collect()` to finalise | [→](/writing/foundation-models-reference#part-5-streaming) | | 6 | Generation Options | `temperature: nil` or `0.0–0.2` for correction tasks; higher for creative | [→](/writing/foundation-models-reference#part-6-generation-options) | | 7 | Tool Calling | Pre-fetch if always needed; define as `Tool` only for conditional data | [→](/writing/foundation-models-reference#part-7-tool-calling) | | 8 | Token Budget | All inputs + output share one fixed window (~4,096 tokens) | [→](/writing/foundation-models-reference#part-8-token-budget) | | 9 | The Transcript | New session per call for stateless tasks — don't accumulate history | [→](/writing/foundation-models-reference#part-9-the-transcript) | | 10 | Failure Modes | `normalise()` should never throw — return raw input on any failure | [→](/writing/foundation-models-reference#part-10-failure-modes--graceful-degradation) | | 11 | Testing | `@Generable` types are unit-testable without the model (memberwise init) | [→](/writing/foundation-models-reference#part-11-testing) | | 12 | Use Cases | 10 concrete patterns: BJJ, recipes, journaling, commits, triage... | [→](/writing/foundation-models-reference#part-12-example-use-cases) | | 13 | Quick Reference | Full type table + anti-patterns (see below) | [→](/writing/foundation-models-reference#part-13-quick-reference--anti-patterns) | | 14 | Context Engineering | 4,096 tokens = ~3,000 words shared. Select, don't dump | [→](/writing/foundation-models-reference#part-14-context-engineering-for-on-device-ai) | | 15 | Advanced Patterns | `call()` runs `@concurrent` — hop to `@MainActor` for state access | [→](/writing/foundation-models-reference#part-15-advanced-patterns) | --- ## Key Types | Type | Purpose | |------|---------| | `SystemLanguageModel` | Singleton entry point — `.default`, `.availability`, `.isAvailable` | | `SystemLanguageModel.Availability` | `.available` / `.unavailable(reason)` — always handle `@unknown default` | | `LanguageModelSession` | One conversation thread. Stateful — holds `Transcript` | | `Instructions` | System prompt — set once at session creation, not per-turn | | `Prompt` | Per-turn user input to the model | | `Response` | Wrapper around typed output — **always** access `.content`, not `response` | | `ResponseStream` | `AsyncSequence` of `Snapshot` for streaming | | `GenerationOptions` | `temperature`, `maximumResponseTokens`, `SamplingMode` | | `@Generable` | Macro — synthesises guided generation schema for a struct or enum | | `@Guide` | Property wrapper on `@Generable` fields — description + constraints | | `GenerationGuide` | Constraint type: `.range()`, `.count()`, `.pattern()` | | `Transcript` | Linear history: `.instructions`, `.prompt`, `.response`, `.toolCalls`, `.toolOutput` | | `Tool` | Protocol — `Arguments` (Generable), `Output` (PromptRepresentable), `call()` | | `SystemLanguageModel.TokenUsage` | `.tokenCount` — measure cost before injection | --- ## Session Init Variants ```swift // Minimal — no tools, inline instructions LanguageModelSession { "Correct BJJ terminology. kimora→Kimura, half card→Half Guard." } // Explicit model LanguageModelSession(model: SystemLanguageModel.default) { "..." } // With tools LanguageModelSession(tools: [PositionLookupTool()]) { "..." } // Resume from saved transcript LanguageModelSession(model: .default, tools: [], transcript: savedTranscript) ``` --- ## `respond()` vs `streamResponse()` | | `respond()` | `streamResponse()` | |--|------------|-------------------| | Returns | `Response` | `ResponseStream` | | Best for | Background processing, pipelines | Live UI with typing effect | | Partial results | No | Yes — `snapshot.content` returns `PartiallyGenerated` | | Finalise stream | N/A | `.collect()` → `Response` | **Rule:** if the output is going directly into a pipeline or SwiftData model, use `respond()`. If the user sees it appear on screen as it generates, use `streamResponse()`. --- ## Token Budget Formula ``` Total window ≈ 4,096 tokens ≈ 3,000 words instructions + tool definitions + transcript history + prompt + response ``` All five compete for the same pool. Response tokens are consumed from the same window as input tokens — a 500-token response leaves 3,596 tokens for everything else. **Measure before injecting:** ```swift let cost = try await model.tokenUsage(for: instructions).tokenCount let window = await model.contextSize // back-deployed via @backDeployed ``` --- ## `@Generable` vs Raw `String` **Use `@Generable` when:** - You need multiple structured fields - Output must be parsed or processed programmatically - You want compile-time guarantees on shape - You need constraints (`@Guide`) on values **Use raw `String` when:** - Output is prose for display to the user - You're summarising or generating a paragraph - Streaming a typing effect --- ## The `AnyObject?` Pattern (Availability Without `@available` Spread) The problem: adding a `@State private var session: LanguageModelSession?` forces `@available(iOS 26, *)` onto the whole view. The fix: use `AnyObject?` as the declared type and cast inside `#available` guards. ```swift // In your view — no @available annotation needed on the view itself @State private var normalisationService: AnyObject? // In .onAppear or .task if #available(iOS 26, *) { normalisationService = TranscriptNormalisationService() } // At call site if #available(iOS 26, *), let service = normalisationService as? TranscriptNormalisationService { let result = await service.normalise(rawText) } ``` --- ## Context Engineering — 4 Patterns When app data is too large to inject directly: | Pattern | When to use | How | |---------|-------------|-----| | **Select, Don't Dump** | Data is queryable | SwiftData predicate — fetch only relevant rows | | **Layered Injection** | Hierarchical data | Inject summaries; load detail on demand via tools | | **Two-Step Compression** | Large corpus, summary needed | Call 1 summarises → Call 2 reasons with summary | | **Pre-Summarise at Write Time** | Rich entities with stable detail | Generate + store AI summary when entity is saved; reuse forever | --- ## The 10 Anti-Patterns **1. Accessing `response` instead of `response.content`** `respond()` returns `Response`, not `T`. Always unwrap `.content`. **2. Storing `LanguageModelSession` persistently when you don't need history** For stateless tasks (normalisation, extraction, classification), create a new session per call. History accumulates and eventually overflows the context window. **3. Too many tools** Each tool definition consumes ~50–100 tokens whether called or not. Keep to 3–5 per session. Split into multiple focused sessions if you have more. **4. Calling `isAvailable` / `checkAvailability()` in the hot path** Availability doesn't change mid-session. Check once at service init; cache the result. **5. High temperature for structured / correction tasks** `@Generable` correction types need `nil` or `temperature: 0.0–0.2`. High temperature produces creatively varied — and wrong — output. **6. Long, elaborate instructions modelled on frontier model prompts** The on-device model is ~3B parameters. Instructions over ~200 words dilute signal. Short, explicit rules outperform discursive prose every time. **7. Not testing the fallback path** On most devices today, Apple Intelligence is unavailable. Your non-AI path is the primary experience for most users. Test it as thoroughly as the AI path. **8. Using FoundationModels for regex-solvable tasks** If the task is a known, fixed pattern (extract a UUID, validate an email, format a date), use a deterministic function. LLM overhead — latency, availability, complexity — is waste. **9. Propagating `@available(iOS 26, *)` to SwiftUI views** Adding `@available` to a `@State` property forces the whole view to require iOS 26. Use the `AnyObject?` pattern instead. **10. Treating `.modelNotReady` as permanent** `.modelNotReady` means the model is downloading. It is transient. Show "not available right now" and retry on next app launch. Do not display a permanent "unsupported" message. --- ## Minimum Viable Service Pattern The production-safe wrapper — never throws, falls back silently: ```swift @available(iOS 26, *) @MainActor final class TranscriptNormalisationService { private func makeSession() -> LanguageModelSession { LanguageModelSession { """ You are a BJJ transcript corrector. Fix misrecognised terms only. Common corrections: kimora→Kimura, half card→Half Guard, arm bar→armbar. Vocabulary: Kimura, Triangle, Armbar, Half Guard, Full Guard, Mount, Back Control. Return the corrected transcript and the BJJ terms found. """ } } /// Never throws. Returns raw input unchanged on any failure. func normalise(_ rawTranscript: String) async -> NormalisedTranscript { guard !rawTranscript.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty else { return NormalisedTranscript(normalisedText: rawTranscript, extractedTerms: []) } guard SystemLanguageModel.default.isAvailable else { return NormalisedTranscript(normalisedText: rawTranscript, extractedTerms: []) } do { let session = makeSession() let result = try await session.respond( to: Prompt { rawTranscript }, generating: NormalisedTranscript.self ) return result.content } catch { return NormalisedTranscript(normalisedText: rawTranscript, extractedTerms: []) } } } ``` --- ## Availability Cases at a Glance | Case | Meaning | What to do | |------|---------|------------| | `.available` | Ready to use | Create session, proceed | | `.unavailable(.deviceNotEligible)` | Hardware doesn't support Apple Intelligence | Show permanent alternative UI; remove AI option | | `.unavailable(.appleIntelligenceNotEnabled)` | User hasn't enabled it in Settings | Optionally prompt user; respect their choice | | `.unavailable(.modelNotReady)` | Model weights downloading | Show "not available right now"; retry on next launch | --- *Full 15-part reference: [iOS 26 FoundationModels: Comprehensive Swift/SwiftUI Reference](/writing/foundation-models-reference)*