Foundation Models: Apple's On-Device AI for Developers
Published on · 26 min
With iOS 26, Apple finally opens the doors to its embedded language model for third-party developers. The Foundation Models Framework allows you to integrate artificial intelligence capabilities directly into your applications, without remote servers, without inference costs, and with complete user data privacy.
This guide explores this new API in depth: from simple text generation to complex architectures with Tool Calling, through the type-safe guided generation that makes this framework shine.
What you will learn
This article covers the entire Foundation Models Framework:
- Configuration and model availability verification
- Language sessions and text generation
- Guided generation with
@Generableand@Guidemacros
- Guided generation with
- Tool Calling to extend model capabilities
- Error handling and security guardrails
- Building a complete conversational assistant in SwiftUI
Prerequisites and compatibility
Supported devices
The Foundation Models Framework relies on Apple Intelligence, available only on devices equipped with a sufficiently powerful Neural Engine:
| Platform | Compatible devices |
|---|---|
| iPhone | iPhone 15 Pro and later |
| iPad | iPad with M1 chip and later |
| Mac | Mac with Apple Silicon (M1+) |
| Vision Pro | All models |
Required configuration
Three conditions must be met to use the framework:
The model (~3 billion parameters) weighs approximately 3 GB and downloads automatically after Apple Intelligence activation. This download happens in the background and may take several minutes depending on connection speed.
Getting started: LanguageModelSession
Simple session
The LanguageModelSession class is the main entry point for the framework. It manages communication with the model and maintains conversation history:
This approach waits for the model to generate its entire response before returning. For long responses, the user may feel the application is unresponsive.
Response streaming
For a smooth user experience, use streamResponse which returns an AsyncThrowingStream:
Streaming allows displaying text character by character, exactly like modern chat interfaces. The user sees the response being built in real time.
System instructions
Instructions define the model's behavior and personality for the entire session duration:
Instructions are more powerful than a simple prompt prefix. The model considers them as priority directives and applies them consistently throughout the conversation.
Guided generation: @Generable and @Guide
Guided generation is the flagship feature of the Foundation Models Framework. It allows obtaining structured responses as native Swift types, without fragile JSON parsing.
The @Generable macro
Annotate your structures with @Generable so the model can instantiate them directly:
The model uses "constrained decoding" to guarantee that the output respects the defined structure exactly. No malformed JSON, no missing fields.
Constraints with @Guide
The @Guide macro allows refining generation behavior for each property:
Available constraints include:
| Constraint | Usage | Example |
|---|---|---|
description |
Semantic guide | "Character name" |
.range() |
Numeric range | .range(1...100) |
.anyOf() |
Enumerated values | .anyOf(["A", "B", "C"]) |
.count() |
Array size | .count(3...5) |
.regex() |
String pattern | .regex(/[A-Z]{2}-\d{4}/) |
Property order
The order of property declaration is crucial. The model generates values sequentially, from top to bottom:
If you place summary before mainTopics, the model cannot rely on identified topics to build the summary.
Generation with streaming
Combine guided generation and streaming for optimal UX:
The PartiallyGenerated type exposes each property as optional, allowing you to display values as they are generated.
Tool Calling: Extending model capabilities
The embedded model has limited knowledge (training data until 2023) and no access to external data. Tool Calling fills these gaps by giving it access to your APIs.
Anatomy of a Tool
A Tool implements the Tool protocol with the following elements:
- description: guides the model on when to use the tool
- Arguments:
@GenerableandSendablestruct defining expected parameters
- Arguments:
- func call(): implementation returning a
String(or anyPromptRepresentabletype)
- func call(): implementation returning a
The name is auto-generated from the struct/class name and can be omitted.
Registration and usage
Tools are passed to the session at creation:
Complete example: Tool with HealthKit
Here's a realistic example integrating HealthKit for health data:
Tool chaining
The model can call multiple tools in sequence to answer a complex request:
The framework automatically handles inserting tool results into the conversation transcript.
Error handling
The Foundation Models Framework defines specific errors via GenerationError:
| Error | Description |
|---|---|
guardrailViolation |
Content blocked by safety filters |
exceededContextWindowSize |
Conversation exceeding 4096 tokens |
rateLimited |
Too many requests or app in background |
unsupportedLanguageOrLocale |
Language not supported |
concurrentRequests |
Attempt to send during ongoing generation |
assetsUnavailable |
Model deleted or insufficient disk space |
unsupportedGuide |
@Guide with unsupported pattern |
decodingFailure |
Failed to deserialize Generable type |
refusal |
Model refuses to respond (explanation available via .explanation) |
Guardrails in detail
Apple applies strict and non-disableable safety filters. Certain content systematically triggers a violation:
- Violent or explicit content
- Requests related to illegal activities
- Certain sensitive political topics
- Critical medical or legal information
These guardrails are sometimes too restrictive (a news article about a death may be blocked). Apple is working to improve these false positives — report them via Feedback Assistant.
Limitations to know
Model characteristics
| Aspect | Value | Implication |
|---|---|---|
| Parameters | ~3 billion | Less powerful than GPT-4 or Claude |
| Context window | 4096 tokens | Limited conversation length |
| Quantization | 2-bit | Optimized for memory, not precision |
| RAM used | ~3 GB | Impact on memory-hungry apps |
What the model does NOT do well
Apple recommends avoiding these use cases:
- Code generation: Inconsistent and often incorrect results
- Mathematical calculations: Frequent errors on complex operations
- Recent factual knowledge: Training data until 2023
- General conversations: Not designed as a generalist chatbot
The model excels however for:
- Summarization and entity extraction
- Short creative content generation
- Classification and tagging
- Structured text understanding
Performance optimization
Model prewarming
Initial model loading takes a few seconds. Use prewarm() to anticipate. This method is synchronous and doesn't throw — it returns immediately and loading happens in the background:
Call prewarm() as soon as you anticipate model usage (e.g., when user opens the chat screen).
Memory management
With ~3 GB of RAM used by the model, monitor memory footprint:
Limiting Generable properties
Each property of a @Generable type is generated sequentially. More properties mean slower generation:
Design your Generable types with only the properties you will actually display.
Complete example: Mini Conversational Chat
Here's a complete implementation of a conversational assistant with SwiftUI and iOS 26:
Data model
ViewModel with managed session
Main Chat view
This implementation handles:
- Model availability verification
- Real-time response streaming
- Complete error handling
- Conversation history maintained by session
- Prewarming for optimal response times
Going further
The Foundation Models Framework opens considerable possibilities for iOS applications. It doesn't replace cloud models for complex tasks, but excels for quality of life features that work offline and respect privacy.