Foundation Models: Apple's On-Device AI for Developers

With iOS 26, Apple finally opens the doors to its embedded language model for third-party developers. The Foundation Models Framework allows you to integrate artificial intelligence capabilities directly into your applications, without remote servers, without inference costs, and with complete user data privacy.

This guide explores this new API in depth: from simple text generation to complex architectures with Tool Calling, through the type-safe guided generation that makes this framework shine.

What you will learn

This article covers the entire Foundation Models Framework:

    • Configuration and model availability verification
    • Language sessions and text generation
    • Guided generation with @Generable and @Guide macros
    • Tool Calling to extend model capabilities
    • Error handling and security guardrails
    • Building a complete conversational assistant in SwiftUI

Prerequisites and compatibility

Supported devices

The Foundation Models Framework relies on Apple Intelligence, available only on devices equipped with a sufficiently powerful Neural Engine:

Platform Compatible devices
iPhone iPhone 15 Pro and later
iPad iPad with M1 chip and later
Mac Mac with Apple Silicon (M1+)
Vision Pro All models

Required configuration

Three conditions must be met to use the framework:

The model (~3 billion parameters) weighs approximately 3 GB and downloads automatically after Apple Intelligence activation. This download happens in the background and may take several minutes depending on connection speed.

Getting started: LanguageModelSession

Simple session

The LanguageModelSession class is the main entry point for the framework. It manages communication with the model and maintains conversation history:

This approach waits for the model to generate its entire response before returning. For long responses, the user may feel the application is unresponsive.

Response streaming

For a smooth user experience, use streamResponse which returns an AsyncThrowingStream:

Streaming allows displaying text character by character, exactly like modern chat interfaces. The user sees the response being built in real time.

System instructions

Instructions define the model's behavior and personality for the entire session duration:

Instructions are more powerful than a simple prompt prefix. The model considers them as priority directives and applies them consistently throughout the conversation.

Guided generation: @Generable and @Guide

Guided generation is the flagship feature of the Foundation Models Framework. It allows obtaining structured responses as native Swift types, without fragile JSON parsing.

The @Generable macro

Annotate your structures with @Generable so the model can instantiate them directly:

The model uses "constrained decoding" to guarantee that the output respects the defined structure exactly. No malformed JSON, no missing fields.

Constraints with @Guide

The @Guide macro allows refining generation behavior for each property:

Available constraints include:

Constraint Usage Example
description Semantic guide "Character name"
.range() Numeric range .range(1...100)
.anyOf() Enumerated values .anyOf(["A", "B", "C"])
.count() Array size .count(3...5)
.regex() String pattern .regex(/[A-Z]{2}-\d{4}/)

Property order

The order of property declaration is crucial. The model generates values sequentially, from top to bottom:

If you place summary before mainTopics, the model cannot rely on identified topics to build the summary.

Generation with streaming

Combine guided generation and streaming for optimal UX:

The PartiallyGenerated type exposes each property as optional, allowing you to display values as they are generated.

Tool Calling: Extending model capabilities

The embedded model has limited knowledge (training data until 2023) and no access to external data. Tool Calling fills these gaps by giving it access to your APIs.

Anatomy of a Tool

A Tool implements the Tool protocol with the following elements:

    • description: guides the model on when to use the tool
    • Arguments: @Generable and Sendable struct defining expected parameters
    • func call(): implementation returning a String (or any PromptRepresentable type)

The name is auto-generated from the struct/class name and can be omitted.

Registration and usage

Tools are passed to the session at creation:

Complete example: Tool with HealthKit

Here's a realistic example integrating HealthKit for health data:

Tool chaining

The model can call multiple tools in sequence to answer a complex request:

The framework automatically handles inserting tool results into the conversation transcript.

Error handling

The Foundation Models Framework defines specific errors via GenerationError:

Error Description
guardrailViolation Content blocked by safety filters
exceededContextWindowSize Conversation exceeding 4096 tokens
rateLimited Too many requests or app in background
unsupportedLanguageOrLocale Language not supported
concurrentRequests Attempt to send during ongoing generation
assetsUnavailable Model deleted or insufficient disk space
unsupportedGuide @Guide with unsupported pattern
decodingFailure Failed to deserialize Generable type
refusal Model refuses to respond (explanation available via .explanation)

Guardrails in detail

Apple applies strict and non-disableable safety filters. Certain content systematically triggers a violation:

    • Violent or explicit content
    • Requests related to illegal activities
    • Certain sensitive political topics
    • Critical medical or legal information

These guardrails are sometimes too restrictive (a news article about a death may be blocked). Apple is working to improve these false positives — report them via Feedback Assistant.

Limitations to know

Model characteristics

Aspect Value Implication
Parameters ~3 billion Less powerful than GPT-4 or Claude
Context window 4096 tokens Limited conversation length
Quantization 2-bit Optimized for memory, not precision
RAM used ~3 GB Impact on memory-hungry apps

What the model does NOT do well

Apple recommends avoiding these use cases:

    • Code generation: Inconsistent and often incorrect results
    • Mathematical calculations: Frequent errors on complex operations
    • Recent factual knowledge: Training data until 2023
    • General conversations: Not designed as a generalist chatbot

The model excels however for:

    • Summarization and entity extraction
    • Short creative content generation
    • Classification and tagging
    • Structured text understanding

Performance optimization

Model prewarming

Initial model loading takes a few seconds. Use prewarm() to anticipate. This method is synchronous and doesn't throw — it returns immediately and loading happens in the background:

Call prewarm() as soon as you anticipate model usage (e.g., when user opens the chat screen).

Memory management

With ~3 GB of RAM used by the model, monitor memory footprint:

Limiting Generable properties

Each property of a @Generable type is generated sequentially. More properties mean slower generation:

Design your Generable types with only the properties you will actually display.

Complete example: Mini Conversational Chat

Here's a complete implementation of a conversational assistant with SwiftUI and iOS 26:

Data model

ViewModel with managed session

Main Chat view

This implementation handles:

    • Model availability verification
    • Real-time response streaming
    • Complete error handling
    • Conversation history maintained by session
    • Prewarming for optimal response times

Going further

The Foundation Models Framework opens considerable possibilities for iOS applications. It doesn't replace cloud models for complex tasks, but excels for quality of life features that work offline and respect privacy.

Official resources

Integration examples