CaptureKit
Published on · 20 min
On Apple platforms, capturing audio and video seems straightforward on the surface. In practice, as soon as you move past the trivial case — microphone + M4A file — API fragmentation makes everything complex. AVCaptureSession for the camera, AudioEngine for the mic, ScreenCaptureKit for macOS screen capture, ReplayKit on iOS, VideoToolbox for H.264 encoding, AudioToolbox for AAC… Each component has its own API surface, its own threading constraints, its own buffer formats.
While building the Atelier Socle streaming ecosystem — HLSKit, RTMPKit, SRTKit, IcecastKit — the most fundamental building block was missing: capturing the signal. Every transport needed encoded audio buffers, compressed video frames, consistent permission management, level monitoring. And every app had to reimplement the same capture logic, with the same pitfalls.
CaptureKit unifies all of it. A single CaptureSession orchestrates sources, encoders and outputs. A single StreamingPipeline connects capture to any transport via a StreamingTransport protocol. All built with Swift 6.2 strict concurrency — actors, AsyncStream, Sendable everywhere — no Combine, no external dependencies, with native support for macOS 14, iOS 17 and visionOS 1.
What CaptureKit does
CaptureKit covers the entire capture path: from the hardware source to the encoded output. Audio and video sources, hardware encoders, file or streaming outputs, real-time metering, device discovery, permission management — all within a unified API.
9 audio sources —
MicrophoneSource,SystemAudioSource,LineInSource,BluetoothAudioSource,AggregateAudioSource,FileAudioSource,VoIPAudioSource,ToneSource(programmable generator),SilenceSource10 video sources —
CameraSource,ExternalCameraSource,ScreenCaptureSource(ScreenCaptureKit + ReplayKit + Broadcast Extension),CinematicCameraSource,SpatialCameraSource(MV-HEVC visionOS),MultiCameraSource,FileVideoSource,TestPatternSource,BlackSource,ColorSource6 audio codecs — AAC (LC/HE v1/HE v2/ELD/xHE), ALAC, Opus, FLAC, MP3, PCM — with AudioToolbox hardware acceleration
6 video codecs — H.264, HEVC, ProRes, AV1, MV-HEVC, JPEG — with VideoToolbox hardware acceleration
8 outputs —
FileOutput(MP4, MOV, M4A, CAF, WAV, AIFF, FLAC with rotation),CallbackOutput,PixelBufferOutput,SampleBufferOutput,PreviewOutput,AudioPreviewOutput,TeeOutput(duplication),NullOutputTransport-agnostic streaming —
StreamingPipelinewithaudioOnly,videoOnlyandmuxedmodes,StreamingTransportprotocol to plug in RTMP, HLS, SRT, Icecast or any other transportReal-time metering —
AudioMeterwith peak/RMS, EBU R128 loudness (momentary, short-term, integrated), true peak, waveform data for visualization25+ presets — Twitch, YouTube, Facebook, Instagram, TikTok, podcast, radio, broadcast, screen recording, spatial video, archive, ProRes, Dolby Atmos
Device discovery —
DeviceDiscoverywith AsyncStream monitoring of audio and video connect/disconnect eventsPermission management — unified
PermissionManagerfor microphone, camera, screen recording with built-in SwiftUI viewsDynamic control — hot-swap sources, adjust bitrate, force keyframes, pause/resume during capture
Swift 6.2 strict concurrency — actors for all stateful types,
Sendableeverywhere,async/awaitend-to-end, zero Combine, zero dependencies
Quick start
Record microphone audio as AAC to an M4A file:
Installation
Via Swift Package Manager, add the dependency to your Package.swift:
Then add it to your target:
Supported platforms
| Platform | Minimum version | Specifics |
|---|---|---|
macOS | 14+ | ScreenCaptureKit, aggregate devices, system audio, USB/Thunderbolt external cameras |
iOS / iPadOS | 17+ | ReplayKit, Broadcast Extension, USB-C cameras |
visionOS | 1+ | MV-HEVC spatial video, SpatialCameraSource |
The Source → Encoder → Output architecture
CaptureKit is organized around three foundational protocols that compose freely. Each source produces an AsyncStream of raw buffers, each encoder transforms those buffers into compressed data, and each output consumes the encoded data. CaptureSession orchestrates everything.
| Type | Role |
|---|---|
| Main orchestrator (actor) — sources, encoders, outputs, state, events, statistics |
| Capture protocols — 9 audio implementations, 10 video |
| Encoding protocols — 7 audio codecs, 6 video codecs |
| Output protocol — 8 implementations (file, callback, preview, tee, null…) |
| Unified capture → encode → send pipeline (actor) |
| Transport-agnostic protocol for live streaming |
| Real-time peak/RMS/loudness metering (actor) |
| Audio/video device monitoring (actor) |
| Unified permission handling (actor) |
Audio capture
Nine audio sources cover every use case — from the built-in microphone to a programmable tone generator. Each source is configured with an AudioSourceConfiguration and produces an AsyncStream<AudioBuffer>. MicrophoneSource exposes automatic gain control, echo cancellation, noise suppression and voice isolation.
AudioSourceConfiguration provides ready-to-use presets:
| Preset | Sample Rate | Channels | Bit Depth | Use case |
|---|---|---|---|---|
| 48 kHz | 2 | Float32 | General purpose |
| 48 kHz | 2 | Float32 | Broadcast (10ms buffer) |
| 96 kHz | 2 | Float32 | Hi-Res audio |
| 16 kHz | 1 | Float32 | VoIP / voice chat |
| 48 kHz | 2 | Float32 | Spatial audio |
Video capture
CameraSource handles built-in and external cameras with full control — zoom, torch, focus mode, exposure, white balance, stabilization, depth data, and photo capture during recording.
Video configuration presets:
| Preset | Resolution | FPS | Specifics |
|---|---|---|---|
| 1080p | 30 | SDR, BT.709 |
| 720p | 30 | Standard stabilization |
| 1080p | 60 | Full HD broadcast |
| UHD 4K | 30 | HDR10, BT.2020 |
| 1080p | 24 | Display P3, cinematic stabilization |
| spatial | 30 | MV-HEVC visionOS |
Screen capture
ScreenCaptureSource unifies three backends under a single API. On macOS, ScreenCaptureKit provides display, window, application or region capture. On iOS, ReplayKit captures the current app, while Broadcast Extension enables full system capture via an App Group.
Hardware encoding
CaptureKit exposes 12 codecs — 6 audio and 6 video — each implemented as an actor with configuration presets. Video encoders use VideoToolbox for hardware acceleration, audio encoders go through AudioToolbox.
| Codec | Type | Presets |
|---|---|---|
AAC (LC/HE v1/HE v2/ELD/xHE) | Audio |
|
ALAC | Audio | Apple Lossless |
Opus | Audio | Interactive codec |
FLAC | Audio | Free Lossless |
MP3 | Audio |
|
PCM / WAV | Audio | Uncompressed |
H.264 | Video |
|
HEVC | Video | Main, Main 10 — HDR support |
ProRes | Video | Proxy, LT, Standard, HQ, 4444, 4444 XQ |
AV1 | Video | Profile 0, 1, 2 |
MV-HEVC | Video | visionOS spatial video |
JPEG | Video | Still image capture |
File recording
FileOutput records to 7 container formats with automatic rotation by duration and/or size. Metadata (title, artist, album) is embedded in the file.
Live streaming
StreamingPipeline is the centerpiece for live. It orchestrates capture, encoding and delivery in a unified pipeline. Three operating modes — audio only, video only, or muxed audio+video. In muxed mode, the pipeline waits for the first video keyframe, extracts parameter sets (SPS/PPS for H.264, VPS/SPS/PPS for HEVC), sends them via sendConfiguration, then starts the interleaved audio+video stream with a shared monotonic clock.
The StreamingTransport protocol is deliberately minimal — four methods to implement:
Concrete implementations — RTMP, HLS, SRT, Icecast bridges — live in the consuming app, keeping CaptureKit transport-agnostic. The demo app includes working bridges for all four protocols in the Atelier Socle ecosystem.
RTMP integration
An RTMP bridge connects CaptureKit to RTMPKit for broadcasting to Twitch, YouTube, Facebook and other RTMP platforms:
Icecast integration
For audio streaming to Icecast/SHOUTcast, the IcecastKit bridge sends AAC or MP3 in audioOnly mode:
Audio metering
AudioMeter provides real-time audio levels with five analysis modes and waveform data for visualization. It accepts buffers from any source and exposes two AsyncStreams — one for levels, one for waveforms.
| Preset | Mode | Waveform | Use case |
|---|---|---|---|
|
| — | Professional broadcast |
|
| — | Podcast recording |
|
| — | EBU R128 compliance |
|
|
| Voice messages |
|
|
| DAW audio editing |
Capture presets
25+ presets cover common use cases, from streaming platforms to professional workflows. Each preset automatically configures codec, bitrate, resolution and frame rate.
| Category | Presets | Details |
|---|---|---|
Streaming |
| H.264 + AAC, customizable resolution/FPS |
Podcast |
| AAC 128k or ALAC lossless |
Radio |
| AAC, MP3 or Opus for web radio |
Broadcast |
| HEVC 1080p, UHD 4K, ProRes HQ |
Spatial |
| MV-HEVC visionOS, immersive audio |
Screen Recording |
| Optimized screen capture |
Low Bandwidth |
| Limited connections |
Archive |
| High-quality archival |
Device discovery
DeviceDiscovery monitors audio and video device connections and disconnections in real time via an AsyncStream<DeviceChangeEvent>. Each device exposes its capabilities, connection type and supported formats.
Permission management
PermissionManager unifies permission checking and requesting for 6 resource types. Status changes are exposed via an AsyncStream<PermissionChange>:
Dynamic session control
During capture, the session allows hot-swapping sources, adjusting bitrates, forcing keyframes and monitoring 17 event types — from state changes to dropped frames to transport quality.
visionOS spatial video
On visionOS, SpatialCameraSource captures MV-HEVC (Multiview HEVC) to produce stereoscopic spatial video, viewable natively on Apple Vision Pro. MVHEVCEncoder handles encoding with appropriate layouts.
Architecture
The library is organized into clearly separated modules, each with a well-defined responsibility:
Tests
The test suite covers the entire library with unit, integration, end-to-end and showcase tests. Showcase tests illustrate recommended usage patterns.
| Metric | Value |
|---|---|
Tests | 1,763 |
Test files | 219 |
Showcase tests | 8 suites (audio, video, encoding, outputs, streaming, presets, formats, errors) |
E2E tests | 2 suites (audio pipeline, streaming pipeline) |
Integration tests | 6 suites (lifecycle, routing, dynamic control, events) |
CI/CD | GitHub Actions + CodeCov |
Documentation
The DocC documentation covers 17 guides:
| Guide | Content |
|---|---|
Getting Started | First project with CaptureKit |
Audio Capture | Audio sources, configuration, formats |
Video Capture | Video sources, cameras, presets |
Screen Capture | ScreenCaptureKit, ReplayKit, Broadcast Extension |
Encoding | Audio/video codecs, configuration, presets |
Outputs Guide | FileOutput, rotation, callbacks, preview |
Streaming Guide | StreamingPipeline, modes, transport |
Streaming Integration | RTMP, HLS, SRT, Icecast bridges |
Metering | AudioMeter, peak/RMS, EBU R128 loudness, waveforms |
Device Discovery | Device monitoring |
Permissions Guide | PermissionManager, SwiftUI views |
Formats Guide | SampleRate, ChannelLayout, VideoResolution, FrameRate |
Presets Guide | 25+ presets by category |
Error Handling | CaptureError, 26 typed error cases |
Platform Compatibility | macOS/iOS/visionOS differences |
Testing Guide | Mocks, testing patterns |
CaptureKit | Architecture overview |
Under the hood
| Metric | Value |
|---|---|
Swift files (Sources) | 213 |
Public types | 196 |
Protocols | 20+ |
Actors | 40+ |
Audio sources | 9 |
Video sources | 10 |
Audio codecs | 6 |
Video codecs | 6 |
Outputs | 8 |
Container formats | 7 |
Capture presets | 25+ |
DocC guides | 17 |
Tests | 1,763 |
Runtime dependencies | 0 |
Ecosystem
CaptureKit is the capture building block of the Atelier Socle streaming ecosystem. It integrates with all four transport libraries via the StreamingTransport protocol:
HLSKit — HTTP Live Streaming, M3U8 manifests, fMP4 packaging, spatial video
RTMPKit — RTMP client and server, Enhanced RTMP v2, 10 platforms
SRTKit — Pure Swift SRT transport, AES encryption, FEC, bonding
IcecastKit — Icecast/SHOUTcast client, adaptive bitrate, multi-destination
CaptureKit (this library) — Unified media capture
Links
GitHub - atelier-socle/swift-capture-kit: Unified media capture, encoding & streaming for Apple platforms — every source, every codec, zero dependencies. Pure Swift 6.2.
Unified media capture, encoding & streaming for Apple platforms — every source, every codec, zero dependencies. Pure Swift 6.2. - atelier-socle/swift-…