CaptureKit

Published on March 12, 2026 · 20 min

Founder & Swift Tech Lead

Swift Capture Audio Video Screen Capture Streaming Metering Encoding visionOS MV-HEVC SPM Open Source Concurrency

On Apple platforms, capturing audio and video seems straightforward on the surface. In practice, as soon as you move past the trivial case — microphone + M4A file — API fragmentation makes everything complex. AVCaptureSession for the camera, AudioEngine for the mic, ScreenCaptureKit for macOS screen capture, ReplayKit on iOS, VideoToolbox for H.264 encoding, AudioToolbox for AAC… Each component has its own API surface, its own threading constraints, its own buffer formats.

While building the Atelier Socle streaming ecosystem — HLSKit, RTMPKit, SRTKit, IcecastKit — the most fundamental building block was missing: capturing the signal. Every transport needed encoded audio buffers, compressed video frames, consistent permission management, level monitoring. And every app had to reimplement the same capture logic, with the same pitfalls.

CaptureKit unifies all of it. A single CaptureSession orchestrates sources, encoders and outputs. A single StreamingPipeline connects capture to any transport via a StreamingTransport protocol. All built with Swift 6.2 strict concurrency — actors, AsyncStream, Sendable everywhere — no Combine, no external dependencies, with native support for macOS 14, iOS 17 and visionOS 1.

What CaptureKit does

CaptureKit covers the entire capture path: from the hardware source to the encoded output. Audio and video sources, hardware encoders, file or streaming outputs, real-time metering, device discovery, permission management — all within a unified API.

9 audio sources — MicrophoneSource, SystemAudioSource, LineInSource, BluetoothAudioSource, AggregateAudioSource, FileAudioSource, VoIPAudioSource, ToneSource (programmable generator), SilenceSource
10 video sources — CameraSource, ExternalCameraSource, ScreenCaptureSource (ScreenCaptureKit + ReplayKit + Broadcast Extension), CinematicCameraSource, SpatialCameraSource (MV-HEVC visionOS), MultiCameraSource, FileVideoSource, TestPatternSource, BlackSource, ColorSource
6 audio codecs — AAC (LC/HE v1/HE v2/ELD/xHE), ALAC, Opus, FLAC, MP3, PCM — with AudioToolbox hardware acceleration
6 video codecs — H.264, HEVC, ProRes, AV1, MV-HEVC, JPEG — with VideoToolbox hardware acceleration
8 outputs — FileOutput (MP4, MOV, M4A, CAF, WAV, AIFF, FLAC with rotation), CallbackOutput, PixelBufferOutput, SampleBufferOutput, PreviewOutput, AudioPreviewOutput, TeeOutput (duplication), NullOutput
Transport-agnostic streaming — StreamingPipeline with audioOnly, videoOnly and muxed modes, StreamingTransport protocol to plug in RTMP, HLS, SRT, Icecast or any other transport
Real-time metering — AudioMeter with peak/RMS, EBU R128 loudness (momentary, short-term, integrated), true peak, waveform data for visualization
25+ presets — Twitch, YouTube, Facebook, Instagram, TikTok, podcast, radio, broadcast, screen recording, spatial video, archive, ProRes, Dolby Atmos
Device discovery — DeviceDiscovery with AsyncStream monitoring of audio and video connect/disconnect events
Permission management — unified PermissionManager for microphone, camera, screen recording with built-in SwiftUI views
Dynamic control — hot-swap sources, adjust bitrate, force keyframes, pause/resume during capture
Swift 6.2 strict concurrency — actors for all stateful types, Sendable everywhere, async/await end-to-end, zero Combine, zero dependencies

Quick start

Record microphone audio as AAC to an M4A file:

Installation

Via Swift Package Manager, add the dependency to your Package.swift:

Then add it to your target:

Supported platforms

Platform	Minimum version	Specifics
macOS	14+	ScreenCaptureKit, aggregate devices, system audio, USB/Thunderbolt external cameras
iOS / iPadOS	17+	ReplayKit, Broadcast Extension, USB-C cameras
visionOS	1+	MV-HEVC spatial video, SpatialCameraSource

The Source → Encoder → Output architecture

CaptureKit is organized around three foundational protocols that compose freely. Each source produces an AsyncStream of raw buffers, each encoder transforms those buffers into compressed data, and each output consumes the encoded data. CaptureSession orchestrates everything.

Type	Role
`CaptureSession`	Main orchestrator (actor) — sources, encoders, outputs, state, events, statistics
`AudioSource` / `VideoSource`	Capture protocols — 9 audio implementations, 10 video
`AudioEncoderProtocol` / `VideoEncoderProtocol`	Encoding protocols — 7 audio codecs, 6 video codecs
`CaptureOutput`	Output protocol — 8 implementations (file, callback, preview, tee, null…)
`StreamingPipeline`	Unified capture → encode → send pipeline (actor)
`StreamingTransport`	Transport-agnostic protocol for live streaming
`AudioMeter`	Real-time peak/RMS/loudness metering (actor)
`DeviceDiscovery`	Audio/video device monitoring (actor)
`PermissionManager`	Unified permission handling (actor)

Audio capture

Nine audio sources cover every use case — from the built-in microphone to a programmable tone generator. Each source is configured with an AudioSourceConfiguration and produces an AsyncStream<AudioBuffer>. MicrophoneSource exposes automatic gain control, echo cancellation, noise suppression and voice isolation.

AudioSourceConfiguration provides ready-to-use presets:

Preset	Sample Rate	Channels	Bit Depth	Use case
`.default`	48 kHz	2	Float32	General purpose
`.broadcast`	48 kHz	2	Float32	Broadcast (10ms buffer)
`.highResolution`	96 kHz	2	Float32	Hi-Res audio
`.voiceChat`	16 kHz	1	Float32	VoIP / voice chat
`.spatialAudio`	48 kHz	2	Float32	Spatial audio

Video capture

CameraSource handles built-in and external cameras with full control — zoom, torch, focus mode, exposure, white balance, stabilization, depth data, and photo capture during recording.

Video configuration presets:

Preset	Resolution	FPS	Specifics
`.default`	1080p	30	SDR, BT.709
`.broadcast720p`	720p	30	Standard stabilization
`.broadcast1080p60`	1080p	60	Full HD broadcast
`.pro4K`	UHD 4K	30	HDR10, BT.2020
`.cinematic`	1080p	24	Display P3, cinematic stabilization
`.spatialVideo`	spatial	30	MV-HEVC visionOS

Screen capture

ScreenCaptureSource unifies three backends under a single API. On macOS, ScreenCaptureKit provides display, window, application or region capture. On iOS, ReplayKit captures the current app, while Broadcast Extension enables full system capture via an App Group.

Hardware encoding

CaptureKit exposes 12 codecs — 6 audio and 6 video — each implemented as an actor with configuration presets. Video encoders use VideoToolbox for hardware acceleration, audio encoders go through AudioToolbox.

Codec	Type	Presets
AAC (LC/HE v1/HE v2/ELD/xHE)	Audio	`.voice` (64k mono), `.podcast` (128k stereo), `.musicHQ` (256k VBR), `.streamingLowBandwidth` (HE 64k), `.lowLatency` (ELD 32k)
ALAC	Audio	Apple Lossless
Opus	Audio	Interactive codec
FLAC	Audio	Free Lossless
MP3	Audio	`.standard` (192k), `.highQuality` (320k), `.webRadio` (128k Icecast)
PCM / WAV	Audio	Uncompressed
H.264	Video	`.streaming720p` (Main 2.5Mbps), `.streaming1080p` (High 4.5Mbps), `.lowLatency` (Baseline CAVLC), `.archive` (High 20Mbps)
HEVC	Video	Main, Main 10 — HDR support
ProRes	Video	Proxy, LT, Standard, HQ, 4444, 4444 XQ
AV1	Video	Profile 0, 1, 2
MV-HEVC	Video	visionOS spatial video
JPEG	Video	Still image capture

File recording

FileOutput records to 7 container formats with automatic rotation by duration and/or size. Metadata (title, artist, album) is embedded in the file.

Live streaming

StreamingPipeline is the centerpiece for live. It orchestrates capture, encoding and delivery in a unified pipeline. Three operating modes — audio only, video only, or muxed audio+video. In muxed mode, the pipeline waits for the first video keyframe, extracts parameter sets (SPS/PPS for H.264, VPS/SPS/PPS for HEVC), sends them via sendConfiguration, then starts the interleaved audio+video stream with a shared monotonic clock.

The StreamingTransport protocol is deliberately minimal — four methods to implement:

Concrete implementations — RTMP, HLS, SRT, Icecast bridges — live in the consuming app, keeping CaptureKit transport-agnostic. The demo app includes working bridges for all four protocols in the Atelier Socle ecosystem.

RTMP integration

An RTMP bridge connects CaptureKit to RTMPKit for broadcasting to Twitch, YouTube, Facebook and other RTMP platforms:

Icecast integration

For audio streaming to Icecast/SHOUTcast, the IcecastKit bridge sends AAC or MP3 in audioOnly mode:

Audio metering

AudioMeter provides real-time audio levels with five analysis modes and waveform data for visualization. It accepts buffers from any source and exposes two AsyncStreams — one for levels, one for waveforms.

Preset	Mode	Waveform	Use case
`.broadcast`	`.full`	—	Professional broadcast
`.podcast`	`.peakAndRMS`	—	Podcast recording
`.loudnessCompliance`	`.loudness`	—	EBU R128 compliance
`.voiceMessage`	`.peak`	`.message`	Voice messages
`.dawEditing`	`.full`	`.dawHighRes`	DAW audio editing

Capture presets

25+ presets cover common use cases, from streaming platforms to professional workflows. Each preset automatically configures codec, bitrate, resolution and frame rate.

Category	Presets	Details
Streaming	`twitch()`, `youtube()`, `facebook()`, `instagram()`, `tiktok()`	H.264 + AAC, customizable resolution/FPS
Podcast	`podcastAudio()`, `podcastAudioHQ()`, `podcastVideo()`, `podcastLossless()`	AAC 128k or ALAC lossless
Radio	`radioLive()`, `webRadioMP3()`, `webRadioOpus()`	AAC, MP3 or Opus for web radio
Broadcast	`broadcastHD()`, `broadcast4K()`, `proResRecording()`	HEVC 1080p, UHD 4K, ProRes HQ
Spatial	`spatialVideo()`, `dolbyAtmos()`	MV-HEVC visionOS, immersive audio
Screen Recording	`screenRecording()`, `screenRecording4K()`, `screenRecordingRetina()`	Optimized screen capture
Low Bandwidth	`lowBandwidth()`, `voiceOnly()`	Limited connections
Archive	`archiveLossless()`, `archive4K()`	High-quality archival

Device discovery

DeviceDiscovery monitors audio and video device connections and disconnections in real time via an AsyncStream<DeviceChangeEvent>. Each device exposes its capabilities, connection type and supported formats.

Permission management

PermissionManager unifies permission checking and requesting for 6 resource types. Status changes are exposed via an AsyncStream<PermissionChange>:

Dynamic session control

During capture, the session allows hot-swapping sources, adjusting bitrates, forcing keyframes and monitoring 17 event types — from state changes to dropped frames to transport quality.

visionOS spatial video

On visionOS, SpatialCameraSource captures MV-HEVC (Multiview HEVC) to produce stereoscopic spatial video, viewable natively on Apple Vision Pro. MVHEVCEncoder handles encoding with appropriate layouts.

Architecture

The library is organized into clearly separated modules, each with a well-defined responsibility:

Tests

The test suite covers the entire library with unit, integration, end-to-end and showcase tests. Showcase tests illustrate recommended usage patterns.

Metric	Value
Tests	1,763
Test files	219
Showcase tests	8 suites (audio, video, encoding, outputs, streaming, presets, formats, errors)
E2E tests	2 suites (audio pipeline, streaming pipeline)
Integration tests	6 suites (lifecycle, routing, dynamic control, events)
CI/CD	GitHub Actions + CodeCov

Documentation

The DocC documentation covers 17 guides:

Guide	Content
Getting Started	First project with CaptureKit
Audio Capture	Audio sources, configuration, formats
Video Capture	Video sources, cameras, presets
Screen Capture	ScreenCaptureKit, ReplayKit, Broadcast Extension
Encoding	Audio/video codecs, configuration, presets
Outputs Guide	FileOutput, rotation, callbacks, preview
Streaming Guide	StreamingPipeline, modes, transport
Streaming Integration	RTMP, HLS, SRT, Icecast bridges
Metering	AudioMeter, peak/RMS, EBU R128 loudness, waveforms
Device Discovery	Device monitoring
Permissions Guide	PermissionManager, SwiftUI views
Formats Guide	SampleRate, ChannelLayout, VideoResolution, FrameRate
Presets Guide	25+ presets by category
Error Handling	CaptureError, 26 typed error cases
Platform Compatibility	macOS/iOS/visionOS differences
Testing Guide	Mocks, testing patterns
CaptureKit	Architecture overview

Under the hood

Metric	Value
Swift files (Sources)	213
Public types	196
Protocols	20+
Actors	40+
Audio sources	9
Video sources	10
Audio codecs	6
Video codecs	6
Outputs	8
Container formats	7
Capture presets	25+
DocC guides	17
Tests	1,763
Runtime dependencies	0

Ecosystem

CaptureKit is the capture building block of the Atelier Socle streaming ecosystem. It integrates with all four transport libraries via the StreamingTransport protocol:

HLSKit — HTTP Live Streaming, M3U8 manifests, fMP4 packaging, spatial video
RTMPKit — RTMP client and server, Enhanced RTMP v2, 10 platforms
SRTKit — Pure Swift SRT transport, AES encryption, FEC, bonding
IcecastKit — Icecast/SHOUTcast client, adaptive bitrate, multi-destination
CaptureKit (this library) — Unified media capture