CaptureKit

On Apple platforms, capturing audio and video seems straightforward on the surface. In practice, as soon as you move past the trivial case — microphone + M4A file — API fragmentation makes everything complex. AVCaptureSession for the camera, AudioEngine for the mic, ScreenCaptureKit for macOS screen capture, ReplayKit on iOS, VideoToolbox for H.264 encoding, AudioToolbox for AAC… Each component has its own API surface, its own threading constraints, its own buffer formats.

While building the Atelier Socle streaming ecosystem — HLSKit, RTMPKit, SRTKit, IcecastKit — the most fundamental building block was missing: capturing the signal. Every transport needed encoded audio buffers, compressed video frames, consistent permission management, level monitoring. And every app had to reimplement the same capture logic, with the same pitfalls.

CaptureKit unifies all of it. A single CaptureSession orchestrates sources, encoders and outputs. A single StreamingPipeline connects capture to any transport via a StreamingTransport protocol. All built with Swift 6.2 strict concurrency — actors, AsyncStream, Sendable everywhere — no Combine, no external dependencies, with native support for macOS 14, iOS 17 and visionOS 1.

What CaptureKit does

CaptureKit covers the entire capture path: from the hardware source to the encoded output. Audio and video sources, hardware encoders, file or streaming outputs, real-time metering, device discovery, permission management — all within a unified API.

  • 9 audio sourcesMicrophoneSource, SystemAudioSource, LineInSource, BluetoothAudioSource, AggregateAudioSource, FileAudioSource, VoIPAudioSource, ToneSource (programmable generator), SilenceSource

  • 10 video sourcesCameraSource, ExternalCameraSource, ScreenCaptureSource (ScreenCaptureKit + ReplayKit + Broadcast Extension), CinematicCameraSource, SpatialCameraSource (MV-HEVC visionOS), MultiCameraSource, FileVideoSource, TestPatternSource, BlackSource, ColorSource

  • 6 audio codecs — AAC (LC/HE v1/HE v2/ELD/xHE), ALAC, Opus, FLAC, MP3, PCM — with AudioToolbox hardware acceleration

  • 6 video codecs — H.264, HEVC, ProRes, AV1, MV-HEVC, JPEG — with VideoToolbox hardware acceleration

  • 8 outputsFileOutput (MP4, MOV, M4A, CAF, WAV, AIFF, FLAC with rotation), CallbackOutput, PixelBufferOutput, SampleBufferOutput, PreviewOutput, AudioPreviewOutput, TeeOutput (duplication), NullOutput

  • Transport-agnostic streamingStreamingPipeline with audioOnly, videoOnly and muxed modes, StreamingTransport protocol to plug in RTMP, HLS, SRT, Icecast or any other transport

  • Real-time meteringAudioMeter with peak/RMS, EBU R128 loudness (momentary, short-term, integrated), true peak, waveform data for visualization

  • 25+ presets — Twitch, YouTube, Facebook, Instagram, TikTok, podcast, radio, broadcast, screen recording, spatial video, archive, ProRes, Dolby Atmos

  • Device discoveryDeviceDiscovery with AsyncStream monitoring of audio and video connect/disconnect events

  • Permission management — unified PermissionManager for microphone, camera, screen recording with built-in SwiftUI views

  • Dynamic control — hot-swap sources, adjust bitrate, force keyframes, pause/resume during capture

  • Swift 6.2 strict concurrency — actors for all stateful types, Sendable everywhere, async/await end-to-end, zero Combine, zero dependencies

Quick start

Record microphone audio as AAC to an M4A file:

Installation

Via Swift Package Manager, add the dependency to your Package.swift:

Then add it to your target:

Supported platforms

PlatformMinimum versionSpecifics

macOS

14+

ScreenCaptureKit, aggregate devices, system audio, USB/Thunderbolt external cameras

iOS / iPadOS

17+

ReplayKit, Broadcast Extension, USB-C cameras

visionOS

1+

MV-HEVC spatial video, SpatialCameraSource

The Source → Encoder → Output architecture

CaptureKit is organized around three foundational protocols that compose freely. Each source produces an AsyncStream of raw buffers, each encoder transforms those buffers into compressed data, and each output consumes the encoded data. CaptureSession orchestrates everything.

TypeRole

CaptureSession

Main orchestrator (actor) — sources, encoders, outputs, state, events, statistics

AudioSource / VideoSource

Capture protocols — 9 audio implementations, 10 video

AudioEncoderProtocol / VideoEncoderProtocol

Encoding protocols — 7 audio codecs, 6 video codecs

CaptureOutput

Output protocol — 8 implementations (file, callback, preview, tee, null…)

StreamingPipeline

Unified capture → encode → send pipeline (actor)

StreamingTransport

Transport-agnostic protocol for live streaming

AudioMeter

Real-time peak/RMS/loudness metering (actor)

DeviceDiscovery

Audio/video device monitoring (actor)

PermissionManager

Unified permission handling (actor)

Audio capture

Nine audio sources cover every use case — from the built-in microphone to a programmable tone generator. Each source is configured with an AudioSourceConfiguration and produces an AsyncStream<AudioBuffer>. MicrophoneSource exposes automatic gain control, echo cancellation, noise suppression and voice isolation.

AudioSourceConfiguration provides ready-to-use presets:

PresetSample RateChannelsBit DepthUse case

.default

48 kHz

2

Float32

General purpose

.broadcast

48 kHz

2

Float32

Broadcast (10ms buffer)

.highResolution

96 kHz

2

Float32

Hi-Res audio

.voiceChat

16 kHz

1

Float32

VoIP / voice chat

.spatialAudio

48 kHz

2

Float32

Spatial audio

Video capture

CameraSource handles built-in and external cameras with full control — zoom, torch, focus mode, exposure, white balance, stabilization, depth data, and photo capture during recording.

Video configuration presets:

PresetResolutionFPSSpecifics

.default

1080p

30

SDR, BT.709

.broadcast720p

720p

30

Standard stabilization

.broadcast1080p60

1080p

60

Full HD broadcast

.pro4K

UHD 4K

30

HDR10, BT.2020

.cinematic

1080p

24

Display P3, cinematic stabilization

.spatialVideo

spatial

30

MV-HEVC visionOS

Screen capture

ScreenCaptureSource unifies three backends under a single API. On macOS, ScreenCaptureKit provides display, window, application or region capture. On iOS, ReplayKit captures the current app, while Broadcast Extension enables full system capture via an App Group.

Hardware encoding

CaptureKit exposes 12 codecs — 6 audio and 6 video — each implemented as an actor with configuration presets. Video encoders use VideoToolbox for hardware acceleration, audio encoders go through AudioToolbox.

CodecTypePresets

AAC (LC/HE v1/HE v2/ELD/xHE)

Audio

.voice (64k mono), .podcast (128k stereo), .musicHQ (256k VBR), .streamingLowBandwidth (HE 64k), .lowLatency (ELD 32k)

ALAC

Audio

Apple Lossless

Opus

Audio

Interactive codec

FLAC

Audio

Free Lossless

MP3

Audio

.standard (192k), .highQuality (320k), .webRadio (128k Icecast)

PCM / WAV

Audio

Uncompressed

H.264

Video

.streaming720p (Main 2.5Mbps), .streaming1080p (High 4.5Mbps), .lowLatency (Baseline CAVLC), .archive (High 20Mbps)

HEVC

Video

Main, Main 10 — HDR support

ProRes

Video

Proxy, LT, Standard, HQ, 4444, 4444 XQ

AV1

Video

Profile 0, 1, 2

MV-HEVC

Video

visionOS spatial video

JPEG

Video

Still image capture

File recording

FileOutput records to 7 container formats with automatic rotation by duration and/or size. Metadata (title, artist, album) is embedded in the file.

Live streaming

StreamingPipeline is the centerpiece for live. It orchestrates capture, encoding and delivery in a unified pipeline. Three operating modes — audio only, video only, or muxed audio+video. In muxed mode, the pipeline waits for the first video keyframe, extracts parameter sets (SPS/PPS for H.264, VPS/SPS/PPS for HEVC), sends them via sendConfiguration, then starts the interleaved audio+video stream with a shared monotonic clock.

The StreamingTransport protocol is deliberately minimal — four methods to implement:

Concrete implementations — RTMP, HLS, SRT, Icecast bridges — live in the consuming app, keeping CaptureKit transport-agnostic. The demo app includes working bridges for all four protocols in the Atelier Socle ecosystem.

RTMP integration

An RTMP bridge connects CaptureKit to RTMPKit for broadcasting to Twitch, YouTube, Facebook and other RTMP platforms:

Icecast integration

For audio streaming to Icecast/SHOUTcast, the IcecastKit bridge sends AAC or MP3 in audioOnly mode:

Audio metering

AudioMeter provides real-time audio levels with five analysis modes and waveform data for visualization. It accepts buffers from any source and exposes two AsyncStreams — one for levels, one for waveforms.

PresetModeWaveformUse case

.broadcast

.full

Professional broadcast

.podcast

.peakAndRMS

Podcast recording

.loudnessCompliance

.loudness

EBU R128 compliance

.voiceMessage

.peak

.message

Voice messages

.dawEditing

.full

.dawHighRes

DAW audio editing

Capture presets

25+ presets cover common use cases, from streaming platforms to professional workflows. Each preset automatically configures codec, bitrate, resolution and frame rate.

CategoryPresetsDetails

Streaming

twitch(), youtube(), facebook(), instagram(), tiktok()

H.264 + AAC, customizable resolution/FPS

Podcast

podcastAudio(), podcastAudioHQ(), podcastVideo(), podcastLossless()

AAC 128k or ALAC lossless

Radio

radioLive(), webRadioMP3(), webRadioOpus()

AAC, MP3 or Opus for web radio

Broadcast

broadcastHD(), broadcast4K(), proResRecording()

HEVC 1080p, UHD 4K, ProRes HQ

Spatial

spatialVideo(), dolbyAtmos()

MV-HEVC visionOS, immersive audio

Screen Recording

screenRecording(), screenRecording4K(), screenRecordingRetina()

Optimized screen capture

Low Bandwidth

lowBandwidth(), voiceOnly()

Limited connections

Archive

archiveLossless(), archive4K()

High-quality archival

Device discovery

DeviceDiscovery monitors audio and video device connections and disconnections in real time via an AsyncStream<DeviceChangeEvent>. Each device exposes its capabilities, connection type and supported formats.

Permission management

PermissionManager unifies permission checking and requesting for 6 resource types. Status changes are exposed via an AsyncStream<PermissionChange>:

Dynamic session control

During capture, the session allows hot-swapping sources, adjusting bitrates, forcing keyframes and monitoring 17 event types — from state changes to dropped frames to transport quality.

visionOS spatial video

On visionOS, SpatialCameraSource captures MV-HEVC (Multiview HEVC) to produce stereoscopic spatial video, viewable natively on Apple Vision Pro. MVHEVCEncoder handles encoding with appropriate layouts.

Architecture

The library is organized into clearly separated modules, each with a well-defined responsibility:

Tests

The test suite covers the entire library with unit, integration, end-to-end and showcase tests. Showcase tests illustrate recommended usage patterns.

MetricValue

Tests

1,763

Test files

219

Showcase tests

8 suites (audio, video, encoding, outputs, streaming, presets, formats, errors)

E2E tests

2 suites (audio pipeline, streaming pipeline)

Integration tests

6 suites (lifecycle, routing, dynamic control, events)

CI/CD

GitHub Actions + CodeCov

Documentation

The DocC documentation covers 17 guides:

GuideContent

Getting Started

First project with CaptureKit

Audio Capture

Audio sources, configuration, formats

Video Capture

Video sources, cameras, presets

Screen Capture

ScreenCaptureKit, ReplayKit, Broadcast Extension

Encoding

Audio/video codecs, configuration, presets

Outputs Guide

FileOutput, rotation, callbacks, preview

Streaming Guide

StreamingPipeline, modes, transport

Streaming Integration

RTMP, HLS, SRT, Icecast bridges

Metering

AudioMeter, peak/RMS, EBU R128 loudness, waveforms

Device Discovery

Device monitoring

Permissions Guide

PermissionManager, SwiftUI views

Formats Guide

SampleRate, ChannelLayout, VideoResolution, FrameRate

Presets Guide

25+ presets by category

Error Handling

CaptureError, 26 typed error cases

Platform Compatibility

macOS/iOS/visionOS differences

Testing Guide

Mocks, testing patterns

CaptureKit

Architecture overview

Under the hood

MetricValue

Swift files (Sources)

213

Public types

196

Protocols

20+

Actors

40+

Audio sources

9

Video sources

10

Audio codecs

6

Video codecs

6

Outputs

8

Container formats

7

Capture presets

25+

DocC guides

17

Tests

1,763

Runtime dependencies

0

Ecosystem

CaptureKit is the capture building block of the Atelier Socle streaming ecosystem. It integrates with all four transport libraries via the StreamingTransport protocol:

  • HLSKit — HTTP Live Streaming, M3U8 manifests, fMP4 packaging, spatial video

  • RTMPKit — RTMP client and server, Enhanced RTMP v2, 10 platforms

  • SRTKit — Pure Swift SRT transport, AES encryption, FEC, bonding

  • IcecastKit — Icecast/SHOUTcast client, adaptive bitrate, multi-destination

  • CaptureKit (this library) — Unified media capture

Links

GitHub - atelier-socle/swift-capture-kit: Unified media capture, encoding & streaming for Apple platforms — every source, every codec, zero dependencies. Pure Swift 6.2.

GitHub - atelier-socle/swift-capture-kit: Unified media capture, encoding & streaming for Apple platforms — every source, every codec, zero dependencies. Pure Swift 6.2.

Unified media capture, encoding & streaming for Apple platforms — every source, every codec, zero dependencies. Pure Swift 6.2. - atelier-socle/swift-…

GitHub