HLSKit

Published on February 24, 2026 · Updated March 12, 2026 · 43 min

Founder & Swift Tech Lead

Where It Comes From

When you listen to a podcast or watch a streaming video, there's a protocol behind it you never see: HLS. HTTP Live Streaming, invented by Apple in 2009, has become the de facto standard for audio and video delivery on the web. Your browser, your phone, your Apple TV — everything speaks HLS.

The principle is simple: you split a media file into small segments of a few seconds, write a manifest (an .m3u8 file) that lists these segments in order, and the player downloads them one by one via HTTP. No specialized server, no exotic protocol — just standard HTTP, with all its benefits: CDN, caching, HTTPS.

While working on PodcastFeedMaker and the server infrastructure PodcastFeedVapor, an obvious need emerged: handling the media files themselves. Not just RSS metadata, but the actual audio and video files — segmenting them, encoding them, encrypting them, validating them. All from pure Swift, with no external dependency in the lib's core, compatible with macOS, iOS, and Linux.

I looked for a Swift library that did this. None existed. FFmpeg wrappers, yes. HLS players, yes. But a complete pipeline — from manifest parsing to segment encryption through cloud transcoding — in native Swift? Nothing.

HLSKit was born from that void.

The first version covered the end-to-end VOD pipeline: parsing, generation, segmentation, transcoding, encryption. The second added cloud transcoding — Cloudflare, AWS, Mux. But the most ambitious piece was still missing: live. Not a placeholder, not a wrapper — a real real-time pipeline capable of taking an audio or video stream, encoding it, segmenting it, broadcasting it, and serving it in Low-Latency HLS with multi-destination push. That's what HLSKit 0.3.0 does.

0.4.0 pushes the ambition even further. The live pipeline becomes intelligent — real-time transport quality monitoring, automatic bitrate adaptation, multi-destination health dashboard. Stereoscopic MV-HEVC spatial video for Apple Vision Pro is natively supported. IMSC1 subtitles (W3C TTML) are parsed, rendered and segmented into fMP4. And EXT-X-DEFINE variable substitution enables CDN templating without modifying source playlists. This is HLSKit 0.4.0 — Transport Intelligence & Spatial Computing.

0.5.0 fixes a critical bug in the video fMP4 initialization segment — the pre_defined field of the VisualSampleEntry was 4 bytes instead of 2 (ISO 14496-12 §12.1.3), which shifted the avcC box and caused silent rejections by Safari, AVPlayer and FFmpeg. It also introduces an async segmentTransform API in IncrementalSegmenter, allowing actors to be used in segment transformation closures instead of manual locks.

0.6.0 brings full HEVC/H.265 support in the live fMP4 segmenter — hev1 sample entry with hvcC box (ISO 14496-15 §8.3), Video Parameter Set, profile and level parsed from SPS NALUs. B-frame support is added via compositionTimeOffset in EncodedFrame, for codecs where display order differs from decode order. Audio-video sync in VideoSegmenter is fixed through integer-based duration accumulation (Int64 ticks) instead of floating-point, eliminating drift errors that could cause TARGETDURATION to exceed the configured value. This is HLSKit 0.6.0 — HEVC & Precision.

What HLS Actually Is

Before diving into the lib, a quick detour on the protocol for those who've never touched HLS. If you already know it, skip to the next section.

HLS works in three stages. First, you take a media file (MP4, MOV, audio…) and segment it — you split it into 4 to 10 second chunks. Each chunk is an independent file, downloadable via HTTP.

Next, you write a manifest — a text file with the .m3u8 extension — that lists the segments in order with their duration. It's the player's roadmap.

Finally, for multi-quality (adaptive bitrate), you create a master playlist that points to multiple media playlists: one for 360p, one for 720p, one for 1080p. The player automatically chooses the quality based on available bandwidth.

In live, the principle is the same — but segments are produced in real time instead of being pre-split. The manifest is updated with each new segment, and old ones exit a sliding window. Low-Latency HLS goes even further: segments are split into smaller parts and the player can request them before the complete segment is even finished.

This 6-line text file is an HLS master playlist. Each STREAM-INF describes a variant: bandwidth, resolution, codecs. The player reads this file, evaluates its connection, and loads the right playlist. It's that simple — and that complex when you want to do it properly, because RFC 8216 that specifies HLS spans 50 pages of very precise rules.

What HLSKit Does

HLSKit covers the HLS pipeline end to end — VOD and live. Not just one piece — the entire path, from source file to encrypted segment ready to be served, or from microphone stream to CDN in real time. All in pure Swift, Sendable end to end, with zero external dependencies in the core lib.

Parser — Complete M3U8 manifest reading with typed models for master playlists, media playlists, variants, segments, and all Low-Latency HLS extensions
Generator — Spec-compliant manifest production, with an imperative API and a @resultBuilder DSL for declarative playlist building
Validator — Conformance checking against RFC 8216 and Apple HLS rules, with 3 severity levels: error, warning, info
fMP4 Segmenter — MP4 file segmentation into fragmented MP4 segments with initialization segment and auto-generated playlist, H.264 (avc1/avcC) and HEVC (hev1/hvcC) support
MPEG-TS Segmenter — Segmentation into MPEG-TS segments for compatibility with legacy players
Byte-range — Byte-range segmentation mode: one file, multiple logical segments
Apple Transcoding — Hardware-accelerated encoding via Apple VideoToolbox (macOS, iOS)
FFmpeg Transcoding — Cross-platform transcoding with quality presets and multi-variant output
Cloud Transcoding — Delegation to Cloudflare Stream, AWS MediaConvert, or Mux — same Transcoder protocol, zero local GPU required
AES-128 Encryption — Full segment encryption in AES-128-CBC with key rotation
SAMPLE-AES Encryption — Sample-level encryption for video NAL units and ADTS audio frames
Key Management — Key generation, IV derivation (RFC 8216), and key file I/O
MP4 Inspection — MP4 box reading, track analysis, sample table parsing
Live Pipeline — Complete real-time stream orchestration: source → encoding → segmentation → playlist → push
Low-Latency HLS — Partial segments, CAN-BLOCK-RELOAD, CAN-SKIP-UNTIL, delta updates, EXT-X-PRELOAD-HINT
Multi-destination Push — Segment delivery to one or more HTTP endpoints, with DI transport support for RTMP, SRT and Icecast
Live Metadata — Real-time injection of SCTE-35, DATE-RANGE, ID3, HLS interstitials
Live Recording — Simultaneous recording during streaming, live-to-VOD conversion with automatic chaptering
I-Frame Playlists — EXT-X-I-FRAMES-ONLY playlist generation for trick play and thumbnails
Audio Processing — Format conversion, LUFS loudness measurement, silence detection, channel mixing, normalization
Spatial Audio — Dolby Atmos, AC-3, E-AC-3, multi-channel 5.1/7.1.4, Hi-Res audio 96/192 kHz
HDR & Ultra-resolution — HDR10, Dolby Vision, HLG, VIDEO-RANGE signaling, 4K/8K support
Live DRM — FairPlay Streaming with per-segment key rotation, session keys
Accessibility — CEA-608/708 closed captions, live WebVTT subtitles, audio description
Resilience — Redundant streams, content steering, gap signaling, automatic failover
CLI — 10 command-line commands for common HLS workflows, including live, iframe, imsc1 and mvhevc
Strict concurrency — All public types are Sendable, Swift 6.2 strict concurrency throughout
Transport Contract v2 — QualityAwareTransport, AdaptiveBitrateTransport, RecordingTransport — quality monitoring, automatic ABR, multi-destination health dashboard
MV-HEVC Spatial Video — Stereoscopic packaging for Apple Vision Pro with MVHEVCPackager, Dolby Vision Profile 8/20, REQ-VIDEO-LAYOUT
IMSC1 Subtitles — W3C TTML parsing, rendering, fMP4 segmentation with IMSC1Parser, IMSC1Renderer, IMSC1Segmenter
Variable Substitution — EXT-X-DEFINE with NAME/VALUE, IMPORT, QUERYPARAM for CDN templating
Video Projections — REQ-VIDEO-LAYOUT with 360°, 180°, Apple Immersive Video via VideoLayoutDescriptor

The VOD Pipeline in 30 Seconds

HLSEngine is the facade that orchestrates VOD operations. In a few lines, you parse a manifest, validate its conformance, and segment a file. No complex config, no boilerplate:

Three operations, three lines of code each. The parser returns typed Swift models. The validator returns a structured report. The segmenter returns the segments, the generated playlist, and the initialization segment. No magic strings, no casting — idiomatic Swift.

The Live Pipeline

This is the big addition in 0.3.0. LivePipeline orchestrates a complete real-time stream: a media source (microphone, camera, file) feeds an encoder, which produces encoded frames, which are segmented on the fly, assembled into a continuously updated playlist, and pushed to one or more destinations.

The whole thing is composable. Each pipeline stage is an independent component injected via LivePipelineComponents — you assemble exactly what you need.

The pipeline emits events (LivePipelineEvent) and real-time statistics (LivePipelineStatistics) via AsyncStream. You can monitor the number of segments produced, actual bitrate, buffer health, encoded frames per second — all in real time, without polling.

The state machine (LivePipelineState) manages transitions: idle → starting → running → stopping → stopped. A LivePipelineSummary is produced on stop with the total duration, segment count, and bytes written.

Live Presets

To simplify getting started, LivePipelineConfiguration offers preconfigured presets for common use cases. Each preset configures encoding, segmentation, playlist, and scenario-specific options:

Audio:

Preset	Description
`.podcastLive`	AAC 128 kbps, 48 kHz stereo, 6s MPEG-TS, sliding window (5), -16 LUFS
`.webradio`	AAC 256 kbps, 4s fMP4, sliding window (8), LL-HLS with 1s parts
`.djMix`	AAC 320 kbps, 4s fMP4, event playlist (full set replay), recording
`.djMixWithDVR`	AAC 320 kbps, 4s fMP4, sliding window (10), DVR 6h, recording
`.lowBandwidth`	AAC 48 kbps mono 22 kHz, 10s MPEG-TS, sliding window (3) — voice over weak connections
`.applePodcastLive`	AAC 128 kbps, 6s fMP4, sliding window (6), -16 LUFS, PROGRAM-DATE-TIME
`.broadcast`	AAC 192 kbps, 6s fMP4, -23 LUFS (EBU R 128), DVR 2h, recording
`.eventRecording`	AAC 128 kbps, 6s fMP4, event playlist (no segment eviction), recording

Video:

Preset	Description
`.videoLive`	1920×1080 30fps 4 Mbps + AAC 128 kbps, 6s fMP4, LL-HLS (0.5s parts)
`.lowLatencyVideo`	1280×720 30fps 2 Mbps, 4s fMP4, full LL-HLS (0.33s, preload hints, delta, blocking)
`.videoSimulcast`	1920×1080 30fps 4 Mbps, 6s fMP4, sliding window (5) — add destinations via pipeline
`.video4K`	3840×2160 30fps 15 Mbps + AAC 192 kbps, 6s fMP4, LL-HLS (0.5s parts)
`.video4KLowLatency`	3840×2160 30fps 15 Mbps, 4s fMP4, full LL-HLS (0.33s, preload, delta, blocking)
`.podcastVideo`	1280×720 30fps 1.5 Mbps, 6s fMP4, -16 LUFS, recording — interviews, talking heads
`.videoLiveWithDVR`	1920×1080 30fps 4 Mbps, 6s fMP4, DVR 4h, LL-HLS (0.5s), recording
`.conferenceStream`	1280×720 15fps 1 Mbps + AAC 96 kbps, 6s fMP4, event playlist, recording

Pro — Spatial Audio, HDR, DRM, Accessibility:

Preset	Description
`.spatialAudioLive`	AAC 128 kbps + E-AC-3 384 kbps Dolby Atmos 5.1, stereo fallback
`.hiResLive`	AAC 256 kbps + ALAC lossless 96 kHz/24-bit
`.videoHDR`	1920×1080 HDR10, HEVC Main10, SDR fallback
`.videoDolbyVision`	3840×2160 Dolby Vision Profile 8, HEVC
`.video8K`	7680×4320 HEVC Main10
`.drmProtectedLive`	FairPlay CBCS, key rotation every 10 segments
`.multiDRMLive`	FairPlay + Widevine + PlayReady, rotation every 10 segments
`.spatialVideo()`	MV-HEVC stereoscopic for Apple Vision Pro, 1080p (or 4K with `dolbyVision: true`)
`.accessibleLive`	CEA-708 EN/ES + audio description EN + WebVTT
`.broadcastPro`	Atmos 5.1 + Dolby Vision 4K + FairPlay + CEA-708 (EN/ES/FR) + audio desc + recording

Low-Latency HLS

Classic HLS has a latency of 15 to 30 seconds — the time to accumulate several complete segments. Low-Latency HLS (LL-HLS) reduces this to under 2 seconds by splitting segments into smaller parts (PartialSegment) and allowing the player to request them before the segment is complete.

HLSKit implements the full LL-HLS spec. LLHLSManager orchestrates the production of partial segments during production and announces server capabilities (CAN-BLOCK-RELOAD, PART-HOLD-BACK). BlockingPlaylistHandler handles CAN-BLOCK-RELOAD — the player makes a "blocking" request and the server only responds when the next piece is ready. DeltaUpdateGenerator produces delta playlists (EXT-X-SKIP) to reduce refresh bandwidth.

The beauty of the system is that LL-HLS is an optional add-on to the existing live pipeline. You enable low-latency components via LivePipelineComponents, and the pipeline handles the rest — partial segments, preload hints, server control, everything is coordinated automatically.

Multi-destination Push

A live stream is useless if it stays on your disk. SegmentPusher sends segments to one or more destinations as they are produced. HTTPPusher does HTTP PUT to any endpoint — a CDN, an Nginx server, an S3 bucket.

For more advanced cases, the architecture is open by design. The RTMPTransport, SRTTransport, and IcecastTransport protocols are defined in HLSKit, but concrete implementations are provided by separate libs — swift-rtmp-kit, swift-srt-kit, swift-icecast-kit. It's clean dependency injection: HLSKit stays zero-dependency, your app imports the transports it needs.

MultiDestinationPusher handles parallel delivery. BandwidthMonitor tracks actual bandwidth per destination. If one destination fails, the others continue — no single point of failure.

Real-time Metadata

A live stream without metadata is a blind pipe. HLSKit injects metadata into the stream during production — without interrupting the pipeline.

SCTE-35 for ad breaks: SCTE35Marker inserts EXT-X-CUE-OUT / EXT-X-CUE-IN signals at precise moments. DATE-RANGE for temporal events — chapter start, program change, news alert. ID3 for timed metadata — title, artist, album art synchronized with audio. Interstitials for HLS breaks — ad pause or inserted content with EXT-X-ASSET-URI and resume configuration.

Recording and Live-to-VOD

While a live stream is being broadcast, SimultaneousRecorder records everything in parallel — segments and metadata — to a local directory. When the stream stops, LiveToVODConverter transforms the recording into a standard VOD playlist with automatic chapters.

Chaptering is handled by AutoChapterGenerator — SCTE-35 cut points or DATE-RANGEs become chapters in the VOD version. Your 2-hour live stream becomes a chaptered VOD file in a single operation.

I-Frame Playlists

IFramePlaylistGenerator produces EXT-X-I-FRAMES-ONLY playlists — playlists containing only keyframes, used for trick play (fast forward, rewind) and thumbnail generation. ThumbnailExtractor extracts thumbnails at regular intervals for player visual timelines.

Audio Processing

The audio module offers five indispensable tools for professional broadcast streams.

AudioFormatConverter converts between formats: MP3 → M4A, WAV → AAC, FLAC → ALAC. LoudnessMeter measures integrated loudness in LUFS according to the EBU R128 standard — essential for broadcast compliance. SilenceDetector identifies silence ranges in an audio stream, useful for automatic chaptering or signal loss detection in live. ChannelMixer handles channel mixing — stereo to mono, 5.1 to stereo, automatic upmix. And AudioNormalizer applies loudness normalization to target standards (-16 LUFS for podcasts, -23 LUFS for broadcast EBU R128).

Spatial Audio, HDR and Hi-Res

HLSKit speaks the language of professional formats.

On the audio side: Dolby Atmos (via Dolby Digital Plus JOC), AC-3, E-AC-3, multi-channel 5.1 and 7.1.4, Hi-Res audio at 96 or 192 kHz, 24 or 32 bits, with ALAC and FLAC lossless. Generated HLS manifests include CHANNELS attributes and properly configured alternative audio renditions.

On the video side: HDR10, Dolby Vision, HLG, with VIDEO-RANGE signaling in manifests. Resolution support goes up to 8K, with CODECS attributes that precisely reflect the encoded profiles. Everything is transparent to the pipeline — you configure capabilities in LivePipelineConfiguration, and manifests are generated with the right attributes.

Live DRM

For protected live content, HLSKit implements FairPlay Streaming integration with per-segment key rotation. Each segment can have its own key, and EncryptionKey allows the player to download the key only once for an entire session. The DRM + LL-HLS combination is supported — partial segments inherit the key from their parent segment.

Accessibility

Accessibility is not a bonus — it's a legal obligation for many broadcasters. HLSKit generates CLOSED-CAPTIONS tags for CEA-608 and CEA-708, live WebVTT subtitle tracks, and audio description renditions. Produced manifests are compliant with Apple's accessibility requirements for App Store distribution.

Transport Contract v2

Major addition in 0.4.0: the live pipeline is now aware of its transport quality. Three protocols define the contract between HLSKit and the transport layers — QualityAwareTransport for real-time quality monitoring, AdaptiveBitrateTransport for ABR recommendations, and RecordingTransport for transport-side local recording.

TransportAwarePipelinePolicy configures the pipeline's behavior in response to transport signals: automatic bitrate adjustment, minimum quality threshold, and ABR responsiveness. TransportHealthDashboard aggregates health across all destinations in real time — healthy, degraded, and failed destination counts, with a worst-case overall grade.

Five quality levels (excellent, good, fair, poor, critical) are derived from the transport score (0.0 to 1.0). Three ABR responsiveness levels are available: conservative (3 consecutive recommendations before adjustment), responsive (2) and immediate (1). Companion transports — swift-rtmp-kit, swift-srt-kit, swift-icecast-kit — implement these protocols with their native metrics: RTT and packet loss for SRT (with SMPTE 2022 FEC and multi-link bonding), stream statistics for Icecast (with 6 authentication modes including digest, bearer and shoutcastV2), enhanced RTMP capabilities for RTMP. On the capture side, swift-capture-kit provides a transport-agnostic StreamingPipeline that feeds these transports with hardware-encoded audio and video.

MV-HEVC Spatial Video

Apple Vision Pro speaks a specific video language: MV-HEVC (Multi-View HEVC), a stereoscopic format where left and right views are encoded in a single HEVC stream with multiview extensions. HLSKit 0.4.0 natively supports this format — from HEVC sample packaging to HLS manifest signaling.

MVHEVCSampleProcessor extracts NAL units from an HEVC stream, identifies parameter sets (VPS, SPS, PPS), and parses SPS profiles. MVHEVCPackager creates fMP4 segments — init segment with spatial boxes, media segments with properly encapsulated samples. SpatialVideoConfiguration provides presets: visionProStandard (1080p stereo), visionProHighQuality (4K stereo), dolbyVisionStereo (4K Dolby Vision Profile 20).

IMSC1 Subtitles

IMSC1 (Internet Media Subtitles and Captions) is the W3C profile of TTML used for subtitles in professional HLS workflows. HLSKit 0.4.0 implements the complete pipeline: TTML XML parsing, rendering, and fMP4 segmentation for HLS delivery.

IMSC1Parser parses a TTML document and returns an IMSC1Document with typed subtitles (begin/end/text), percentage-positioned regions, and styles (font, size, color, alignment, outline). IMSC1Renderer serializes the document back to valid TTML. IMSC1Segmenter produces fMP4 segments — init segment with subtitle track metadata, media segments with encapsulated temporal cues.

The parser handles timecodes in HH:MM:SS.mmm and HH:MM:SS:FF formats. Parsing errors are typed: invalidXML, missingTTElement, invalidTimecode, missingLanguage — no silent crashes, no partial data.

Variable Substitution

HLS variable substitution (EXT-X-DEFINE) allows templating manifests without modifying them for each deployment. It's the standard mechanism for CDN templating — one source manifest, values injected at runtime.

HLSKit supports all three forms defined by the spec: NAME/VALUE (inline definition), IMPORT (import from a parent manifest), and QUERYPARAM (extraction from a URL parameter). The VariableResolver resolves {$variable} references in URIs and attributes, with a strict mode that detects undefined variables.

The HLS validator automatically checks variables: undefined references, duplicate definitions, IMPORT without multivariant context. Manifest generation includes EXT-X-DEFINE tags when VariableDefinition objects are attached to the playlist.

Video Projections

The REQ-VIDEO-LAYOUT tag signals to the HLS player the type of video content and its projection. It's essential for spatial video (Apple Vision Pro) and immersive video (360°, 180°).

VideoLayoutDescriptor combines a VideoChannelLayout (mono, stereo) and a VideoProjection (rectilinear, equirectangular, half-equirectangular, Apple Immersive Video). Presets cover common cases:

Preset	REQ-VIDEO-LAYOUT Value	Usage
`.stereo`	`CH-STEREO`	Standard stereoscopic video
`.mono`	`CH-MONO`	Classic 2D video
`.video360`	`PROJ-EQUI`	360° equirectangular video
`.immersive180`	`CH-STEREO,PROJ-HEQU`	180° stereo for Apple Vision Pro
`.appleImmersive`	`CH-STEREO,PROJ-AIV`	Apple Immersive Video

Parsing and Generating Manifests

The parser reads any M3U8 manifest — master playlists, media playlists, and all HLS v7+ extensions including Low-Latency HLS. It returns a Manifest that is either .master(MasterPlaylist) or .media(MediaPlaylist). Every variant, every segment, every tag is modeled by a typed Swift struct.

For generation, two approaches. The imperative API if you're building playlists dynamically:

And the @resultBuilder DSL if you prefer declarative syntax:

Both approaches produce spec-compliant M3U8. The generator handles formatting, tags, attributes, and all the serialization details that RFC 8216 mandates.

RFC 8216 Validation

A manifest can be syntactically valid but semantically wrong. HLSValidator checks playlists against two rule sets: RFC 8216 (the IETF standard) and Apple HLS rules (stricter on certain points). Each violation is classified as error, warning, or info.

The structured report tells you exactly what's wrong and why. No generic "invalid playlist" message — each rule has an identifier, a message, and a severity level. You know what's blocking before deploying, not after.

Segmentation

This is the core of the HLS pipeline: taking an MP4 file and splitting it into segments ready to be served. HLSKit offers two output formats — each with its own advantages.

Fragmented MP4 (fMP4) is the modern format recommended by Apple. Each segment is an independent MP4 fragment, preceded by an initialization segment (init.mp4) containing the track metadata. It's the most efficient format, the one used by modern CDNs.

MPEG-TS is the historical format. Each segment is self-contained with its own metadata — heavier, but compatible with absolutely all players, including the oldest ones.

A third mode, byte-range, allows splitting a file into logical segments without physically duplicating it. Segments are byte ranges within a single file — useful when storage is constrained.

In live, two additional segmenters handle real-time: AudioSegmenter for pure audio streams and VideoSegmenter for video. They consume encoded frames on the fly and produce segments as soon as the target duration is reached, without waiting for the stream to end.

The segmenter reads the MP4 file at the box level (the structural blocks of the ISO BMFF format), analyzes the sample tables to find optimal cut points (keyframes), and produces segments aligned on sample boundaries. No audio glitches, no missing frames — the cut is surgical.

For non-ISOBMFF formats (MP3, WAV, FLAC), HLSKit automatically detects the incompatibility and transcodes the file to M4A before segmenting — total transparency for the caller.

HEVC fMP4 & B-Frames (0.6.0)

The live fMP4 segmenter (CMAFWriter) now supports HEVC/H.265 natively. When the VideoConfig codec is .h265, the initialization segment generates an hev1 sample entry with an hvcC box conforming to ISO 14496-15 §8.3 — profile, tier, level, and the three NALU arrays (VPS, SPS, PPS) parsed automatically from the stream parameters.

B-frame support is added via compositionTimeOffset in EncodedFrame. Codecs that reorder frames (HEVC, H.264 High Profile) produce frames whose decode order differs from display order — the compositionTimeOffset (PTS − DTS) encodes this difference in each trun box of the fMP4 segment. When nil, PTS == DTS is assumed (no reordering).

Duration accumulation in the live segmenter now uses integers (Int64 ticks + Int32 timescale) instead of Double. When the timescale is constant — which is the case 99% of the time in live — addition is done in integer ticks, and conversion to seconds only happens for comparison with the target duration. Result: no more floating-point drift that could cause TARGETDURATION to exceed the configured value after several hours of streaming.

Async segmentTransform (0.5.0)

IncrementalSegmenter now accepts an async segmentTransform closure. Before 0.5.0, this closure was synchronous — forcing the use of locks (NSLock, @unchecked Sendable) for muxing operations that touch actors. The closure is now @Sendable (LiveSegment, [EncodedFrame]) async -> LiveSegment, allowing direct actor calls. Synchronous closures are still accepted — the migration is non-breaking.

Local Transcoding

Segmenting a file is good. But often, you also need to encode it — change the resolution, codec, bitrate. HLSKit offers two local transcoders, both conforming to the Transcoder protocol.

AppleTranscoder uses VideoToolbox, Apple's low-level framework for hardware encoding. It leverages your Mac's or iPhone's dedicated chips to encode H.264/HEVC much faster than software — with minimal power consumption. Available only on Apple platforms.

FFmpegTranscoder wraps the FFmpeg binary installed on the system. Cross-platform, Linux-compatible, supports virtually all existing codecs. It's the default server choice.

Both implement the same Transcoder protocol. Your calling code doesn't change — you can switch between Apple and FFmpeg by changing a single line. Quality presets (.p360, .p480, .p720, .p1080, .p2160, .audioOnly) are shared.

For multi-variant (adaptive bitrate), transcodeVariants() encodes to multiple qualities in a single pass and generates the master playlist automatically.

Automatic source content detection (video or audio-only) adjusts the preset accordingly — no manual configuration needed.

Cloud Transcoding

On a server, you have neither GPU nor FFmpeg. Installing FFmpeg on a minimal cloud instance is possible but not always desirable — it complicates deployment, consumes CPU, and doesn't scale.

ManagedTranscoder solves this problem by delegating transcoding to a cloud service. Cloudflare Stream, AWS MediaConvert, or Mux — you choose the provider, the lib handles everything: uploading the source file, creating the job, polling the status, downloading the result.

Most importantly: ManagedTranscoder implements the same Transcoder protocol as local transcoders. Your calling code doesn't know — and doesn't need to know — whether transcoding happens on your machine or in a datacenter 5,000 km away.

Provider	Authentication	Ideal For
Cloudflare Stream	API token (Bearer)	Zero egress bandwidth cost, global CDN
AWS MediaConvert	Access key + secret (SigV4)	Enterprise, existing AWS infrastructure
Mux	Token ID + secret (Basic Auth)	Simplest API, automatic adaptive bitrate

Streaming Upload/Download

A 500 MB video file — you don't want to load it all into RAM at once. Streamed upload and download send and receive data directly from disk, never loading the complete file into memory. The progress callback reports progress granularly through 5 phases: upload (0-30%), job creation (30%), polling (30-80%), download (80-95%), complete (100%).

Job Lifecycle

Under the hood, each cloud transcoding operation follows a precise lifecycle: queued → processing → completed | failed | cancelled. ManagedTranscodingJob encapsulates this state with a jobID, an assetID, encoding progress, output URLs when complete, and an error message when not.

Polling is configurable: interval between checks (pollingInterval, default 5 seconds) and global timeout (timeout, default 1 hour). By default, cloud assets are deleted after download (cleanupAfterDownload = true) to avoid residual storage costs.

Encryption

HLS supports two encryption modes to protect content — and HLSKit implements both.

AES-128 encrypts each segment entirely with AES-128-CBC. It's the standard mode, supported by all players. Simple, robust, with possible key rotation per segment.

SAMPLE-AES is finer-grained: it encrypts at the individual sample level — NAL units for H.264 video, ADTS frames for AAC audio. The container remains readable (headers, metadata), only the media content is encrypted. This is the mode used by DRMs like FairPlay.

KeyManager generates cryptographically secure AES-128 keys and derives IVs according to RFC 8216 (segment sequence number in big-endian on 16 bytes). It also handles reading and writing key files — the 16-byte binary file that the player downloads to decrypt segments.

MP4 Inspection

Before segmenting or transcoding a file, you need to know what it contains. MP4BoxReader parses the ISO BMFF container structure — the famous "boxes" (or "atoms" in QuickTime vocabulary) that organize an MP4 file.

MP4InfoParser extracts useful information: audio/video tracks, codecs, resolution, duration, bitrate, timescale. All without decoding the media — just reading the container metadata.

Going deeper, SampleTableParser reads the container's sample tables: the tables that map each sample (video frame, audio packet) to its position in the file, its timestamp, and its size. It's thanks to these tables that the segmenter knows exactly where to cut — on a keyframe, without breaking the timeline.

The CLI

10 commands for common HLS workflows, directly from the terminal. Handy for scripting, CI/CD, or simply testing quickly without writing code.

Command	What It Does
`info`	Inspect an MP4 or M3U8 file (tracks, codec, duration, segments)
`segment`	Split an MP4 into fMP4 or MPEG-TS segments
`transcode`	Transcode to one or more HLS variants
`validate`	Validate a manifest against RFC 8216 and Apple rules
`encrypt`	Encrypt HLS segments with AES-128 or SAMPLE-AES
`manifest`	Parse or generate M3U8 manifests (2 subcommands)
`live`	Full live pipeline: start, stop, stats, convert-to-vod, metadata
`iframe`	Generate I-Frame playlists for trick play and thumbnails
`imsc1`	IMSC1 subtitle pipeline: parse, render, segment (3 subcommands)
`mvhevc`	MV-HEVC spatial video: package, info (2 subcommands)

Architecture

The lib is split into functional modules, each responsible for one stage of the pipeline. Zero external dependencies in the core HLSKit — only the CLI uses swift-argument-parser for argument parsing.

Each module can be used independently. Only need the parser? Import HLSKit and use ManifestParser. Need to segment without transcoding? MP4Segmenter is standalone. Need a live audio pipeline? Assemble AudioEncoder + AudioSegmenter + SlidingWindowPlaylist + HTTPPusher in a LivePipeline. Everything is tied together by two facades: HLSEngine for VOD, and LivePipeline for real-time.

Tests

5,165 tests across 617 suites — models, parser, generator, validator, segmenter, transcoder, encryption, container, transport, engine, encoder, live segmenter, live playlist, LL-HLS, push, metadata, recording, I-Frame, audio, spatial audio, HDR, DRM, accessibility, resilience, pipeline, subtitles, spatial video, CLI, end-to-end. 94.66% coverage. Zero XCTest — 100% Swift Testing (import Testing).

Showcase tests serve as executable documentation: every public API has at least one test showing how to use it. The code examples in DocC and in this article are taken from these tests — what's written here has been compiled and executed.

Category	Focus
Model	Type conformances, Codable round-trip, HLS models
Parser	Master/media playlists, LL-HLS, byte-range, encryption tags
Generator	M3U8 output, builder DSL, tag writing
Validator	RFC 8216, Apple HLS rules, severity levels
Segmenter	fMP4, MPEG-TS, byte-range, config, playlist generation
Transcoder	Quality presets, Apple/FFmpeg/Managed availability, multi-variant
Encryption	AES-128, SAMPLE-AES, key management, round-trip
Container	MP4 box reading, sample tables, init/media segment writing
Transport	TS packets, PAT/PMT, PES, ADTS/AnnexB conversion
Engine	HLSEngine facade, segmentation, encryption, manifest operations
Encoder	Real-time AAC AudioEncoder, H.264/HEVC VideoEncoder
Live Segmenter	AudioSegmenter, VideoSegmenter, CMAFWriter
Live Playlist	SlidingWindowPlaylist, EventPlaylist, DVRBuffer
Low-Latency	LLHLSManager, BlockingPlaylistHandler, DeltaUpdateGenerator
Push	HTTPPusher, MultiDestinationPusher, BandwidthMonitor
Metadata	SCTE-35, DATE-RANGE, ID3, interstitials
Recording	SimultaneousRecorder, LiveToVODConverter, AutoChapterGenerator
I-Frame	IFramePlaylistGenerator, ThumbnailExtractor
Audio	AudioFormatConverter, LoudnessMeter, SilenceDetector
Spatial Audio	Dolby Atmos, multi-channel, Hi-Res, renditions
HDR	HDR10, Dolby Vision, HLG, VIDEO-RANGE, ultra-resolution
DRM	FairPlay live, key rotation, session keys
Accessibility	CEA-608/708, WebVTT, audio description
Resilience	Redundant streams, content steering, gap signaling
Subtitles	IMSC1Parser, IMSC1Renderer, IMSC1Segmenter, TTML round-trip
Spatial	MVHEVCPackager, MVHEVCSampleProcessor, SpatialVideoConfiguration
Pipeline	LivePipeline, components, presets, statistics, transport monitoring
Showcase	Public API demonstrations (executable documentation)
CLI	10 commands, argument parsing, integration
EndToEnd	Cross-feature integration scenarios

Installation

Requirements

Swift 6.2+ with strict concurrency
Lib: macOS 14+, iOS 17+, tvOS 17+, watchOS 10+, visionOS 1+
CLI: macOS 14+, Linux (Ubuntu 22.04+)
Zero external dependencies in the core lib (swift-argument-parser for CLI only)

Swift Package Manager

Documentation

Complete documentation is available in DocC, integrated with the package. 33 guides cover each pipeline stage — VOD, live, spatial video, subtitles, transport v2 — with executable examples.

Guide	Content
Getting Started	Installation, first workflow, builder DSL, live example
Manifest Parsing	ManifestParser, TagParser, AttributeParser, error handling
Manifest Generation	ManifestGenerator, TagWriter, builder DSL, LL-HLS models
Validating Manifests	HLSValidator, rule sets, severity levels, reports
Segmenting Media	MP4Segmenter, TSSegmenter, byte-range, auto-transcode non-ISOBMFF
Transcoding Media	Quality presets, Apple/FFmpeg transcoders, multi-variant, auto detection
Cloud Transcoding	ManagedTranscoder, Cloudflare/AWS/Mux providers, streaming upload
Encrypting Segments	AES-128, SAMPLE-AES, KeyManager, key rotation
HLSEngine	High-level facade for end-to-end VOD workflows
CLI Reference	10 commands with options, examples, JSON config
Live Streaming	Live pipeline overview, architecture, use cases
Live Encoding	MediaSource, AudioEncoder, VideoEncoder, MultiBitrateEncoder
Live Segmentation	LiveSegmenter, AudioSegmenter, VideoSegmenter, CMAFWriter
Live Playlists	LivePlaylistManager, DVRBuffer, sliding window, event playlist
Low-Latency HLS	LLHLSManager, BlockingPlaylistHandler, DeltaUpdateGenerator, partial segments
Segment Pushing	HTTPPusher, multi-destination, transport DI (RTMP/SRT/Icecast)
Live Metadata	SCTE-35, DATE-RANGE, ID3, interstitials, real-time injection
Live Recording	SimultaneousRecorder, live-to-VOD, automatic chaptering
I-Frame Playlists	IFramePlaylistGenerator, ThumbnailExtractor, trick play
Audio Processing	Format conversion, LUFS loudness, silence detection
Spatial Audio	Dolby Atmos, AC-3, multi-channel, Hi-Res 96/192 kHz
HDR Video	HDR10, Dolby Vision, HLG, VIDEO-RANGE, 4K/8K
Live DRM	FairPlay live, key rotation, session keys
Accessibility & Resilience	CEA-608/708, WebVTT, failover, gap signaling
Live Presets	LivePipeline presets, configuration, statistics, lifecycle
Transport Contracts v2	Quality monitoring, ABR, RTMP/SRT/Icecast v2
Transport-Aware Pipeline	Pipeline integration with transport quality
Variable Substitution	EXT-X-DEFINE, CDN templating, validation
IMSC1 Subtitles Guide	TTML parse, render, fMP4 segmentation
Spatial Video Guide	MV-HEVC packaging for Apple Vision Pro
Video Projection Specifiers	REQ-VIDEO-LAYOUT, 360°, Apple Immersive
Testing Guide	Test suite, mock server, CLI scenarios

References

The specifications and standards HLSKit builds upon — 31 in total:

Under the Hood

Swift 6.2 — Strict concurrency, all types Sendable, cross-platform thread-safe with LockedState<T>
5,165 tests — 617 suites, 94.66% overall coverage, 100% Swift Testing
33 DocC articles — Complete documentation with executable examples for VOD, live, spatial video and subtitles
6 platforms — macOS, iOS, tvOS, watchOS, visionOS, Linux
~30 modules — Parsing, generation, validation, segmentation, transcoding, encryption, live pipeline, IMSC1 subtitles, MV-HEVC spatial video, transport v2
31 industry standards — RFC 8216, LL-HLS, CMAF, SCTE-35, Dolby Atmos, HDR10, FairPlay, CEA-608/708, EBU R128, W3C TTML/IMSC1…
Zero dependencies — Pure Swift + Foundation in the core
Apache 2.0 — Permissive open-source license with SPDX headers