HLSKit
Published on · Updated · 43 min
Where It Comes From
When you listen to a podcast or watch a streaming video, there's a protocol behind it you never see: HLS. HTTP Live Streaming, invented by Apple in 2009, has become the de facto standard for audio and video delivery on the web. Your browser, your phone, your Apple TV — everything speaks HLS.
The principle is simple: you split a media file into small segments of a few seconds, write a manifest (an .m3u8 file) that lists these segments in order, and the player downloads them one by one via HTTP. No specialized server, no exotic protocol — just standard HTTP, with all its benefits: CDN, caching, HTTPS.
While working on PodcastFeedMaker and the server infrastructure PodcastFeedVapor, an obvious need emerged: handling the media files themselves. Not just RSS metadata, but the actual audio and video files — segmenting them, encoding them, encrypting them, validating them. All from pure Swift, with no external dependency in the lib's core, compatible with macOS, iOS, and Linux.
I looked for a Swift library that did this. None existed. FFmpeg wrappers, yes. HLS players, yes. But a complete pipeline — from manifest parsing to segment encryption through cloud transcoding — in native Swift? Nothing.
HLSKit was born from that void.
The first version covered the end-to-end VOD pipeline: parsing, generation, segmentation, transcoding, encryption. The second added cloud transcoding — Cloudflare, AWS, Mux. But the most ambitious piece was still missing: live. Not a placeholder, not a wrapper — a real real-time pipeline capable of taking an audio or video stream, encoding it, segmenting it, broadcasting it, and serving it in Low-Latency HLS with multi-destination push. That's what HLSKit 0.3.0 does.
0.4.0 pushes the ambition even further. The live pipeline becomes intelligent — real-time transport quality monitoring, automatic bitrate adaptation, multi-destination health dashboard. Stereoscopic MV-HEVC spatial video for Apple Vision Pro is natively supported. IMSC1 subtitles (W3C TTML) are parsed, rendered and segmented into fMP4. And EXT-X-DEFINE variable substitution enables CDN templating without modifying source playlists. This is HLSKit 0.4.0 — Transport Intelligence & Spatial Computing.
0.5.0 fixes a critical bug in the video fMP4 initialization segment — the pre_defined field of the VisualSampleEntry was 4 bytes instead of 2 (ISO 14496-12 §12.1.3), which shifted the avcC box and caused silent rejections by Safari, AVPlayer and FFmpeg. It also introduces an async segmentTransform API in IncrementalSegmenter, allowing actors to be used in segment transformation closures instead of manual locks.
0.6.0 brings full HEVC/H.265 support in the live fMP4 segmenter — hev1 sample entry with hvcC box (ISO 14496-15 §8.3), Video Parameter Set, profile and level parsed from SPS NALUs. B-frame support is added via compositionTimeOffset in EncodedFrame, for codecs where display order differs from decode order. Audio-video sync in VideoSegmenter is fixed through integer-based duration accumulation (Int64 ticks) instead of floating-point, eliminating drift errors that could cause TARGETDURATION to exceed the configured value. This is HLSKit 0.6.0 — HEVC & Precision.
What HLS Actually Is
Before diving into the lib, a quick detour on the protocol for those who've never touched HLS. If you already know it, skip to the next section.
HLS works in three stages. First, you take a media file (MP4, MOV, audio…) and segment it — you split it into 4 to 10 second chunks. Each chunk is an independent file, downloadable via HTTP.
Next, you write a manifest — a text file with the .m3u8 extension — that lists the segments in order with their duration. It's the player's roadmap.
Finally, for multi-quality (adaptive bitrate), you create a master playlist that points to multiple media playlists: one for 360p, one for 720p, one for 1080p. The player automatically chooses the quality based on available bandwidth.
In live, the principle is the same — but segments are produced in real time instead of being pre-split. The manifest is updated with each new segment, and old ones exit a sliding window. Low-Latency HLS goes even further: segments are split into smaller parts and the player can request them before the complete segment is even finished.
This 6-line text file is an HLS master playlist. Each STREAM-INF describes a variant: bandwidth, resolution, codecs. The player reads this file, evaluates its connection, and loads the right playlist. It's that simple — and that complex when you want to do it properly, because RFC 8216 that specifies HLS spans 50 pages of very precise rules.
What HLSKit Does
HLSKit covers the HLS pipeline end to end — VOD and live. Not just one piece — the entire path, from source file to encrypted segment ready to be served, or from microphone stream to CDN in real time. All in pure Swift, Sendable end to end, with zero external dependencies in the core lib.
Parser — Complete M3U8 manifest reading with typed models for master playlists, media playlists, variants, segments, and all Low-Latency HLS extensions
Generator — Spec-compliant manifest production, with an imperative API and a
@resultBuilderDSL for declarative playlist buildingValidator — Conformance checking against RFC 8216 and Apple HLS rules, with 3 severity levels: error, warning, info
fMP4 Segmenter — MP4 file segmentation into fragmented MP4 segments with initialization segment and auto-generated playlist, H.264 (
avc1/avcC) and HEVC (hev1/hvcC) supportMPEG-TS Segmenter — Segmentation into MPEG-TS segments for compatibility with legacy players
Byte-range — Byte-range segmentation mode: one file, multiple logical segments
Apple Transcoding — Hardware-accelerated encoding via Apple VideoToolbox (macOS, iOS)
FFmpeg Transcoding — Cross-platform transcoding with quality presets and multi-variant output
Cloud Transcoding — Delegation to Cloudflare Stream, AWS MediaConvert, or Mux — same
Transcoderprotocol, zero local GPU requiredAES-128 Encryption — Full segment encryption in AES-128-CBC with key rotation
SAMPLE-AES Encryption — Sample-level encryption for video NAL units and ADTS audio frames
Key Management — Key generation, IV derivation (RFC 8216), and key file I/O
MP4 Inspection — MP4 box reading, track analysis, sample table parsing
Live Pipeline — Complete real-time stream orchestration: source → encoding → segmentation → playlist → push
Low-Latency HLS — Partial segments,
CAN-BLOCK-RELOAD,CAN-SKIP-UNTIL, delta updates,EXT-X-PRELOAD-HINTMulti-destination Push — Segment delivery to one or more HTTP endpoints, with DI transport support for RTMP, SRT and Icecast
Live Metadata — Real-time injection of SCTE-35, DATE-RANGE, ID3, HLS interstitials
Live Recording — Simultaneous recording during streaming, live-to-VOD conversion with automatic chaptering
I-Frame Playlists —
EXT-X-I-FRAMES-ONLYplaylist generation for trick play and thumbnailsAudio Processing — Format conversion, LUFS loudness measurement, silence detection, channel mixing, normalization
Spatial Audio — Dolby Atmos, AC-3, E-AC-3, multi-channel 5.1/7.1.4, Hi-Res audio 96/192 kHz
HDR & Ultra-resolution — HDR10, Dolby Vision, HLG, VIDEO-RANGE signaling, 4K/8K support
Live DRM — FairPlay Streaming with per-segment key rotation, session keys
Accessibility — CEA-608/708 closed captions, live WebVTT subtitles, audio description
Resilience — Redundant streams, content steering, gap signaling, automatic failover
CLI — 10 command-line commands for common HLS workflows, including
live,iframe,imsc1andmvhevcStrict concurrency — All public types are
Sendable, Swift 6.2 strict concurrency throughoutTransport Contract v2 —
QualityAwareTransport,AdaptiveBitrateTransport,RecordingTransport— quality monitoring, automatic ABR, multi-destination health dashboardMV-HEVC Spatial Video — Stereoscopic packaging for Apple Vision Pro with
MVHEVCPackager, Dolby Vision Profile 8/20,REQ-VIDEO-LAYOUTIMSC1 Subtitles — W3C TTML parsing, rendering, fMP4 segmentation with
IMSC1Parser,IMSC1Renderer,IMSC1SegmenterVariable Substitution —
EXT-X-DEFINEwith NAME/VALUE, IMPORT, QUERYPARAM for CDN templatingVideo Projections —
REQ-VIDEO-LAYOUTwith 360°, 180°, Apple Immersive Video viaVideoLayoutDescriptor
The VOD Pipeline in 30 Seconds
HLSEngine is the facade that orchestrates VOD operations. In a few lines, you parse a manifest, validate its conformance, and segment a file. No complex config, no boilerplate:
Three operations, three lines of code each. The parser returns typed Swift models. The validator returns a structured report. The segmenter returns the segments, the generated playlist, and the initialization segment. No magic strings, no casting — idiomatic Swift.
The Live Pipeline
This is the big addition in 0.3.0. LivePipeline orchestrates a complete real-time stream: a media source (microphone, camera, file) feeds an encoder, which produces encoded frames, which are segmented on the fly, assembled into a continuously updated playlist, and pushed to one or more destinations.
The whole thing is composable. Each pipeline stage is an independent component injected via LivePipelineComponents — you assemble exactly what you need.
The pipeline emits events (LivePipelineEvent) and real-time statistics (LivePipelineStatistics) via AsyncStream. You can monitor the number of segments produced, actual bitrate, buffer health, encoded frames per second — all in real time, without polling.
The state machine (LivePipelineState) manages transitions: idle → starting → running → stopping → stopped. A LivePipelineSummary is produced on stop with the total duration, segment count, and bytes written.
Live Presets
To simplify getting started, LivePipelineConfiguration offers preconfigured presets for common use cases. Each preset configures encoding, segmentation, playlist, and scenario-specific options:
Audio:
| Preset | Description |
|---|---|
| AAC 128 kbps, 48 kHz stereo, 6s MPEG-TS, sliding window (5), -16 LUFS |
| AAC 256 kbps, 4s fMP4, sliding window (8), LL-HLS with 1s parts |
| AAC 320 kbps, 4s fMP4, event playlist (full set replay), recording |
| AAC 320 kbps, 4s fMP4, sliding window (10), DVR 6h, recording |
| AAC 48 kbps mono 22 kHz, 10s MPEG-TS, sliding window (3) — voice over weak connections |
| AAC 128 kbps, 6s fMP4, sliding window (6), -16 LUFS, PROGRAM-DATE-TIME |
| AAC 192 kbps, 6s fMP4, -23 LUFS (EBU R 128), DVR 2h, recording |
| AAC 128 kbps, 6s fMP4, event playlist (no segment eviction), recording |
Video:
| Preset | Description |
|---|---|
| 1920×1080 30fps 4 Mbps + AAC 128 kbps, 6s fMP4, LL-HLS (0.5s parts) |
| 1280×720 30fps 2 Mbps, 4s fMP4, full LL-HLS (0.33s, preload hints, delta, blocking) |
| 1920×1080 30fps 4 Mbps, 6s fMP4, sliding window (5) — add destinations via pipeline |
| 3840×2160 30fps 15 Mbps + AAC 192 kbps, 6s fMP4, LL-HLS (0.5s parts) |
| 3840×2160 30fps 15 Mbps, 4s fMP4, full LL-HLS (0.33s, preload, delta, blocking) |
| 1280×720 30fps 1.5 Mbps, 6s fMP4, -16 LUFS, recording — interviews, talking heads |
| 1920×1080 30fps 4 Mbps, 6s fMP4, DVR 4h, LL-HLS (0.5s), recording |
| 1280×720 15fps 1 Mbps + AAC 96 kbps, 6s fMP4, event playlist, recording |
Pro — Spatial Audio, HDR, DRM, Accessibility:
| Preset | Description |
|---|---|
| AAC 128 kbps + E-AC-3 384 kbps Dolby Atmos 5.1, stereo fallback |
| AAC 256 kbps + ALAC lossless 96 kHz/24-bit |
| 1920×1080 HDR10, HEVC Main10, SDR fallback |
| 3840×2160 Dolby Vision Profile 8, HEVC |
| 7680×4320 HEVC Main10 |
| FairPlay CBCS, key rotation every 10 segments |
| FairPlay + Widevine + PlayReady, rotation every 10 segments |
| MV-HEVC stereoscopic for Apple Vision Pro, 1080p (or 4K with |
| CEA-708 EN/ES + audio description EN + WebVTT |
| Atmos 5.1 + Dolby Vision 4K + FairPlay + CEA-708 (EN/ES/FR) + audio desc + recording |
Low-Latency HLS
Classic HLS has a latency of 15 to 30 seconds — the time to accumulate several complete segments. Low-Latency HLS (LL-HLS) reduces this to under 2 seconds by splitting segments into smaller parts (PartialSegment) and allowing the player to request them before the segment is complete.
HLSKit implements the full LL-HLS spec. LLHLSManager orchestrates the production of partial segments during production and announces server capabilities (CAN-BLOCK-RELOAD, PART-HOLD-BACK). BlockingPlaylistHandler handles CAN-BLOCK-RELOAD — the player makes a "blocking" request and the server only responds when the next piece is ready. DeltaUpdateGenerator produces delta playlists (EXT-X-SKIP) to reduce refresh bandwidth.
The beauty of the system is that LL-HLS is an optional add-on to the existing live pipeline. You enable low-latency components via LivePipelineComponents, and the pipeline handles the rest — partial segments, preload hints, server control, everything is coordinated automatically.
Multi-destination Push
A live stream is useless if it stays on your disk. SegmentPusher sends segments to one or more destinations as they are produced. HTTPPusher does HTTP PUT to any endpoint — a CDN, an Nginx server, an S3 bucket.
For more advanced cases, the architecture is open by design. The RTMPTransport, SRTTransport, and IcecastTransport protocols are defined in HLSKit, but concrete implementations are provided by separate libs — swift-rtmp-kit, swift-srt-kit, swift-icecast-kit. It's clean dependency injection: HLSKit stays zero-dependency, your app imports the transports it needs.
MultiDestinationPusher handles parallel delivery. BandwidthMonitor tracks actual bandwidth per destination. If one destination fails, the others continue — no single point of failure.
Real-time Metadata
A live stream without metadata is a blind pipe. HLSKit injects metadata into the stream during production — without interrupting the pipeline.
SCTE-35 for ad breaks: SCTE35Marker inserts EXT-X-CUE-OUT / EXT-X-CUE-IN signals at precise moments. DATE-RANGE for temporal events — chapter start, program change, news alert. ID3 for timed metadata — title, artist, album art synchronized with audio. Interstitials for HLS breaks — ad pause or inserted content with EXT-X-ASSET-URI and resume configuration.
Recording and Live-to-VOD
While a live stream is being broadcast, SimultaneousRecorder records everything in parallel — segments and metadata — to a local directory. When the stream stops, LiveToVODConverter transforms the recording into a standard VOD playlist with automatic chapters.
Chaptering is handled by AutoChapterGenerator — SCTE-35 cut points or DATE-RANGEs become chapters in the VOD version. Your 2-hour live stream becomes a chaptered VOD file in a single operation.
I-Frame Playlists
IFramePlaylistGenerator produces EXT-X-I-FRAMES-ONLY playlists — playlists containing only keyframes, used for trick play (fast forward, rewind) and thumbnail generation. ThumbnailExtractor extracts thumbnails at regular intervals for player visual timelines.
Audio Processing
The audio module offers five indispensable tools for professional broadcast streams.
AudioFormatConverter converts between formats: MP3 → M4A, WAV → AAC, FLAC → ALAC. LoudnessMeter measures integrated loudness in LUFS according to the EBU R128 standard — essential for broadcast compliance. SilenceDetector identifies silence ranges in an audio stream, useful for automatic chaptering or signal loss detection in live. ChannelMixer handles channel mixing — stereo to mono, 5.1 to stereo, automatic upmix. And AudioNormalizer applies loudness normalization to target standards (-16 LUFS for podcasts, -23 LUFS for broadcast EBU R128).
Spatial Audio, HDR and Hi-Res
HLSKit speaks the language of professional formats.
On the audio side: Dolby Atmos (via Dolby Digital Plus JOC), AC-3, E-AC-3, multi-channel 5.1 and 7.1.4, Hi-Res audio at 96 or 192 kHz, 24 or 32 bits, with ALAC and FLAC lossless. Generated HLS manifests include CHANNELS attributes and properly configured alternative audio renditions.
On the video side: HDR10, Dolby Vision, HLG, with VIDEO-RANGE signaling in manifests. Resolution support goes up to 8K, with CODECS attributes that precisely reflect the encoded profiles. Everything is transparent to the pipeline — you configure capabilities in LivePipelineConfiguration, and manifests are generated with the right attributes.
Live DRM
For protected live content, HLSKit implements FairPlay Streaming integration with per-segment key rotation. Each segment can have its own key, and EncryptionKey allows the player to download the key only once for an entire session. The DRM + LL-HLS combination is supported — partial segments inherit the key from their parent segment.
Accessibility
Accessibility is not a bonus — it's a legal obligation for many broadcasters. HLSKit generates CLOSED-CAPTIONS tags for CEA-608 and CEA-708, live WebVTT subtitle tracks, and audio description renditions. Produced manifests are compliant with Apple's accessibility requirements for App Store distribution.
Transport Contract v2
Major addition in 0.4.0: the live pipeline is now aware of its transport quality. Three protocols define the contract between HLSKit and the transport layers — QualityAwareTransport for real-time quality monitoring, AdaptiveBitrateTransport for ABR recommendations, and RecordingTransport for transport-side local recording.
TransportAwarePipelinePolicy configures the pipeline's behavior in response to transport signals: automatic bitrate adjustment, minimum quality threshold, and ABR responsiveness. TransportHealthDashboard aggregates health across all destinations in real time — healthy, degraded, and failed destination counts, with a worst-case overall grade.
Five quality levels (excellent, good, fair, poor, critical) are derived from the transport score (0.0 to 1.0). Three ABR responsiveness levels are available: conservative (3 consecutive recommendations before adjustment), responsive (2) and immediate (1). Companion transports — swift-rtmp-kit, swift-srt-kit, swift-icecast-kit — implement these protocols with their native metrics: RTT and packet loss for SRT (with SMPTE 2022 FEC and multi-link bonding), stream statistics for Icecast (with 6 authentication modes including digest, bearer and shoutcastV2), enhanced RTMP capabilities for RTMP. On the capture side, swift-capture-kit provides a transport-agnostic StreamingPipeline that feeds these transports with hardware-encoded audio and video.
MV-HEVC Spatial Video
Apple Vision Pro speaks a specific video language: MV-HEVC (Multi-View HEVC), a stereoscopic format where left and right views are encoded in a single HEVC stream with multiview extensions. HLSKit 0.4.0 natively supports this format — from HEVC sample packaging to HLS manifest signaling.
MVHEVCSampleProcessor extracts NAL units from an HEVC stream, identifies parameter sets (VPS, SPS, PPS), and parses SPS profiles. MVHEVCPackager creates fMP4 segments — init segment with spatial boxes, media segments with properly encapsulated samples. SpatialVideoConfiguration provides presets: visionProStandard (1080p stereo), visionProHighQuality (4K stereo), dolbyVisionStereo (4K Dolby Vision Profile 20).
IMSC1 Subtitles
IMSC1 (Internet Media Subtitles and Captions) is the W3C profile of TTML used for subtitles in professional HLS workflows. HLSKit 0.4.0 implements the complete pipeline: TTML XML parsing, rendering, and fMP4 segmentation for HLS delivery.
IMSC1Parser parses a TTML document and returns an IMSC1Document with typed subtitles (begin/end/text), percentage-positioned regions, and styles (font, size, color, alignment, outline). IMSC1Renderer serializes the document back to valid TTML. IMSC1Segmenter produces fMP4 segments — init segment with subtitle track metadata, media segments with encapsulated temporal cues.
The parser handles timecodes in HH:MM:SS.mmm and HH:MM:SS:FF formats. Parsing errors are typed: invalidXML, missingTTElement, invalidTimecode, missingLanguage — no silent crashes, no partial data.
Variable Substitution
HLS variable substitution (EXT-X-DEFINE) allows templating manifests without modifying them for each deployment. It's the standard mechanism for CDN templating — one source manifest, values injected at runtime.
HLSKit supports all three forms defined by the spec: NAME/VALUE (inline definition), IMPORT (import from a parent manifest), and QUERYPARAM (extraction from a URL parameter). The VariableResolver resolves {$variable} references in URIs and attributes, with a strict mode that detects undefined variables.
The HLS validator automatically checks variables: undefined references, duplicate definitions, IMPORT without multivariant context. Manifest generation includes EXT-X-DEFINE tags when VariableDefinition objects are attached to the playlist.
Video Projections
The REQ-VIDEO-LAYOUT tag signals to the HLS player the type of video content and its projection. It's essential for spatial video (Apple Vision Pro) and immersive video (360°, 180°).
VideoLayoutDescriptor combines a VideoChannelLayout (mono, stereo) and a VideoProjection (rectilinear, equirectangular, half-equirectangular, Apple Immersive Video). Presets cover common cases:
| Preset | REQ-VIDEO-LAYOUT Value | Usage |
|---|---|---|
|
| Standard stereoscopic video |
|
| Classic 2D video |
|
| 360° equirectangular video |
|
| 180° stereo for Apple Vision Pro |
|
| Apple Immersive Video |
Parsing and Generating Manifests
The parser reads any M3U8 manifest — master playlists, media playlists, and all HLS v7+ extensions including Low-Latency HLS. It returns a Manifest that is either .master(MasterPlaylist) or .media(MediaPlaylist). Every variant, every segment, every tag is modeled by a typed Swift struct.
For generation, two approaches. The imperative API if you're building playlists dynamically:
And the @resultBuilder DSL if you prefer declarative syntax:
Both approaches produce spec-compliant M3U8. The generator handles formatting, tags, attributes, and all the serialization details that RFC 8216 mandates.
RFC 8216 Validation
A manifest can be syntactically valid but semantically wrong. HLSValidator checks playlists against two rule sets: RFC 8216 (the IETF standard) and Apple HLS rules (stricter on certain points). Each violation is classified as error, warning, or info.
The structured report tells you exactly what's wrong and why. No generic "invalid playlist" message — each rule has an identifier, a message, and a severity level. You know what's blocking before deploying, not after.
Segmentation
This is the core of the HLS pipeline: taking an MP4 file and splitting it into segments ready to be served. HLSKit offers two output formats — each with its own advantages.
Fragmented MP4 (fMP4) is the modern format recommended by Apple. Each segment is an independent MP4 fragment, preceded by an initialization segment (init.mp4) containing the track metadata. It's the most efficient format, the one used by modern CDNs.
MPEG-TS is the historical format. Each segment is self-contained with its own metadata — heavier, but compatible with absolutely all players, including the oldest ones.
A third mode, byte-range, allows splitting a file into logical segments without physically duplicating it. Segments are byte ranges within a single file — useful when storage is constrained.
In live, two additional segmenters handle real-time: AudioSegmenter for pure audio streams and VideoSegmenter for video. They consume encoded frames on the fly and produce segments as soon as the target duration is reached, without waiting for the stream to end.
The segmenter reads the MP4 file at the box level (the structural blocks of the ISO BMFF format), analyzes the sample tables to find optimal cut points (keyframes), and produces segments aligned on sample boundaries. No audio glitches, no missing frames — the cut is surgical.
For non-ISOBMFF formats (MP3, WAV, FLAC), HLSKit automatically detects the incompatibility and transcodes the file to M4A before segmenting — total transparency for the caller.
HEVC fMP4 & B-Frames (0.6.0)
The live fMP4 segmenter (CMAFWriter) now supports HEVC/H.265 natively. When the VideoConfig codec is .h265, the initialization segment generates an hev1 sample entry with an hvcC box conforming to ISO 14496-15 §8.3 — profile, tier, level, and the three NALU arrays (VPS, SPS, PPS) parsed automatically from the stream parameters.
B-frame support is added via compositionTimeOffset in EncodedFrame. Codecs that reorder frames (HEVC, H.264 High Profile) produce frames whose decode order differs from display order — the compositionTimeOffset (PTS − DTS) encodes this difference in each trun box of the fMP4 segment. When nil, PTS == DTS is assumed (no reordering).
Duration accumulation in the live segmenter now uses integers (Int64 ticks + Int32 timescale) instead of Double. When the timescale is constant — which is the case 99% of the time in live — addition is done in integer ticks, and conversion to seconds only happens for comparison with the target duration. Result: no more floating-point drift that could cause TARGETDURATION to exceed the configured value after several hours of streaming.
Async segmentTransform (0.5.0)
IncrementalSegmenter now accepts an async segmentTransform closure. Before 0.5.0, this closure was synchronous — forcing the use of locks (NSLock, @unchecked Sendable) for muxing operations that touch actors. The closure is now @Sendable (LiveSegment, [EncodedFrame]) async -> LiveSegment, allowing direct actor calls. Synchronous closures are still accepted — the migration is non-breaking.
Local Transcoding
Segmenting a file is good. But often, you also need to encode it — change the resolution, codec, bitrate. HLSKit offers two local transcoders, both conforming to the Transcoder protocol.
AppleTranscoder uses VideoToolbox, Apple's low-level framework for hardware encoding. It leverages your Mac's or iPhone's dedicated chips to encode H.264/HEVC much faster than software — with minimal power consumption. Available only on Apple platforms.
FFmpegTranscoder wraps the FFmpeg binary installed on the system. Cross-platform, Linux-compatible, supports virtually all existing codecs. It's the default server choice.
Both implement the same Transcoder protocol. Your calling code doesn't change — you can switch between Apple and FFmpeg by changing a single line. Quality presets (.p360, .p480, .p720, .p1080, .p2160, .audioOnly) are shared.
For multi-variant (adaptive bitrate), transcodeVariants() encodes to multiple qualities in a single pass and generates the master playlist automatically.
Automatic source content detection (video or audio-only) adjusts the preset accordingly — no manual configuration needed.
Cloud Transcoding
On a server, you have neither GPU nor FFmpeg. Installing FFmpeg on a minimal cloud instance is possible but not always desirable — it complicates deployment, consumes CPU, and doesn't scale.
ManagedTranscoder solves this problem by delegating transcoding to a cloud service. Cloudflare Stream, AWS MediaConvert, or Mux — you choose the provider, the lib handles everything: uploading the source file, creating the job, polling the status, downloading the result.
Most importantly: ManagedTranscoder implements the same Transcoder protocol as local transcoders. Your calling code doesn't know — and doesn't need to know — whether transcoding happens on your machine or in a datacenter 5,000 km away.
| Provider | Authentication | Ideal For |
|---|---|---|
Cloudflare Stream | API token (Bearer) | Zero egress bandwidth cost, global CDN |
AWS MediaConvert | Access key + secret (SigV4) | Enterprise, existing AWS infrastructure |
Mux | Token ID + secret (Basic Auth) | Simplest API, automatic adaptive bitrate |
Streaming Upload/Download
A 500 MB video file — you don't want to load it all into RAM at once. Streamed upload and download send and receive data directly from disk, never loading the complete file into memory. The progress callback reports progress granularly through 5 phases: upload (0-30%), job creation (30%), polling (30-80%), download (80-95%), complete (100%).
Job Lifecycle
Under the hood, each cloud transcoding operation follows a precise lifecycle: queued → processing → completed | failed | cancelled. ManagedTranscodingJob encapsulates this state with a jobID, an assetID, encoding progress, output URLs when complete, and an error message when not.
Polling is configurable: interval between checks (pollingInterval, default 5 seconds) and global timeout (timeout, default 1 hour). By default, cloud assets are deleted after download (cleanupAfterDownload = true) to avoid residual storage costs.
Encryption
HLS supports two encryption modes to protect content — and HLSKit implements both.
AES-128 encrypts each segment entirely with AES-128-CBC. It's the standard mode, supported by all players. Simple, robust, with possible key rotation per segment.
SAMPLE-AES is finer-grained: it encrypts at the individual sample level — NAL units for H.264 video, ADTS frames for AAC audio. The container remains readable (headers, metadata), only the media content is encrypted. This is the mode used by DRMs like FairPlay.
KeyManager generates cryptographically secure AES-128 keys and derives IVs according to RFC 8216 (segment sequence number in big-endian on 16 bytes). It also handles reading and writing key files — the 16-byte binary file that the player downloads to decrypt segments.
MP4 Inspection
Before segmenting or transcoding a file, you need to know what it contains. MP4BoxReader parses the ISO BMFF container structure — the famous "boxes" (or "atoms" in QuickTime vocabulary) that organize an MP4 file.
MP4InfoParser extracts useful information: audio/video tracks, codecs, resolution, duration, bitrate, timescale. All without decoding the media — just reading the container metadata.
Going deeper, SampleTableParser reads the container's sample tables: the tables that map each sample (video frame, audio packet) to its position in the file, its timestamp, and its size. It's thanks to these tables that the segmenter knows exactly where to cut — on a keyframe, without breaking the timeline.
The CLI
10 commands for common HLS workflows, directly from the terminal. Handy for scripting, CI/CD, or simply testing quickly without writing code.
| Command | What It Does |
|---|---|
| Inspect an MP4 or M3U8 file (tracks, codec, duration, segments) |
| Split an MP4 into fMP4 or MPEG-TS segments |
| Transcode to one or more HLS variants |
| Validate a manifest against RFC 8216 and Apple rules |
| Encrypt HLS segments with AES-128 or SAMPLE-AES |
| Parse or generate M3U8 manifests (2 subcommands) |
| Full live pipeline: start, stop, stats, convert-to-vod, metadata |
| Generate I-Frame playlists for trick play and thumbnails |
| IMSC1 subtitle pipeline: parse, render, segment (3 subcommands) |
| MV-HEVC spatial video: package, info (2 subcommands) |
Architecture
The lib is split into functional modules, each responsible for one stage of the pipeline. Zero external dependencies in the core HLSKit — only the CLI uses swift-argument-parser for argument parsing.
Each module can be used independently. Only need the parser? Import HLSKit and use ManifestParser. Need to segment without transcoding? MP4Segmenter is standalone. Need a live audio pipeline? Assemble AudioEncoder + AudioSegmenter + SlidingWindowPlaylist + HTTPPusher in a LivePipeline. Everything is tied together by two facades: HLSEngine for VOD, and LivePipeline for real-time.
Tests
5,165 tests across 617 suites — models, parser, generator, validator, segmenter, transcoder, encryption, container, transport, engine, encoder, live segmenter, live playlist, LL-HLS, push, metadata, recording, I-Frame, audio, spatial audio, HDR, DRM, accessibility, resilience, pipeline, subtitles, spatial video, CLI, end-to-end. 94.66% coverage. Zero XCTest — 100% Swift Testing (import Testing).
Showcase tests serve as executable documentation: every public API has at least one test showing how to use it. The code examples in DocC and in this article are taken from these tests — what's written here has been compiled and executed.
| Category | Focus |
|---|---|
Model | Type conformances, Codable round-trip, HLS models |
Parser | Master/media playlists, LL-HLS, byte-range, encryption tags |
Generator | M3U8 output, builder DSL, tag writing |
Validator | RFC 8216, Apple HLS rules, severity levels |
Segmenter | fMP4, MPEG-TS, byte-range, config, playlist generation |
Transcoder | Quality presets, Apple/FFmpeg/Managed availability, multi-variant |
Encryption | AES-128, SAMPLE-AES, key management, round-trip |
Container | MP4 box reading, sample tables, init/media segment writing |
Transport | TS packets, PAT/PMT, PES, ADTS/AnnexB conversion |
Engine | HLSEngine facade, segmentation, encryption, manifest operations |
Encoder | Real-time AAC AudioEncoder, H.264/HEVC VideoEncoder |
Live Segmenter | AudioSegmenter, VideoSegmenter, CMAFWriter |
Live Playlist | SlidingWindowPlaylist, EventPlaylist, DVRBuffer |
Low-Latency | LLHLSManager, BlockingPlaylistHandler, DeltaUpdateGenerator |
Push | HTTPPusher, MultiDestinationPusher, BandwidthMonitor |
Metadata | SCTE-35, DATE-RANGE, ID3, interstitials |
Recording | SimultaneousRecorder, LiveToVODConverter, AutoChapterGenerator |
I-Frame | IFramePlaylistGenerator, ThumbnailExtractor |
Audio | AudioFormatConverter, LoudnessMeter, SilenceDetector |
Spatial Audio | Dolby Atmos, multi-channel, Hi-Res, renditions |
HDR | HDR10, Dolby Vision, HLG, VIDEO-RANGE, ultra-resolution |
DRM | FairPlay live, key rotation, session keys |
Accessibility | CEA-608/708, WebVTT, audio description |
Resilience | Redundant streams, content steering, gap signaling |
Subtitles | IMSC1Parser, IMSC1Renderer, IMSC1Segmenter, TTML round-trip |
Spatial | MVHEVCPackager, MVHEVCSampleProcessor, SpatialVideoConfiguration |
Pipeline | LivePipeline, components, presets, statistics, transport monitoring |
Showcase | Public API demonstrations (executable documentation) |
CLI | 10 commands, argument parsing, integration |
EndToEnd | Cross-feature integration scenarios |
Installation
Requirements
Swift 6.2+ with strict concurrency
Lib: macOS 14+, iOS 17+, tvOS 17+, watchOS 10+, visionOS 1+
CLI: macOS 14+, Linux (Ubuntu 22.04+)
Zero external dependencies in the core lib (
swift-argument-parserfor CLI only)
Swift Package Manager
Documentation
Complete documentation is available in DocC, integrated with the package. 33 guides cover each pipeline stage — VOD, live, spatial video, subtitles, transport v2 — with executable examples.
| Guide | Content |
|---|---|
Getting Started | Installation, first workflow, builder DSL, live example |
Manifest Parsing | ManifestParser, TagParser, AttributeParser, error handling |
Manifest Generation | ManifestGenerator, TagWriter, builder DSL, LL-HLS models |
Validating Manifests | HLSValidator, rule sets, severity levels, reports |
Segmenting Media | MP4Segmenter, TSSegmenter, byte-range, auto-transcode non-ISOBMFF |
Transcoding Media | Quality presets, Apple/FFmpeg transcoders, multi-variant, auto detection |
Cloud Transcoding | ManagedTranscoder, Cloudflare/AWS/Mux providers, streaming upload |
Encrypting Segments | AES-128, SAMPLE-AES, KeyManager, key rotation |
HLSEngine | High-level facade for end-to-end VOD workflows |
CLI Reference | 10 commands with options, examples, JSON config |
Live Streaming | Live pipeline overview, architecture, use cases |
Live Encoding | MediaSource, AudioEncoder, VideoEncoder, MultiBitrateEncoder |
Live Segmentation | LiveSegmenter, AudioSegmenter, VideoSegmenter, CMAFWriter |
Live Playlists | LivePlaylistManager, DVRBuffer, sliding window, event playlist |
Low-Latency HLS | LLHLSManager, BlockingPlaylistHandler, DeltaUpdateGenerator, partial segments |
Segment Pushing | HTTPPusher, multi-destination, transport DI (RTMP/SRT/Icecast) |
Live Metadata | SCTE-35, DATE-RANGE, ID3, interstitials, real-time injection |
Live Recording | SimultaneousRecorder, live-to-VOD, automatic chaptering |
I-Frame Playlists | IFramePlaylistGenerator, ThumbnailExtractor, trick play |
Audio Processing | Format conversion, LUFS loudness, silence detection |
Spatial Audio | Dolby Atmos, AC-3, multi-channel, Hi-Res 96/192 kHz |
HDR Video | HDR10, Dolby Vision, HLG, VIDEO-RANGE, 4K/8K |
Live DRM | FairPlay live, key rotation, session keys |
Accessibility & Resilience | CEA-608/708, WebVTT, failover, gap signaling |
Live Presets | LivePipeline presets, configuration, statistics, lifecycle |
Transport Contracts v2 | Quality monitoring, ABR, RTMP/SRT/Icecast v2 |
Transport-Aware Pipeline | Pipeline integration with transport quality |
Variable Substitution | EXT-X-DEFINE, CDN templating, validation |
IMSC1 Subtitles Guide | TTML parse, render, fMP4 segmentation |
Spatial Video Guide | MV-HEVC packaging for Apple Vision Pro |
Video Projection Specifiers | REQ-VIDEO-LAYOUT, 360°, Apple Immersive |
Testing Guide | Test suite, mock server, CLI scenarios |
References
The specifications and standards HLSKit builds upon — 31 in total:
Under the Hood
Swift 6.2 — Strict concurrency, all types
Sendable, cross-platform thread-safe withLockedState<T>5,165 tests — 617 suites, 94.66% overall coverage, 100% Swift Testing
33 DocC articles — Complete documentation with executable examples for VOD, live, spatial video and subtitles
6 platforms — macOS, iOS, tvOS, watchOS, visionOS, Linux
~30 modules — Parsing, generation, validation, segmentation, transcoding, encryption, live pipeline, IMSC1 subtitles, MV-HEVC spatial video, transport v2
31 industry standards — RFC 8216, LL-HLS, CMAF, SCTE-35, Dolby Atmos, HDR10, FairPlay, CEA-608/708, EBU R128, W3C TTML/IMSC1…
Zero dependencies — Pure Swift + Foundation in the core
Apache 2.0 — Permissive open-source license with SPDX headers
Links
GitHub - atelier-socle/swift-hls-kit: Enterprise-grade pure Swift HLS library — parse, segment, transcode, encrypt, stream live with LL-HLS, MV-HEVC spatial video, IMSC1 subtitles & transport-aware ABR. Cross-platform, RFC 8216 compliant
Enterprise-grade pure Swift HLS library — parse, segment, transcode, encrypt, stream live with LL-HLS, MV-HEVC spatial video, IMSC1 subtitles & tr…