HLSKit

Where It Comes From

When you listen to a podcast or watch a streaming video, there's a protocol behind it you never see: HLS. HTTP Live Streaming, invented by Apple in 2009, has become the de facto standard for audio and video delivery on the web. Your browser, your phone, your Apple TV — everything speaks HLS.

The principle is simple: you split a media file into small segments of a few seconds, write a manifest (an .m3u8 file) that lists these segments in order, and the player downloads them one by one via HTTP. No specialized server, no exotic protocol — just standard HTTP, with all its benefits: CDN, caching, HTTPS.

While working on PodcastFeedMaker and the server infrastructure PodcastFeedVapor, an obvious need emerged: handling the media files themselves. Not just RSS metadata, but the actual audio and video files — segmenting them, encoding them, encrypting them, validating them. All from pure Swift, with no external dependency in the lib's core, compatible with macOS, iOS, and Linux.

I looked for a Swift library that did this. None existed. FFmpeg wrappers, yes. HLS players, yes. But a complete pipeline — from manifest parsing to segment encryption through cloud transcoding — in native Swift? Nothing.

HLSKit was born from that void.

The first version covered the end-to-end VOD pipeline: parsing, generation, segmentation, transcoding, encryption. The second added cloud transcoding — Cloudflare, AWS, Mux. But the most ambitious piece was still missing: live. Not a placeholder, not a wrapper — a real real-time pipeline capable of taking an audio or video stream, encoding it, segmenting it, broadcasting it, and serving it in Low-Latency HLS with multi-destination push. That's what HLSKit 0.3.0 does.

0.4.0 pushes the ambition even further. The live pipeline becomes intelligent — real-time transport quality monitoring, automatic bitrate adaptation, multi-destination health dashboard. Stereoscopic MV-HEVC spatial video for Apple Vision Pro is natively supported. IMSC1 subtitles (W3C TTML) are parsed, rendered and segmented into fMP4. And EXT-X-DEFINE variable substitution enables CDN templating without modifying source playlists. This is HLSKit 0.4.0 — Transport Intelligence & Spatial Computing.

0.5.0 fixes a critical bug in the video fMP4 initialization segment — the pre_defined field of the VisualSampleEntry was 4 bytes instead of 2 (ISO 14496-12 §12.1.3), which shifted the avcC box and caused silent rejections by Safari, AVPlayer and FFmpeg. It also introduces an async segmentTransform API in IncrementalSegmenter, allowing actors to be used in segment transformation closures instead of manual locks.

0.6.0 brings full HEVC/H.265 support in the live fMP4 segmenter — hev1 sample entry with hvcC box (ISO 14496-15 §8.3), Video Parameter Set, profile and level parsed from SPS NALUs. B-frame support is added via compositionTimeOffset in EncodedFrame, for codecs where display order differs from decode order. Audio-video sync in VideoSegmenter is fixed through integer-based duration accumulation (Int64 ticks) instead of floating-point, eliminating drift errors that could cause TARGETDURATION to exceed the configured value. This is HLSKit 0.6.0 — HEVC & Precision.

What HLS Actually Is

Before diving into the lib, a quick detour on the protocol for those who've never touched HLS. If you already know it, skip to the next section.

HLS works in three stages. First, you take a media file (MP4, MOV, audio…) and segment it — you split it into 4 to 10 second chunks. Each chunk is an independent file, downloadable via HTTP.

Next, you write a manifest — a text file with the .m3u8 extension — that lists the segments in order with their duration. It's the player's roadmap.

Finally, for multi-quality (adaptive bitrate), you create a master playlist that points to multiple media playlists: one for 360p, one for 720p, one for 1080p. The player automatically chooses the quality based on available bandwidth.

In live, the principle is the same — but segments are produced in real time instead of being pre-split. The manifest is updated with each new segment, and old ones exit a sliding window. Low-Latency HLS goes even further: segments are split into smaller parts and the player can request them before the complete segment is even finished.

This 6-line text file is an HLS master playlist. Each STREAM-INF describes a variant: bandwidth, resolution, codecs. The player reads this file, evaluates its connection, and loads the right playlist. It's that simple — and that complex when you want to do it properly, because RFC 8216 that specifies HLS spans 50 pages of very precise rules.

What HLSKit Does

HLSKit covers the HLS pipeline end to end — VOD and live. Not just one piece — the entire path, from source file to encrypted segment ready to be served, or from microphone stream to CDN in real time. All in pure Swift, Sendable end to end, with zero external dependencies in the core lib.

  • Parser — Complete M3U8 manifest reading with typed models for master playlists, media playlists, variants, segments, and all Low-Latency HLS extensions

  • Generator — Spec-compliant manifest production, with an imperative API and a @resultBuilder DSL for declarative playlist building

  • Validator — Conformance checking against RFC 8216 and Apple HLS rules, with 3 severity levels: error, warning, info

  • fMP4 Segmenter — MP4 file segmentation into fragmented MP4 segments with initialization segment and auto-generated playlist, H.264 (avc1/avcC) and HEVC (hev1/hvcC) support

  • MPEG-TS Segmenter — Segmentation into MPEG-TS segments for compatibility with legacy players

  • Byte-range — Byte-range segmentation mode: one file, multiple logical segments

  • Apple Transcoding — Hardware-accelerated encoding via Apple VideoToolbox (macOS, iOS)

  • FFmpeg Transcoding — Cross-platform transcoding with quality presets and multi-variant output

  • Cloud Transcoding — Delegation to Cloudflare Stream, AWS MediaConvert, or Mux — same Transcoder protocol, zero local GPU required

  • AES-128 Encryption — Full segment encryption in AES-128-CBC with key rotation

  • SAMPLE-AES Encryption — Sample-level encryption for video NAL units and ADTS audio frames

  • Key Management — Key generation, IV derivation (RFC 8216), and key file I/O

  • MP4 Inspection — MP4 box reading, track analysis, sample table parsing

  • Live Pipeline — Complete real-time stream orchestration: source → encoding → segmentation → playlist → push

  • Low-Latency HLS — Partial segments, CAN-BLOCK-RELOAD, CAN-SKIP-UNTIL, delta updates, EXT-X-PRELOAD-HINT

  • Multi-destination Push — Segment delivery to one or more HTTP endpoints, with DI transport support for RTMP, SRT and Icecast

  • Live Metadata — Real-time injection of SCTE-35, DATE-RANGE, ID3, HLS interstitials

  • Live Recording — Simultaneous recording during streaming, live-to-VOD conversion with automatic chaptering

  • I-Frame PlaylistsEXT-X-I-FRAMES-ONLY playlist generation for trick play and thumbnails

  • Audio Processing — Format conversion, LUFS loudness measurement, silence detection, channel mixing, normalization

  • Spatial Audio — Dolby Atmos, AC-3, E-AC-3, multi-channel 5.1/7.1.4, Hi-Res audio 96/192 kHz

  • HDR & Ultra-resolution — HDR10, Dolby Vision, HLG, VIDEO-RANGE signaling, 4K/8K support

  • Live DRM — FairPlay Streaming with per-segment key rotation, session keys

  • Accessibility — CEA-608/708 closed captions, live WebVTT subtitles, audio description

  • Resilience — Redundant streams, content steering, gap signaling, automatic failover

  • CLI — 10 command-line commands for common HLS workflows, including live, iframe, imsc1 and mvhevc

  • Strict concurrency — All public types are Sendable, Swift 6.2 strict concurrency throughout

  • Transport Contract v2QualityAwareTransport, AdaptiveBitrateTransport, RecordingTransport — quality monitoring, automatic ABR, multi-destination health dashboard

  • MV-HEVC Spatial Video — Stereoscopic packaging for Apple Vision Pro with MVHEVCPackager, Dolby Vision Profile 8/20, REQ-VIDEO-LAYOUT

  • IMSC1 Subtitles — W3C TTML parsing, rendering, fMP4 segmentation with IMSC1Parser, IMSC1Renderer, IMSC1Segmenter

  • Variable SubstitutionEXT-X-DEFINE with NAME/VALUE, IMPORT, QUERYPARAM for CDN templating

  • Video ProjectionsREQ-VIDEO-LAYOUT with 360°, 180°, Apple Immersive Video via VideoLayoutDescriptor

The VOD Pipeline in 30 Seconds

HLSEngine is the facade that orchestrates VOD operations. In a few lines, you parse a manifest, validate its conformance, and segment a file. No complex config, no boilerplate:

Three operations, three lines of code each. The parser returns typed Swift models. The validator returns a structured report. The segmenter returns the segments, the generated playlist, and the initialization segment. No magic strings, no casting — idiomatic Swift.

The Live Pipeline

This is the big addition in 0.3.0. LivePipeline orchestrates a complete real-time stream: a media source (microphone, camera, file) feeds an encoder, which produces encoded frames, which are segmented on the fly, assembled into a continuously updated playlist, and pushed to one or more destinations.

The whole thing is composable. Each pipeline stage is an independent component injected via LivePipelineComponents — you assemble exactly what you need.

The pipeline emits events (LivePipelineEvent) and real-time statistics (LivePipelineStatistics) via AsyncStream. You can monitor the number of segments produced, actual bitrate, buffer health, encoded frames per second — all in real time, without polling.

The state machine (LivePipelineState) manages transitions: idlestartingrunningstoppingstopped. A LivePipelineSummary is produced on stop with the total duration, segment count, and bytes written.

Live Presets

To simplify getting started, LivePipelineConfiguration offers preconfigured presets for common use cases. Each preset configures encoding, segmentation, playlist, and scenario-specific options:

Audio:

PresetDescription

.podcastLive

AAC 128 kbps, 48 kHz stereo, 6s MPEG-TS, sliding window (5), -16 LUFS

.webradio

AAC 256 kbps, 4s fMP4, sliding window (8), LL-HLS with 1s parts

.djMix

AAC 320 kbps, 4s fMP4, event playlist (full set replay), recording

.djMixWithDVR

AAC 320 kbps, 4s fMP4, sliding window (10), DVR 6h, recording

.lowBandwidth

AAC 48 kbps mono 22 kHz, 10s MPEG-TS, sliding window (3) — voice over weak connections

.applePodcastLive

AAC 128 kbps, 6s fMP4, sliding window (6), -16 LUFS, PROGRAM-DATE-TIME

.broadcast

AAC 192 kbps, 6s fMP4, -23 LUFS (EBU R 128), DVR 2h, recording

.eventRecording

AAC 128 kbps, 6s fMP4, event playlist (no segment eviction), recording

Video:

PresetDescription

.videoLive

1920×1080 30fps 4 Mbps + AAC 128 kbps, 6s fMP4, LL-HLS (0.5s parts)

.lowLatencyVideo

1280×720 30fps 2 Mbps, 4s fMP4, full LL-HLS (0.33s, preload hints, delta, blocking)

.videoSimulcast

1920×1080 30fps 4 Mbps, 6s fMP4, sliding window (5) — add destinations via pipeline

.video4K

3840×2160 30fps 15 Mbps + AAC 192 kbps, 6s fMP4, LL-HLS (0.5s parts)

.video4KLowLatency

3840×2160 30fps 15 Mbps, 4s fMP4, full LL-HLS (0.33s, preload, delta, blocking)

.podcastVideo

1280×720 30fps 1.5 Mbps, 6s fMP4, -16 LUFS, recording — interviews, talking heads

.videoLiveWithDVR

1920×1080 30fps 4 Mbps, 6s fMP4, DVR 4h, LL-HLS (0.5s), recording

.conferenceStream

1280×720 15fps 1 Mbps + AAC 96 kbps, 6s fMP4, event playlist, recording

Pro — Spatial Audio, HDR, DRM, Accessibility:

PresetDescription

.spatialAudioLive

AAC 128 kbps + E-AC-3 384 kbps Dolby Atmos 5.1, stereo fallback

.hiResLive

AAC 256 kbps + ALAC lossless 96 kHz/24-bit

.videoHDR

1920×1080 HDR10, HEVC Main10, SDR fallback

.videoDolbyVision

3840×2160 Dolby Vision Profile 8, HEVC

.video8K

7680×4320 HEVC Main10

.drmProtectedLive

FairPlay CBCS, key rotation every 10 segments

.multiDRMLive

FairPlay + Widevine + PlayReady, rotation every 10 segments

.spatialVideo()

MV-HEVC stereoscopic for Apple Vision Pro, 1080p (or 4K with dolbyVision: true)

.accessibleLive

CEA-708 EN/ES + audio description EN + WebVTT

.broadcastPro

Atmos 5.1 + Dolby Vision 4K + FairPlay + CEA-708 (EN/ES/FR) + audio desc + recording

Low-Latency HLS

Classic HLS has a latency of 15 to 30 seconds — the time to accumulate several complete segments. Low-Latency HLS (LL-HLS) reduces this to under 2 seconds by splitting segments into smaller parts (PartialSegment) and allowing the player to request them before the segment is complete.

HLSKit implements the full LL-HLS spec. LLHLSManager orchestrates the production of partial segments during production and announces server capabilities (CAN-BLOCK-RELOAD, PART-HOLD-BACK). BlockingPlaylistHandler handles CAN-BLOCK-RELOAD — the player makes a "blocking" request and the server only responds when the next piece is ready. DeltaUpdateGenerator produces delta playlists (EXT-X-SKIP) to reduce refresh bandwidth.

The beauty of the system is that LL-HLS is an optional add-on to the existing live pipeline. You enable low-latency components via LivePipelineComponents, and the pipeline handles the rest — partial segments, preload hints, server control, everything is coordinated automatically.

Multi-destination Push

A live stream is useless if it stays on your disk. SegmentPusher sends segments to one or more destinations as they are produced. HTTPPusher does HTTP PUT to any endpoint — a CDN, an Nginx server, an S3 bucket.

For more advanced cases, the architecture is open by design. The RTMPTransport, SRTTransport, and IcecastTransport protocols are defined in HLSKit, but concrete implementations are provided by separate libs — swift-rtmp-kit, swift-srt-kit, swift-icecast-kit. It's clean dependency injection: HLSKit stays zero-dependency, your app imports the transports it needs.

MultiDestinationPusher handles parallel delivery. BandwidthMonitor tracks actual bandwidth per destination. If one destination fails, the others continue — no single point of failure.

Real-time Metadata

A live stream without metadata is a blind pipe. HLSKit injects metadata into the stream during production — without interrupting the pipeline.

SCTE-35 for ad breaks: SCTE35Marker inserts EXT-X-CUE-OUT / EXT-X-CUE-IN signals at precise moments. DATE-RANGE for temporal events — chapter start, program change, news alert. ID3 for timed metadata — title, artist, album art synchronized with audio. Interstitials for HLS breaks — ad pause or inserted content with EXT-X-ASSET-URI and resume configuration.

Recording and Live-to-VOD

While a live stream is being broadcast, SimultaneousRecorder records everything in parallel — segments and metadata — to a local directory. When the stream stops, LiveToVODConverter transforms the recording into a standard VOD playlist with automatic chapters.

Chaptering is handled by AutoChapterGenerator — SCTE-35 cut points or DATE-RANGEs become chapters in the VOD version. Your 2-hour live stream becomes a chaptered VOD file in a single operation.

I-Frame Playlists

IFramePlaylistGenerator produces EXT-X-I-FRAMES-ONLY playlists — playlists containing only keyframes, used for trick play (fast forward, rewind) and thumbnail generation. ThumbnailExtractor extracts thumbnails at regular intervals for player visual timelines.

Audio Processing

The audio module offers five indispensable tools for professional broadcast streams.

AudioFormatConverter converts between formats: MP3 → M4A, WAV → AAC, FLAC → ALAC. LoudnessMeter measures integrated loudness in LUFS according to the EBU R128 standard — essential for broadcast compliance. SilenceDetector identifies silence ranges in an audio stream, useful for automatic chaptering or signal loss detection in live. ChannelMixer handles channel mixing — stereo to mono, 5.1 to stereo, automatic upmix. And AudioNormalizer applies loudness normalization to target standards (-16 LUFS for podcasts, -23 LUFS for broadcast EBU R128).

Spatial Audio, HDR and Hi-Res

HLSKit speaks the language of professional formats.

On the audio side: Dolby Atmos (via Dolby Digital Plus JOC), AC-3, E-AC-3, multi-channel 5.1 and 7.1.4, Hi-Res audio at 96 or 192 kHz, 24 or 32 bits, with ALAC and FLAC lossless. Generated HLS manifests include CHANNELS attributes and properly configured alternative audio renditions.

On the video side: HDR10, Dolby Vision, HLG, with VIDEO-RANGE signaling in manifests. Resolution support goes up to 8K, with CODECS attributes that precisely reflect the encoded profiles. Everything is transparent to the pipeline — you configure capabilities in LivePipelineConfiguration, and manifests are generated with the right attributes.

Live DRM

For protected live content, HLSKit implements FairPlay Streaming integration with per-segment key rotation. Each segment can have its own key, and EncryptionKey allows the player to download the key only once for an entire session. The DRM + LL-HLS combination is supported — partial segments inherit the key from their parent segment.

Accessibility

Accessibility is not a bonus — it's a legal obligation for many broadcasters. HLSKit generates CLOSED-CAPTIONS tags for CEA-608 and CEA-708, live WebVTT subtitle tracks, and audio description renditions. Produced manifests are compliant with Apple's accessibility requirements for App Store distribution.

Transport Contract v2

Major addition in 0.4.0: the live pipeline is now aware of its transport quality. Three protocols define the contract between HLSKit and the transport layers — QualityAwareTransport for real-time quality monitoring, AdaptiveBitrateTransport for ABR recommendations, and RecordingTransport for transport-side local recording.

TransportAwarePipelinePolicy configures the pipeline's behavior in response to transport signals: automatic bitrate adjustment, minimum quality threshold, and ABR responsiveness. TransportHealthDashboard aggregates health across all destinations in real time — healthy, degraded, and failed destination counts, with a worst-case overall grade.

Five quality levels (excellent, good, fair, poor, critical) are derived from the transport score (0.0 to 1.0). Three ABR responsiveness levels are available: conservative (3 consecutive recommendations before adjustment), responsive (2) and immediate (1). Companion transports — swift-rtmp-kit, swift-srt-kit, swift-icecast-kit — implement these protocols with their native metrics: RTT and packet loss for SRT (with SMPTE 2022 FEC and multi-link bonding), stream statistics for Icecast (with 6 authentication modes including digest, bearer and shoutcastV2), enhanced RTMP capabilities for RTMP. On the capture side, swift-capture-kit provides a transport-agnostic StreamingPipeline that feeds these transports with hardware-encoded audio and video.

MV-HEVC Spatial Video

Apple Vision Pro speaks a specific video language: MV-HEVC (Multi-View HEVC), a stereoscopic format where left and right views are encoded in a single HEVC stream with multiview extensions. HLSKit 0.4.0 natively supports this format — from HEVC sample packaging to HLS manifest signaling.

MVHEVCSampleProcessor extracts NAL units from an HEVC stream, identifies parameter sets (VPS, SPS, PPS), and parses SPS profiles. MVHEVCPackager creates fMP4 segments — init segment with spatial boxes, media segments with properly encapsulated samples. SpatialVideoConfiguration provides presets: visionProStandard (1080p stereo), visionProHighQuality (4K stereo), dolbyVisionStereo (4K Dolby Vision Profile 20).

IMSC1 Subtitles

IMSC1 (Internet Media Subtitles and Captions) is the W3C profile of TTML used for subtitles in professional HLS workflows. HLSKit 0.4.0 implements the complete pipeline: TTML XML parsing, rendering, and fMP4 segmentation for HLS delivery.

IMSC1Parser parses a TTML document and returns an IMSC1Document with typed subtitles (begin/end/text), percentage-positioned regions, and styles (font, size, color, alignment, outline). IMSC1Renderer serializes the document back to valid TTML. IMSC1Segmenter produces fMP4 segments — init segment with subtitle track metadata, media segments with encapsulated temporal cues.

The parser handles timecodes in HH:MM:SS.mmm and HH:MM:SS:FF formats. Parsing errors are typed: invalidXML, missingTTElement, invalidTimecode, missingLanguage — no silent crashes, no partial data.

Variable Substitution

HLS variable substitution (EXT-X-DEFINE) allows templating manifests without modifying them for each deployment. It's the standard mechanism for CDN templating — one source manifest, values injected at runtime.

HLSKit supports all three forms defined by the spec: NAME/VALUE (inline definition), IMPORT (import from a parent manifest), and QUERYPARAM (extraction from a URL parameter). The VariableResolver resolves {$variable} references in URIs and attributes, with a strict mode that detects undefined variables.

The HLS validator automatically checks variables: undefined references, duplicate definitions, IMPORT without multivariant context. Manifest generation includes EXT-X-DEFINE tags when VariableDefinition objects are attached to the playlist.

Video Projections

The REQ-VIDEO-LAYOUT tag signals to the HLS player the type of video content and its projection. It's essential for spatial video (Apple Vision Pro) and immersive video (360°, 180°).

VideoLayoutDescriptor combines a VideoChannelLayout (mono, stereo) and a VideoProjection (rectilinear, equirectangular, half-equirectangular, Apple Immersive Video). Presets cover common cases:

PresetREQ-VIDEO-LAYOUT ValueUsage

.stereo

CH-STEREO

Standard stereoscopic video

.mono

CH-MONO

Classic 2D video

.video360

PROJ-EQUI

360° equirectangular video

.immersive180

CH-STEREO,PROJ-HEQU

180° stereo for Apple Vision Pro

.appleImmersive

CH-STEREO,PROJ-AIV

Apple Immersive Video

Parsing and Generating Manifests

The parser reads any M3U8 manifest — master playlists, media playlists, and all HLS v7+ extensions including Low-Latency HLS. It returns a Manifest that is either .master(MasterPlaylist) or .media(MediaPlaylist). Every variant, every segment, every tag is modeled by a typed Swift struct.

For generation, two approaches. The imperative API if you're building playlists dynamically:

And the @resultBuilder DSL if you prefer declarative syntax:

Both approaches produce spec-compliant M3U8. The generator handles formatting, tags, attributes, and all the serialization details that RFC 8216 mandates.

RFC 8216 Validation

A manifest can be syntactically valid but semantically wrong. HLSValidator checks playlists against two rule sets: RFC 8216 (the IETF standard) and Apple HLS rules (stricter on certain points). Each violation is classified as error, warning, or info.

The structured report tells you exactly what's wrong and why. No generic "invalid playlist" message — each rule has an identifier, a message, and a severity level. You know what's blocking before deploying, not after.

Segmentation

This is the core of the HLS pipeline: taking an MP4 file and splitting it into segments ready to be served. HLSKit offers two output formats — each with its own advantages.

Fragmented MP4 (fMP4) is the modern format recommended by Apple. Each segment is an independent MP4 fragment, preceded by an initialization segment (init.mp4) containing the track metadata. It's the most efficient format, the one used by modern CDNs.

MPEG-TS is the historical format. Each segment is self-contained with its own metadata — heavier, but compatible with absolutely all players, including the oldest ones.

A third mode, byte-range, allows splitting a file into logical segments without physically duplicating it. Segments are byte ranges within a single file — useful when storage is constrained.

In live, two additional segmenters handle real-time: AudioSegmenter for pure audio streams and VideoSegmenter for video. They consume encoded frames on the fly and produce segments as soon as the target duration is reached, without waiting for the stream to end.

The segmenter reads the MP4 file at the box level (the structural blocks of the ISO BMFF format), analyzes the sample tables to find optimal cut points (keyframes), and produces segments aligned on sample boundaries. No audio glitches, no missing frames — the cut is surgical.

For non-ISOBMFF formats (MP3, WAV, FLAC), HLSKit automatically detects the incompatibility and transcodes the file to M4A before segmenting — total transparency for the caller.

HEVC fMP4 & B-Frames (0.6.0)

The live fMP4 segmenter (CMAFWriter) now supports HEVC/H.265 natively. When the VideoConfig codec is .h265, the initialization segment generates an hev1 sample entry with an hvcC box conforming to ISO 14496-15 §8.3 — profile, tier, level, and the three NALU arrays (VPS, SPS, PPS) parsed automatically from the stream parameters.

B-frame support is added via compositionTimeOffset in EncodedFrame. Codecs that reorder frames (HEVC, H.264 High Profile) produce frames whose decode order differs from display order — the compositionTimeOffset (PTS − DTS) encodes this difference in each trun box of the fMP4 segment. When nil, PTS == DTS is assumed (no reordering).

Duration accumulation in the live segmenter now uses integers (Int64 ticks + Int32 timescale) instead of Double. When the timescale is constant — which is the case 99% of the time in live — addition is done in integer ticks, and conversion to seconds only happens for comparison with the target duration. Result: no more floating-point drift that could cause TARGETDURATION to exceed the configured value after several hours of streaming.

Async segmentTransform (0.5.0)

IncrementalSegmenter now accepts an async segmentTransform closure. Before 0.5.0, this closure was synchronous — forcing the use of locks (NSLock, @unchecked Sendable) for muxing operations that touch actors. The closure is now @Sendable (LiveSegment, [EncodedFrame]) async -> LiveSegment, allowing direct actor calls. Synchronous closures are still accepted — the migration is non-breaking.

Local Transcoding

Segmenting a file is good. But often, you also need to encode it — change the resolution, codec, bitrate. HLSKit offers two local transcoders, both conforming to the Transcoder protocol.

AppleTranscoder uses VideoToolbox, Apple's low-level framework for hardware encoding. It leverages your Mac's or iPhone's dedicated chips to encode H.264/HEVC much faster than software — with minimal power consumption. Available only on Apple platforms.

FFmpegTranscoder wraps the FFmpeg binary installed on the system. Cross-platform, Linux-compatible, supports virtually all existing codecs. It's the default server choice.

Both implement the same Transcoder protocol. Your calling code doesn't change — you can switch between Apple and FFmpeg by changing a single line. Quality presets (.p360, .p480, .p720, .p1080, .p2160, .audioOnly) are shared.

For multi-variant (adaptive bitrate), transcodeVariants() encodes to multiple qualities in a single pass and generates the master playlist automatically.

Automatic source content detection (video or audio-only) adjusts the preset accordingly — no manual configuration needed.

Cloud Transcoding

On a server, you have neither GPU nor FFmpeg. Installing FFmpeg on a minimal cloud instance is possible but not always desirable — it complicates deployment, consumes CPU, and doesn't scale.

ManagedTranscoder solves this problem by delegating transcoding to a cloud service. Cloudflare Stream, AWS MediaConvert, or Mux — you choose the provider, the lib handles everything: uploading the source file, creating the job, polling the status, downloading the result.

Most importantly: ManagedTranscoder implements the same Transcoder protocol as local transcoders. Your calling code doesn't know — and doesn't need to know — whether transcoding happens on your machine or in a datacenter 5,000 km away.

ProviderAuthenticationIdeal For

Cloudflare Stream

API token (Bearer)

Zero egress bandwidth cost, global CDN

AWS MediaConvert

Access key + secret (SigV4)

Enterprise, existing AWS infrastructure

Mux

Token ID + secret (Basic Auth)

Simplest API, automatic adaptive bitrate

Streaming Upload/Download

A 500 MB video file — you don't want to load it all into RAM at once. Streamed upload and download send and receive data directly from disk, never loading the complete file into memory. The progress callback reports progress granularly through 5 phases: upload (0-30%), job creation (30%), polling (30-80%), download (80-95%), complete (100%).

Job Lifecycle

Under the hood, each cloud transcoding operation follows a precise lifecycle: queuedprocessingcompleted | failed | cancelled. ManagedTranscodingJob encapsulates this state with a jobID, an assetID, encoding progress, output URLs when complete, and an error message when not.

Polling is configurable: interval between checks (pollingInterval, default 5 seconds) and global timeout (timeout, default 1 hour). By default, cloud assets are deleted after download (cleanupAfterDownload = true) to avoid residual storage costs.

Encryption

HLS supports two encryption modes to protect content — and HLSKit implements both.

AES-128 encrypts each segment entirely with AES-128-CBC. It's the standard mode, supported by all players. Simple, robust, with possible key rotation per segment.

SAMPLE-AES is finer-grained: it encrypts at the individual sample level — NAL units for H.264 video, ADTS frames for AAC audio. The container remains readable (headers, metadata), only the media content is encrypted. This is the mode used by DRMs like FairPlay.

KeyManager generates cryptographically secure AES-128 keys and derives IVs according to RFC 8216 (segment sequence number in big-endian on 16 bytes). It also handles reading and writing key files — the 16-byte binary file that the player downloads to decrypt segments.

MP4 Inspection

Before segmenting or transcoding a file, you need to know what it contains. MP4BoxReader parses the ISO BMFF container structure — the famous "boxes" (or "atoms" in QuickTime vocabulary) that organize an MP4 file.

MP4InfoParser extracts useful information: audio/video tracks, codecs, resolution, duration, bitrate, timescale. All without decoding the media — just reading the container metadata.

Going deeper, SampleTableParser reads the container's sample tables: the tables that map each sample (video frame, audio packet) to its position in the file, its timestamp, and its size. It's thanks to these tables that the segmenter knows exactly where to cut — on a keyframe, without breaking the timeline.

The CLI

10 commands for common HLS workflows, directly from the terminal. Handy for scripting, CI/CD, or simply testing quickly without writing code.

CommandWhat It Does

info

Inspect an MP4 or M3U8 file (tracks, codec, duration, segments)

segment

Split an MP4 into fMP4 or MPEG-TS segments

transcode

Transcode to one or more HLS variants

validate

Validate a manifest against RFC 8216 and Apple rules

encrypt

Encrypt HLS segments with AES-128 or SAMPLE-AES

manifest

Parse or generate M3U8 manifests (2 subcommands)

live

Full live pipeline: start, stop, stats, convert-to-vod, metadata

iframe

Generate I-Frame playlists for trick play and thumbnails

imsc1

IMSC1 subtitle pipeline: parse, render, segment (3 subcommands)

mvhevc

MV-HEVC spatial video: package, info (2 subcommands)

Architecture

The lib is split into functional modules, each responsible for one stage of the pipeline. Zero external dependencies in the core HLSKit — only the CLI uses swift-argument-parser for argument parsing.

Each module can be used independently. Only need the parser? Import HLSKit and use ManifestParser. Need to segment without transcoding? MP4Segmenter is standalone. Need a live audio pipeline? Assemble AudioEncoder + AudioSegmenter + SlidingWindowPlaylist + HTTPPusher in a LivePipeline. Everything is tied together by two facades: HLSEngine for VOD, and LivePipeline for real-time.

Tests

5,165 tests across 617 suites — models, parser, generator, validator, segmenter, transcoder, encryption, container, transport, engine, encoder, live segmenter, live playlist, LL-HLS, push, metadata, recording, I-Frame, audio, spatial audio, HDR, DRM, accessibility, resilience, pipeline, subtitles, spatial video, CLI, end-to-end. 94.66% coverage. Zero XCTest — 100% Swift Testing (import Testing).

Showcase tests serve as executable documentation: every public API has at least one test showing how to use it. The code examples in DocC and in this article are taken from these tests — what's written here has been compiled and executed.

CategoryFocus

Model

Type conformances, Codable round-trip, HLS models

Parser

Master/media playlists, LL-HLS, byte-range, encryption tags

Generator

M3U8 output, builder DSL, tag writing

Validator

RFC 8216, Apple HLS rules, severity levels

Segmenter

fMP4, MPEG-TS, byte-range, config, playlist generation

Transcoder

Quality presets, Apple/FFmpeg/Managed availability, multi-variant

Encryption

AES-128, SAMPLE-AES, key management, round-trip

Container

MP4 box reading, sample tables, init/media segment writing

Transport

TS packets, PAT/PMT, PES, ADTS/AnnexB conversion

Engine

HLSEngine facade, segmentation, encryption, manifest operations

Encoder

Real-time AAC AudioEncoder, H.264/HEVC VideoEncoder

Live Segmenter

AudioSegmenter, VideoSegmenter, CMAFWriter

Live Playlist

SlidingWindowPlaylist, EventPlaylist, DVRBuffer

Low-Latency

LLHLSManager, BlockingPlaylistHandler, DeltaUpdateGenerator

Push

HTTPPusher, MultiDestinationPusher, BandwidthMonitor

Metadata

SCTE-35, DATE-RANGE, ID3, interstitials

Recording

SimultaneousRecorder, LiveToVODConverter, AutoChapterGenerator

I-Frame

IFramePlaylistGenerator, ThumbnailExtractor

Audio

AudioFormatConverter, LoudnessMeter, SilenceDetector

Spatial Audio

Dolby Atmos, multi-channel, Hi-Res, renditions

HDR

HDR10, Dolby Vision, HLG, VIDEO-RANGE, ultra-resolution

DRM

FairPlay live, key rotation, session keys

Accessibility

CEA-608/708, WebVTT, audio description

Resilience

Redundant streams, content steering, gap signaling

Subtitles

IMSC1Parser, IMSC1Renderer, IMSC1Segmenter, TTML round-trip

Spatial

MVHEVCPackager, MVHEVCSampleProcessor, SpatialVideoConfiguration

Pipeline

LivePipeline, components, presets, statistics, transport monitoring

Showcase

Public API demonstrations (executable documentation)

CLI

10 commands, argument parsing, integration

EndToEnd

Cross-feature integration scenarios

Installation

Requirements

  • Swift 6.2+ with strict concurrency

  • Lib: macOS 14+, iOS 17+, tvOS 17+, watchOS 10+, visionOS 1+

  • CLI: macOS 14+, Linux (Ubuntu 22.04+)

  • Zero external dependencies in the core lib (swift-argument-parser for CLI only)

Swift Package Manager

Documentation

Complete documentation is available in DocC, integrated with the package. 33 guides cover each pipeline stage — VOD, live, spatial video, subtitles, transport v2 — with executable examples.

GuideContent

Getting Started

Installation, first workflow, builder DSL, live example

Manifest Parsing

ManifestParser, TagParser, AttributeParser, error handling

Manifest Generation

ManifestGenerator, TagWriter, builder DSL, LL-HLS models

Validating Manifests

HLSValidator, rule sets, severity levels, reports

Segmenting Media

MP4Segmenter, TSSegmenter, byte-range, auto-transcode non-ISOBMFF

Transcoding Media

Quality presets, Apple/FFmpeg transcoders, multi-variant, auto detection

Cloud Transcoding

ManagedTranscoder, Cloudflare/AWS/Mux providers, streaming upload

Encrypting Segments

AES-128, SAMPLE-AES, KeyManager, key rotation

HLSEngine

High-level facade for end-to-end VOD workflows

CLI Reference

10 commands with options, examples, JSON config

Live Streaming

Live pipeline overview, architecture, use cases

Live Encoding

MediaSource, AudioEncoder, VideoEncoder, MultiBitrateEncoder

Live Segmentation

LiveSegmenter, AudioSegmenter, VideoSegmenter, CMAFWriter

Live Playlists

LivePlaylistManager, DVRBuffer, sliding window, event playlist

Low-Latency HLS

LLHLSManager, BlockingPlaylistHandler, DeltaUpdateGenerator, partial segments

Segment Pushing

HTTPPusher, multi-destination, transport DI (RTMP/SRT/Icecast)

Live Metadata

SCTE-35, DATE-RANGE, ID3, interstitials, real-time injection

Live Recording

SimultaneousRecorder, live-to-VOD, automatic chaptering

I-Frame Playlists

IFramePlaylistGenerator, ThumbnailExtractor, trick play

Audio Processing

Format conversion, LUFS loudness, silence detection

Spatial Audio

Dolby Atmos, AC-3, multi-channel, Hi-Res 96/192 kHz

HDR Video

HDR10, Dolby Vision, HLG, VIDEO-RANGE, 4K/8K

Live DRM

FairPlay live, key rotation, session keys

Accessibility & Resilience

CEA-608/708, WebVTT, failover, gap signaling

Live Presets

LivePipeline presets, configuration, statistics, lifecycle

Transport Contracts v2

Quality monitoring, ABR, RTMP/SRT/Icecast v2

Transport-Aware Pipeline

Pipeline integration with transport quality

Variable Substitution

EXT-X-DEFINE, CDN templating, validation

IMSC1 Subtitles Guide

TTML parse, render, fMP4 segmentation

Spatial Video Guide

MV-HEVC packaging for Apple Vision Pro

Video Projection Specifiers

REQ-VIDEO-LAYOUT, 360°, Apple Immersive

Testing Guide

Test suite, mock server, CLI scenarios

References

The specifications and standards HLSKit builds upon — 31 in total:

Under the Hood

  • Swift 6.2 — Strict concurrency, all types Sendable, cross-platform thread-safe with LockedState<T>

  • 5,165 tests — 617 suites, 94.66% overall coverage, 100% Swift Testing

  • 33 DocC articles — Complete documentation with executable examples for VOD, live, spatial video and subtitles

  • 6 platforms — macOS, iOS, tvOS, watchOS, visionOS, Linux

  • ~30 modules — Parsing, generation, validation, segmentation, transcoding, encryption, live pipeline, IMSC1 subtitles, MV-HEVC spatial video, transport v2

  • 31 industry standards — RFC 8216, LL-HLS, CMAF, SCTE-35, Dolby Atmos, HDR10, FairPlay, CEA-608/708, EBU R128, W3C TTML/IMSC1…

  • Zero dependencies — Pure Swift + Foundation in the core

  • Apache 2.0 — Permissive open-source license with SPDX headers

Links

GitHub - atelier-socle/swift-hls-kit: Enterprise-grade pure Swift HLS library — parse, segment, transcode, encrypt, stream live with LL-HLS, MV-HEVC spatial video, IMSC1 subtitles & transport-aware ABR. Cross-platform, RFC 8216 compliant

GitHub - atelier-socle/swift-hls-kit: Enterprise-grade pure Swift HLS library — parse, segment, transcode, encrypt, stream live with LL-HLS, MV-HEVC spatial video, IMSC1 subtitles & transport-aware ABR. Cross-platform, RFC 8216 compliant

Enterprise-grade pure Swift HLS library — parse, segment, transcode, encrypt, stream live with LL-HLS, MV-HEVC spatial video, IMSC1 subtitles &amp; tr…

GitHub