The AirPlay 2 sender communicates with receivers through the RTSP protocol defined in RFC2326. However, Apple RTSP implementation has been customized to behave similarly to both an RTSP and a HTTP server.
The request-URI ends with RTSP/1.0
e.g. GET /info RTSP/1.0
.
An RTSP HTTP-like request can have the following headers:
GET /info RTSP/1.0
X-Apple-ProtocolVersion: 1
Content-Length: 70
Content-Type: application/x-apple-binary-plist
CSeq: 0
DACP-ID: 93842EE68464F5B9
Active-Remote: 1149412352
User-Agent: AirPlay/409.16
Instead, a standard RTSP request can have the following headers:
SETUP rtsp://192.168.1.12/15566703517752576217 RTSP/1.0
Content-Length: 819
Content-Type: application/x-apple-binary-plist
CSeq: 1
DACP-ID: 93842EE68464F5B9
Active-Remote: 1149412352
User-Agent: AirPlay/409.16
Header | Description |
---|---|
X-Apple-ProtocolVersion | Protocol version |
Content-Length | Length of the content/body after the headers |
Content-Type | Type of content |
CSeq | Specifies the sequence number for an RTSP request/response pair. Receiver must reply with the same CSeq. Incremented by one in every request. |
DACP-ID | 64-bit value identifying the DACP server (remote control of the sender) |
Active-Remote | Authentication token for the DACP server (remote control of the sender) |
RTP-Info | Sent with FLUSH for RTP synchronization |
The receiver must implement the endpoints reported in the following subsections. Their behavior could be different depending on the content type.
GET /info
Sender needs info from the receiver
Content-type: application/x-apple-binary-plist
The sender asks for specific info. The request body is encoded in a binary plist. Usually the sender initiates the communication demanding the txtAirPlay
qualifier
of the receiver with the following binary plist:
{'qualifier': ['txtAirPlay']}
This requests actually asks the TXT
record of the _airplay._tcp
mDNS service as a binary plist. This also suggests that a receiver can have a minimal TXT
mDNS record and then declare further information at this stage.
Example
server -> client↕
client -> server↕
No content type
The server sends a second /info
request to ask for additional information. The request has no body. The receiver replies with a binary plist which may contain the following key:value entries:
Key | Type | Description |
---|---|---|
initialVolume | Integer | A value from -144 to 0 corresponding to the initial volume of the receiver [dB] |
Example
server -> client↕
client -> server↕
POST /auth-setup
Example
server -> client↕
client -> server↕
POST /fp-setup
Example
server -> client↕
client -> server↕
POST /pair-setup
Example
server -> client↕
client -> server↕
POST /pair-verify
Example
server -> client↕
client -> server↕
POST /command
Example
server -> client↕
client -> server↕
POST /feedback
Probably an heartbeat to ensure the ensure the receiver is alive. Sent until the receiver is disconnected.
Example
server -> client↕
client -> server↕
POST /audioMode
Example
server -> client↕
client -> server↕
SETUP
This is the setup request used to establish the communication with the receiver and configure the time, event, control and data channels between the two. The time channel is used only with NTP synchronization and stays down when using PTP.
1) SETUP info and event
The sender communicate generic info about the device, timing protocol, timing peers and values related to encryption: ekey
, eiv
and et
(encryption type). timingPeerInfo
and timingPeerList
are needed if receiver supports PTP and the sender announces timingProtocol=PTP
.
The receiver sets up an event
channel (TCP) and communicates the port, together with its timing info. If the receiver declares PTP time synchronization, then timingPort
won't be used. If sender and receiver use NTP instead, the receiver must open a timing channel and declare its port into timingPort
.
The event channel must be open or the RTSP won't continue.
Example
server -> client↕
client -> server↕
2) SETUP control and data
This SETUP requests is sent as soon as audio streaming must start. The sender declares audio format, latencies, its control port and the following parameters:
audioFormat
- the audio format;ct
- compression type;shk
- shared encryption key;spf
- Frames per packet.The key type
is used to declare the type of streaming.
Type | Description |
---|---|
96 | General audio - Real time |
103 | General audio - Buffered |
110 | Screen |
120 | Playback |
130 | Remote control |
The key ct
stands for compression type.
Compression type | Description |
---|---|
1 | LPCM (Linear Pulse Code Modulation) |
2 | ALAC (Apple Lossless) |
4 | AAC (Advanced Audio Coding) |
8 | AAC ELD (Enhanced Low Delay) |
32 | OPUS1 |
The receiver prepares its control and data channels (UDP or TCP depending on the type) and communicates the respective ports in controlPort
and dataPort
. The control channel will receive RTCP2 packets while the data channel the actual streaming payload over RTP3.
The value of audioFormat
is encoded as described in section Audio codecs
Example
server -> client↕
client -> server↕
SET_PARAMETER
Used to set parameters on the receiver end or to communicate something, depending on Content-Type
.
Content-Type | Body | Description |
---|---|---|
text/parameters | "volume: N” | Volume to set on the receiver |
text/parameters | "progress: X/Y/Z” | Progress of the current track (start/current/end) |
image/jpeg | data | JPEG image of the artwork |
application/x-dmap-tagged | data | Now playing info using DAAP |
Example: volume
server -> client↕
client -> server↕
GET_PARAMETER
Get a parameter from the receiver.
Example: volume
server -> client↕
client -> server↕
SETPEERS
Used to communicate other possible receivers in the multi-room group. It is a binary plist containing a list of IPv4 and IPv6 addresses of the devices in the group.
Example: single receiver
server -> client↕
client -> server↕
Example: multi-group join
server -> client↕
client -> server↕
RECORD
The sender wants to start streaming.
Example
server -> client↕
client -> server↕
FLUSH
Sent every time the audio streaming is about to start. The RTSP request includes the header RTP-Info: seq=X;rtptime=Y
. X is the first RTP sequence number and Y the first RTP timestamp.
Example
server -> client↕
client -> server↕
TEARDOWN
Sent when audio is paused or AirPlay is stopped. The body is a binary plist containing active streams, if audio is on pause, or empty if AirPlay is disconnected.
Example: Pause
server -> client↕
client -> server↕
Example: Disconnect
server -> client↕
client -> server↕
Definition of the Opus Audio Codec - https://tools.ietf.org/html/rfc6716 ↩︎
RTP: A Transport Protocol for Real-Time Applications - SR: Sender Report RTCP Packet - https://tools.ietf.org/html/rfc3550#section-6.4.1 ↩︎
RTP: A Transport Protocol for Real-Time Applications - https://tools.ietf.org/html/rfc3550 ↩︎