Security implications of WebRTC (part 2): End to end encryption. Well, no… 3

The problem

Some time ago i was watching an episode of the VoIP Users Conference ( about the Jitsi  WebRTC video bridge. The video bridge is automagically maximizing the video of the currently active speaker without processing the audio of each participant. Instead it is utilizing a RTP header extension (RFC 6464) to find out about the audio level of each participant.

WebRTC media streams are encrypted with SRTP, the SRTP key exchange is performed end-to-end with DTLS-SRTP. This ensures that any kind of man-in-the-middle attack can be detected and the two WebRTC endpoints can be sure that nobody can spy on their conversation.

This is great! Except for the fact that SRTP only encrypts the payload part of RTP packets. The RTP header (and all RTP header extensions, like RFC 6464 audio levels) are NOT encrypted. Which means that anybody who can see your SRTP packets (e.g. your ISP or “some three letter agency recording the whole internet with a datacenter in Utah”) knows when you are speaking, when you are silent or even if you have muted your microphone). Considering what can be done with traditional telephony meta data alone, this is a bit scary.

Chrome is enabling the “ssrc-audio-level” RTP header extension by default (by inserting “a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level” into the SDP offer) for every call although Chrome does not use the data from received SRTP packets (__insert_conspiracy_theory_here__).

 See for yourself

To verify my crazy claims (remember, I am just somebody on the internet!) you just need to use one of the gazillion WebRTC demos (make sure you only share your microphone so it’s easier to find the srtp audio session) and capture the packets with wireshark. Then tell wireshark to decrypt those UDP packets as RTP and have a look:


The first 2 of the 4 marked bytes (the other 2 are just padding) are 0×10 (first 4 bit ID, second 4 bit length of the extension – 1) and 0xff (least significant 7bits are the audio level expressed in -dBov). The audio level in this example is -127 dBov, the audio level of a digitally muted source (I muted my microphone).

How to fix this?

There is a RFC for encrypting RTP header extensions (RFC 6904). Chrome should implement this or at least not enable RFC 6464 by default. I have created an issue for this on the WebRTC google code project (issue 3411).

Given the success I had with reporting my STUN gun (issue 2172 still open for over a year..), I am suggesting that WebRTC devs should fix it today with a bit of SDP mangling in javascript. Whenever you get a SDP description feed it through a function to remove the offending RTP header extension:

function processSdp(sdp_str) {
 var sdp = new Array();
 var lines = sdp_str.split("\r\n");
 for (var i = 0; i < lines.length; i++) {
   if (lines[i].indexOf("urn:ietf:params:rtp-hdrext:ssrc-audio-level") != -1) {
     /* drop it like it's hot */
   } else {
     /* keep the rest */
 return sdp.join("\r\n");


Leave a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

3 thoughts on “Security implications of WebRTC (part 2): End to end encryption. Well, no…

  • Justin Uberti

    We have this enabled at the moment, mainly because it is useful for many apps (e.g. MCUs), and the WG hasn’t agreed on a mechanism to turn it on or off. But the final state will be that we will support RFC 6904 (with optional disabling for apps that need it unencrypted), as well as padding of the codec stream to prevent similar information leakage from the codec bitrate when this header info is being encrypted. In the meantime, apps that are concerned about this can use the workaround that you suggest.

    There is a need to support unencrypted energy levels because there are important use cases where having this unencrypted is beneficial for security (e.g. a MCU that can’t decode the media, but needs to know the energy for switching).

    Regarding the STUN gun, that issue has been resolved by adding a STUN rate limiter in Chrome many revisions back.

  • kapejod Post author

    My workaround will only help with the unencrypted audio levels. A less detailed information leakage when using VAD/CNG or VBR codecs still remains (with Opus you can even find out if the microphone is muted). That only leaves g.711 for secure communication. :(

    Regarding you argument about MCUs i tend to disagree. That argument was valid when SDES was still available. With DTLS-SRTP being mandatory now, how could a relaying MCU work without decrypting the streams?

    The STUN rate limiter in Chrome is limiting STUN traffic to 2 Mbit/s per page. It does not limit individual peerconnections.
    With one peerconnection i can still generate a full 2Mbit/s, which is 16x what you could achieve with RFC compliant ICE credentials.
    That’s still quite usable for DDoS. I have updated my proof of concept STUN gun and put it up on github (

  • Justin Uberti

    There are ways to set up multiway DTLS associations such that the MCU doesn’t have access to the DTLS encryption keys. This concept is being actively explored by several MCU providers.

    Thanks for the update on the STUN gun. I am surprised the rate limiter didn’t stop this, but as you have noticed we eventually fixed this issue.