Man using magnifying glass to read CDN and infrastructure providers’ privacy policies
Looking at CDN and Infrastructure Privacy Policies by Pierre Far, Founder at Blockmetry

Looking at CDN and Infrastructure Privacy Policies

A handful of the major content distribution networks (CDNs) and other infrastructure providers serve many of the ad tags, analytics trackers, and other assets across the web. Given the prominence of these providers, an obvious question is to look at their privacy policies a bit more closely.

The important question is whether infrastructure providers are merely passive conduits working strictly on behalf of the vendors whose content they host, or if they offer extra services, like user tracking, profiling, and identity matching based on the visitor data they process as part of normal serving of vendor tags.

As we’ll discuss below, whether that happens or not is sometimes unclear.

Note: Because of the ambiguity at the heart of these public statement, it is entirely possible that neither of two examples below is doing any user tracking that we suggest is feasible – that’s the point, in a way. Please do not take this article as leveling accusations.

Context

In the simplest arrangement under the GDPR, the website owner is the data controller, the ad tag or analytics vendor is the processor, and the CDNs function as “subprocessors”. The GDPR places requirements on processors when they use subprocessors.

We at Blockmetry built Personal Data Auditor to help data protection professionals audit and monitor in real time their contracts with processors, which includes auditing their subprocessors.

We’ve found several infrastructure providers that raise privacy questions based on how they publicly communicate their practices and products. We will use only two examples from the ones we’ve found to illustrate the point.

Edgecast CDN

Edgecast CDN is particularly interesting for two reasons:

  1. It is owned by Verizon (branded Verizon Digital Media Services), whose publishing and advertising solutions includes the injection of the Unique Identifier Header (UIDH) in HTTP request going to Verizon companies;
  2. It serves Twitter’s Tweets, Buttons, and other assets embedded on pages across the web (see publish.twitter.com). The default embed code Twitter generates includes serving the key JavaScript file (widget.js) from platform.twitter.com, which is served by Edgecast.

Simply due to the large number of publishers worldwide that embed Twitter tags, Edgecast sees a large portion of internet traffic, along with Twitter’s own cookies.

Verizon Digital Media Services’ privacy policies are here, and the three relevant ones appear to be Verizon’s full privacy policy, international policy, and an Edgecast-specific policy. The Edgecast-specific policy is labelled as a “previous policy” or “per applicable services agreement”, which may mean it is out of scope.

All three policies talk about using IP addresses and user-agent strings in a way that implies user profiling. For example:

  1. The Edgecast-specific policy says “EdgeCast uses IP addresses to…track user movement, and to gather broad demographic information for aggregate use”; and
  2. The international policy says, in section 4, “We use this information [defined to include IP addresses and user-agent strings]…to help us deliver more relevant Verizon marketing messages on our websites, on non-Verizon websites. The marketing messages may be delivered by our representatives, via email, or via other Verizon services or devices. This information is also used to tailor the content you see …”.

In addition to the lack of clarity about which privacy policy applies, both quotes above can be interpreted in several ways, including allowing (or not) for user profiling and tracking.

Further, Verizon’s advertising business, Verizon Media (formerly called Oath) is mentioned in some of the privacy policies above. Interestingly, Oath is clear they do use IP addresses for profiling users. If, and how, Edgecast’s use of IP addresses is related to Oath’s use is unclear.

Separately, Twitter itself sets cookies that are for the whole *.twitter.com domain. Edgecast sees these cookies in each HTTP request it serves. None of the privacy policies above explains if Edgecast ignores them or not.

Twitter’s own privacy policy addresses this concern in a generic way when discussing the sharing of personal data in section 3.2, although it is not clear if Twitter considers Edgecast’s access to its cookies as “sharing” or not.

Google Public Key Infrastructure

This one is a bit more circumstantial, but worth exploring as a real-world example of a well-known concern.

Google operates its own TLS/SSL certificate authority (CA), called Google PKI, that is used by many (perhaps all) Google websites for the TLS certificates. As is normal for a CA, it operates an OCSP responder that browsers can use to check the validity of TLS certificates they encounter. OCSP privacy concerns are not new, but Google PKI offers a particularly concerning situation.

To begin, some example Google services that use Google PKI:

  1. www.google.com;
  2. fonts.googleapis.com (*.googleapis.com) and fonts.gstatic.com (*.gstatic.com) which, together, serve Google Fonts and other Google-served assets;
  3. cdn.ampproject.org, which serves Google Accelerated Mobile Project assets; and
  4. Google ads and trackers including DoubleClick and Google Analytics.

Google PKI uses Google’s company-wide privacy policy, the same one that applies to Google’s ad operations. This means, that for browsers that use OCSP to check for certificate validity, the user is subject to Google privacy policy and all it enables around tracking.

Importantly, Google certainly has the means to combine OCSP server log data (which includes the IP address, and any other HTTP request headers they choose to log) with other personal data that it knows about its users – it’s all Google’s data after all. Without public commitments for organizational and technical measures that such combination will not happen, it is possible that OCSP checks can be used to track users as they browse non-Google pages that embed Google-served assets, even if those pages do not include Google Analytics or Google ads.

This exact question has come up before with Google Fonts. Google addresses it in the Fonts FAQ by stating some technical measures to limit data collection and tracking, and falls back on the Google-wide privacy policy. This public reassurance does not appear to be sufficient for some developers wanting to use Google Fonts.

Recommendations

Consumers and businesses implicitly trust infrastructure providers, and most of the privacy conversation around ad and tracker blocking has centered on the ad and tracking vendors, not the underlying infrastructure providers that serve them. However, as shown above, this trust is potentially misplaced.

Infrastructure providers: The most important recommendation for infrastructure providers is to remove all ambiguity about the potential for tracking.

Infrastructure providers have to be absolutely clear about how personal data that passes through their networks is handled and how it is shared. One way to achieve this is to have a platform-specific privacy policy that is easily findable by consumers.

To make the same point another way, depending on a single privacy policy that covers all of the different activities of the business (as Verizon and Google do) creates uncertainty and goes against the GDPR’s principles of purpose limitation, data minimisation, and transparency.

Website owners, as the data controllers, also need to publicly document answers to the types of questions raised above. After all, it is their responsibility to audit and document.