A handful of the major content distribution networks (CDNs) and other infrastructure providers serve many of the ad tags, analytics trackers, and other assets across the web. Given the prominence of these providers, an obvious question is to look at their privacy policies a bit more closely.
The important question is whether infrastructure providers are merely passive conduits working strictly on behalf of the vendors whose content they host, or if they offer extra services, like user tracking, profiling, and identity matching based on the visitor data they process as part of normal serving of vendor tags.
As we’ll discuss below, whether that happens or not is sometimes unclear.
Note: Because of the ambiguity at the heart of these public statement, it is entirely possible that neither of two examples below is doing any user tracking that we suggest is feasible – that’s the point, in a way. Please do not take this article as leveling accusations.
In the simplest arrangement under the GDPR, the website owner is the data controller, the ad tag or analytics vendor is the processor, and the CDNs function as “subprocessors”. The GDPR places requirements on processors when they use subprocessors.
We at Blockmetry built Personal Data Auditor to help data protection professionals audit and monitor in real time their contracts with processors, which includes auditing their subprocessors.
We’ve found several infrastructure providers that raise privacy questions based on how they publicly communicate their practices and products. We will use only two examples from the ones we’ve found to illustrate the point.
Edgecast CDN is particularly interesting for two reasons:
- It is owned by Verizon (branded Verizon Digital Media Services), whose publishing and advertising solutions includes the injection of the Unique Identifier Header (UIDH) in HTTP request going to Verizon companies;
Simply due to the large number of publishers worldwide that embed Twitter tags, Edgecast sees a large portion of internet traffic, along with Twitter’s own cookies.
All three policies talk about using IP addresses and user-agent strings in a way that implies user profiling. For example:
- The Edgecast-specific policy says “EdgeCast uses IP addresses to…track user movement, and to gather broad demographic information for aggregate use”; and
- The international policy says, in section 4, “We use this information [defined to include IP addresses and user-agent strings]…to help us deliver more relevant Verizon marketing messages on our websites, on non-Verizon websites. The marketing messages may be delivered by our representatives, via email, or via other Verizon services or devices. This information is also used to tailor the content you see …”.
Further, Verizon’s advertising business, Verizon Media (formerly called Oath) is mentioned in some of the privacy policies above. Interestingly, Oath is clear they do use IP addresses for profiling users. If, and how, Edgecast’s use of IP addresses is related to Oath’s use is unclear.
Separately, Twitter itself sets cookies that are for the whole *.twitter.com domain. Edgecast sees these cookies in each HTTP request it serves. None of the privacy policies above explains if Edgecast ignores them or not.
Google Public Key Infrastructure
This one is a bit more circumstantial, but worth exploring as a real-world example of a well-known concern.
Google operates its own TLS/SSL certificate authority (CA), called Google PKI, that is used by many (perhaps all) Google websites for the TLS certificates. As is normal for a CA, it operates an OCSP responder that browsers can use to check the validity of TLS certificates they encounter. OCSP privacy concerns are not new, but Google PKI offers a particularly concerning situation.
To begin, some example Google services that use Google PKI:
- fonts.googleapis.com (*.googleapis.com) and fonts.gstatic.com (*.gstatic.com) which, together, serve Google Fonts and other Google-served assets;
- cdn.ampproject.org, which serves Google Accelerated Mobile Project assets; and
- Google ads and trackers including DoubleClick and Google Analytics.
Importantly, Google certainly has the means to combine OCSP server log data (which includes the IP address, and any other HTTP request headers they choose to log) with other personal data that it knows about its users – it’s all Google’s data after all. Without public commitments for organizational and technical measures that such combination will not happen, it is possible that OCSP checks can be used to track users as they browse non-Google pages that embed Google-served assets, even if those pages do not include Google Analytics or Google ads.
Consumers and businesses implicitly trust infrastructure providers, and most of the privacy conversation around ad and tracker blocking has centered on the ad and tracking vendors, not the underlying infrastructure providers that serve them. However, as shown above, this trust is potentially misplaced.
Infrastructure providers: The most important recommendation for infrastructure providers is to remove all ambiguity about the potential for tracking.
Website owners, as the data controllers, also need to publicly document answers to the types of questions raised above. After all, it is their responsibility to audit and document.