Man coding on laptop showing the implication of HiQ vs. LinkedIn case on automated web scraping
What the HiQ vs. LinkedIn Case Means for Automated Web Scraping by James Keenan, Automation and Anonymity Evangelist at Smartproxy

What the HiQ vs. LinkedIn Case Means for Automated Web Scraping

Data plays a prominent role in our lives today, even if we aren’t aware of its presence. There are many complex moral, legal, and philosophical questions about how we gather and use data; not least who actually owns it. After all, if I tell you how tall I am, do you now ‘own’ my height? If you write the number down and sell the information on to someone else, am I entitled to a cut?

A case between data aggregator HiQ and social media platform LinkedIn highlights some of the difficult questions facing data scientists today.

The case in a nutshell

The implications of the litigation between LinkedIn and HiQ are profound, but the case itself is simple enough to understand. It centered around LinkedIn’s invocation of the Computer Fraud and Abuse Act in a cease-and-desist letter to HiQ.

HiQ is a data analytics firm that provides business intelligence based on publicly-available data scraped from LinkedIn. Like many businesses today, they depend on access to public-facing data to be able to function. One of the unspoken but very salient questions raised by the case is where the line between public and private data lies.

The data that LinkedIn holds belongs to the company, inasmuch as it is being stored on their systems. However, the data itself consists only of what other people have submitted to LinkedIn. At the time of the case, the data was accessible to anyone who visited LinkedIn. From HiQ’s perspective, this meant that the data on LinkedIn was fair game for scraping. From LinkedIn’s perspective, their ToS prohibited the use of automation tools. They had a right to enforce those ToS by banning IP addresses associated with scraping.

With a growing number of entities scraping LinkedIn for data, the platform took action to terminate the accounts of suspected offenders. One of the businesses caught up in the bans was HiQ. They were able to easily circumvent the IP ban, by utilizing proxy services to mask the IP addresses they used for scraping.

LinkedIn responded by sending a cease-and-desist letter to HiQ. They asserted that not only had the firm breached LinkedIn’s ToS, but they had also violated the Computer Fraud and Abuse Act (CFAA), along with some other laws. HiQ responded with a lawsuit seeking an injunction against LinkedIn to prevent them from hindering HiQ’s access to data until the case was resolved.

The precedent

In an opinion published in September 2019, the Ninth Circuit, while stopping short of issuing a definitive ruling, appeared to be leaning towards HiQ’s side. The Ninth Circuit made the significant decision to disregard some of its own prior rulings. This case was far from the first concerning how online services use the CFAA to enforce their own terms of service.

For example, a case back in 2012, United States v. Nosal, led to a ruling from the Ninth Circuit that the CFAA should not be turned “into a sweeping internet-policing mandate.” The CFAA was originally conceived to provide a legal framework for responding to hacking and the court chose to maintain the Act’s focus on hacking when issuing their decision. In that case, it was decided that violating a website’s terms of use would not constitute a violation of the CFAA.

Giving the CFAA a broader focus so that it could be used to enforce a website’s user agreement would have had a chilling effect on the then-nascent data scraping industry. In fact, the potential impact on internet users would have been far-reaching. Just about any internet user could be criminally liable for even minor infractions of a social media service’s ToS. The Ninth Circuit’s ruling in Nosal suggested that its interpretation of the CFAA was relatively narrow and that violations of the Act required more than a ToS violation.

However, two other decisions taken by the Ninth Circuit muddied the waters. One of these concerned a second decision in the Nosal case. The other was a ruling in an unrelated case, Facebook v. Power Ventures. In the second Nosal ruling, the court held that the term “without authorization” in the CFAA is not limited to circumventing access control using technical methods. A user gaining unauthorized access with legitimate login credentials could still be in violation of the act.

In the Power Ventures ruling, the court found that even though the data scraper had permission to access Facebook accounts using passwords and scrape data, it continued to do so after Facebook issued a cease-and-desist letter. This put Power Ventures in violation of the CFAA. Facebook had also blocked the IP address Power Ventures had initially used, although Power Venture’s circumvention of this block was not in itself considered to be a violation.

The ruling

A number of organizations, including the Electronic Frontier Foundation (EFF), have taken a particular interest in the case because it has far-reaching implications for data scraping. The case also presented an opportunity to overturn or limit the impact of the Ninth Circuit’s earlier rulings. The EFF feared this would have a chilling effect on innovation and web scraping.

In their cease-and-desist to HiQ, LinkedIn cited the Power Ventures case as evidence that continuing to access its data would mean HiQ was in violation of the CFAA. HiQ decided to beat LinkedIn to the punch and filed for a preliminary injunction. Despite the earlier Power Ventures ruling, the Ninth Circuit found that HiQ was “likely” to be successful in their claim that automated access to public-facing data was not a violation of the CFAA.

The Ninth Circuit ultimately upheld the preliminary injunction, but there is still potential for the case to come back to court.

What Does This Mean for Data Scraping?

During the case, the EFF filed an amicus brief that emphasized to the court how vital scraping is to a number of industries. Web scraping isn’t just used commercially. It is vital for research and has a number of other beneficial uses.

The Ninth Circuit affirmed that any data that required no authorization to access and was freely available by default was fair game for scraping. As the court pointed out, ‘authorization’ to access data is implicit unless steps are taken to restrict general access.

The ruling in HiQ v. LinkedIn means that judges in the future will have more leeway. It limits the significance of earlier rulings in the Power Ventures and Nosal cases. In those cases, the court was of the opinion that requiring a login before providing access to data would render it as private, not public, data.

This raises another problem, however. Upon logging in to Facebook, a wealth of otherwise private data is now easily available without restrictions. LinkedIn appears to have interpreted the court’s ruling as meaning that any and all data that requires a login is private and LinkedIn can revoke access to it. As a result, LinkedIn is now requiring users to login before being able to browse the platform.

However, for many people, the most significant finding of the Ninth Circuit was that the CFAA exists to combat hacking and cannot be used as a catch-all enforcement document for enforcing a website’s ToS.

Finally, the case touches on one of the most important data and privacy issues of our time. Who actually owns our personal data? The Ninth Circuit’s ruling would appear to affirm that it is us that owns our data. Any platforms we share that data with are merely licensed to use it, they don’t own it outright.

HiQ circumvent #LinkedIn' IP ban by utilizing proxy services to mask the IP addresses they used for data scraping. #respectdataClick to Tweet

Data scraping is an integral part of the modern internet ecosystem. It isn’t about to go anywhere. LinkedIn’s interest in pursuing HiQ may have more to do with them competing to provide the same services than it does about any legitimate security or privacy concerns. It is worth noting that the Ninth Circuit listed a number of other potential legal remedies for businesses in LinkedIn’s position. The case will now return to the district court for a trial. A lot of people will be watching developments with great interest.