Biometric information does enjoy a higher level of legal protection than random blog posts and articles. However, the fundamental point of these cases is that users have some reasonable expectation of how their content will be used by private entities when they put it on the internet, and having it absorbed into a private for-profit database is not always among those reasonable expectations. The issue is even more complex with AI models involved, as they may spit some of this information back out in a legally actionable way.
Legal challenges along these lines are broadly expected over the next few years as AI models and products continue to roll out, and some are in courts already. OpenAI is facing a class action lawsuit in Northern California over its freewheeling scraping of the internet, and the ultimate result of that case will apply directly to Google’s plans. A separate and more targeted lawsuit was also filed against OpenAI by a collection of authors who say that the company specifically scraped their protected works. And Microsoft looks to be headed to court over its training of Github Copilot, which plaintiffs say helped itself to open source code without honoring the required licensing agreements.
AI models already changing the internet landscape
As these court judgments and precedents likely take years to shake out, and in turn prompt more clear regulation of AI models, the deployment of tools like ChatGPT is already causing major ripple effects across the public internet. AI scraping by competitors has been the cited reason for both Twitter and Reddit’s APIs suddenly going entirely pay-for-play. Over the July 4th weekend Twitter took the extreme additional step of requiring users to log in to view any tweets at all, and then doubled down by throttling all users to viewing just several hundred posts per day.
AI models like Google and Bing’s new assisted search tools are also roiling private industry that has built around search traffic. These tools offer an “AI summary” often scraped from websites, diverting users from actually visiting the sources it took the information from. Google in particular faces legal and regulatory threats along antitrust lines in this area, given that it holds an estimated 90% of the search advertising market. The beta version of Google’s “Search Generative Experience” tool indicates that AI-generated text will take up the entire first page of search results (combined with advertising links), with actual links to websites “below the fold” requiring users to scroll down and click a “show more” button to view.