Major Data Broker Exposes 235 Million Social Media Profiles in Data Leak

Social Data, a data broker that appears to have been scraping public social media profiles for information without the knowledge or consent of the host companies, is the latest organization to get caught with an exposed public database. The source of the data leak was an unsecured database sitting unprotected without a password, apparently due to some sort of configuration error.

In addition to providing yet another example of how simple cloud configuration errors can turn into massive problems for organizations, this data leak also illustrates the risks of mass scraping performed by the most questionable elements of the data broker market.

Dangerously irresponsible data brokers

The data leak, which contains identical copies of about 235 million social media profiles, was discovered by security researchers with Comparitech. Comparitech has made something of a specialty out of discovering unsecured and improperly configured databases in recent months, including the Adobe Creative Cloud breach in October, eight million sales records from EU ecommerce sites in March and the UFO VPN incident last month.

This particular data leak contains information harvested from public profile pages of Instagram, TikTok and YouTube. These pages can contain full names, links to personal and business websites, email addresses, personal images and videos, the content of posts and information about followers among other items.

These pieces of personal information are not an immediate threat in terms of individual financial crimes or hacking of any personal accounts. However, this is the sort of information that scammers collate into larger “combo files” (often traded on the dark web) as reference for elements of authenticity when engaging in attempts at fraud or social engineering. A data breach of this nature provides a convenient source for materials that otherwise would have taken weeks or months and substantial computing power to harvest.

The users of these services are also frequently not aware that scraping is done by data brokers, and believe that once something has been removed from a profile page it is no longer accessible to the internet in general. Scraping can constitute a privacy violation when it captures and redistributes materials no longer meant to be made public. For all of these reasons, the platforms involved in this data leak (as well as numerous other social media sites) forbid scraping in their terms of service.

Comparitech reports that the data leak appears to originate from a now-defunct data broker called Deep Social, which appears to have either reformed as or passed assets on to Social Data. Deep Social was banned from using Facebook and Instagram’s marketing APIs in 2018 for scraping, which appears to have been the deathblow for the company. A spokesperson for Social Data defended the practice to Comparitech on the basis of the information being public, not addressing the fact that it violates the TOS of most major platforms.

Data leaks due to carelessness on the rise in recent years

This is just another in a long string of similar data leaks across roughly the last two years, attributable to either failure to password-protect a database or a misconfiguration creating a vulnerability. Misconfigurations frequently happen due to improperly-applied updates, but organizations also sometimes simply believe a “security through obscurity” approach will be good enough to keep an unprotected database out of the public’s eye.

Groups like Comparitech scan entire IP blocks and make use of resources such as Shodan to locate these data leaks, ideally before the bad guys find them. However, there is little way to know if the breaches have already been exploited before security researchers get to them. That requires the responsible parties to conduct an internal forensic investigation and report the results to the public, something that can take months and that is often not required by any sort of law or regulation. Data brokers are particularly unlikely to do anything that is strictly voluntary.

Though data collection via scraping is technically not illegal, Chloé Messdaghi, VP of Strategy of Point3 Security, points out that the practice could end up falling afoul of the law if the personal information of minors is caught up in these data broker files and exposed online: “Moreover, sites such as Facebook, Tik Tok, Instagram, and YouTube attract minors – can we say for certain that data scrapers are in compliance with the FTC’s Children’s Online Privacy Protection Act (COPPA) mandates? This underscores why everyone needs greater protections online, and especially children.”

This incident also raises the question of how much personal data should really be made public on social media sites, given that data brokers are always vacuuming up this information even if the sites prohibit it. There is no law preventing them from doing so; all that individual sites can do is make it more inconvenient for them by banning them from their various services. Saryu Nayyar, CEO of Gurucul, weighed in on this question: “The challenge for users is how to balance usability with security. We have to assume our information will escape from 3rd parties, so how little information can we expose and still use the social media services we’ve come to rely on? At the very least, it’s worth separating the addresses and information we associate with our critical accounts, such as banking or health, from our strictly social activities. That keeps a compromise of one from leading to a direct compromise of the other.”

What can organizations with overwhelmed IT departments and a mountain of regular patches do to keep up with these potential vulnerabilities? Chris DeRamus, VP of Technology of Cloud Security Practice for Rapid7, suggests that automated scanning and patching is becoming a virtual necessity given the current conditions: “This incident further underscores the importance of investing in automated cloud security solutions, as many breaches are a result of misconfigurations of cloud services that are exploited by an attacker. Companies must employ security tools that are capable of detecting and remediating misconfigurations (such as databases left unsecured without a password) in real time, or better yet – preventing them from ever happening in the first place.”