Social communication and networking icons showing data scraping of social profiles

Chinese Startup Leaks 318 Million Private Records Obtained Through Data Scraping Facebook, Instagram, and LinkedIn Social Profiles

A Chinese social media management startup leaked over 400GB of personally identifiable information (PII) of social media users, including celebrities and social media influencers worldwide and the US. SocialArks obtained the information by data scraping social media networks, which remains a controversial practice banned by the affected networks.

The firm describes itself as a “cross-border social media management company dedicated to solving the current problems of brand building, marketing, marketing, social customer management in China’s foreign trade industry.”

More concerning was the presence of private personal information not publicly provided by the victims on their public social profiles. The data leak affected 214 million social media users on Facebook, Instagram, and LinkedIn.

Safety Detectives discovered the exposed data as part of a cybersecurity mission to find various vulnerabilities posing cybersecurity risks to the general public.

Sensitive information exposed from unsecured ElasticSearch database

Safety Detectives discovered the information stored in a misconfigured ElasticSearch database without password protection or encryption during a routine IP address check for unsecured databases. The researchers noted that anybody with the IP addresses could have accessed the information.

The head of the Safety Detectives cybersecurity team Anurag Sen said that the exposed Elasticsearch database contained 408GB from 318 million records obtained from social profiles of 214 million Facebook, Instagram, and LinkedIn users.

Tencent hosted the vulnerable server in Hong Kong. The server was segmented into indices to store data obtained from different sources effectively.

SocialArks suffered a similar breach in August 2020, exposing data from 150 million LinkedIn, Facebook, and Instagram social profiles.

Leaked information obtained from data scraping violating user terms of service

Safety Detectives researchers confirmed that the information was obtained through data scraping the affected social media platforms. The researchers also noted that the practice is unethical and violated Facebook’s, Instagram’s, and LinkedIn’s policy.

Data scraping involves the use of automated bots capable of extracting information from web pages without human interaction. The practice is legal in most cases but could be abused by various rogue actors to copy large amounts of information. Some websites have a policy prohibiting the practice. Others employ various countermeasures, such as using captchas, which could also be defeated by the scraping bots.

Typical legal applications of data scraping include information gathering on booking sites and job portals for analytical purposes.

However, scraping personal information and aggregating it with data from other secure locations is unethical and troubling for social media companies and users.

The possession of highly personalized information could lead to social engineering attacks through specifically-crafted and personalized messages. It also creates the possibility of identity theft to commit financial fraud on online banking systems.

Controversial practices such as data scraping put professional network users in a dilemma on whether to provide personal information necessary for business and employment or limit their social profiles to protect their privacy.

Private personally identifiable information leaked from public social profiles

The information leaked allowed someone to determine the victims’ full names, resident country, workplace, job position, subscriber data, social profile link, and contact information.  The information also contained profile pictures, Messenger ID, usernames of other linked social media accounts, number of followers, frequently used hashtags, number of comments, among other details.

Additionally, the leak revealed personal data for Instagram and LinkedIn users, including phone numbers and email addresses, even for users who never publicly provided such information on their social profiles.

It remains unclear how SocialArks obtained the private data inaccessible through regular data scraping of public social profiles.

In total social profiles of 11,651,162 Instagram and 66,117,839 LinkedIn users were leaked, while 81,551,567 Facebook user profiles were exposed. Another batch containing 55,300,000 Facebook profiles was deleted a few hours after discovery.

SocialArks never responded to the researchers’ messages but secured the database upon notification.