Hacker working on code showing data scraping and data published on hacking forum

Over 2.6 Million Duolingo User Records Obtained via Data Scraping Published on Hacking Forum

Duolingo user account information obtained via data scraping was recently leaked on an underground hacking forum.

The data includes publicly available names and usernames, private email addresses, phone numbers, experience level, language, learning progress, achievements, social media information, country, role, courses, subscriptions, and account creation date.

Duolingo told Recorded Future’s The Record that threat actors scraped the data from publicly available profile information. The language-learning platform added that its systems were not compromised during the incident.

However, Duolingo failed to explain how threat actors obtained email addresses and other private user information.

Hacking forum selling Duolingo user data for $2

VX Underground reported that threat actors published scraped data of 2.6 million Duolingo users on August 21, 2023, on a hacking forum for 8 credits worth only $2.13.

“Today I have uploaded the Duolingo Scrape for you to download, thanks for reading and enjoy!,” the seller posted on the hacking forum.

On January 2023, a threat actor was selling the same data on the Breached hacking forum for $1,500.

Researchers explained that the private information was obtained by scraping an exposed application programming interface (API). The exposed API allows anybody to submit an email address and confirm if it is associated with a Duolingo account. On success, it returns personal data associated with the username.

Subsequently, threat actors could use emails leaked in previous data breaches to confirm if the user has a Duolingo account.

The previously leaked users’ data could contain additional information, such as phone numbers, allowing threat actors to execute social engineering and targeted phishing attacks on Duolingo users.

However, Max Gannon, a Senior Cyber Threat Intelligence Analyst at Cofense, believes the scraped data is worthless except for targeted attacks.

“The scraped data doesn’t have much value outside of targeted attacks where the attacker spoofs Duolingo, this is demonstrated by the fact that the dump is now only worth $2.13,” Gannon said. “The only mitigation steps that can be taken are for users of Duolingo to be particularly suspicious of potentially spoofed communications.”

VX Underground also suggested that the information could be used for doxxing, a cyber attack involving publishing someone’s personal information online.

X (formerly Twitter) users have discussed Duolingo’s exposed API, which was still publicly accessible at the time of publication, since March.

“The Duolingo data breach highlights the vulnerabilities posed by poorly secured APIs and the potential for business logic abuse by threat actors,” said Jason Kent, Hacker in Residence, Cequence Security.

“In this case, the breach was not a result of traditional hacking methods but rather the exploitation of an exposed API that had been openly shared since at least March 2023.”

Massive data scraping leaks have led to stern legal and regulatory actions

Data scraping has exposed billions of users, leading to stern regulatory actions and multiple lawsuits.

In 2022, the Irish Data Protection Commission (DPC) fined Meta €265 million after a data scraping incident exposed 533 million users via the ‘Add Friend’ feature.

The DPC is investigating another bug exposing the email addresses and other personal information of over 200 million Twitter users.

In April 2021, the personal information of 150 million LinkedIn users obtained via data scraping was listed for sale on a hacking forum. Shortly after, another trove of 500 million records from the same employment platform was listed on another hacking forum.

Data scraping remains controversial, with website owners obligated to protect users against scraping, even for publicly available information.

“This is yet another example why every online service should take proactive security measures to prevent mass data scraping,” said Roger Grimes, Data-Driven Defense Evangelist at KnowBe4. “This isn’t a research project where some computer security defender noticed an error that needs to be fixed.”

According to an August 2023 joint regulatory statement by the Global Privacy Assembly (GPA), websites should “carefully consider the legality of different types of data scraping in the jurisdictions applicable to them and implement measures to protect against unlawful data scraping.”