Social communication and networking nodes and links showing LinkedIn profiles leaked in data scraping

Data Scraping Yields 700 Million LinkedIn Profiles for Sale on Dark Web; About 92% Of Platform Users, but Mostly Public Information

Another social media platform API has been abused for data scraping, this time the one belonging to business networking giant LinkedIn. A listing offering 700 million LinkedIn profiles appeared on an underground hacking forum, and reporters with privacy news site Restore Privacy verified that a sample of one million of these profiles was legitimate.

The profiles appear to mostly contain public-facing information, but in some cases have included geolocation data and may have also included contact information that was only meant to be accessed by authorized contacts on the site. LinkedIn confirmed that their API was used for data scraping at this scale, but says that some of the included information came from outside of the site.

LinkedIn profiles available to interested buyers for $5,000

The dark web post follows an incident in April in which 500 million LinkedIn profiles were offered for sale on a similar forum. LinkedIn denied that data scraping played a role in that incident, claiming that the information was gathered from a variety of other websites. On June 29, LinkedIn issued a statement confirming that the company’s API was used in this more recent incident and that information from the initial leak of 500 million profiles was among the new data for sale.

LinkedIn claims that “no private information” is to be found among the more recent collection of its profiles, but some of what was leaked appears to be items that were not intended for the general public. The sample of one million LinkedIn profiles was found to contain full names, LinkedIn user names and profile URLs, email addresses, phone numbers, physical addresses, geolocation records, gender, and usernames for other social media accounts. The geolocation data likely came from GPS tracking of mobile device users who logged in through the app.

As is the case with all large-scale data scraping of this sort, the primary risk to LinkedIn users is that the information will be put to use to craft convincing identity theft and phishing attempts. The hacker offered the data for $5,000, but the price usually comes down on “combination files” such as this as they are distributed more widely (and inevitably someone posts them publicly somewhere once the overall value has become very low).

Tim Mackey, Principal Security Strategist for Synopsys Software Integrity Group, expects an increase in API data scraping given that it is easy and can be converted to cash: “From a user’s perspective, there is no difference between a data breach where company servers were hacked and someone misusing an API to obtain their data. Data loss is data loss, and attackers will find the simplest way to obtain the data they need to fund their operations … As successful attacks on infrastructure become more difficult to execute, attackers will naturally shift their focus to abusing legitimate access methods like APIs provided by businesses to access data. Where legitimate users care about terms of service, criminals won’t. This is an important detail for anyone exposing an API on the internet – it’s only a matter of time before your APIs are discovered and abused. So the key question then becomes – how quickly can you detect abnormal usage and take corrective action? The more powerful your API, the more attractive it will be to criminals.”

Does data scraping account for all of the breached items?

It is still unclear at this point as to how all of the “extra” information that LinkedIn claims came from other sources got into the collection of LinkedIn profiles. A screenshot posted by Restore Privacy of the hacker’s source file shows fields that are not listed in LinkedIn’s documentation of what is generally available through the API, such as “inferred salary.” This raises a couple of possibilities. One is that the hackers may have had access to a higher-level API for perusing LinkedIn profiles that is not available to those outside of the organization. Another is that the data was actually obtained from a third-party data broker, something that LinkedIn would probably not like to admit to.

While data scraping is against LinkedIn’s terms of service, there is no real penalty for it besides the potential banning of a paid marketing account that has access to the API. Since users technically “volunteered” this information to the service, civil penalties are also highly unlikely. The incident creates an added need to be vigilant against scam and social engineering attempts for those who trusted LinkedIn with their personal information. The most serious information included in the breach is likely going to be the geolocation data, which Tom’s Guide found sometimes referred to specific residential houses.

While geolocation data is worrying (and certainly not what many users would consider “public”), the greatest danger to the average person from this breach is attack attempts on them conducted via LinkedIn itself. An example of this happened earlier this year, as a threat actor group calling itself “Golden Chickens” used fake LinkedIn job offers to direct unwitting platform users to malware. The group targeted LinkedIn profiles that appeared to be seeking employment, sending users messages about a potential job offer that had a zip file with a trojan attached. The trojan created a backdoor that ran quietly in the background of Windows systems in a way that was very difficult for antivirus software to detect; the group did not directly exploit compromised systems, but instead rented out access to them to other parties as a “malware as a service” scheme. The campaign was highly targeted at workers in the health care sector, and the information leaked from the LinkedIn profiles is tailored to crafting convincing scam messages of this nature. James McQuiggan, Security Awareness Advocate for KnowBe4, notes studies showing that recent and accurate personal information increases success rates of phishing attempts: “In the past, research has shown that more people fall for phishing emails when it comes to their social media accounts like LinkedIn or Facebook. Users must monitor their email, avoid clicking on any links and visit the actual social media account to determine anything wrong with an account.”

LinkedIn has not shown much indication that it is concerned about abuse of its API. Chris Clements, VP of Solutions Architecture for Cerberus Sentinel, points out that tech platforms do not have a legitimate reason to shrug off API data scraping as an inevitable cost of doing business: “The size and regularity of mass scale data leaks can lend itself to a defeatist attitude about the future of privacy and security online, however, these are problems that can be vastly improved given the right attention and resources. Organizations must adopt a true culture of security to ensure that data users entrust to them remains safe from unintended disclosure. Security must be built into the design of applications with the expectation that any functionality like data export APIs can and will be abused by malicious actors. Even beyond design, all systems and applications should be regularly penetration tested to ensure no mistakes or oversights have been introduced that may expose sensitive data. Continuous monitoring for suspicious behavior is also critical for ensuring that any malicious activities can be caught and stopped before widespread damage has been done.”