Instagram Data Scraping by HYP3R Raises Privacy Concerns

Until recently, many of the social media privacy concerns that seem to swirl around Facebook on a regular basis never seemed to extend to Instagram, which is owned by Facebook. But all that could be changing as the result of a recent Instagram data scraping case that is attracting a lot of attention from privacy and security experts. A trusted Facebook marketing partner, HYP3R, had been scraping data from Instagram, storing it on its own servers, and then re-packaging all of that social media data for advertisers. The Instagram data scraping in question included physical locations, bio information, and photos – as well as some content (such as Instagram Stories) that were specifically intended to disappear after 24 hours.

As might be imagined, Instagram is facing a firestorm of controversy over this HYP3R Instagram data scraping case. What makes the Instagram data scraping case even worse is the fact that it apparently was able to take place right under Instagram’s watch. The company applied only very loose security barriers and safeguards to protect user data, and never checked up on how its Facebook marketing partners were actually using Instagram user data.

The Instagram data scraping business model

In the case of HYP3R, the San Francisco-based company was specifically touting access to a database of high-value consumers to advertisers, and it now appears there’s a good reason why. According to some experts, as much as 90 percent of the data from HYP3R database came from Instagram. What the company did was to routinely hoover up as much user data as it could within a short enough period of time – and then store all this data indefinitely on its own servers. This practice even extended to Instagram Stories, which are specifically designed to be ephemeral in nature. When people post these Stories to Instagram, the expectation is that this content will soon disappear from the web.

Even more disturbingly, even when Instagram sought to restrict access to its data and information by tweaking its API in 2018, HYP3R looked for an end-around these data restrictions to view any content it needed. The company found loopholes wherever it could, with the most obvious of these being the ability to access the public data found on Instagram’s “Locations” pages even when logged out of Instagram. This meant that HYP3R could hoover up data about public locations even when not logged in – a tactic that came in very handy when it was trying to create geofencing data for advertising partners. For example, say that an advertiser wanted to run a promotion around a certain hotel in a certain city – all HYP3R had to do was find Instagram Locations near that hotel, hoover up as much public data as possible about that location, and then repackage and reformat the users’ data for advertisers.

Data scraping and privacy violations

So did HYP3R really do anything wrong? Obviously, Instagram didn’t think so until tech media outlets started poking around and asking questions. After the story about Instagram data scraping started to pick up media traction, Instagram decided to boot HYP3R off its platform, even going so far as to send a “cease and desist” letter to the company, in order to block HYP3R from accessing any content or information from the Instagram API. Instagram now says that this case of Instagram data scraping violates the company’s terms of service. As a result, they’ve removed HYP3R from the company’s list of trusted marketing partners. Instagram says that HYP3R’s actions were not sanctioned and violate the social network’s terms of service.

In response, though, HYP3R says that all of the data that it was accessing was viewable publicly by everyone online. In other words, it didn’t require some sort of proprietary access to the Instagram API in order to scrape all this data: everything was accessed publicly. HYP3R, in a statement, said that it remains a company that enables “authentic, delightful marketing that is compliant with consumer privacy regulations and social network terms of service.” But did you really expect the company to say anything else? After all, there are probably thousands of Internet bots doing the exact same sort of thing as HYP3R – it’s just that HYP3R got caught.

What makes the HYP3R case so egregious, however, is the fact that the company’s entire business seems to be based around Instagram data scraping. When advertisers paid big bucks to HYP3R to help them create location-specific advertising campaigns, they probably didn’t realize that HYP3R was just engaging in Instagram data scraping. Instead, they probably assumed that HYP3R had relationships with top influencers, and was only using Instagram data to complement and support a proprietary database of social media profiles.

Facebook, Cambridge Analytica and HYP3R

If the HYP3R case sounds familiar, well, it should. HYP3R’s actions are very reminiscent of how another Facebook partner, Cambridge Analytica, managed to turn an innocuous-sounding social media quiz into a huge data scraping business involving close to 87 million Facebook users. Instead of selling all this data to advertisers, as HYP3R did, Cambridge Analytica instead re-packaged and re-formatted the data for political campaigns. People who had never heard of Cambridge Analytica were having their data used in ways they had never anticipated.

The same type of shady business practice seems to have been going on in this Instagram data scraping case. Once HYP3R had access to the Instagram API, it kept pushing and pushing until it had much more data than ever anticipated by Instagram. Then, once Instagram closed off access to part of its API, HYP3R continued to look for a way to get around Instagram’s lax security barriers. And, in the process, say experts, HYP3R may have been scraping as many as 1 million different Instagram posts every month.

New rules of the road for social media data

If there’s one big takeaway from this Instagram data scraping case, it’s that data collection and brokering is big business these days. While the data created by any individual user (such as a photo of a tourist spot tagged with the location of that photo) might seem to be insignificant, it is the aggregate data that matters here. And this is something that HYP3R realized very well: an advertiser would not be willing to pay for a single user’s data, but would be more than willing to pay for data from thousands or tens of thousands of users, all of them creating content around events, locations or themes.

What’s needed now more than ever is a set of new rules of the road for social media usage. The days of companies like HYP3R engaging in Instagram data scraping with the sole purpose of turning around and re-selling that data to third parties need to end. Perhaps the only way to create these new rules of the road, though, is via comprehensive national privacy legislation that will finally hold social media companies like Instagram responsible for the actions of their partners and prevent companies from scraping public data.