Data leaks are an unfortunate fact of online life. There have been so many in recent years that everyone should expect at least one service to have let them down. In 2016, Uber exposed the details of 57 million users. In 2017, Equifax leaked the personal information of 143 million users, including social security numbers and addresses. JP Morgan Chase was hacked in 2014, releasing data that included the contact information of more than half of US households. This year, Marriott was relieved of over half a billion hotel reservation records. You can see more of the same on Have I Been Pwned.
But I want to talk about Quora’s exposure of the sensitive data of 100 million users, particularly the leak of its password database. Quora is a question and answer site, similar in intent to Stack Exchange or Yahoo Answers, and sitting somewhere between them with regard to the quality of answers and editorial policies. There will be many post-mortems examining the mistakes Quora made, but I will look at what Quora did right.
The leak of 100 million user passwords sounds like a disaster. People often use the same passwords on more than one website. It’s a certainty that many of the passwords that leaked are also used on email accounts, eCommerce stores, bank sites, and healthcare services. That’s good news for criminals, who can use those details for identity theft and all manner of fraud. But Quora’s leak isn’t as good for criminals as you might think. Quora was smart about how passwords were stored and how they handled the leak. There are lessons that companies of all sizes could learn.
Making a hash of password storage
It’s not accurate to say Quora lost 100 million passwords. It is more precise to say that they lost 100 million cryptographic hashes. The passwords didn’t leak because Quora didn’t have the passwords; they had hashes of the passwords. A cryptographic hash is a fixed length string of letters and numbers generated by a hashing algorithm. A hashing algorithm runs with the password as its input, outputting a long string of characters.
Hashing algorithms have some peculiar qualities. The same input produces the same output. Different input produces different output, so different passwords can’t generate the same output hash. Small differences in the input lead to big differences in the output. And, most importantly, it’s impossible to reverse the hashing process: you can’t get the password from the hash (this is only mostly true, as we’ll discuss shortly).
To bake a cake, ingredients are mixed and put in a hot oven. It’s easy to make a cake given the right ingredients, baking temperature, and baking time. But once the cake is baked, you can’t reverse the process to get the ingredients back. You can’t make eggs and flour from a cake. Hashing is the same: you can turn a password into a hash, but you can’t turn a hash into the original password.
This behavior is useful because, when users choose a password, a service can run it through a hashing algorithm, store the output, and forget the password. When the user wants to log in, the service runs the password the user entered through the same algorithm, compares the output to the hash stored in the database, and, if they match, the user entered the same password. Storing hashes is the first thing Quora did right.
Slower is better
There are many types of hashing algorithm. One of the ways they differ is speed. Some are fast: they produce hashes very quickly, which is to say they don’t use a lot of computational resources to generate the hash. Slow hashing algorithms, on the contrary, are slow and consume more computational resources. Why does speed matter? Because when criminals have a hash and want a password, they can run dictionaries of potential passwords through the hashing algorithm. The result is a long list of hashes and the passwords that produced them. The criminals can search through the list of hashes and find the associated passwords.
With a fast hashing algorithm, reversing the hashes in this way is easy. It can be done on cheap hardware in a reasonable timeframe. But with a slow hashing algorithm, it would take a very long time to reverse the hashes, perhaps thousands of years. Many services use fast algorithms to hash their passwords, and, if their database of hashes leaks, criminals can work out the passwords.
Quora used bcrypt, a slow hashing algorithm. That makes it almost impossible for the hackers to figure out the passwords unless they are prepared to wait until their great-great-great-great-grandchildren are grown up. Using bcrypt is the second thing Quora did right.
Salty hashes are better for you
Even with a slow hashing algorithm, some password hashes can be reversed. If a user chooses “dog” as their password, it can be figured out because “dog” is a dictionary word. It will be among the first passwords run through the algorithm, and attackers already have huge tables of these easy-to-guess passwords and their hashes. They’re called rainbow tables. Lots of users choose easy-to-guess passwords, but there is a way to confound the criminals and their rainbow tables.
A salt is a random string of letters and numbers added to a password before it’s hashed. Every user’s password is given a different salt. A user chooses the password “dog” and the service adds the salt “32dsheu49” to the end, hashing “dog32dsheu49”. Another user also chooses “dog”, and the service adds “73hbfy374” to the end, creating the string “dog73hbfy374”. Salting makes every password unique and complex, even though the users chose the same simple password. The salts are stored with the password hashes, but that doesn’t help the criminal because a small difference in the password makes a big difference in the hash. There’s no way to tell that the original passwords were the same. If you’re curious about how salting works in more detail, Dan Arias wrote an excellent introduction on the Auth0 blog. Salting the stored hashes is the third thing Quora did right.
The combination of hashed and salted passwords created by a slow hashing algorithm like bcrypt makes it practically impossible for criminals to do anything useful with Quora’s leaked password databases.