Given the recent hacking attack on LinkedIn, in which millions of user passwords were stolen and posted on the internet, I felt it appropriate to discuss passwords. In particular, I wanted to help everyone understand how passwords work, hopefully providing some insight into how a good password can be created.
Clearing Up Some Confusion
First, I want to clear up some common misunderstandings regarding passwords. If you are a frequent reader of XKCD, you might recall a comic addressing the issue of password strength:
While this comic is technically correct, it is somewhat misleading. To an unknowing reader, the comic could give the impression that using real words can create a stronger password than using a bunch of gibberish. However, what it fails to explain is that the length of the password is also important.
I tested out this comic’s claim using Steve Gibson’s Haystack Tool, which estimates the time required to crack a password using a brute force approach (meaning the attacker uses a program that simply tries every possible password). After trying the two passwords in the comic, I then created another password by adding random gibberish onto the first password, creating one of equal length to the “better” password. Here is what I found:
||Online Attack Time
||5.75 x 1021
||1.83 billion centuries
||2.46 x 1035
||78.30 billion trillion centuries
||2.66 x 1051
||8.47 hundred trillion trillion trillion centuries
As you can see, the comic is correct in saying that “correcthorsebatterystaple” is a better password than “Troub4dor&3″. But that is simply because the password is longer. If we create a gibberish password that is just as long, such as “Tr0ub4dor&3pha4aeP5aephai2″, we see that given equal length, a gibberish password is in fact harder to crack than 4 common words put together. It is also harder to remember, but more on that later. For now, lets dive in to why the gibberish is harder to crack.
When you go to a website and create a password, it is not (or shouldn’t) be stored as-is. It is typically run through a special process called hashing, which is a one-way transformation of the password into something that is unreadable. Two commonly used algorithms used to hash passwords are called the md5 algorithm and the sha1 algorithm. The beauty of these tools is that even the slightest change in the password input creates a totally different output, while the length of the output always remains the same. As an example, lets look at the md5 and sha1 outputs for two similar words:
The only difference between the two inputs was whether or not the “H” was capitalized, and yet the two outputs have virtually no resemblance to each other except for the length. So using either of these tools would turn any password into a sting of gibberish, and if all you have is the output, it is almost impossible to know what input created that string. I say “almost”, though, because of one minor problem; there are databases online that will show common passwords and the hash outputs, allowing hackers to look them up and find passwords. Luckily, there is a solution, and it is called salting.
The solution to the md5 database problem is to simply use a combination of hashing algorithms in a process called salting. Because the output from the hashing algorithms is always the same, a web developer can do whatever they want to the password and still get an output of the same length, allowing for easy database storage, while sufficiently obscuring the password input to the point where it can’t be looked up in a database.
When salting a password, a web developer will manipulate the password before it gets hashed, and possibly again after being hashed . One example salting technique could be as follows:
1. User inputs their password: “hello”
2. The server adds on a random string to get: “hellobeeD2coh”
3. The new string is run through an md5 hash to get: “c7c31c72148b5e49a17f1d3f7ffe4c0c”
4. The output is then run through a sha1 hash to get: “70e945873233f7eddf1f6a6541553d833876615a”
5. The output from that is once again hashed through another sha1 to get: “c26eafd1191b1d4b15efa6b67ab87e8804595232″
The resulting string is then stored in the database. Now, whenever the user logs in, the password itself doesn’t have to be matched to anything. Instead, it will be run through this exact salting process and see if the outputs match. Because it is based on hashing algorithms, as long as the random string we added on remains the same, the output from the salting process will always be the same. If the output from the password matches what is in the database for that user, we know they have the correct password, and can allow access to the site. This process has many advantages over just storing the password in the database.
1. By hashing the password, it can’t be read by anyone, not even someone with full access to the database. They can see the hash output, but as we’ve seen, that won’t help very much.
2. By salting the password, it can’t be looked up in an md5 or sha1 database, even if the user chooses a really bad password.
3. By using hashing algorithms like md5 or sha1, the length of the output is always the same, making it very easy to store in a database.
4. Also, because the output length is always the same, it is impossible to even tell how long the input was. Whether the input was one letter or fifty, the output will be the same length. The hacker won’t even know how many characters the password was.
Keep in mind that the salting process can be whatever the programmer wants it to be. They can use any combination of hashing algorithms and random gibberish, and as long as they don’t tell anyone what salting process they used, the passwords will be much harder to crack. More advanced systems will even use a separate salt for every user, creating even more difficulty for crackers.
So using what we have seen so far, lets put it all together and see what makes some passwords better than others, even though they might be the same length.
Now that we have a better understanding of how passwords work, lets go back to the comic and understand what it is saying. It makes the claim that “correcthorsebatterystaple” is a better password than “Tr0ub4dor&3″, and as I said earlier, that is true, but only because it is longer. If we assume that the attacker is using a brute force attack, we can now see why the random string of equal length is actually better. It is better because it has more possible combinations to try. Let’s simplify it to understand it better.
If my password was just one lowercase letter, there would be 26 possible passwords. If it was 2 letters, there would be 676 possible combinations, because 26 x 26 = 676, and with three letters, it would be 17,576 possible combinations, and so on. At 8 characters, there would be 8,031,810,176 combinations. But what if we use something besides lowercase letters?
Again, lets use a password that is one character, but this time it can be lowercase, uppercase, numbers, or an ampersand (&) character. Remember that the hashing algorithms treat uppercase and lowercase differently, so we have 26 lowercase, 26 uppercase, 10 numbers (0-9), and an ampersand, for a total of 63 possible one-character passwords. At two characters, we could do 3,969 different passwords (because 63 x 63 = 3,969), and so on again. But this time, by the time we get to 8 characters, we have 248,155,780,267,521 possible combinations! By adding in more possible characters, the number of possible passwords increases significantly, even for passwords of the same length.
So to sum it up, the comic is somewhat misleading by suggesting that words are better than gibberish. As we’ve seen, the strongest password is one that in both long and contains many different characters. So lets wrap this up by discussing how to make a good password.
Creating a Good Password
One issue that I have not mentioned much is remembering your password. You could make an argument that complex passwords are worse because they encourage people to write it down, making them vulnerable to physical theft. That is a valid point, particularly in a corporate setting where it is fairly easy for someone to sneak into a cubicle and grab post-it notes off your desk. But there is a way to get the best of both worlds; a password that is easy to remember, but difficult to crack.
One possibility is to use a combination of words and numbers that make sense to you. For example, you could use the date of your wedding and your spouse’s name to get “Jack&Jill1994″, which has 246,278,864,694,166,156,419,903 possible combinations.
I also know someone who speaks Italian, and likes to use an Italian word combined with the English translation. For example, their password could be “HelloCiao”, which has 2,779,905,883,635,712 possible combinations. Not as good as the first example, but still better than all lowercase.
Finally, one example proposed by Steve Gibson is to use something called padding, where you use something easy to remember, then add on easy-to-remember gibberish on the end to make it longer. For example, your password could be “:)Aaron:)”. At 9 characters with uppercase, lowercase, and special characters, we get around 3,904,305,912,313,344. This is a tough one to crack by brute force, but very easy to remember. As long as you don’t tell anyone your padding technique, this is a very sound method.
By understanding why a good password is good, I hope you can go forward and create tough passwords that are easy to remember. Stay safe out there.
Edit (10/23/13) – In the time since I have written this, I discovered a new solution to the issue of memorizing passwords. It is called LastPass, and it is what I now use for my password management. The free version is plenty for my needs, including my web server management passwords. I highly recommend you try that.