I recently decide to start re-learning PHP, and started with building in a security framework for the application I’m developing. In my search for how to implement password authentication in PHP, I started to notice that many still neglect “salting” the password at rest. In other words, many believe that creating an MD5 hash of the password is enough. This is simply not the case. The whole reason for salting a password is to create randomness in the password hashes created.
The idea behind a hash, or checksum, is to create a consistent way to verify that data was not altered. Hashing has expanded into a way of creating a way to store a non clear-text password. In other words, a hash is not encryption, but a mathematical way of representing the data’s state at the time the hash was created. A hash is not something to be decrypted, but used as a comparison between the stored password (password at rest) and the password provided for comparison (i.e. the password provided by the end user when logging into the system). Many start using a hash with the idea of “decrypting” the password, when the hash is really used for comparison.
In order for the hash to be of any value, it must always be calculated the same way and return the same value representing the data’s state. So, if a file was created ten years ago, nothing has changed with the file, the hash value of the file will be today what it was ten years ago. The same applies to a password. If my password today is ‘foo’, and I never change it (shame on me), the hash value of ‘foo’ will be the same ten years from now.
Now here’s the problem with not salting passwords. Because the hash reliably and consistently returns the same value for unaltered data, if two (or more) people decide to use the password ‘foo’, the hash value for both people’s password will be identical. On the surface, this seems like what we want – a consistent way of representing the data. The problem is when the password source (ex. password file, database table where passwords are stored, LDAP directory, etc.) is compromised. An attacker now has a way of determining one or more passwords for the system.
For the sake of discussion, below is a list of possible passwords and their hashes (using the SHA256 algorithm) and a list of passwords obtained from a system:
Current system passwords:
As you can see, a pattern emerges. Several end users have the same password hash, and the password is … ‘password’ for the end users in question. This shows us that no password salting is occurring, and comparison of possible passwords with end user passwords will yield the clear-text password used by the end user. For example, jim is using the clear-text password of ‘bar’, and john is using the clear-text password of ‘foo’.
Now, let’s salt the passwords and see what happens:
Current system passwords:
The pattern we saw earlier is no longer evident. In fact, even knowing the salt used in hashing the password is of no help, because the salt is different for each password generated. The end users with ‘password’ show different hashes, because different salts were used to generate the hashes. If we did not know from the earlier example that the end users used the same password, we would have no idea they were the same. This is why a salt is used – to generate randomness in generating hashes.
In summary, first assume that the place where passwords are stored will be compromised. The idea is that we need to slow down the progress of the attacker. Adding randomness to passwords is one way to slow down the attackers’ progress. In the end, depending upon the time and resources available to an attacker, and how valuable the information is being protected, the attacker will find a way to discovering clear-text passwords from generated hashes, even with a salt applied. The goal in information security is not about 100% protection, but reduction of risk. The risk of discovering clear-text passwords from unsalted hashes is greater than discovering clear-text passwords from salted hashes.