Collision Based Hashing Algorithm Disclosure

In February 2017 a number of Google Engineers created the first SHA-1 collision. Even though this hashing algorithm has already been marked as deprecated by NIST in 2011, it is still widely used.

What are Hash Collisions?

A hash collision happens when two different cleartext values produce the same hash value. Collisions can lead to a wide range of problems, but we won't cover them within this article.

Instead, in this blog post we will take a look at another side effect of collisions; a method that allows you to detect whether or not a website uses weak hash functions. This can be done without having access to the source code.

To make it easy to remember we are referring to this method as Collision Based Hashing Algorithm Disclosure.

Example of a Hash Collision

The collision the Google engineers identified allows anybody to create two PDF files, with different content but with the same hash. Let’s take a look at both of the cleartext values Google used:

255044462D312E330A25E2E3CFD30A0A0A312030206F626A0A3C3C2F57696474682032203020522F4865696768742033203020522F547970652034203020522F537562747970652035203020522F46696C7465722036203020522F436F6C6F7253706163652037203020522F4C656E6774682038203020522F42697473506572436F6D706F6E656E7420383E3E0A73747265616D0AFFD8FFFE00245348412D3120697320646561642121212121852FEC092339759C39B1A1C63C4C97E1FFFE017346DC9166B67E118F029AB621B2560FF9CA67CCA8C7F85BA84C79030C2B3DE218F86DB3A90901D5DF45C14F26FEDFB3DC38E96AC22FE7BD728F0E45BCE046D23C570FEB141398BB552EF5A0A82BE331FEA48037B8B5D71F0E332EDF93AC3500EB4DDC0DECC1A864790C782C76215660DD309791D06BD0AF3F98CDA4BC4629B1

255044462D312E330A25E2E3CFD30A0A0A312030206F626A0A3C3C2F57696474682032203020522F4865696768742033203020522F547970652034203020522F537562747970652035203020522F46696C7465722036203020522F436F6C6F7253706163652037203020522F4C656E6774682038203020522F42697473506572436F6D706F6E656E7420383E3E0A73747265616D0AFFD8FFFE00245348412D3120697320646561642121212121852FEC092339759C39B1A1C63C4C97E1FFFE017F46DC93A6B67E013B029AAA1DB2560B45CA67D688C7F84B8C4C791FE02B3DF614F86DB1690901C56B45C1530AFEDFB76038E972722FE7AD728F0E4904E046C230570FE9D41398ABE12EF5BC942BE33542A4802D98B5D70F2A332EC37FAC3514E74DDC0F2CC1A874CD0C78305A21566461309789606BD0BF3F98CDA8044629A1

Those two strings are hex encoded, and once you decode them both will result in the same SHA-1 sum once hashed:

f92d74e3874587aaf443d1db961d4e26dde13e9c

Introducing Collision Based Hashing Algorithm Disclosure

Hashing Algorithms in Web Applications

When you register to an online service, the majority of websites will hash your password and store the hash in the database. This is good practice since it allows the web application to store your password in a form that doesn't allow a potential attacker to view it in plain text, should he gain access. However, to be effective a strong hashing algorithm has to be used. This means that algorithms like SHA-1 and MD5 are not suitable for that kind of application. Nonetheless, nowadays they are often used by developers to hash passwords.

When you try to login to the website, the password hash stored in the database is compared to the hash generated on the fly from the password you submit in the login form. Therefore if the target web application uses SHA-1 hashing algorithm, and we supply our collision strings, the hash will be the same. This also means that we can login using two different strings/passwords.

By using the same technique in a black box fashion, we can determine whether or not a web application uses a vulnerable hashing algorithm, as explained below.

How Does the Collision Based Hashing Algorithm Disclosure Work?

The Theory Behind the Attack

In theory, it is very simple. Create an account on the web application that you would like to test. As a password use a string that produces the same hash as another different string. Once the account is registered, try to login again. This time supply the different string that produces the same hash as the password. If you manage to login, it means that the target web application uses the SHA-1 algorithm.

Example of Collision Based Hashing Algorithm Disclosure

Let's assume that when you register a new user on a web application, it uses the cleartext1 string that you supplied as password and hashes it. As seen below, the hashed password would result in the hash abcd (simplified), which is then stored in the database:

hash(cleartext1) == ‘abcd’

Note: To keep things simple we will not take salts, a random string that’s concatenated with the password to make it more secure against certain types of attacks into consideration.

The web application stores the hash generated from that process in the database. When you try to log back into the web application, the same hashing algorithm is applied to the password you supply in the login form. This hash is then compared to the one in the database and if they match you will log in.

So let’s assume that to login now you used cleartext2 as password, and when the web application hashes it using the SHA-1 algorithm, the same hash is produced:

hash(cleartext2) == ‘abcd’

When the web application compares the hashed password with the hash that was stored in the database, they will match and you will log in.

hash(password) == dbhash

However, this method has a few limitation and it won’t work if:

Strict server side password length restrictions are used in the registration and login forms, for example, a maximum password length of 20 characters is enforced.
If there is a whitelist of allowed characters,
If there is a salt prepended (not appended).

How To Check if a Web Application Uses SHA-1 Hashing Algorithm

Below is a step by step explanation of how you can check if a web application uses the SHA-1 hashing algorithm.

Setup an interception proxy, such as the one in Netsparker Desktop and configure the web browser to proxy the requests through it.
Register an account on the web application and use a recognizable password such as !!PASS!! so it is easy to find when you intercept the HTTP request.
Edit the registration request in the interception proxy by replacing all occurrences of the !!PASS!! string with the first collision string (converted to URL encoding) from the above example.

Interception of the user registration HTTP request with a proxy.

NOTE: To URL encode the collision strings you have to place a % character in front of every encoded byte. You can use the below PHP code to do it:

implode(array_map(function($byte){ return '%' . $byte;},str_split($collisionstring,2)));

Once you send the request to the web application, it will generate a hash and stores it in the database. If the web application uses the SHA-1 algorithm the hash will be f92d74e3874587aaf443d1db961d4e26dde13e9c
Now try to login to the web application using the !!PASS!! string as password again.
Intercept the login HTTP request and replace all occurrences of !!PASS!! with the URL encoded version of the second string.
The web application will hash your supplied password and compares it to the stored value in the database. Once again the hash should be f92d74e3874587aaf443d1db961d4e26dde13e9c.

If the web application uses the SHA-1 hashing algorithm, even though you supplied a different value you will login.

Results Expectations

If you do not manage to login because the passwords do not match, then the web application uses a hashing algorithm other than SHA-1 is used.

If you manage to login to the web application it means that the SHA-1 hashing algorithm is used for password hashing.

Does This Collision Based Hashing Algorithm Disclosure Work for SHA-1 Algorithm Only?

This method will also work with other hashing algorithms that have known collisions, for example, MD5. The prerequisites don’t differ much. However, the length restriction is less of a concern as the known md5 collisions are not as long. They are just 64 bytes. This might still be too long for some server side filtering though.

Here is a known MD5 collision which you can use for testing:

4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa200a8284bf36e8e4b55b35f427593d849676da0d1555d8360fb5f07fea2

4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa202a8284bf36e8e4b55b35f427593d849676da0d1d55d8360fb5f07fea2

Both strings will result in the following hash:

008ee33a9d58b51cfeb425b0959121c9

Who Needs to Know about the Collision Based Hashing Algorithm Disclosure?

As a developer of a website you already know which hashing algorithm you use and do not need this test to see if your algorithm is secure or not. Just knowing which hashing algorithm is used also won’t aid an attacker during an attack.

However, there are two scenarios where this is especially useful: during a black box penetration test, where it is not possible to get a look at the source code, and as an additional step to check the authenticity of a database dump.

If a leaked database uses unsalted SHA-1 hashes and this method confirms that indeed SHA-1 is the hashing algorithm used by the website, it can be a very small indicator that the dump might be credible.

The URL encoded strings

For easy copying here are the URL encoded strings for the above check:

SHA-1

String 1

%25%50%44%46%2D%31%2E%33%0A%25%E2%E3%CF%D3%0A%0A%0A%31%20%30%20%6F%62%6A%0A%3C%3C%2F%57%69%64%74%68%20%32%20%30%20%52%2F%48%65%69%67%68%74%20%33%20%30%20%52%2F%54%79%70%65%20%34%20%30%20%52%2F%53%75%62%74%79%70%65%20%35%20%30%20%52%2F%46%69%6C%74%65%72%20%36%20%30%20%52%2F%43%6F%6C%6F%72%53%70%61%63%65%20%37%20%30%20%52%2F%4C%65%6E%67%74%68%20%38%20%30%20%52%2F%42%69%74%73%50%65%72%43%6F%6D%70%6F%6E%65%6E%74%20%38%3E%3E%0A%73%74%72%65%61%6D%0A%FF%D8%FF%FE%00%24%53%48%41%2D%31%20%69%73%20%64%65%61%64%21%21%21%21%21%85%2F%EC%09%23%39%75%9C%39%B1%A1%C6%3C%4C%97%E1%FF%FE%01%73%46%DC%91%66%B6%7E%11%8F%02%9A%B6%21%B2%56%0F%F9%CA%67%CC%A8%C7%F8%5B%A8%4C%79%03%0C%2B%3D%E2%18%F8%6D%B3%A9%09%01%D5%DF%45%C1%4F%26%FE%DF%B3%DC%38%E9%6A%C2%2F%E7%BD%72%8F%0E%45%BC%E0%46%D2%3C%57%0F%EB%14%13%98%BB%55%2E%F5%A0%A8%2B%E3%31%FE%A4%80%37%B8%B5%D7%1F%0E%33%2E%DF%93%AC%35%00%EB%4D%DC%0D%EC%C1%A8%64%79%0C%78%2C%76%21%56%60%DD%30%97%91%D0%6B%D0%AF%3F%98%CD%A4%BC%46%29%B1

String 2

%25%50%44%46%2D%31%2E%33%0A%25%E2%E3%CF%D3%0A%0A%0A%31%20%30%20%6F%62%6A%0A%3C%3C%2F%57%69%64%74%68%20%32%20%30%20%52%2F%48%65%69%67%68%74%20%33%20%30%20%52%2F%54%79%70%65%20%34%20%30%20%52%2F%53%75%62%74%79%70%65%20%35%20%30%20%52%2F%46%69%6C%74%65%72%20%36%20%30%20%52%2F%43%6F%6C%6F%72%53%70%61%63%65%20%37%20%30%20%52%2F%4C%65%6E%67%74%68%20%38%20%30%20%52%2F%42%69%74%73%50%65%72%43%6F%6D%70%6F%6E%65%6E%74%20%38%3E%3E%0A%73%74%72%65%61%6D%0A%FF%D8%FF%FE%00%24%53%48%41%2D%31%20%69%73%20%64%65%61%64%21%21%21%21%21%85%2F%EC%09%23%39%75%9C%39%B1%A1%C6%3C%4C%97%E1%FF%FE%01%7F%46%DC%93%A6%B6%7E%01%3B%02%9A%AA%1D%B2%56%0B%45%CA%67%D6%88%C7%F8%4B%8C%4C%79%1F%E0%2B%3D%F6%14%F8%6D%B1%69%09%01%C5%6B%45%C1%53%0A%FE%DF%B7%60%38%E9%72%72%2F%E7%AD%72%8F%0E%49%04%E0%46%C2%30%57%0F%E9%D4%13%98%AB%E1%2E%F5%BC%94%2B%E3%35%42%A4%80%2D%98%B5%D7%0F%2A%33%2E%C3%7F%AC%35%14%E7%4D%DC%0F%2C%C1%A8%74%CD%0C%78%30%5A%21%56%64%61%30%97%89%60%6B%D0%BF%3F%98%CD%A8%04%46%29%A1

MD5

String 1

%4d%c9%68%ff%0e%e3%5c%20%95%72%d4%77%7b%72%15%87%d3%6f%a7%b2%1b%dc%56%b7%4a%3d%c0%78%3e%7b%95%18%af%bf%a2%00%a8%28%4b%f3%6e%8e%4b%55%b3%5f%42%75%93%d8%49%67%6d%a0%d1%55%5d%83%60%fb%5f%07%fe%a2

String 2

%4d%c9%68%ff%0e%e3%5c%20%95%72%d4%77%7b%72%15%87%d3%6f%a7%b2%1b%dc%56%b7%4a%3d%c0%78%3e%7b%95%18%af%bf%a2%02%a8%28%4b%f3%6e%8e%4b%55%b3%5f%42%75%93%d8%49%67%6d%a0%d1%d5%5d%83%60%fb%5f%07%fe%a2

Collision Based Hashing Algorithm Disclosure

What are Hash Collisions?

Example of a Hash Collision