Detailed Explanation of PHP Type Juggling Vulnerabilities

This article provides an introduction to PHP Type Juggling Vulnerabilities and loosely typed code and type comparisons, and how to avoid or fix them.

Detailed Explanation of PHP Type Juggling Vulnerabilities

PHP is often referred to as a ‘loosely typed’ programming language. This means that you don’t have to define the type of any variable you declare. During the comparisons of different variables, PHP will automatically convert the data into a common, comparable type. This makes it possible to compare the number 12 to the string ’12’ or check whether or not a string is empty by using a comparison like $string == True. This, however, leads to a variety of problems and might even cause security vulnerabilities, as described in this blog post.

PHP Type Juggling Vulnerabilities Logo

The PHP Language Has Its Peculiarities

There are lots of reasons not to like PHP. One of them is its inconsistent naming of built-in functions. This becomes apparent when you look at the functions that are responsible for encoding and decoding data. For one, there is base64_decode or base64_encode with underscores in the function name. However, urldecode and urlencode are named differently, for no apparent reason.

Another thing is that the order of common parameters varies greatly among different functions. Yet again, there are two examples where this is quite obvious:

  • bool in_array ( mixed $needle , array $haystack [, bool $strict = FALSE ] )
  • mixed strpos ( string $haystack , mixed $needle [, int $offset = 0 ] )

You see how both of these functions are described in the PHP manual. While in_array takes data to search as the first parameter, strpos takes it as the second one. When you forget this, it’s easy to make mistakes and write a strpos check that’s almost never true. Those issues are often hard to find and debug. But, by far, this is not the only problem you can encounter when dealing with PHP!

Type Conversions Made Easy

Since we’ve already found an issue with strpos, let’s see if we can also find one for in_array.

Consider the following PHP code, which can be used to find a value in an array.

Code
$values = array("apple","orange","pear","grape");
var_dump(in_array("apple", $values));
Output
bool(true)

There are four elements in the $values array: “apple”, “orange”, “pear” and “grape”. To determine whether the value “apple” is among them, we pass it to in_array. The function var_dump prints bool(true) as result of this search, since “apple” is indeed a part of the $values array. If we passed the value “cherry” to the function, var_dump would print bool(false) since it’s not found in the array.

Things are going to get a little confusing at this point. It’s pretty obvious what should happen when we pass the number ‘0’ to the array. Just from looking at the code, you know that there is no ‘0’, and therefore you’d expect the search to return false. But let’s see what actually happens.

Code
$values = array("apple","orange","pear","grape");
var_dump(in_array(0, $values));
Output
bool(true)

This result is unexpected and it doesn’t make any sense at a first glance. Why would PHP inform us, incorrectly, that there is a zero in the array? To understand what’s going on you have to take a look at how PHP compares values.

Type Comparison Basics

What the in_array function does is compare the supplied value with every value in the array until it either finds a matching value (and returns true), or until all the values are compared without a match (and returns false). But, there is a problem. We have passed an integer to PHP and the $values array contains strings. Think of this analogy: comparing strings to integers would be like saying “The sky is exactly as blue as happiness”, which makes no sense at all.

So, in order to compare it, PHP has to convert the data to the same type. When comparing strings and integers, PHP will always attempt to convert the string to an integer. Being able to compare strings to integers is a very convenient feature when you deal with user input, especially for beginners who aren’t as familiar with data types. Imagine a form that allows you to record how many bottles of water you drank throughout the day. A user can simply type the string ‘2 bottles’ into the input field, and PHP would automatically extract the integer ‘2’ from the beginning of the string.

Code
var_dump("2 bottles" == 2);
Output
bool(true)

The == part is the operator that is being used for a loose comparison. We’ll come to that in a minute. Comparing strings to numbers like that makes coding a lot easier, especially for beginners, but it also leads to the mysterious in_array behaviour. It’s still hard to understand why passing ‘0’ to in_array in the above scenario will lead to a match. It turns out that the reason is actually quite simple – and sometimes even exploitable – as we’ll see later.

The in_array function will compare each array element to the input (and return true) once there is a match. You could write it as the following PHP code:

Code
$values = array("apple","orange","pear","grape");
$input = 0;
foreach($values as $value)
{
if($input == $value)
{
die($value . ' matched');
}
}
Output
apple matched

The reason why “apple” matches ‘0’ is that PHP tries to convert “apple” to an integer. As seen before in the example with five bottles, PHP tries to extract a number from the beginning of the string, converts it to an integer, and then compares this to the integer we passed for comparison. But, since there are no (zero) leading numbers in “apple“, PHP will treat the string as 0. After conversion, the comparison looks like this to PHP:

if(0 == 0)

This kind of behaviour is called an implicit type conversion or ‘type juggling’. If you want to have an exact comparison of two values, this may lead to the undesirable results we’ve seen with in_array.

So what is the solution to this problem?

If we go back to the in_array function, there is an optional third parameter, which is set to false by default:

bool in_array ( mixed $needle , array $haystack [, bool $strict = FALSE ] )

The name of the parameter is $strict, and when it is set to ‘true’, PHP will take the datatype of the values into consideration when comparing them. This means that even if there is a string beginning with (or solely consisting of) ‘0’, it will still fail the comparison when the other value is the integer ‘0’.

Code
$values = array("apple","orange","pear","grape");
var_dump(in_array(0, $values, true));
Output
bool(false)

The following table gives a good overview of the behavior of PHP with loose type comparisons.

The following table gives a good overview of the behavior of PHP with loose type comparisons.

Loose Comparison Vulnerabilities in PHP

There are various, subtle vulnerabilities that can arise from loose comparison. This section contains an explanation of some of the more common ones.

Authentication Bypass

It would be easy to assume that this code could be used to bypass authentication:

if($_POST['password'] == "secretpass")

Just passing password=0 in the POST body should be enough to fool PHP, and convert the secret passphrase to ‘0’. However, there is a problem. Array elements in $_GET, $_POST or $_COOKIE are always either strings or arrays, never integers. This means that we would compare the string “0” to the string “secretpass“. Since both are already of the same type, PHP does not need to convert anything, and therefore the comparison will result in false. (There are special cases where PHP will still convert strings to integers, but we will talk about this in the next section.)

There are still several problems with this check. When you use parameters that come from json_decode() or unserialize(), it’s possible to specify whether you want to pass your number as an integer or a string. Therefore, this would pass the check:

Input:

{"password": 0}

Code:

$input = json_decode(file_get_contents('php://input'));
if($input->password == "secretpass");

Instead of 0, you can even pass true, as (“any_string” == true) will always succeed. The same would work when comparing CSRF tokens.

Reduction in Entropy

A common way to ensure that a value isn’t tampered with is using a keyed-hash message authentication code (also known as HMAC) on the server side. An HMAC is the result of hashing operations that contain a secret string and a message. It will be appended to the message that’s being sent to the server, and is used as a signature.

An HMAC is often used as an additional security measure when deserializing input, for example, from cookies, since the dangerous unserialize function can only be called if the message was not tampered with by an unauthorized party. In this scenario, the secret string that’s needed to generate the signature is only known to the server, but not to the user or an attacker.

Often these signatures are generated as illustrated in the PHP code, even though this does not conform to the RFC 2104’s recommended way to create HMACs. This approach has various disadvantages, including being vulnerable to a length extension attack, and therefore should not be used in production in any form. Instead, developers should use PHP’s built-in hash_hmac function to generate this kind of signature:

$secret = 'secure_random_secret_value';
$hmac = md5($secret . $_POST['message']);
if($hmac == $_POST['hmac'])
shell_exec($_POST['message']);

The general idea behind this is that shell_exec will only succeed if the attacker knows the secret value. I have mentioned a special case in the Authentication Bypass section where two strings are converted to an integer when using loose comparison. This one happens when using scientific E-notation. The theory behind this is as follows. Scientific E-notation is used to write very long numbers in a short form. The number 1,000,000, for example, would be written as ‘1e6’. In other words 1*(106) or 1*(10*10*10*10*10*10) or 1*(1000000). This is problematic because usually hashes, like md5 in our example, are written in hexadecimal encoding, which consists of the numbers 0-9 and the letters a-f.

After enough attempts, you can therefore generate a string that begins with ‘0e’ and is followed by numbers only. In most cases, this is significantly easier than bruteforcing the secret value. The reason why this is the desired format is that ‘0*10n’ is always ‘0’. So it doesn’t really matter which numbers follow after ‘0e’, it will always result in the number ‘0’. That means we can add random characters to our message and always pass the signature ‘0e123’ to the server. Eventually the message hashed with the secret value will result in the format ^0e\d+$, and PHP will convert both strings to ‘0’ and pass the check. This can therefore significantly reduce the number of attempts that an attacker needs to bruteforce a valid signature.

Code
var_dump("0e123" == "0e51217526859264863");
Output
bool(true)

Of course, this is also a problem in databases with lots of users. Using this format, an attacker can try to log into every single user account with a password that will result in a hash of the mentioned format. If there is a hash with the same format in the database, he will be logged in as the user that the hash belongs to, without needing to know the password.

Hashing Algorithm Disclosure

It’s also possible to check whether or not a specific hashing algorithm is present when loose comparison is used, similar to the Collision Based Hashing Algorithm Disclosure. In this case, a tester has to register with a password that results in a hash, in the scientific E-notation format. If he is able to log in with a different password that also results in a hash with the same format, he knows that the site uses loose comparison and the respective hashing algorithm. For MD5, the strings ‘240610708‘ and ‘QNKCDZO‘ can be used:

Code

var_dump(md5('240610708') == md5('QNKCDZO'));

Output

bool(true)

Fixing Type Juggling Vulnerabilities in PHP Web Applications

It’s relatively easy to fix Type Juggling vulnerabilities – most of the time. In cases of simple comparisons, we recommend that you use three equals symbols (===) instead of two (==). The same goes for !=, which should be avoided in favor of !==. This makes sure that the type is taken into consideration when comparing the values.

It’s also advisable to consult the PHP manual to see which functions compare loosely, and whether they have a parameter to enforce strict comparisons.

And, you should avoid writing code like the following:

if($value) {
//code
}

Instead specify the desired outcome like so, to be safe:

if($value === true) {
//code
}

For strict comparisons, the table now looks like the one below, representing more sane and predictable behaviour than that represented in the table at the beginning of this blog post.

The table now looks like the one below, representing more sane and predictable behaviour than that represented in the table at the beginning of this blog post.

Finally, we recommend that you avoid casting the strings to the same type, before comparing them, as this often has the same effect as the == operator.

Code:

var_dump((int) "1abc" === (int) "1xyz");

Output:

bool(true)

The PHP developers have made the decision to sacrifice security for convenience and ease of use. Even though this approach allows inexperienced users to code a website in PHP, it makes it much harder for them to keep it secure. Additionally it creates unnecessary pitfalls into which even experienced developers might stumble if they are not careful enough.

PHP Type Juggling Vulnerabilities Presentation

In the video below you can see a presentation about PHP Type Juggling on the Security Weekly show. We talked about why developers use weak type comparisons, how to avoid them and highlight a few of the unexpected results of PHP Type Juggling.

Sven Morgenroth

About the Author

Sven Morgenroth - Senior Security Engineer