Input validation errors: The root of all evil in web application security

Input validation is the first step in sanitizing the type and content of data supplied by a user or application. For web applications, input validation usually means verifying user inputs provided in web forms, query parameters, uploads, and so on. Missing or improper input validation is a major factor in many web security vulnerabilities, including cross-site scripting (XSS) and SQL injection. Let’s see why proper data validation is so important for application security – but also why it cannot be your only line of defense.

Input validation errors: The root of all evil in web application security

What is input validation?

Any system or application that works with input data needs to ensure that it is valid. This applies equally to information provided directly by the user and to data received from other systems. There are many different types and levels of validation, from syntactic validation that checks the input types and lengths to semantic validation that ensures supplied values make sense in the application context. So if you’re entering an email address, syntactic validation would mean checking the syntax (i.e. the characters and structure) to ensure that it is a valid email, while semantic validation might allow (or exclude) only addresses from specific domains.

In web application development, input validation is typically understood as checking the values of web form input fields. This initial client-side validation is performed directly in the browser, but you also have to check submitted values on the server side.

While you will often see the terms user input or user-controlled input, actually determining all the application inputs that a malicious user could control is not easy. This is why it is good security practice to treat all application inputs as untrusted by default and validate everything. The same principle also applies to data originating from theoretically trusted systems and users since attackers may abuse such trust relationships to send dangerous data via a compromised third party.

The consequences of improper input validation

When reading about web vulnerabilities on this blog, you may have noticed that many of the posts have a very similar ending: “To mitigate this vulnerability, make sure you carefully validate all user inputs.” By preventing malicious users from freely entering attack strings, you can reduce your exposure to many injection attacks, including cross-site scripting (XSS), SQL injection, and code injection (RCE). If you look at the definition of CWE-20: Improper Input Validation, you will notice that this weakness can precede many others and lead to all sorts of security headaches.

While input validation alone can never prevent all attacks, it can reduce the attack surface and minimize the impact of any attacks that do succeed. Beyond its security implications, data validation is also crucial for software performance, stability, and usability. When processing invalid or corrupt data, an application might return incorrect results, fail to load, or even crash the web server.

Missing or insufficient input validation can also degrade the user experience on other levels. For example, if a registration page fails to detect an incorrect email or phone number, the user may be unable to confirm their account. If invalid data passes validation in the browser and is only caught during server-side validation, users may experience errors or longer load times.

How to ensure proper input validation in web applications

Validating form fields and other inputs is usually done using JavaScript, either manually or using a dedicated library. Implementing validation is a tedious and error-prone process, so you should always explore and use existing validation features before going the DIY route to build custom validation. Most languages and frameworks have built-in validators that make form validation much easier and more reliable. For input data that should match a specific JSON or XML schema, you should validate input against that schema, especially when you are making API calls where errors can be harder to troubleshoot.

HTML5 validation features

The HTML5 spec includes built-in form validation features that let you specify validation constraints directly in HTML. These include input field attributes such as required to indicate a required field, type to specify the data type, maxlength to define a maximum length limit, and pattern to specify a regex pattern for valid values. The spec also defines CSS pseudo-classes such as :valid and :invalid so you can easily apply different styles depending on the validation result.

Built-in form validation features in HTML5 are a great place to get started with data validation. With just a few extra attributes in standard HTML elements, you get basic data type and content validation with cross-platform support to save you a lot of work and provide a native user experience. For detailed examples, see the MDN article on client-side form validation.

Blacklisting vs. whitelisting

Looking at input validation from a security standpoint, it can be tempting to simply disallow anything that you expect to be used in an injection attack. One example of this naïve approach would be to ban apostrophes and semicolons to prevent SQL injection, parentheses to stop malicious users from inserting a JavaScript function, and angle brackets to eliminate the risk of someone entering HTML tags. This is called blacklisting and it’s usually a bad idea because developers cannot hope to anticipate and cover every possible input and attack vector now and in the future. Blacklist-based validation is hard to implement and tedious to maintain while also being easy for attackers to bypass.

For well-defined inputs such as numbers, dates, or postcodes, it’s much easier and safer to use a whitelist. That way, you can precisely specify permitted values and reject everything else. With HTML5 form validation, you get predefined whitelisting logic in the built-in data type definitions, so if you indicate that a field contains an email address, you already have email validation. If only a handful of values are expected, you can use regular expressions to explicitly whitelist them.

Whitelisting gets tricky with free-form text fields, where you need some way to allow the vast majority of available characters, potentially in many different alphabets. Unicode character categories can be useful to allow, for example, only letters and numbers in a variety of international scripts. You should also apply normalization to ensure that all input uses the same encoding and no invalid characters are present.

Input validation against XSS

The problems with validating free-form text once again highlight the limitations of input validation in a security context. Despite its importance for web application security, input validation is not and never should be your primary defense against cross-site scripting (XSS). Believing that rejecting angle brackets or script tags will protect you against XSS is asking for trouble. Simply filtering inputs is not enough to prevent cross-site scripting (and, in any case, does not cover all XSS variants), which is why XSS filters have been removed from modern web browsers.

In the case of cross-site scripting and other injection attacks, your main defense is context-aware output encoding to ensure that even if malicious code makes it into the application, it will not be executed. Apart from security, context-aware encoding is also important for usability. To end with a real-life example, if an application user needs to enter <script> in a text field (perhaps because they are writing a blog post about input validation), the application should properly encode these characters and ensure that they are processed correctly and safely in this specific context.

For a detailed discussion of input validation in web applications, see the OWASP Input Validation Cheat Sheet.

Zbigniew Banach

About the Author

Zbigniew Banach - Technical Content Lead & Managing Editor

Cybersecurity writer and blog managing editor at Invicti Security. Drawing on years of experience with security, software development, content creation, journalism, and technical translation, he does his best to bring web application security and cybersecurity in general to a wider audience.