Black-box testing and its role in application security

Black-box testing is a key practice in software and security testing, used not only to verify that an application works as expected but also to limit the attack surface exposed to malicious actors. Learn why combining manual black-box testing with automated DAST tools is the most efficient way to ensure application security.

Black-box testing and its role in application security

Key takeaways


  • Black-box security testing validates an app’s behavior and verifies that it does not unwittingly give malicious hackers entry points.
  • It relies on testing the app without knowledge of its inner workings, thereby mimicking the actions of both legitimate users and bad actors.
  • When combined with DAST, manual black-box testing for vulnerabilities is an effective way to secure web apps against numerous threats.

Black-box testing is a well-established testing methodology used by IT teams to verify that an application functions the way it’s supposed to, without any knowledge of its source code or configuration details. In that way, the app itself is the black box, with testers poking around in the unknown. Black-box testing also plays a prominent role in identifying security issues. 

To perform black-box testing, a testing team first studies an application’s requirements and design documents and then creates a series of tests to make sure the app conforms. Suppose an online banking application is designed to issue a warning to an account holder when a debit card transaction is made above a preset limit. Black-box testers would write a test to create a transaction that exceeds the limit and then verify an alert is sent to the account holder communicating the correct information. 

Thousands of such scenarios are written and run to test complicated apps. Good black-box testing uses valid data to evaluate every single expected action and option on a user’s screen, and carefully verifies the expected results. This type of black-box testing is known as positive testing.

Black-box testing for security

But what happens when an app encounters invalid data or an unexpected situation? Using our banking app example, what if a customer enters a debit transaction for $0.00? Testers will want to see whether the app knows how to handle the situation and what kind of error condition results. For example, will the app crash? Enter negative testing.

Negative testing is especially valuable for security purposes because it emulates a hacker’s view of the app as a black box with vulnerable entry points to be found and attacked. The combination of manual black-box testing with dynamic application security testing (DAST), which crawls running web applications looking for attack vectors and then runs automated tests, provides a powerful tool to IT teams as they roll out new, secure, and stable applications. Because DAST tools can include thousands of built-in security checks, they can save a lot of time compared to defining and running purely manual tests while also filling in any gaps in test scopes.

One part of negative testing is to ensure that in the event of invalid data, an error message is issued that is helpful to the user yet betrays nothing about the internals of the application, as that information can be very useful to attackers.  

Default error messages may include stack traces that run into hundreds of lines, summarizing the software in the stack that is active at the moment the error occurred. This information is intended to be a diagnostic resource to assist developers in locating and remediating an issue. For example, this snippet from 135 lines worth of error data from a server-hosted Java application, recently posted by an academic, identifies that the system is running on Java and using the Struts framework running in a Java EE (enterprise edition) container.

These details are like a roadmap for a hacker, and note that such extensive error messages are not unique to Java – Microsoft’s .NET framework can provide equally detailed stack information. In this case, the use of the Struts framework would be especially helpful information. Struts has had its share of security lapses that a prepared hacker can look up and probe to see whether an organization has skipped patches or updates, inadvertently facilitating entry into their system. 

This is not just an idle example. Lax patching practices were the cause of a major break-in at the Equifax credit bureau in September 2017, when an unpatched Struts implementation allowed a command injection attack that exposed the data of 143 million people. 

Continuing with the example of overly verbose error messages, black-box testing attempts to verify that no matter the error, internal information about the system isn’t revealed. An appropriate error message would simply state that an error occurred and an action could not continue. It might also ask the user to check their request and try again or provide some other helpful direction.

In this case, supplementing black-box testing with DAST would deliver two key benefits: manual testing would have revealed that error messages were exposing crucial information to attackers, while DAST would have identified the unpatched Struts implementation.

Black-box testing and the SDLC

Application testing in the software development life cycle (SDLC) falls into two general categories: white-box testing and, as discussed thus far, black-box testing. 

White-box testing depends on knowledge of the system’s code. It includes all verification done by developers, such as unit testing and integration testing, as well as many of the tests performed by test engineers, such as some types of regression testing. Static analysis tools (SAST) also fall into this category. All of this testing occupies known, established places in the SDLC, with the goal of preparing an app for functional testing. In later stages, these tests can also be complemented by automated black-box testing with DAST, which tests APIs and many other facets of web applications to reveal additional attack vectors. 

Functional testing has two primary components: black-box testing and user-acceptance testing (UAT). When these tests are performed varies widely depending on the IT organization and the type of SDLC it uses. For example, an organization that practices agile development might perform UAT on a frequent basis but formal black-box testing later in the SLDC and less frequently. Meanwhile, an organization with extensive requirements and considerable design up-front might do black-box testing before initiating a cycle of UAT.

One advantage of black-box testing with regard to its place in the SDLC is that work can begin on designing the tests from the moment requirements have been finalized. 

Another testing practice, behavior-driven development (BDD), leverages white-box and black-box testing to run functional tests. BDD aims to specify detailed app behavior up front in a form that can be run by developers as part of their routine testing. In BDD, the tests are generally specified by users and stakeholders using a special lexicon that BDD tools translate into developer tests. By using BDD, both developers and stakeholders can be confident that by the time an app is ready for black-box testing and UATs, it already fulfills most, if not all, of its known requirements. 

Limitations of black-box testing

Black-box testing is a requirement for most organizations that can support a separate team of software testers. Because those testers are working from specific scripts, they have complete knowledge of what they have tested. In positive testing, it is possible to thereby know the product has been tested comprehensively. 

However, negative testing offers no such assurances, especially when it comes to security. Hackers are extremely creative in finding small, unexpected vulnerabilities that have escaped app designers’ notice; they are not tested because they’re simply not known. Black-box testing might uncover unknown problems but can never affirm that every possible vulnerability has been uncovered. 

Systems with many moving parts, such as enterprise web apps or Internet of Things setups, are particularly difficult to cover comprehensively with negative black-box testing. As a result, IT organizations concerned with security complement manual black-box testing with several forms of dynamic testing, especially DAST. This form of automated testing checks for vulnerabilities in running applications that black-box testing might not catch, and it also verifies systems against newly published product vulnerabilities as they become known. 

As with all things security-related, the best approach involves multiple overlapping forms of testing and monitoring, of which black-box testing is a central element.

About the Author

Andrew Binstock - Contributing Writer

Andrew Binstock (@platypusguy) is a technology analyst. He was formerly the editor in chief of Oracle’s Java Magazine and, earlier, Dr. Dobb’s Journal. He is a frequent contributor to open-source projects. In his spare time, Binstock studies piano – to the distress of his now-former friends and present neighbors.