Investigating CQL injection in Apache Cassandra

Is it possible to perform NoSQL injection attacks using Cassandra Query Language (CQL)? Invicti security researchers investigated CQL injection against applications that use the Apache Cassandra database.

Investigating CQL injection in Apache Cassandra

While working on new security checks, our security research team at Invicti looked at the possibilities of automating injection attacks for many types of NoSQL databases. This post summarizes our findings about query language injection attacks against Apache Cassandra and shows why such CQL injections are difficult to perform and automate.

What is Apache Cassandra?

Apache Cassandra is a free, open-source, distributed NoSQL database. The project was first started in early 2009 and is now one of the most commonly used NoSQL databases. Organizations choose Cassandra for its ability to scale up very fast and seamlessly. It is specifically designed for use cases that require large transaction volumes distributed across different geographical areas.

How is Cassandra different from other databases?

Unlike other NoSQL databases, Cassandra stores its data in rows and columns, so at first glance, it may appear similar to relational databases. However, in a Cassandra table, each row is a single piece of structured data, while the columns are used to define partitions and partition keys. Cassandra uses partitions to efficiently distribute data across any number of interconnected nodes (instances), with provisions for replicating data across multiple nodes for high availability, disaster recovery, and improved performance.

There are many other interesting aspects of Cassandra database structure, architecture and data modeling – see the official Cassandra documentation to learn more.

What is Cassandra Query Language (CQL)?

The Cassandra Query Language (CQL) is the primary language for communicating with Apache Cassandra databases. A Thrift API was also available in the past but is no longer supported as of Cassandra version 4. In any case, CQL is easier to use than the Thrift API, especially as it is very similar to SQL.

Most of the basic operators, keywords, and identifiers in CQL are similar to those used in SQL. However, since Cassandra has its own data model and structure that is not based on relational tables, CQL has some unique features and keywords, like KEYSPACE or USING TTL. Conversely, some relational clauses are missing from CQL, notably JOIN, FOREIGN KEY, and UNION. 

The importance of CQL client drivers

To connect to a Cassandra database from your application, you need the right client driver. Because each new version of Cassandra brings major changes to the database and the CQL language, there are lots of client drivers available, though most are specific to one version and not maintained beyond that. The most commonly used drivers are developed by DataStax (a company that offers a commercial database based on Cassandra), so open-source DataStax drivers are what we used for our application security research. Crucially, the client drivers can also provide some additional security features. 

Exploring CQL injection

Injection attacks are possible wherever you have unsanitized user input being passed to a back-end system, including a database. Unlike traditional SQL injection, where you can use broadly the same attack techniques for all relational databases, NoSQL injection requires attacks aimed at specific databases, with MongoDB being probably the most popular target. The same approach should work against a Cassandra database, using carefully crafted CQL queries to perform CQL injection attacks.

Using CQL injection for unauthenticated access

To test if CQL injection is possible, we created a vulnerable login page that does a simple database lookup using unsanitized user input. The application queries a Cassandra database that has a users table, using the following unsafe CQL query to authenticate users:

SELECT * FROM users WHERE username='[user_input]' AND password='[user_input]' ALLOW FILTERING;

In this case, the following user inputs will be enough to perform a CQL injection attack for an authentication bypass:

  • Username: admin'/*
  • Password: */ and password >'

The resulting query will be:

SELECT * FROM users WHERE username='admin'/*' AND password='*/ and password >'' ALLOW FILTERING;

The query was supposed to look for a valid user and password combination, but the injection payload has commented out the password lookup and only checks if a password exists. This query will run successfully, potentially allowing an attacker to log in as admin without knowing the password.

This kind of payload can be useful to at least confirm that injection is possible, but to automate the attack and obtain confirmation, we needed more. Unfortunately, at the moment, there are no known payloads to extract additional information from the table or the database itself. This is mainly due to numerous limitations imposed by the CQL language and client drivers when compared to typical SQL injection techniques.

Why CQL injection is harder than SQL injection

SQL injections are among the oldest and best-researched web attacks, and CQL syntax is very similar to SQL, so it makes sense to try and apply SQLi techniques to CQL injection. As we found out, seemingly minor differences compared to SQL add up to become serious limitations that prevented us from using typical SQL injection techniques. The limitations below were identified and tested in a test environment with Cassandra 4.0.3, a DataStax Python driver 3.25.0, and a Python application based on Flask.

Missing injection-friendly language constructs and functions

  • SQL injections extensively use table joining operations to grab data from additional tables (union-based in-band SQL injection). Cassandra is a non-relational database, so there are no JOIN or UNION statements in CQL, making it hard to access other tables.
  • Cassandra has no convenient built-in functions like DATABASE() or USER() to retrieve database information.
  • There is no OR operator in CQL, so we can’t use it to set up always-true conditions – a CQL query like SELECT * FROM table WHERE col1='a' OR col2='b'; will be rejected.
  • Time-based blind SQL injections rely on using SLEEP() or a similar function to induce a delay, but there is no SLEEP() function in CQL, making time-based injections very hard (if not impossible).
  • There are no built-in functions that could be used to send network requests, so there is no easy way to perform out-of-band verification (typically done by listening for DNS resolution requests) as with out-of-band SQL injections.

WHERE clause limitations

  • Columns that do not have secondary indexes cannot be filtered with WHERE clauses. For example, if column col1 is not a primary key and does not have a secondary index, the following CQL query will be rejected by the database: SELECT * FROM table WHERE col1='asd';
  • Only valid column names can be specified in WHERE clauses (unlike in SQL), so trying to add an always-true condition like SELECT * FROM table WHERE column1='a' AND '1'='1'; will not work as there is no column named '1'. This greatly limits the scope of available payloads, especially for boolean-based detection and attempts to make a query return all rows in a table.

SELECT clause limitations

  • CQL does not support echo-type queries such as SELECT 'text'; or SELECT 3;, eliminating a useful technique for further exploitation and confirmation.
  • A fundamental SQLi technique is to discard the end of a query by injecting a comment. When trying this with CQL in our application test environment, a query like SELECT * FROM table WHERE user='admin';// AND pass='pass'; fails. Note that this query is syntactically correct and works when directly connected to a Cassandra database. However, our tests found that the application client drivers don’t allow comments at the end of queries. This is another major limitation since we are now restricted to only using valid queries and conditions.

Restrictions on keys

  • If a Cassandra database specifies some restrictions on its clustering keys but not on its partition keys, queries will not work without the ALLOW FILTERING keyword. To make matters worse, ALLOW FILTERING can only be specified at the end of a query, which limits our injection options to the last condition in a WHERE clause (injecting anywhere else would result in an invalid query that would be rejected).
  • Queries are rejected if you do not specify all the clustering key columns. For example, if col3 is part of a clustering key, a query like SELECT * FROM table WHERE col1='a' and col2='b' and col3='c' and col4='d' and col5='e'; would work, but SELECT * FROM table WHERE col1='a' and col2='b' and col4='d' and col5='e'; would be rejected because col3 is missing. In practice, this means an attacker would need to know all the clustering key columns to prepare a valid payload.

No nested or stacked queries

  • CQL does not allow subqueries or other nested statements, so a query like SELECT * FROM table WHERE column=(SELECT column FROM table LIMIT 1); would be rejected. This eliminates many classic SQL injection tricks, especially for boolean-based injections.
  • Client drivers for CQL do not allow stacked queries (multiple queries separated by semicolons). This means we will not be able to run arbitrary queries by stacking them.
  • CQL lets you create user-defined functions (UDF) that can include arbitrary code. In the past, this feature has caused vulnerabilities such as remote code execution via CVE-2021-44521. However, since we cannot run stacked queries, we will not be able to create user-defined functions without directly connecting to the database.

CQL injections in the wild

CQL injection is still a relatively new topic, and while there are some blog posts out there, a lot of the information you will find is outdated or inaccurate. We examined existing posts about potential CQL injection scenarios like the one shown earlier and found that most of the suggested injection payloads don’t work, at least not in our test environment.

As of this writing, no CQL injection vulnerabilities have been found and disclosed in bug bounty programs. There are also no CVEs assigned to CQL injection vulnerabilities (except for the special case of user-defined functions mentioned above). No open-source tools are available for finding CQL injections, and almost all existing NoSQL injection tools are specific to MongoDB injection, with no specific CQL injection payloads.

Mitigating CQL injection vulnerabilities

All injection attacks are made possible by applications directly using unsanitized user-controllable data. As with SQL injection vulnerabilities, you can eliminate the risk of CQL injections by using parameterized queries (prepared statements) that prevent raw user input from making it into database queries. For example, the following CQL query uses parameterized queries:

const query = 'SELECT * FROM table WHERE col1=? and col2 =?';
const params = ['username', 'password'];
client.execute(query, params, { prepare: true }, callback);

In addition to using parameterized queries, it is always good practice to validate and filter user inputs, and also to apply context-sensitive encoding depending on how the data is used.

Is Apache Cassandra secure from injection attacks?

Our tests have shown that, like other NoSQL injection types, CQL injection attacks are technically possible. However, due to the limitations imposed by both the CQL language itself and the client drivers, it is really difficult to perform any practically useful CQL injections and exploit them further. Despite the syntactic similarities between SQL and CQL, very few of the standard SQL injection techniques can be successfully used against Cassandra databases.

Other than the very limited example described above, we have found no way to perform useful CQL injection attacks or extract data using any of the known techniques in our test environment. This makes Apache Cassandra a pretty secure database choice when it comes to injections, especially if elementary secure coding practices are followed in development. However, this is still a new area of study with lots of room for research on bypassing the limitations posted here, so it is likely that new methodologies will eventually emerge for exploiting CQL injection vulnerabilities.