User controllable charset
Description
This vulnerability occurs when user-supplied input directly controls the character encoding (charset) declaration of a web page response. The charset can be specified through HTTP Content-Type headers or HTML meta tags. When an attacker can manipulate the charset parameter, they may exploit browser charset handling behaviors to bypass XSS filters, execute malicious scripts, or cause the browser to misinterpret page content in ways that enable injection attacks.
Remediation
Implement the following measures to prevent user control over character encoding:
1. Enforce UTF-8 encoding explicitly:
Set the charset in both the HTTP response header and HTML meta tag to UTF-8, without accepting user input:
// Server-side (example in various languages):
// Java/Spring
response.setContentType("text/html; charset=UTF-8");
// PHP
header('Content-Type: text/html; charset=UTF-8');
// Python/Flask
response.headers['Content-Type'] = 'text/html; charset=UTF-8'
// Node.js/Express
res.setHeader('Content-Type', 'text/html; charset=UTF-8');2. Add HTML meta tag declaration:
<meta charset="UTF-8">
3. If user-controlled charset is absolutely required:
Implement a strict whitelist of safe character encodings and validate all input:
// Example whitelist validation
const ALLOWED_CHARSETS = ['UTF-8', 'ISO-8859-1'];
function validateCharset(userCharset) {
const normalized = userCharset.toUpperCase().trim();
if (ALLOWED_CHARSETS.includes(normalized)) {
return normalized;
}
return 'UTF-8'; // Default to UTF-8
}4. Review and remove any code that accepts charset values from query parameters, POST data, cookies, or HTTP headers controlled by users.