How to secure PHP application against XSS injections

H

Cross-Site Scripting (XSS) is one of the most common security vulnerabilities that web developers face. It occurs when an attacker is able to inject malicious scripts into web pages viewed by other users. This can lead to a range of issues, including stealing user data, session hijacking, and executing arbitrary actions on behalf of the user.

In this article, I will dive into different contexts where XSS vulnerabilities can occur, and more importantly, how we can defend against them using PHP. From injecting scripts into HTML tag content to manipulating JavaScript event handlers and URL parameters, XSS attacks can take many forms. We’ll look at how untrusted data (such as user input) can be manipulated in each of these contexts, and how encoding and sanitization can help mitigate these risks.

What You Will Learn:
  • Common XSS attack vectors and how attackers exploit them.
  • How to defend against XSS in different contexts, such as HTML, JavaScript, and URLs.
  • PHP best practices for preventing XSS using built-in functions like htmlspecialchars(), urlencode(), and custom encoding strategies.
  • Real-world code examples of XSS attacks and defenses in PHP.
Injection into Tag Content
Code Fragment
<p>$var</p>
Attack Method Common attack where an HTML tag is added, e.g.:
<p><img src onerror=alert(1)></p>
Defense Method Convert special characters into HTML entities, like:
htmlspecialchars($var, ENT_QUOTES, 'UTF-8')
Injection into Attribute Content
Code Fragment
<p class="$var"></p>
Attack Method Escape the class attribute or create a new HTML attribute with JS. Examples:
<p class="" onmouseover=alert(1)></p>
<p class=""><script>alert(1)</script></p>
Defense Method Convert special characters into HTML entities, like:
htmlspecialchars($var, ENT_QUOTES, 'UTF-8')
Injection into HREF Attribute
Code Fragment
<a href="$var"></a>
Attack Method Escape the href attribute using the javascript: protocol:
<a href="javascript:alert(1)"></a>
The javascript: protocol allows the attacker to execute JavaScript code by crafting a malicious URL. This URL triggers JavaScript execution upon clicking the link.
Defense Method Validate the protocol and accept only HTTP/HTTPS URLs.
// Validate the URL format
if (filter_var($var, FILTER_VALIDATE_URL)) {
    // Parse the URL to check the protocol
    $urlParts = parse_url($var);
    
    // Check if the protocol is http or https
    if (isset($urlParts['scheme']) && ($urlParts['scheme'] === 'http' || $urlParts['scheme'] === 'https')) {
        // Safe URL: Output the link with the validated href
        // e.g using htmlspecialchars($var, ENT_QUOTES, 'UTF-8')
    } else {
        // Unsafe URL: Reject the link or handle the error
    }
} else {
    // Invalid URL format
}
Injection into String Inside JS Code
Code Fragment
<script>var comment="$var";</script>
Attack Method The XSS attack happens when an attacker inserts JavaScript code inside a JavaScript string:
<script>var comment="";alert(1)//"</script>
Many applications try to protect themselves against this attack by preventing escaping from the string and encoding the quote " as \\. Assuming this is the only replacement, it is still insufficient. If we add our own character, for example \\, this will replace \" with \\, making the quote no longer able to close the string. Even if this is done correctly, there is still a very common method in use, which, despite everything, allows executing custom JS code:
<script>var comment="</script><script>alert(1)</script>";</script>
The first script tag contains a naturally occurring syntax error (an unclosed string). However, from the attacker's perspective, this is irrelevant, as the next script tag will execute normally, enabling the use of XSS.
Defense Method In case PHP and 2 above mentioned samples, htmlspecialchars will protect, as it escapes quotes and script tags, but it is recommenede to use additionally UTF-16 encoding for non-alphanumeric characters, e.g., \uXXXX to be sure at 100% xss injection would not happen, like:
function filter($input) {
    // Step 1: Escape HTML characters for safe embedding in HTML context
    $input = htmlspecialchars($input, ENT_QUOTES, 'UTF-8');

    // Step 2: Escape special characters for JavaScript context using Unicode escape sequences
    $input = str_replace(
        ['<', '>', '"', "'", '&', '\\'],
        ['\u003C', '\u003E', '\u0022', '\u0027', '\u0026', '\u005C'],
        $input
    );

    // Step 3: Optionally, encode any non-alphanumeric characters (e.g., control characters) to Unicode
    $input = preg_replace_callback('/[^\x20-\x7E]/', function ($matches) {
        return '\\u' . str_pad(dechex(ord($matches[0])), 4, '0', STR_PAD_LEFT);
    }, $input);

    // Return the fully filtered input
    return $input;
}
Injection in the onclick Attribute
Code Fragment
<p onclick="change('$var')">Comment</p>
Attack Method We assume that change() function exists at JS context. At first glance, it seems that standard protection for HTML attributes, such as escaping special characters, can prevent this attack. However, let’s see what happens if the attacker enters the following code:
<p onclick="change('&#39;); alert(1)//')")>Comment</p>
Looking at raw HTML code, initially, it may be hard to spot why XSS will succeed here. The apostrophe is replaced by &#39; - but in this case, this is not sufficient because the browser automatically decodes all HTML entities found in attribute values. Therefore, the JS engine will see the code as:
change(''); alert(1)//')
This example shows that protecting against XSS cannot be approached lightly. It is worth to keep in mind that some attributes have special meaning. For example, the content of the onclick attribute is treated as JS code, meaning the attacker does not need to escape this attribute.
Defense Method Encode both HTML attributes and JavaScript strings. Look at function filter($input) {...} sample above.
Injection in the href Attribute within the JS Protocol
Code Fragment
<a href="javascript:change('$var')">CLICK</a>
Attack Method Here, the attacker is exploiting multiple contexts: the HTML attribute, JavaScript, and URL encoding. The injected payload is URL-encoded, allowing the malicious code to bypass basic filters. For example:
<a href="javascript:change(' %27);alert(1)//')">CL</a>
In this case, the `%27` (which represents an apostrophe) is decoded by the browser, allowing the JS engine to execute the attack as at example above
Defense Method Apply three-layer encoding:
  • Encode within the JS string.
  • URL-encode.
  • Encode HTML entities.
This ensures that dynamically generated code doesn't end up inside a vulnerable attribute. Example:
function applyThreeLayerEncoding($input) {
    // 1. Encode characters for JavaScript strings (e.g., escape quotes and backslashes)
    $encodedForJS = str_replace(["'", '"', '\\'], ['\\u0027', '\\u0022', '\\u005C'], $input);

    // 2. URL-encode the input (e.g., percent-encode characters like `;`, `=`, `&`)
    $encodedForURL = urlencode($encodedForJS);

    // 3. Encode HTML entities to prevent raw HTML injection (e.g., < > &)
    $encodedForHTML = htmlspecialchars($encodedForURL, ENT_QUOTES, 'UTF-8');

    return $encodedForHTML;
}

Thank you for your attentions. If you are interested in similar content, then sign up to my newsletter:

architecture AWS cluster cyber-security devops devops-basics docker elasticsearch flask geo high availability java machine learning opensearch php programming languages python recommendation systems search systems spring boot symfony