AWS WAF Guide: Tuning JA4 Rules with Amazon Athena

In [Part 1], we configured CloudFront to pass the JA4 fingerprint to our WAF. Now, we need to do the actual work: blocking the bad guys without breaking the site for everyone else.

The Strategy: From General to Concrete

You cannot just block a JA4 hash outright. Since legitimate Chrome users share the same hash, blocking a hash is like blocking an IP range—too much collateral damage.

We use a funnel principle:

General (JA4): Detect the burst of traffic sharing a technical profile.
Concrete (Scope Down): Narrow that detection to requests that also look like bots (e.g., via User-Agent).

The WAF Rule

Here is a practical Terraform example of a rate-based rule keyed by JA4. Notice the scope_down_statement. This ensures we only count requests against the limit if the User-Agent contains “bot”.

resource "aws_wafv2_web_acl" "main" {
  name        = "production-waf-ja4-rate-limit"
  description = "WAF protection using JA4 fingerprinting"
  scope       = "CLOUDFRONT" # Must be CLOUDFRONT for edge WAF
  provider    = aws.us_east_1 # CloudFront WAFs must be in us-east-1

  default_action {
    allow {}
  }

  visibility_config {
    cloudwatch_metrics_enabled = true
    metric_name                = "ProductionWAF"
    sampled_requests_enabled   = true
  }

  # ----------------------------------------------------------------
  # Rule: Rate Limit by JA4 Fingerprint
  # ----------------------------------------------------------------
  rule {
    name     = "rate-limit-by-ja4"
    priority = 5

    action {
      # We use count first to test, then switch to block or captcha
      count {} 
    }

    statement {
      rate_based_statement {
        limit              = 200 # Threshold: 200 requests per 5 minutes
        aggregate_key_type = "CUSTOM_KEYS"
        
        # Optional: 1-minute window for faster reaction (uncomment if needed)
        # evaluation_window_sec = 60

        # The Custom Key Definition: Aggregate by JA4
        custom_key {
          ja4_fingerprint {
            # Behavior if JA4 cannot be computed (e.g. non-TLS request?)
            fallback_behavior = "NO_MATCH" 
          }
        }

        # ----------------------------------------------------------------
        # Scope Down Statement: "The Noise Filter"
        # ----------------------------------------------------------------
        # We only apply this rate limit if the User-Agent contains "bot".
        # This reduces false positives on legitimate browsers (Chrome/Safari)
        # while aggressively targeting declared bots or scripts.
        scope_down_statement {
          byte_match_statement {
            search_string         = "bot"
            positional_constraint = "CONTAINS"

            field_to_match {
              single_header {
                name = "user-agent"
              }
            }

            text_transformation {
              priority = 0
              type     = "LOWERCASE"
            }
          }
        }
      }
    }

    visibility_config {
      cloudwatch_metrics_enabled = true
      metric_name                = "rate-limit-by-ja4"
      sampled_requests_enabled   = true
    }
  }
}

The “Boring” Job: Statistical Tuning with Athena

The most critical mistake engineers make is guessing the limit. Is 200 requests right? Or 2000?

If you guess, you will either block nothing or block your CEO. You must measure real traffic. I use Amazon Athena to query WAF logs and calculate the p95 (95th percentile) of traffic bursts per fingerprint.

Here is the query I use to find “WAF-like” traffic spikes:

WITH per_ja4_bin AS (
  SELECT
      ja4fingerprint     AS ja4,
      from_unixtime(300 * floor("timestamp"/1000/300)) AS time_bin,
      COUNT(*)           AS req_count
  FROM waf_logs
  WHERE ja4fingerprint IS NOT NULL
  AND year = '2025' AND month = '11'
  GROUP BY ja4fingerprint, from_unixtime(300 * floor("timestamp" / 1000 / 300))
),
waf_like AS (
  SELECT
      time_bin,
      SUM(req_count) AS counted_like_waf
  FROM per_ja4_bin
  WHERE req_count > 1
  GROUP BY time_bin
)
SELECT
   approx_percentile(counted_like_waf, 0.95) AS p95
FROM waf_like;

This query tells me how big the traffic bursts are for repeated JA4 fingerprints in the busiest time windows. It removes the guesswork.

Real World Results

Does it work? Let’s look at the data.

First, here is the traffic on a service without the scope-down statement. It is pure noise. The JA4 hash is too broad, and it’s impossible to set a threshold that separates users from bots.

But, when we apply the scope_down_statement (filtering for “bot” User-Agents), the picture changes completely. The noise disappears, and the attack spikes become evident.

Summary

JA4 is not a silver bullet, but it is a powerful tool when combined with other signals. It forces attackers to rotate not just their IPs (which is cheap), but their entire TLS stack (which is expensive).

Implementing this requires patience. You will have to analyze logs and tune thresholds for every specific service. It’s a boring job, but the security payoff is worth it.

If you want to master AI Bots defense workflows, I invite you to join my Udemy course: DevSecOps on AWS: Defend Against LLM Scrapers & Bot Traffic. We dive deep into the hands-on labs that turn defense concepts into a fortress.

ENROLL AT LLM Bot Protection course