Spam Filter
crate.spam.spamfilter classifies a map of field names to values against a set of heuristics and produces a weighted score. Above a configurable threshold, the verdict is “spam.”
Import
Section titled “Import”import crate.spam.spamfilter;SpamFilterConfig config = { threshold: 2.0, spamWords: ["viagra", "crypto", "seo-services"], disposableDomains: ["mailinator.com", "tempmail.org"], spamDomains: ["bad-actor.example"],};
string[string] fields = [ "name": "John", "email": "user@mailinator.com", "message": "Buy cheap viagra now!!!",];
auto verdict = classify(fields, config);
if(verdict.isSpam) { // block or tarpit}classify
Section titled “classify”SpamVerdict classify(string[string] fields, SpamFilterConfig config);Field named email is run through the email rule set; every other field is run through the text rule set. Scores from every triggered rule accumulate into SpamVerdict.score.
SpamFilterConfig
Section titled “SpamFilterConfig”| Field | Type | Default | Description |
|---|---|---|---|
threshold | float | 2.0 | Score at or above which the verdict is spam |
spamWords | string[] | [] | Case-insensitive substrings to flag in text |
disposableDomains | string[] | [] | Exact-match email domains to flag |
spamDomains | string[] | [] | Substrings to flag anywhere in the email |
SpamVerdict
Section titled “SpamVerdict”| Field | Type | Description |
|---|---|---|
score | float | Sum of all triggered rule scores |
isSpam | bool | score >= config.threshold |
triggered | RuleResult[] | Each rule that fired, with its weight |
RuleResult is { string rule; float score; string field; }.
Text Rules
Section titled “Text Rules”Applied to every non-email field:
| Rule | Score | Trigger |
|---|---|---|
RANDOM_CHARS | 1.5 | >85% consonant ratio, or Shannon entropy > 3.5 |
SPECIAL_CHARS | 1.0 | >30% non-alphanumeric characters (excluding -, ', space) |
SQL_INJECTION | 2.0 | Classic SQLi substrings (' OR, UNION SELECT, 1=1, -- SELECT, …) |
HTML_INJECTION | 2.0 | <script, <img, <iframe, javascript:, onerror= |
CAPITALIZATION | 0.5 | All-caps text (length > 2) |
NUMBERS_ONLY | 1.0 | Only digits |
URL | 1.5 | Contains http://, https://, or www. |
SPAM_WORDS | 1.0 | Contains any config.spamWords substring (case-insensitive) |
Email Rules
Section titled “Email Rules”Applied only to the email field:
| Rule | Score | Trigger |
|---|---|---|
INVALID_FORMAT | 2.0 | No @, missing TLD, or TLD shorter than 2 characters |
RESERVED_TLD | 2.0 | TLD is one of test, example, invalid, localhost, local, tst |
DISPOSABLE | 2.0 | Domain matches config.disposableDomains exactly |
SPAM_DOMAINS | 2.0 | Email contains any config.spamDomains substring |
Standalone Helpers
Section titled “Standalone Helpers”These are exported so you can reuse them outside the main classify flow:
| Function | Description |
|---|---|
isRandomChars(text) | Consonant-ratio + entropy heuristic |
shannonEntropy(text) | Byte-level Shannon entropy |
hasExcessiveSpecialChars(text) | >30% special-character ratio |
hasSqlInjection(text) / hasHtmlInjection(text) | Injection substring scan |
isAllCaps(text) / isNumbersOnly(text) | Shape heuristics |
containsUrl(text) | URL presence |
isValidEmailFormat(email) / extractDomain(email) | Lightweight email parsing |
isReservedTld(domain) | Checks the reserved-TLD list above |
isDisposableDomain(domain, list) / isSpamDomain(email, list) | List membership checks |
Use them directly when you want one specific check without the full scoring pass — for example, reject any form submission containing <script before you even run the scorer.