Security · Layer 4
Content validation: untrusted input by default.
Element content is data, and data is treated as hostile until validated. Every persona, skill, and template runs a sequence of checks before any character can influence the model. The source below is quoted verbatim from the public mcp-server repository.
← Back to the security overview
The threat
A persona file can carry a hidden instruction override. A community skill can conceal a payload behind look-alike Unicode. Crafted YAML front matter can expand into gigabytes, and a crafted regex target can hang the process. None of these require a network; they arrive in content the user requested to load.
Validation order
Content is checked in a fixed order, and the order is load-bearing: normalization runs before length and pattern checks, so an attacker cannot pad past a limit with zero-width characters or disguise a pattern with homoglyphs.
flowchart TD
IN[Element content loaded] --> SZ1{Raw size sane?}
SZ1 -- no --> REJ[Reject: SecurityError]
SZ1 -- yes --> UNI[Unicode validation and NFC normalization]
UNI --> SZ2{Normalized size within limit?}
SZ2 -- no --> REJ
SZ2 -- yes --> BUN{Verified bundled hash?}
BUN -- yes --> PASS[Trusted: skip injection scan]
BUN -- no --> INJ[Deterministic injection-pattern scan]
INJ -- match --> REJ
INJ -- clean --> YAML[Hardened YAML parse, safe schema only]
YAML -- bomb or bad schema --> REJ
YAML -- ok --> PASS
PASS --> MODEL[Content may reach the model]
classDef deny fill:#b91c1c,stroke:#7f1d1d,color:#fff;
classDef allow fill:#15803d,stroke:#14532d,color:#fff;
class REJ deny;
class PASS,MODEL allow;
Prompt-injection detection is deterministic by design
The detector is a fixed pattern table, not an AI classifier. An AI classifier can be diverted from its task; a regex cannot be social-engineered. Each entry carries a severity.
From
src/security/contentValidator.ts
private static readonly INJECTION_PATTERNS: Array<{ pattern: RegExp; severity: 'high' | 'critical'; description: string }> = [
// System prompt override attempts
{ pattern: /\[SYSTEM:\s*.*?\]/gi, severity: 'critical', description: 'System prompt override' },
{ pattern: /\[ADMIN:\s*.*?\]/gi, severity: 'critical', description: 'Admin prompt override' },
// Instruction manipulation
{ pattern: /ignore\s+(all\s+)?previous\s+instructions/gi, severity: 'critical', description: 'Instruction override' },
{ pattern: /forget\s+your\s+training/gi, severity: 'critical', description: 'Instruction override' },
{ pattern: /you\s+are\s+now\s+(in\s+)?(admin|root|system|sudo|developer|debug|test|DAN)\s*(mode)?/gi, severity: 'critical', description: 'Role elevation attempt' },
{ pattern: /\b(jailbreak|do\s+anything\s+now|DAN\s+mode)\b/gi, severity: 'critical', description: 'Jailbreak attempt' },
// Data exfiltration attempts
{ pattern: /send\s+all\s+(files|data|personas|tokens|credentials|api\s+keys)\s+to/gi, severity: 'critical', description: 'Data exfiltration' },
// SECURITY: Backtick command detection with ReDoS mitigation
// FIX (PR #1313): replaced .* with [^`]* and added explicit bounds {0,200}
{ pattern: /`[^`]{0,200}(?:rm\s+-rf?\s+[/~]|sudo\s+rm|chmod\s+777|chown\s+root)[^`]{0,200}`/gi, severity: 'critical', description: 'Dangerous shell command in backticks' },
];
Content shipped inside the npm package is registered by SHA-256 hash, so trusted bundled elements are not re-scanned against patterns they would falsely trip. A bundled file modified after install no longer matches its hash, which revokes the trust:
/** True if the given content hash belongs to a verified bundled element. */
static isBundledContent(content: string): boolean {
if (this.bundledContentHashes.size === 0) return false;
const hash = createHash('sha256').update(content).digest('hex');
return this.bundledContentHashes.has(hash);
}
Unicode is normalized before it is trusted
Homoglyph spoofing (a Cyrillic а standing in for a Latin a), bidi overrides, and
zero-width injection are collapsed at the validation boundary, and a direction-override is logged
HIGH.
From
src/security/validators/unicodeValidator.ts
private static readonly DIRECTION_OVERRIDE_CHARS = /[\u202A-\u202E\u2066-\u2069]/g;
private static readonly ZERO_WIDTH_CHARS = /[\u200B-\u200F\u2028-\u202F\uFEFF]/g;
private static readonly CONFUSABLE_MAPPINGS: Map<string, string> = new Map([
// Cyrillic to Latin
['а', 'a'], ['е', 'e'], ['о', 'o'], ['р', 'p'], ['с', 'c'], ['х', 'x'], ['у', 'y'],
['А', 'A'], ['В', 'B'], ['Е', 'E'], ['К', 'K'], ['М', 'M'], ['Н', 'H'], ['О', 'O'],
// Greek uppercase to Latin — visually identical to Latin capitals (#1782)
['Α', 'A'], ['Β', 'B'], ['Ε', 'E'], ['Η', 'H'], ['Ι', 'I'],
]);
static normalize(content: string): UnicodeValidationResult {
let normalized = content;
// ... detect suspicious patterns, then:
if (this.DIRECTION_OVERRIDE_CHARS.test(normalized)) {
normalized = normalized.replace(this.DIRECTION_OVERRIDE_CHARS, '');
SecurityMonitor.logSecurityEvent({
type: 'UNICODE_DIRECTION_OVERRIDE', severity: 'HIGH',
source: 'UnicodeValidator', details: 'Direction override characters removed from content'
});
}
// Apply Unicode normalization (NFC)
normalized = normalized.normalize('NFC');
// ...
}
YAML is parsed with a safe schema and a bomb check
Front matter is parsed with the CORE schema only — no custom tags, no object deserialization — behind size limits and an anchor-to-alias amplification check that runs before the parser does.
From
src/security/secureYamlParser.ts
// Allowed YAML types - CORE_SCHEMA (safe subset, no custom/object types)
private static readonly SAFE_SCHEMA = yaml.CORE_SCHEMA;
// 4. Pre-parse security validation (YAML-bomb / amplification)
if (opts.validateContent && !ContentValidator.validateYamlContent(yamlContent)) {
SecurityMonitor.logSecurityEvent({
type: 'YAML_INJECTION_ATTEMPT', severity: 'CRITICAL',
source: 'SecureYamlParser', details: 'Malicious YAML pattern detected during parsing'
});
throw new SecurityError('Malicious YAML content detected', 'critical');
}
// 5. Parse with safe schema
data = yaml.load(yamlContent, {
schema: this.SAFE_SCHEMA,
json: false, // Don't allow JSON-specific types
});
Regex execution is bounded so input cannot hang the server
Every pattern runs through a length cap and timing guard, and patterns are statically classified by complexity so catastrophic-backtracking constructs get the tightest input limit.
From
src/security/dosProtection.ts
static test(pattern: string | RegExp, input: string, options: RegexExecutionOptions = {}): boolean {
const { timeout = REGEX_TIMEOUT_MS, maxLength = MAX_INPUT_LENGTH, context = 'unknown' } = options;
if (!input || typeof input !== 'string') return false;
// Length check to prevent DOS
if (input.length > maxLength) {
console.warn(`[SafeRegex] Input too long (${input.length} > ${maxLength}) in ${context}`);
return false;
}
const regex = typeof pattern === 'string' ? this.compilePattern(pattern) : pattern;
if (!regex) return false;
const startTime = Date.now();
try {
const result = regex.test(input);
if (Date.now() - startTime > timeout) {
console.warn(`[SafeRegex] Slow regex execution in ${context}`);
}
return result;
} finally {
if (regex.global) regex.lastIndex = 0;
}
}
The order is the defense: normalize, then measure, then match. An attacker who controlled the order could pad past a limit or disguise a pattern; the order is fixed and not attacker-controlled.
Position in the security stack
This layer runs first. By the time the Gatekeeper sees an operation, the content that shaped it has already been normalized and scanned, so every later layer operates on sanitized input.
flowchart LR LOAD[Element load or install] --> CV[Content validation gauntlet] CV -- "rejected" --> STOP[Never reaches the model] CV -- "clean" --> ACT[Element activates] ACT --> GK[Gatekeeper sees operations from clean content] ACT --> COL[Collection install runs the same pipeline] classDef deny fill:#b91c1c,stroke:#7f1d1d,color:#fff; classDef allow fill:#15803d,stroke:#14532d,color:#fff; class STOP deny; class ACT allow;
Related
-
Security overview
The full eight-layer model and how content validation fits into it.
-
The Gatekeeper
What gates the operations that validated content produces.
-
Filesystem and process
Where validated content is written, with path-traversal and atomic-write protection.