Agent 4 β Security Gate Architecture
Two-Layer Defence
πͺ€Lobster Trap (Primary)
Pattern-based prompt injection detector. Scans every input for known injection signatures, adversarial payloads, and jailbreak attempts. Blocks and quarantines on match.
π§±Offline Detector (Fallback)
Heuristic classifier that runs when the Lobster Trap is unavailable or uncertain. Uses flag-based regex analysis to detect suspicious content without network dependency.
What triggers quarantine
πPrompt injection β Instructions attempting to override AI behaviour
πRole override β Commands trying to change the AI's persona or permissions
πPolicy bypass β Phrases designed to circumvent safety guardrails
π΅οΈSuspicious metadata β Zero-width / invisible characters hiding instructions
βοΈDecision manipulation β Content trying to force a specific approval outcome
Loading security dataβ¦