Security Dashboard

Agent 4 β€” Security Gate Architecture

Two-Layer Defence

πŸͺ€Lobster Trap (Primary)

Pattern-based prompt injection detector. Scans every input for known injection signatures, adversarial payloads, and jailbreak attempts. Blocks and quarantines on match.

🧱Offline Detector (Fallback)

Heuristic classifier that runs when the Lobster Trap is unavailable or uncertain. Uses flag-based regex analysis to detect suspicious content without network dependency.

What triggers quarantine

πŸ’‰Prompt injection β€” Instructions attempting to override AI behaviour
🎭Role override β€” Commands trying to change the AI's persona or permissions
πŸ”“Policy bypass β€” Phrases designed to circumvent safety guardrails
πŸ•΅οΈSuspicious metadata β€” Zero-width / invisible characters hiding instructions
βš™οΈDecision manipulation β€” Content trying to force a specific approval outcome
Loading security data…