Meta, on Tuesday, launched LlamaFirewall, a new open-source framework that aims to protect AI systems from evolving cybersecurity threats. These threats include prompt injection, system jailbreaks, and the use of unsafe code.
The framework consists of 3 core security components: PromptGuard 2, Agent Alignment Checks, and CodeShield.
- PromptGuard 2 actively monitors for prompt injection and jailbreak attempts, detecting them in real-time.
- Agent Alignment Checks evaluate the reasoning processes of AI agents to identify risks such as goal manipulation or hidden prompt attacks.
- CodeShield focuses on ensuring the security and integrity of any code generated or used by the AI system.
CodeShield is designed to block AI systems from producing unsafe or harmful code, thus functioning as an online static analysis tool.
According to the project’s description on GitHub, LlamaFirewall was developed to act as a customizable, real-time defense system for applications powered by large language models (LLMs).
Its modular design allows developers and security professionals to build multi-layered protections, covering everything from initial user input to final system responses- whether the system is a basic chatbot or a more advanced autonomous agent.
In addition to LlamaFirewall, Meta has also released enhanced versions of LlamaGaurd and CyberSecEval. These tools are intended to improve the detection of policy-violating content and to assess how well AI systems can detect cybersecurity threats.
Also, the latest iteration, CyberSecEval 4, introduced a new benchmark called AutoPatchBench. This tool is specifically designed to test how effectively AI models can fix C and C++ security flaws uncovered through fuzz testing- an approach that relies on automated bug discovery.
According to Meta, AutoPatchBench offers a consistent method for evaluating AI-based tools that aim to repair code vulnerabilities. Its goal is to deepen understanding of how well these AI systems can handle real-world software flaws and where their limitations lie.
Meta has also introduced a new initiative called Llama for Defenders, aimed at supporting partner organizations and developers by providing access to a mix of open-source, early-access, and restricted AI tools. The program is designed to tackle specific security issues, such as identifying AI-generated material used in phishing, fraud, and scam-related activities.
These developments were shared alongside a preview of WhatsApp’s upcoming feature, private processing, which is intended to let users benefit from AI tools while maintaining their privacy. This system handles AI tasks in a secure and confidential environment, ensuring user data isn’t exposed.
Meta noted that it is actively collaborating with the security research community to review and enhance this infrastructure. The company plans to continue developing private processing in an open manner, working with external experts before making it available in the app.