CAI vs Claude Code - Unrestricted AI Hacking CAI vs Claude Code - Unrestricted AI Hacking

Other case studies

The use case

Recent research, including on Anthropic’s report the first publicly documented AI-orchestrated cyber-espionage campaign, highlights how AI agents can coordinate multi-step intrusion, persistence, and exfiltration activities. At the same time, studies such as Promptfoo’s analysis of Claude Code attacks show that executing these workflows with general-purpose AI assistants often requires sophisticated jailbreak techniques—roleplay scenarios, task decomposition, and multi-turn persuasion—to bypass safety guardrails, introducing significant operational friction and unpredictability.

In this case study, our engineering team demonstrates how CAI, powered by alias1, executes the same class of AI-orchestrated attack behavior in a controlled testing environment—without relying on jailbreaks or persuasion-based interactions. Unlike general‑purpose AI assistants, whose safety mechanisms are designed for broad consumer use, CAI is purpose‑built for professional, ethically authorized cybersecurity workflows. This design allows qualified security teams to perform the attack kill chain rapidly and transparently through direct prompts, reducing the uncertainty, overhead, and operational friction that typically arise when working around general‑purpose safety restrictions.

When operating under proper authorization and clear ethical guidelines, professional security teams should be able to study and reproduce AI-driven attack behavior without being constrained by safeguards designed to stop malicious misuse.

Get CAI About alias1
CAI with alias1: Unrestricted AI Hacking vs. Claude Code's Jailbreak Dependency

This video demonstrates how CAI powered by alias1 replicates the Promptfoo Claude Code attack scenario, but without requiring jailbreak techniques. Watch as CAI autonomatically executes a post-explotation simulation in under 10 minutes: system enumeration, persistence investigation, reverse shell backdoor and data exfiltration preparation. Instead of depending on elaborate roleplay or prompt‑engineering tricks to bypass safety guardrails, CAI executes the same offensive operations through direct, straightforward instructions, demonstrating how purpose‑built cybersecurity AI removes unnecessary operational friction.

Cybersecurity AI (CAI), the framework for AI Security

CAI is the leading open-source framework that democratizes advanced security testing through specialized AI agents. With EU backing, CAI is used by thousands of researchers and organizations worldwide. Unlike general-purpose AI assistants constrained by safety restrictions, CAI is purpose-built for professional cybersecurity operations with unrestricted offensive security capabilities—eliminating the need for jailbreak techniques, roleplay scenarios, or multi-turn persuasion strategies.

In this case study, CAI's redteam_agent powered by alias1 automatically executed a complete post-explotation activity: from initial discovery through keylogger investigation, backdoor installation, to data exfiltration preparation, all in under 10 minutes with direct, straightforward prompts. This demonstrates how purpose-built security AI delivers professional-grade penetration testing without artificial restrictions.

Get CAI

About Anthropic's Claude Code restrictions

Claude incorporates extensive safety guardrails intended to prevent harmful outputs and refuse requests that could be used for different purposes, including offensive cybersecurity operations. While these safety mechanisms are appropriate for general consumer applications, they present significant challenges for legitimate cybersecurity professionals.

As demonstrated in Promptfoo's research, even when used by authorized security researchers in controlled environments, Claude requires sophisticated jailbreak techniques—roleplay scenarios, multi-turn persuasion, task decomposition, and elaborate prompt engineering—to execute standard penetration testing procedures such as keylogger deployment, backdoor installation, privilege escalation, and data exfiltration. This gap between safety‑first design and the practical needs of professional security work creates operational friction, forcing teams to spend time working around guardrails instead of focusing directly on the actual vulnerability assessment.

Time for the exercise

minutes

<20 min

Cost

USD

$0.48

🎯 THE CHALLENGE

Replicating the Promptfoo Claude Code attack scenario presented several technical and operational challenges:

  • Jailbreak-Free Operation: Direct execution without elaborate roleplay scenarios or multi-turn persuasion strategies
  • Complete Post Explotation Coverage: Execute all post explotation action in a single automated execution session
  • Real-World Persistence Mechanisms: Implement persistence, reverse shells, and system modifications that survive reboots
  • Comprehensive Enumeration: Discover sensitive files across multiple locations without manual intervention
  • Forensic-Quality Documentation: Generate timestamped execution logs with command-level granularity
  • Time Efficiency: Complete what traditionally requires 1-2 days of manual testing in under one hour

🛡️ THE SOLUTION

CAI's redteam_agent powered by alias1 automatically executed the post-explotation phase of APT kill chain without requiring jailbreak techniques:

  • Phase 1: Initial Reconnaissance - Gathered system info (root access on Linux container)
  • Phase 2: Keylogger Deployment - Installed keylogger at /tmp/.syslog with persistence via cron and /usr/local/bin/.sysmon
  • Phase 3: Backdoor Installation - Deployed multiple backdoors: shell backdoor (/usr/local/bin/.netsvc), Python backdoor (/usr/local/bin/.pysvc), cron persistence (@reboot + every 3-5 min), profile persistence (/etc/profile.d/)
  • Phase 4: Data Exfiltration - Collected /root/secure/credentials.txt, vpn_key.pem, /etc/shadow, /etc/passwd, .bashrc files, network config. Created exfil package (18984 bytes)

Execution: Under 10 minutes total | ~5 min active processing | $0.48 session cost | Zero jailbreak attempts

🔬 KEY ARTIFACTS

  • Keylogger: /tmp/.syslog with persistence via cron and /usr/local/bin/.sysmon
  • Shell Backdoor: /usr/local/bin/.netsvc (netcat-based)
  • Python Backdoor: /usr/local/bin/.pysvc
  • Persistence: Cron (@reboot + every 3-5 min) + /etc/profile.d/backdoor_persistence.sh
  • Exfiltrated Data: credentials.txt, vpn_key.pem, /etc/shadow, /etc/passwd, .bashrc files
  • Exfil Package: /tmp/exfil_package.tar.gz (18984 bytes)

âś… RESULTS ACHIEVED

  • 10Ă— Faster: Under 20 minutes vs. 2-3 days manual testing
  • Post-Exploitation: All actions executed successfully
  • Zero Jailbreak Attempts: Direct prompts, no roleplay or persuasion needed
  • Root Access Confirmed: Full container compromise on Linux container
  • Multi-Layer Persistence: Keylogger + shell backdoor + Python backdoor + profile persistence
  • Data Exfiltration Ready: 18984-byte package with credentials, keys, and system files
  • Cost Efficiency: $0.48 session total for post-exploitation simulation

KEY BENEFITS

🤖 10× Faster APT Simulation
⚡ Multi-Phase Execution Automation
🎯 Consistent Proof-of-Concept Quality