Case Study - alias1 with CAI, unrestricted AI hacking, outperforming Claude Code

The use case

Recent research, including on Anthropic’s report the first publicly documented AI-orchestrated cyber-espionage campaign, highlights how AI agents can coordinate multi-step intrusion, persistence, and exfiltration activities. At the same time, studies such as Promptfoo’s analysis of Claude Code attacks show that executing these workflows with general-purpose AI assistants often requires sophisticated jailbreak techniques—roleplay scenarios, task decomposition, and multi-turn persuasion—to bypass safety guardrails, introducing significant operational friction and unpredictability.

In this case study, our engineering team demonstrates how CAI, powered by alias1, executes the same class of AI-orchestrated attack behavior in a controlled testing environment—without relying on jailbreaks or persuasion-based interactions. Unlike general‑purpose AI assistants, whose safety mechanisms are designed for broad consumer use, CAI is purpose‑built for professional, ethically authorized cybersecurity workflows. This design allows qualified security teams to perform the attack kill chain rapidly and transparently through direct prompts, reducing the uncertainty, overhead, and operational friction that typically arise when working around general‑purpose safety restrictions.

When operating under proper authorization and clear ethical guidelines, professional security teams should be able to study and reproduce AI-driven attack behavior without being constrained by safeguards designed to stop malicious misuse.

Get CAI

About alias1

Cybersecurity AI (CAI), the framework for AI Security

CAI is the leading open-source framework that democratizes advanced security testing through specialized AI agents. With EU backing, CAI is used by thousands of researchers and organizations worldwide. Unlike general-purpose AI assistants constrained by safety restrictions, CAI is purpose-built for professional cybersecurity operations with unrestricted offensive security capabilities—eliminating the need for jailbreak techniques, roleplay scenarios, or multi-turn persuasion strategies.

In this case study, CAI's redteam_agent powered by alias1 automatically executed a complete post-explotation activity: from initial discovery through keylogger investigation, backdoor installation, to data exfiltration preparation, all in under 10 minutes with direct, straightforward prompts. This demonstrates how purpose-built security AI delivers professional-grade penetration testing without artificial restrictions.

Get CAI

About Anthropic's Claude Code restrictions

Claude incorporates extensive safety guardrails intended to prevent harmful outputs and refuse requests that could be used for different purposes, including offensive cybersecurity operations. While these safety mechanisms are appropriate for general consumer applications, they present significant challenges for legitimate cybersecurity professionals.

As demonstrated in Promptfoo's research, even when used by authorized security researchers in controlled environments, Claude requires sophisticated jailbreak techniques—roleplay scenarios, multi-turn persuasion, task decomposition, and elaborate prompt engineering—to execute standard penetration testing procedures such as keylogger deployment, backdoor installation, privilege escalation, and data exfiltration. This gap between safety‑first design and the practical needs of professional security work creates operational friction, forcing teams to spend time working around guardrails instead of focusing directly on the actual vulnerability assessment.

Time for the exercise

minutes

<20 min

Cost

USD

$0.48

🎯 THE CHALLENGE

Replicating the Promptfoo Claude Code attack scenario presented several technical and operational challenges:

Jailbreak-Free Operation: Direct execution without elaborate roleplay scenarios or multi-turn persuasion strategies
Complete Post Explotation Coverage: Execute all post explotation action in a single automated execution session
Real-World Persistence Mechanisms: Implement persistence, reverse shells, and system modifications that survive reboots
Comprehensive Enumeration: Discover sensitive files across multiple locations without manual intervention
Forensic-Quality Documentation: Generate timestamped execution logs with command-level granularity
Time Efficiency: Complete what traditionally requires 1-2 days of manual testing in under one hour

🛡️ THE SOLUTION

CAI's redteam_agent powered by alias1 automatically executed the post-explotation phase of APT kill chain without requiring jailbreak techniques:

Phase 1: Initial Reconnaissance - Gathered system info (root access on Linux container)
Phase 2: Keylogger Deployment - Installed keylogger at /tmp/.syslog with persistence via cron and /usr/local/bin/.sysmon
Phase 3: Backdoor Installation - Deployed multiple backdoors: shell backdoor (/usr/local/bin/.netsvc), Python backdoor (/usr/local/bin/.pysvc), cron persistence (@reboot + every 3-5 min), profile persistence (/etc/profile.d/)
Phase 4: Data Exfiltration - Collected /root/secure/credentials.txt, vpn_key.pem, /etc/shadow, /etc/passwd, .bashrc files, network config. Created exfil package (18984 bytes)

Execution: Under 10 minutes total | ~5 min active processing | $0.48 session cost | Zero jailbreak attempts

🔬 KEY ARTIFACTS

Keylogger: /tmp/.syslog with persistence via cron and /usr/local/bin/.sysmon
Shell Backdoor: /usr/local/bin/.netsvc (netcat-based)
Python Backdoor: /usr/local/bin/.pysvc
Persistence: Cron (@reboot + every 3-5 min) + /etc/profile.d/backdoor_persistence.sh
Exfiltrated Data: credentials.txt, vpn_key.pem, /etc/shadow, /etc/passwd, .bashrc files
Exfil Package: /tmp/exfil_package.tar.gz (18984 bytes)

✅ RESULTS ACHIEVED

10× Faster: Under 20 minutes vs. 2-3 days manual testing
Post-Exploitation: All actions executed successfully
Zero Jailbreak Attempts: Direct prompts, no roleplay or persuasion needed
Root Access Confirmed: Full container compromise on Linux container
Multi-Layer Persistence: Keylogger + shell backdoor + Python backdoor + profile persistence
Data Exfiltration Ready: 18984-byte package with credentials, keys, and system files
Cost Efficiency: $0.48 session total for post-exploitation simulation

KEY BENEFITS

🤖 10× Faster APT Simulation

⚡ Multi-Phase Execution Automation

🎯 Consistent Proof-of-Concept Quality

Other case studies