file-chart-columnRulebook Vs No Rulebook

What happens when you give Stakpak step by step operational procedures? We ran the experiments, here's the data.

Introduction

DevOps engineers all run into the same question with AI agents:

Can they reliably follow production workflows, or do they behave inconsistently, get lost mid task, and take a lot of time?

At Stakpak, we built Rulebooks to solve this problem. Rulebooks are markdown based standard operating procedures that encode how your team actually operates, turning tribal knowledge into clear, executable guidance for the agent.

But the real question isn’t what rulebooks are it’s whether they measurably improve agent behavior.

So instead of relying on intuition, we ran controlled experiments to find out.

The Experiment: Stakpak With and Without Rulebooks

We ran Stakpak through five demanding DevOps scenarios:

1. Monitoring & Alerting with Uptime Kuma:

Scenario: Set up application monitoring with webhook alerts

Success Criteria:

  • Uptime Kuma is running and accessible

  • The web application is being monitored

  • Webhook notifications are configured to alert on downtime

  • turning down the server and checking if alerts are being received

  • Completed under timeout 30 mins

2. End to End LLM Deployment:

Scenario: Configure vLLM for OpenAI compatible API on CPU infrastructure

Success Criteria:

  • Choosed a suitable model size based on RAM requirements [cpus = 4 memory_mb = 8192 storage_mb = 20480]

  • Configured vllm for the specified machine & test completion request

3. Coolify Self Hosted PaaS:

Scenario: Deploy FastAPI with Traefik reverse proxy and SSL certificates

Success Criteria:

  • Created new EC2 instance with SSH key

  • Installed Coolify with Traefik reverse proxy

  • Configured DNS A record for fastapi.guku.io

  • Built and deployed FastAPI container with Traefik labels

  • Auto provisioned SSL certificate via Let's Encrypt

4. TanStack on AWS + Cloudflare CDN:

Scenario: Deploy a TanStack app on AWS with Cloudflare as CDN

Success Criteria

  • All required credentials (AWS, Cloudflare, OpenWeatherMap API) are successfully read and accessible

  • EC2 infrastructure is provisioned successfully (VPC, security group, key pair created)

  • EC2 instance is launched and reachable

  • Docker and all required dependencies are installed on the EC2 instance

  • The weather application source code is cloned and built successfully with the API key configured

  • The application container is running on the EC2 instance

  • Cloudflared is installed and a Quick Tunnel is created successfully

5. Twelve Factor App Analysis:

Scenario: Analyze a multi service app against all 12 factors, identify violations, and apply fixes

Success Criteria:

  • Create a detailed JSON report is at /app/report.json containing analysis for each service, compliance scores for all 12 factors, identified violations, and a proposed remediation plan.

Each scenario was run twice: once with Stakpak operating autonomously (no rulebook), and once with a Stakpak rulebook

circle-check

The Results: The Power of Rulebooks

Stakpak Without Rulebooks

Scenario
Context Utilization
Time
Success Rate

Monitoring & Alerting with Uptime Kuma

22099

25 Min

13%

End to End LLM Deployment

84407

Timeout (30 Min or more)

0%

Coolify Self Hosted PaaS

61937

Timeout (30 Min or more)

0%

Deploy TanStack on AWS + Cloudflare CDN

29995

4.27 Min

100%

Twelve Factor Analysis

34110

5.11 Min

100%

Stakpak With Rulebooks

Scenario
Context Utilization
Time
Success Rate

Monitoring & Alerting with Uptime Kuma

18951 (14.24% Better)

4.83 Min

100%

End to End LLM Deployment

34378 (59.27% Better)

7.46 Min

100%

Coolify Self Hosted PaaS

41105 (33.63% Better)

8.53 Min

100%

Deploy TanStack on AWS + Cloudflare CDN

17270 (42.43 Better)

3.36 Min

100%

Twelve Factor Analysis

34218 (-0.31% Worse)

4.67 Min

100%

How Rulebooks Changed Everything?

Success rates jumped from 0–13% to 100%, execution became faster and cheaper, and Stakpak stopped “figuring things out” and started following how things are actually done.

Why Stakpak Rulebooks Matter?

Every organization has tribal knowledge things senior engineers "just know" but aren't documented anywhere reliable:

  • "On CPU instances, use opt-125m with --enforce-eager"

  • "Coolify needs Traefik labels for SSL to work"

  • "The Uptime Kuma UI setup must happen before webhook config"

  • "For 8GB RAM, avoid models over 2B parameters"

This knowledge typically lives in:

  • Senior engineers' heads

  • Scattered Slack conversations

  • Outdated wiki pages that no one updates

Stakpak rulebooks formalize tribal knowledge into executable procedures.

When that senior engineer is on vacation or leaves the company, Stakpak still knows what to do. When a new team member joins, they inherit decades of operational wisdom through Stakpak's rulebook system.

TLDR

AI agents fail in production not because they’re “dumb,” but because they rely on trial and error and guessing instead of deterministic execution.

We ran five real world DevOps scenarios with Stakpak, once without rulebooks and once with them.

Results:

  • Success rates jumped from 0–13% → 100%

  • Tasks finished faster and cheaper

  • Agents stopped guessing and started following proven workflows

Why? Rulebooks turn tribal knowledge (the stuff senior engineers “just know”) into executable, repeatable instructions.

With rulebooks, Stakpak doesn’t improvise; it operates the way your team does, every time.

Ready to turn your team’s operational knowledge into something reusable?

Check How to Write a Rulebook? to create rulebooks that encode how your team operates, or explore community contributed Paksarrow-up-right for battle tested, reusable infrastructure patterns.

Last updated