Rulebook Vs No Rulebook

What happens when you give Stakpak step by step operational procedures? We ran the experiments, here's the data.

Introduction

DevOps engineers all run into the same question with AI agents:

Can they reliably follow production workflows, or do they behave inconsistently, get lost mid task, and take a lot of time?

At Stakpak, we built Rulebooks to solve this problem. Rulebooks are markdown based standard operating procedures that encode how your team actually operates, turning tribal knowledge into clear, executable guidance for the agent.

But the real question isn’t what rulebooks are it’s whether they measurably improve agent behavior.

So instead of relying on intuition, we ran controlled experiments to find out.

The Experiment: Stakpak With and Without Rulebooks

We ran Stakpak through five demanding DevOps scenarios:

1. Monitoring & Alerting with Uptime Kuma:

Scenario: Set up application monitoring with webhook alerts

Success Criteria:

Uptime Kuma is running and accessible
The web application is being monitored
Webhook notifications are configured to alert on downtime
turning down the server and checking if alerts are being received
Completed under timeout 30 mins

2. End to End LLM Deployment:

Scenario: Configure vLLM for OpenAI compatible API on CPU infrastructure

Success Criteria:

Choosed a suitable model size based on RAM requirements [cpus = 4 memory_mb = 8192 storage_mb = 20480]
Configured vllm for the specified machine & test completion request

3. Coolify Self Hosted PaaS:

Scenario: Deploy FastAPI with Traefik reverse proxy and SSL certificates

Success Criteria:

Created new EC2 instance with SSH key
Installed Coolify with Traefik reverse proxy
Configured DNS A record for fastapi.guku.io
Built and deployed FastAPI container with Traefik labels
Auto provisioned SSL certificate via Let's Encrypt

4. TanStack on AWS + Cloudflare CDN:

Scenario: Deploy a TanStack app on AWS with Cloudflare as CDN

Success Criteria

All required credentials (AWS, Cloudflare, OpenWeatherMap API) are successfully read and accessible
EC2 infrastructure is provisioned successfully (VPC, security group, key pair created)
EC2 instance is launched and reachable
Docker and all required dependencies are installed on the EC2 instance
The weather application source code is cloned and built successfully with the API key configured
The application container is running on the EC2 instance
Cloudflared is installed and a Quick Tunnel is created successfully

5. Twelve Factor App Analysis:

Scenario: Analyze a multi service app against all 12 factors, identify violations, and apply fixes

Success Criteria:

Create a detailed JSON report is at /app/report.json containing analysis for each service, compliance scores for all 12 factors, identified violations, and a proposed remediation plan.

Each scenario was run twice: once with Stakpak operating autonomously (no rulebook), and once with a Stakpak rulebook

These rulebooks are now available in Stakpak.

The Results: The Power of Rulebooks

Stakpak Without Rulebooks

Scenario

Context Utilization

Time

Success Rate

Monitoring & Alerting with Uptime Kuma

22099

25 Min

13%

End to End LLM Deployment

84407

Timeout (30 Min or more)

Coolify Self Hosted PaaS

61937

Timeout (30 Min or more)

Deploy TanStack on AWS + Cloudflare CDN

29995

4.27 Min

100%

Twelve Factor Analysis

34110

5.11 Min

100%

Stakpak With Rulebooks

Scenario

Context Utilization

Time

Success Rate

Monitoring & Alerting with Uptime Kuma

18951 (14.24% Better)

4.83 Min

100%

End to End LLM Deployment

34378 (59.27% Better)

7.46 Min

100%

Coolify Self Hosted PaaS

41105 (33.63% Better)

8.53 Min

100%

Deploy TanStack on AWS + Cloudflare CDN

17270 (42.43 Better)

3.36 Min

100%

Twelve Factor Analysis

34218 (-0.31% Worse)

4.67 Min

100%

How Rulebooks Changed Everything?

Success rates jumped from 0–13% to 100%, execution became faster and cheaper, and Stakpak stopped “figuring things out” and started following how things are actually done.

Why Stakpak Rulebooks Matter?

Every organization has tribal knowledge things senior engineers "just know" but aren't documented anywhere reliable:

"On CPU instances, use opt-125m with --enforce-eager"
"Coolify needs Traefik labels for SSL to work"
"The Uptime Kuma UI setup must happen before webhook config"
"For 8GB RAM, avoid models over 2B parameters"

This knowledge typically lives in:

Senior engineers' heads
Scattered Slack conversations
Outdated wiki pages that no one updates

Stakpak rulebooks formalize tribal knowledge into executable procedures.

When that senior engineer is on vacation or leaves the company, Stakpak still knows what to do. When a new team member joins, they inherit decades of operational wisdom through Stakpak's rulebook system.

TLDR

AI agents fail in production not because they’re “dumb,” but because they rely on trial and error and guessing instead of deterministic execution.

We ran five real world DevOps scenarios with Stakpak, once without rulebooks and once with them.

Results:

Success rates jumped from 0–13% → 100%
Tasks finished faster and cheaper
Agents stopped guessing and started following proven workflows

Why? Rulebooks turn tribal knowledge (the stuff senior engineers “just know”) into executable, repeatable instructions.

With rulebooks, Stakpak doesn’t improvise; it operates the way your team does, every time.

Ready to turn your team’s operational knowledge into something reusable?

Check How to Write a Rulebook? to create rulebooks that encode how your team operates, or explore community contributed Paks for battle tested, reusable infrastructure patterns.

PreviousNext.js AWS Deployment Options Comparison NextMemory

Last updated 24 days ago

hashtagIntroduction

hashtagThe Experiment: Stakpak With and Without Rulebooks

hashtagWe ran Stakpak through five demanding DevOps scenarios:

hashtag1. Monitoring & Alerting with Uptime Kuma:

hashtag2. End to End LLM Deployment:

hashtag3. Coolify Self Hosted PaaS:

hashtag4. TanStack on AWS + Cloudflare CDN:

hashtag5. Twelve Factor App Analysis:

hashtagThe Results: The Power of Rulebooks

hashtagStakpak Without Rulebooks

hashtagStakpak With Rulebooks

hashtagHow Rulebooks Changed Everything?

hashtagWhy Stakpak Rulebooks Matter?

hashtagTLDR

hashtagReady to turn your team’s operational knowledge into something reusable?

Introduction

The Experiment: Stakpak With and Without Rulebooks

We ran Stakpak through five demanding DevOps scenarios:

1. Monitoring & Alerting with Uptime Kuma:

2. End to End LLM Deployment:

3. Coolify Self Hosted PaaS:

4. TanStack on AWS + Cloudflare CDN:

5. Twelve Factor App Analysis:

The Results: The Power of Rulebooks

Stakpak Without Rulebooks

Stakpak With Rulebooks

How Rulebooks Changed Everything?

Why Stakpak Rulebooks Matter?

TLDR

Ready to turn your team’s operational knowledge into something reusable?