> For the complete documentation index, see [llms.txt](https://stakpak.gitbook.io/docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://stakpak.gitbook.io/docs/tutorial/investigate-and-clean-up-unused-cloud-resources.md).

# Investigate and Clean Up Unused Cloud Resources

## Overview

By the end of this tutorial, you'll learn how to use Stakpak to investigate zombie resources in a live AWS production account, identify every detached volume, idle load balancer, orphaned snapshot, and forgotten instance silently accruing\
charges, apply the right cleanups safely, validate that production stays healthy throughout, and configure Stakpak [Autopilot](/docs/how-it-works/autopilot.md) to help detect similar resource sprawl automatically in the future.

{% hint style="info" %}
Stakpak is open source, vendor neutral, and works with any model you choose.
{% endhint %}

## Problem

AWS environments naturally accumulate unused resources over time: detached volumes, old snapshots, idle load balancers, unassociated Elastic IPs, and forgotten S3 buckets.

Finding them is easy. Determining whether they're safe to delete is not.

A resource may look unused, but it could still support a production workload, backup process, or undocumented dependency. Safely cleaning up cloud waste requires connecting usage, ownership, and activity data across your environment before taking action.

## How Stakpak Helps?

Instead of manually pulling describe-\* calls, CloudWatch metrics, CloudTrail history, and tag attribution across every resource type and stitching the results together by hand, we can ask Stakpak to investigate the account for us.

Stakpak inspects the live AWS environment, correlates each resource with its real usage signals, traces the relationships between them (instance to volume to snapshot to AMI, load balancer to target group to instance, Elastic IP to network\
interface), and cross-references everything against the production application catalog to determine what actually supports a running workload and what does not.

It then classifies every resource as safe to delete, needs review, or keep, with the evidence behind each decision:

utilization data, ownership signals, last-access timestamps, and any downstream dependencies it found. It proposes a cleanup plan ordered by savings and risk, applies the safe removals once approved, and validates that production stays\
healthy throughout.

Then, we'll configure Stakpak [Autopilot](/docs/how-it-works/autopilot.md) to continuously monitor the AWS account and help detect resource sprawl automatically in the future.

## Application

Northstar Commerce is a B2B ecommerce platform running on AWS, with workloads spread across EKS, ECS Fargate, Lambda, and\
Vercel. The main components are:

* storefront: Customer facing Next.js app on Vercel.
* api-gateway: Public REST and GraphQL edge on EKS.
* orders-service: Order lifecycle, Go on EKS, backed by Aurora PostgreSQL.
* payments-service: Java on ECS Fargate, integrates with Stripe.
* Inventory-worker: Celery workers on EKS draining an SQS queue.
* search-indexer: Rust Lambda keeping OpenSearch in sync.
* admin-console: React SPA on S3 behind CloudFront.

Shared infrastructure includes an EKS cluster, an Aurora cluster, an ElastiCache Redis, an MSK cluster, an OpenSearch domain, ECR, Route 53, ACM, and Secrets Manager.

Primary region is us-east-1, with us-west-2 as a disaster recovery region.

Every workload in the catalog is healthy and serving traffic. None of the recent deploys touched infrastructure.&#x20;

The application itself is well understood and accounted for, but the AWS account it runs in has accumulated years of side projects, migrations, and experiments that nobody has audited. Anything we find outside of this catalog is a candidate for cleanup, as long as we can prove it isn't quietly supporting one of these workloads.

Now that we understand the app and architecture, we can start investigating the account.

## Step-by-Step Guide

### Prerequisites

1. [Install Stakpak](/docs/get-started/install-stakpak.md)
2. Cloud provider credentials configured

### Troubleshooting

1. Open Stakpak and ask it to `audit our AWS account for unused and zombie resources.`

Now lets let it do its magic

<figure><img src="/files/l1FShhEjigbpbAf0IFCi" alt=""><figcaption></figcaption></figure>

Stakpak audited the AWS account for unused and zombie resources across compute, network, storage, IAM, data, and operational categories and found a small but real pool of recurring waste with no business value attached to any of it.

<figure><img src="/files/momOMEZ5a4iS7qM5SDy3" alt=""><figcaption></figcaption></figure>

It identified \~$97/month of avoidable spend spread across 15 zombie resources in us-east-1, none tied to any active application. The signals came from EC2 state checks, EBS volume status, ELB target health, CloudWatch metrics for S3 and Lambda, IAM credential reports, and tag/name pattern analysis (\*-OLD, -DEPRECATED, rakesh-test-, marketing-campaign-2022, loadtest-runner-2024-q1).

Then it:

* Terminated the stopped loadtest-runner-2024-q1 EC2 instance, abandoned since the Q1 2024 load test campaign
* Deleted five unattached EBS volumes totaling 371 GB, including a 200 GB elasticsearch-data-node-3 orphan from the search-v1 deprecation and a 100 GB northstar-mysql-data-OLD volume
* Deregistered the northstar-golden-image-v2-DEPRECATED AMI and removed its backing snapshot
* Released four unassociated Elastic IPs, including old-nat-gateway-eip from a decommissioned NAT and jenkins-static-ip from the Jenkins-to-GHA migration
* Deleted two abandoned ALBs (northstar-internal-tools with an empty target group, and a canary ALB with all targets unhealthy) and the northstar-legacy-clb Classic ELB tied to the deprecated checkout-v1 project
* Removed the unused openclaw-sg security group and its orphan openclaw-key key pair
* Emptied and deleted six zombie S3 buckets including northstar-marketing-campaign-2022, rakesh-test-bucket (employee left), tempdata-export, and northstar-checkout-v1-logs
* Cleaned up two empty CloudWatch log groups (/aws/lambda/feedbackboard-server, /aws/lambda/feedbackboard-warmer) left behind by deleted Lambda functions

After the changes were applied, Stakpak verified that:

* All 15 zombie resources are gone
* No surviving production resources (bastion, api-canary, staging-app, prod-invoices bucket) were impacted
* us-west-2 remained clean (only the default VPC, no workloads)
* Projected monthly waste dropped from \~$97/month to $0, a 100% reduction on identified zombies

Now everything is cleaned up 🥳

Now its asking us if we want to sit up stakpak [Autopilot](/docs/how-it-works/autopilot.md)to avoid having zombie resources

{% hint style="info" %}
Stakpak Autopilot monitors your apps 24/7, detects unexpected changes, fixes what’s safe, and only alerts you when it actually matters.
{% endhint %}

### Monitoring

1. First, it asks us about how often we want to run the checks

<figure><img src="/files/bCpRQ19DIMLxeeOBX22W" alt=""><figcaption></figcaption></figure>

2. Then it asks if we want Stakpak to take action

<figure><img src="/files/KcfxZaNF6Ox3RM4oY2Gi" alt=""><figcaption></figcaption></figure>

3. Then it asks about where we want to get alerted

<figure><img src="/files/KNscsX9SXyMxp120mI2J" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/HBLDnjc1l8qMvPPJSrJ7" alt=""><figcaption></figcaption></figure>

4. And that's it

<figure><img src="/files/ZbM6mRGBQY5aLYW3R46A" alt=""><figcaption></figcaption></figure>

## Extra Resources:

### Related Use Cases

* [Containerize a Python App](/docs/tutorial/containerize-a-python-app.md)
* [Load Test to Optimize Cloud Costs](/docs/tutorial/load-test-to-optimize-cloud-costs.md)
* [Fix Kubernetes Apps That Are Running but Not Reachable](/docs/tutorial/fix-kubernetes-apps-that-are-running-but-not-reachable.md)

and more...

### References

* [Install Stakpak](/docs/get-started/install-stakpak.md)
* [Configure Stakpak](/docs/get-started/configure-stakpak.md)
* [Configuration and credential file settings in the AWS CLI](https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-files.html)
* [Autopilot](/docs/how-it-works/autopilot.md)
* [Handling Secrets](/docs/how-it-works/handling-secrets.md)
* [Warden Guardrails](/docs/how-it-works/warden-guardrails.md)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://stakpak.gitbook.io/docs/tutorial/investigate-and-clean-up-unused-cloud-resources.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
