> For the complete documentation index, see [llms.txt](https://stakpak.gitbook.io/docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://stakpak.gitbook.io/docs/tutorial/fix-kubernetes-persistent-volume-issues.md).

# Fix Kubernetes Persistent Volume Issues

## Overview

In this tutorial, we'll use Stakpak to investigate and fix a Kubernetes storage incident where a stateful application cannot start because its persistent storage is preventing the application from starting

Rather than manually inspecting multiple Kubernetes resources and piecing together events across the cluster, we'll use Stakpak to:

* Investigate the incident
* Identify the root cause
* Apply the fix
* Validate that PostgreSQL becomes healthy again

By the end of this tutorial, you'll learn how to use Stakpak to troubleshoot persistent volume issues in Kubernetes and configure Stakpak [Autopilot](/docs/how-it-works/autopilot.md) to help detect similar storage related incidents automatically in the future.

{% hint style="info" %}
Stakpak is open source, vendor neutral, and works with any model you choose.
{% endhint %}

## Problem

You deploy a stateful application to Kubernetes, and everything seems fine at first.

* The manifests apply successfully.
* The StatefulSet exists.
* The PersistentVolumeClaim exists.
* The storage configuration looks correct.

But the database Pod never starts.

You check the workload:

```
kubectl get pods -n orders-prod
```

And the Pod is stuck in Pending.

So you start the usual Kubernetes storage debugging loop:

```
kubectl get pods -n orders-prod
kubectl get pvc -n orders-prod
kubectl get pv
kubectl get storageclass
kubectl describe pod -n orders-prod orders-db-postgresql-0
kubectl describe pvc -n orders-prod data-orders-db-postgresql-0
kubectl get events -n orders-prod --sort-by=.lastTimestamp
```

Now you have to figure out what actually matters.

* Is the claim waiting for a volume?
* Is the volume usable by this workload?
* Is the scheduler blocked by storage constraints?
* Is the StorageClass behaving as expected?

Kubernetes gives you the clues, but you still have to connect them.

## How Stakpak Helps?

Instead of manually tracing storage issues across Pods, StatefulSets, PVCs, PVs, StorageClasses, nodes, and events, we can ask Stakpak to investigate the cluster for us.

Stakpak inspects the Kubernetes storage path, connects the signals across the cluster, identifies why the workload cannot start, applies the fix, and validates that the database becomes healthy again.

Then, we’ll configure Stakpak [Autopilot](/docs/how-it-works/autopilot.md) to continuously monitor the cluster and help detect similar persistent volume issues automatically in the future.

## Application

The application is a PostgreSQL database running on Kubernetes for the orders platform.

It runs as a StatefulSet because it needs stable identity and persistent storage across restarts. The database Pod uses a PersistentVolumeClaim to request storage, and Kubernetes must successfully satisfy the PersistentVolumeClaim and make the storage available to the Pod before PostgreSQL can start.

The main components are:

* PostgreSQL StatefulSet: Runs the database Pod.
* PersistentVolumeClaim: Requests storage for the database data directory.
* PersistentVolume: Represents the storage available in the cluster.
* StorageClass: Defines how Kubernetes handles the storage request.
* Service: Provides a stable network identity for the database.
* Namespace: Isolates the application resources in orders-prod.

The normal startup flow is: Kubernetes creates the StatefulSet, creates the database Pod, creates the PVC, binds it to a compatible PV, mounts the volume, and then starts PostgreSQL.

Now that we understand the app, we can start troubleshooting.

## Step-by-Step Guide

### Prerequisites

1. [Install Stakpak](/docs/get-started/install-stakpak.md)
2. Cloud provider credentials configured

### Troubleshooting

1. Open Stakpak and ask it to `investigate the Kubernetes issue`

Now lets let it do its magic

<figure><img src="/files/l1FShhEjigbpbAf0IFCi" alt=""><figcaption></figcaption></figure>

Stakpak started investigating the PostgreSQL startup failure and traced the issue through the StatefulSet, PVC, PV, StorageClass, scheduler events, and node placement constraints.

It found that the orders-db-postgresql-0 Pod was stuck in Pending because its\
PersistentVolumeClaim requested 12Gi, but the only matching PersistentVolume provided only 5Gi.

Then it:

* Fixed the PersistentVolume capacity from 5Gi to 12Gi
* Updated the source template at manifests/20-persistentvolume.yaml.tpl
* Updated the generated manifest at .generated/20-persistentvolume.yaml
* Applied the corrected PersistentVolume manifest
* Restarted the PostgreSQL Pod so Kubernetes could retry scheduling and volume binding

After the changes were applied, Stakpak verified that:

* The data-orders-db-postgresql-0 PVC successfully bound to orders-db-primary-a
* The PersistentVolume showed the correct 12Gi capacity
* The orders-db-postgresql-0 Pod scheduled onto the worker node
* The PostgreSQL container became Running and Ready
* pg\_isready confirmed that PostgreSQL was accepting connections on port 5432

Now everything is working🥳

Let's ask it to set up Stakpak [Autopilot](/docs/how-it-works/autopilot.md)so we avoid waking up at 3am because of an incident🤡

{% hint style="info" %}
Stakpak Autopilot monitors your apps 24/7, detects unexpected changes, fixes what’s safe, and only alerts you when it actually matters.
{% endhint %}

### Monitoring

<figure><img src="/files/x2NzDystnkMbRH2N8tX7" alt=""><figcaption></figcaption></figure>

Thats it, now it won't hunt us in our nightmares at 3 am.

## Extra Resources:

### Related Use Cases

* [Containerize a Python App](/docs/tutorial/containerize-a-python-app.md)
* [Fix Kubernetes CrashLoopBackOff in Minutes](/docs/tutorial/fix-kubernetes-crashloopbackoff-in-minutes.md)
* [Fix Kubernetes Apps That Are Running but Not Reachable](/docs/tutorial/fix-kubernetes-apps-that-are-running-but-not-reachable.md)

and more...

### References

* [Install Stakpak](/docs/get-started/install-stakpak.md)
* [Configure Stakpak](/docs/get-started/configure-stakpak.md)
* [Configuration and credential file settings in the AWS CLI](https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-files.html)
* [Autopilot](/docs/how-it-works/autopilot.md)
* [Handling Secrets](/docs/how-it-works/handling-secrets.md)
* [Warden Guardrails](/docs/how-it-works/warden-guardrails.md)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://stakpak.gitbook.io/docs/tutorial/fix-kubernetes-persistent-volume-issues.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
