Fix Kubernetes Apps That Are Running but Not Reachable
Overview
In this tutorial, we’ll use Stakpak to investigate and fix a Kubernetes incident where the application is unreachable even though the cluster looks healthy.
The Pods are running
The deployments are healthy.
The ingress exists.
But the application is still down.
Instead of manually jumping between kubectl get, kubectl describe, ingress configs, Services, logs, events, and DNS trying to figure out where traffic is failing, we’ll ask Stakpak to investigate the cluster for us.
Stakpak will inspect the environment, connect the signals across Kubernetes networking layers, identify the root cause, apply the fix, and validate that the application is reachable again.
Then, we’ll set up Stakpak Autopilot to continuously monitor the cluster and handle similar issues automatically next time.
Stakpak is open source, vendor neutral, and works with any model you choose.
Problem
You deploy your application to Kubernetes, and everything seems fine.
The workloads are up. The Pods are running. Kubernetes does not show an obvious application crash.
But users cannot access the application.
The issue looks simple at first.
We try to access the application through the ingress:
curl -i -H 'Host: demo.local' http://127.0.0.1/
Instead of a successful response, the request fails.
So you start the usual Kubernetes debugging loop:
And now you have to figure out what actually matters.
The frustrating part is that many Kubernetes failures look similar from the outside. A failed request could be caused by a problem in many different places:
The Ingress
The Ingress controller
A Service
Pod readiness
Application configuration
Internal DNS
Networking
A recent rollout
A stale local or external route
Kubernetes gives you pieces of the picture, but not the full story.
You still have to trace the request path yourself:
Did the request even reach the cluster?
Did the ingress route traffic correctly?
Did the Service point to the right Pods?
Were healthy endpoints attached?
Was the application actually listening on the expected port?
Was the app failing while talking to another internal service?
How Stakpak Helps?
Instead of manually tracing traffic across ingress rules, Services, endpoints, logs, events, and application configs, we can ask Stakpak to investigate the cluster for us.
Stakpak inspects the Kubernetes networking path, connects the signals across the cluster, identifies where traffic is failing, applies the fix, and validates that the application becomes reachable again.
Then, we’ll configure Stakpak Autopilot to continuously monitor the cluster and automatically detect and fix similar networking issues in the future automatically.
Application
The application is a storefront running on Kubernetes. Users access the storefront through an Ingress route, which sends traffic to the frontend service. The frontend is responsible for rendering the web page and retrieving product data from an internal Catalog API. The Catalog API is not exposed directly to users; it runs inside the cluster and provides the product information that appears on the storefront page.
The main components are:
Frontend: Handles user-facing requests and serves the storefront page.
Catalog API: Provides product data to the frontend through an internal Kubernetes service.
Ingress: Exposes the frontend to users through the demo.local hostname.
Kubernetes Services: Route traffic between the Ingress, frontend, and internal API.
Namespaces: Separate the application and supporting platform resources inside the cluster.
The normal request flow is: a user accesses the storefront, the request enters through the Ingress, traffic is routed to the frontend, the frontend calls the Catalog API, and the page is returned with product data.
Now that we understand the app, we can start troubleshooting.
Step-by-Step Guide
Prerequisites
Cloud provider credentials configured
Troubleshooting
Open Stakpak and ask it to
Investigate why the Kubernetes app is not reachable, even tho the pods are running
Now lets let it do its magic

Stakpak traced the request path across the cluster and identified multiple networking and configuration issues causing the outage.
It found that the storefront-public Service had no healthy endpoints because the selector did not match the running Pods.
Then it:
Fixed the Service selector from storefront-v2 to storefront
Corrected the Service targetPort from 8081 to web
Fixed the frontend API configuration from catalog-api.platform.svc.cluster.local to catalog-api.apps.svc.cluster.local
Applied the updated Kubernetes manifests
Restarted the frontend deployment so it picked up the updated ConfigMap
After the changes were applied, Stakpak verified that:
The storefront-public Service endpoints were created successfully
The Ingress routed traffic to the correct backend Pods
The application returned HTTP 200 OK
Now everything is working🥳
Let's ask it to set up Stakpak Autopilotso we avoid waking up at 3am because of an incident🤡
Stakpak Autopilot monitors your apps 24/7, detects unexpected changes, fixes what’s safe, and only alerts you when it actually matters.
Monitoring

Thats it, now it won't hunt us in our nightmares at 3 am.
Extra Resources:
Related Use Cases
and more...
References
Last updated