How I Modernized OpenShift Infrastructure to Scale an App

April 27, 2025 — #OpenShift #DevOps

🌍 Situation

The Highway Maintenance Contractor Reporting (HMCR) application is a core platform used by contractors across British Columbia to report on road maintenance activities—everything from pothole fixes to rockfall clearing and winter road salts.

When I joined the project, the application was functional but fragile.

The app worked… most of the time.
The infrastructure? Cobwebbed YAML, manual patches, and tribal knowledge.
OpenShift deployments were automated via GitHub Actions, but only skin-deep.
CI/CD existed, but we were far from modern DevOps.

Contractor adoption increased and the volume of maintenance reports skyrocketed. From a flow of new hires to the project, internal users have also increased. Both which increased the load on the application. Hence:

We needed HMCR to keep up with modern standards and scale

🎯 Task

My task was threefold:

Stabilize and understand the existing OpenShift-based deployment pipeline—without breaking production.
Modernize our infrastructure to support scalability, automation, and observability across multiple environments (dev, test, uat, prod).
Scale the application to meet growing number of users that further increase application load.

🔧 Action

🔍 Step 1: Understand the Processes

I started by dissecting how the HMCR app deployed:

Traced GitHub Actions workflows that built Docker images and pushed to OpenShift.
Analyzed relationships between DeploymentConfig, BuildConfig, and ImageStream wirings.
Explored route configs, internal TLS, and how our build triggers were set up.

The app was running in OpenShift, but it was held together with duct tape.

🔁 Step 2: Migrate from DeploymentConfig to Kubernetes Deployments

OpenShift's DeploymentConfig had limitations and was in the deprecation phase. We needed better compatibility with Kubernetes-native tooling and Helm charts. So, I:

Rewrote all deployment manifests to use apps/v1 Deployment
Replaced BuildConfigs with external image builds via GitHub Actions
Updated GitHub workflows to push directly to OpenShift's internal image registry

This allowed us to ditch OpenShift-specific legacy components and move toward portable, K8s-native deployment practices.

📊 Step 3: Monitor and Observe

I added monitoring and alerting layers:

Integrated Prometheus + Grafana for pod metrics
Enabled EFK (Elasticsearch-Fluentd-Kibana) stack for application logs
Added proactive alerts for CPU/memory thresholds and failing pods

I also cleaned up old image streams, and automated stale object pruning via oc and GitHub Actions workflows.

⚖️ Step 4: Implement Autoscaling with HPA

With growing traffic and more contractors using HMCR, I set up a Horizontal Pod Autoscaler (HPA) for our API and background process servers. This was the initial setup:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: hmcr-api
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hmcr-api
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 70
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: '100'

Where http_requests_per_second is a custom metric from Prometheus Adapter.

HPA automatically scaled the pods from 2 up to 8 during peak usage (e.g., after storms or annual reports) in seconds, maintaining stable loads, and scaling down during quiet hours.

In the interest of reading time, I will skip the part where I setup session affinity and ensure application statelessness with Redis

📦 Step 5: Helm All The Things

Manually managing templates across multiple environments was error-prone and time-consuming. OpenShift is an interconnected network of interdependent Kubernetes objects. Helm brilliantly solves this issue by allowing objects to be shipped together as a package. I created Helm charts to:

Template our deployments
Inject environment-specific values via values.yaml
DRY up configuration for routes, secrets, and storage

Now we could roll out changes across dev, test, and prod using a single, repeatable structure in a single command:

helm upgrade --install hmcr-api ./charts/hmcr-api -f hmcr-prod.yaml

🐘 Step 6: Automate PostgreSQL Upgrades

I created an automated, namespace-scoped PostgreSQL upgrade script that:

Backed up existing PVCs
Restored data post-upgrade
Provision necessary objects

We now had safe, repeatable, zero-downtime upgrades across all environments.

✅ Result

By the end of this effort:

HMCR reliably scale to meet growing number of users and loads.
We eliminated config drift between environments and package interdependent K8s objects using Helm.
PostgreSQL upgrades were hands-off and namespace-safe.
Developers could focus on features, not firefighting YAML bugs.