Dify on OpenShift: Debugging and Keycloak SSO

A Dify deployment for restricted OpenShift: Keycloak SSO at the route, FIPS-compatible dependencies, GitOps startup order, and two-layer debugging across Langfuse traces and Prometheus metrics.

Operators get the same review, observability, and deploy path they already expect from the rest of the platform.

> Restricted OpenShift, FIPS crypto, two observability layers, sync-wave deploys.

0

public registries

FIPS

crypto, no exceptions

2

observability layers

Usage numbers stay inside the fence by design. What's public is the pattern and the constraints it survives.

// what it does

A multi-tenant marketplace where teams build chatbots, agents, and RAG pipelines without writing code: drag and drop components, connect to your data, and deploy.

A visual workflow builder plus chat interface, running in a self-managed environment with enterprise auth, observability, and self-hosted models.

// what users can build

Chat Apps

Conversational AI with custom system prompts and personas

Workflows

Visual multi-step pipelines with code execution nodes

RAG Apps

Knowledge base Q&A with pgvector semantic search

Agent Apps

Agents that call tools and take multi-step actions under the same SSO and trace path.

// Dify with Keycloak SSO on OpenShift

OpenID Connect identity, restricted-network deployment, persistence, and observability around the upstream application.

Keycloak provides the OpenID Connect client. OAuth2-Proxy guards the OpenShift route before traffic reaches Dify. The internal certificate chain, proxy configuration, and application routes ship together so SSO is a deployment property instead of a manual step after install.

Standard Dify	This Fork
Basic API keys	Keycloak SSO + OAuth2
Optional PostgreSQL	EDB + pgvector (HA)
No LLM observability	Langfuse (built-in)
Docker Compose	Helm + ArgoCD + sync-waves
Public SaaS	Restricted OpenShift

// Dify authentication request path

Identity is resolved before the request enters the application tier.

 Route → OAuth2-Proxy → Keycloak (SSO)
    ↓
 Dify API + Web  ──trace──▶  Langfuse
    ↓                ──metric──▶  Prometheus / Grafana
 EDB Postgres + pgvector · Redis · ClickHouse

// Dify debugging: trace the model and the platform

A Dify request can fail in identity, application, model, or infrastructure. The debugging path has to cross those boundaries.

Start at the edge. Confirm that the OpenShift route, OAuth2-Proxy, and Keycloak exchange completed. Then check whether Dify accepted the request. Langfuse shows the prompt, model call, latency, and response path; Prometheus and Grafana show pod health, HTTP behavior, and database pressure in the same time window.

A model trace with healthy infrastructure points toward prompt, tool, or provider behavior. A missing trace with proxy or pod errors points toward the platform path. Each layer rules out a different class of failure.

Check	Healthy signal	If it fails
Route + OAuth2-Proxy	One redirect to Keycloak, then an authenticated session	Check route TLS, proxy logs, and the registered redirect URI
Keycloak	Authorization-code exchange succeeds and the proxy sets a session cookie	Compare client ID, secret, issuer URL, and internal CA trust
Dify API + Web	The request reaches service endpoints and returns the expected application status	Inspect service endpoints, readiness, and API/Web pod logs
Langfuse	The request has a trace with a model span, latency, and response	Check Dify instrumentation and the configured model provider
Prometheus + Grafana	Pods are ready, restarts are stable, and database pressure is normal	Follow the unhealthy pod, HTTP error, or database metric

Layer 1: Langfuse

LLM-specific observability

• Token usage per request

• Cost estimation

• Full request/response traces

• User feedback tracking

Layer 2: Prometheus

Infrastructure observability

• CPU / Memory usage

• Pod health metrics

• HTTP metrics (via OTEL)

• Database connections

> Trace the model path and the platform path before changing either one.

// sync-wave deployment

ArgoCD sync-waves orchestrate startup order so dependencies are ready before applications arrive.

Wave -1Secrets, ConfigMaps, internal CA

Wave 0Keycloak OpenIDClient

Wave 1EDB PostgreSQL Cluster

Wave 2Redis StatefulSet

Wave 3All Application Pods + Routes

// the proof pattern

First workload: a self-hosted assistant over the engineering corpus, so people can ask the platform what it already knows.

Documentation, patterns, and decisions become searchable through conversation. Knowledge stays inspectable, reusable, and checkable against the source.

// why this is hard

Disconnected dependencies. Every container image, chart, and dependency has to be mirrored, vendored, and reviewed.

FIPS cryptography. Standard TLS libraries don't work. OpenSSL FIPS module required. Most open-source projects assume normal crypto. Dify didn't.

OpenShift security context. No root. Arbitrary UIDs. Read-only filesystems. SecurityContextConstraints that block most Docker Hub images out of the box.

Enterprise SSO. Identity, OAuth2 proxying, and certificate chains have to be treated as first-class platform work.

// stack

DifyOpenShiftLangfuseEDB PostgreSQLpgvectorKeycloakOAuth2-ProxyHelmArgoCDOpenTelemetryGrafanaClickHouseRedisHarbor

// the point

Two-layer observability shows what the model did and what the cluster is doing. Consistency under those constraints is the next problem.

> First you can see the failure. Then you can gate the next one.