/proof/dify-platform
// operable AI applications under constraint
This is a multi-tenant AI application platform adapted for restricted OpenShift, complete with two-layer observability that lets you debug AI failures the same way you'd debug infrastructure failures. Traces. Metrics. Logs. Platform reliability patterns applied to AI delivery.
The platform demonstrates how AI agents can be operated with the same review, observability, and deployment discipline expected from infrastructure in high-consequence environments.
> Fast delivery. Restricted environment. Observable by default.
// what it does
A multi-tenant marketplace where teams can build chatbots, agents, and RAG pipelines without writing code, just drag and drop components, connect to your data, and deploy.
Think visual workflow builder plus chat interface, but running in a self-managed environment with enterprise auth, observability, and self-hosted models.
// what users can build
Conversational AI with custom system prompts and personas
Visual multi-step pipelines with code execution nodes
Knowledge base Q&A with pgvector semantic search
Autonomous decision-making with tool orchestration
// fork enhancements
This is the reliability work around Dify: hardened deployment, identity, persistence, and observability for restricted OpenShift.
| Standard Dify | This Fork |
|---|---|
| Basic API keys | Keycloak SSO + OAuth2 |
| Optional PostgreSQL | EDB + pgvector (HA) |
| No LLM observability | Langfuse (built-in) |
| Docker Compose | Helm + ArgoCD + sync-waves |
| Public SaaS | Restricted OpenShift |
+ pgvector
// the key to AI reliability
The key insight: AI needs two observability layers. Infrastructure metrics tell you the system is healthy. LLM traces tell you the AI is working correctly. Most platforms give you one or the other. You need both.
LLM-specific observability
- • Token usage per request
- • Cost estimation
- • Full request/response traces
- • User feedback tracking
Infrastructure observability
- • CPU/Memory usage
- • Pod health metrics
- • HTTP metrics (via OTEL)
- • Database connections
>This pattern applies to any LLM app. Without both layers, you're flying blind.
// sync-wave deployment
ArgoCD sync-waves orchestrate startup order so dependencies are ready before applications arrive.
// the dogfood
The MVP is a self-hosted assistant connected to a curated engineering corpus. Documentation, patterns, examples, and decisions become searchable through conversation.
The goal is practical: make platform and AI-delivery knowledge easier to inspect, reuse, and validate.
> The platform is useful only when the knowledge becomes reachable.
// why this is hard
No public registries. Every container image, chart, and dependency has to be mirrored, vendored, and reviewed.
FIPS cryptography.Standard TLS libraries don't work. OpenSSL FIPS module required. Most open source projects assume you have normal crypto, Dify didn't.
OpenShift security context. No root. Arbitrary UIDs. Read-only filesystems. SecurityContextConstraints that block 90% of Docker Hub images out of the box.
Enterprise SSO. Identity, OAuth2 proxying, and certificate chains have to be treated as first-class platform work.
// stack
This is not a demo app. It is observable AI infrastructure: the foundation you need before reliability is possible.
Two-layer observability gets you visibility. You can finally see what AI is doing. Making it do the right thing consistently, that's the next problem.
> Step one: make it observable. Step two: make it reliable.