LLM Integration
The AI Janitor uses Large Language Models (LLMs) to safely analyze and remove stale feature flags. This page explains how the analysis works, how we ensure correctness, and how to configure it for your compliance requirements.
Architecture
┌──────────────────────────────────────────────────────────────────┐
│ Analysis Pipeline │
│ │
│ Stale Flag → Provider Selector → Compliance Check → LLM Call │
│ │ │ │
│ ▼ ▼ │
│ Org Policy? Data Redaction? │
│ Budget Left? Audit Logging? │
│ Region Match? Budget Tracking? │
│ │
│ LLM Response → Validation → PR Generation │
│ │ │
│ ▼ │
│ Confidence ≥ 0.85? → Auto-PR │
│ Confidence < 0.85? → Manual Review Needed │
└──────────────────────────────────────────────────────────────────┘
How Analysis Works
1. Multi-File Code Understanding
When analyzing a stale flag, the LLM receives:
- The flag key and metadata (days served, percentage)
- ALL files containing references to the flag
- The full content of each file (context window)
This allows the LLM to understand:
- How the flag is used across the codebase
- Whether the check is in an
if/else,if-only,ternary, orswitch - Which branch to preserve (true branch for 100% true flags)
- Side effects and dependencies
2. Confidence Scoring
Each analysis returns a confidence score (0.0–1.0):
| Score | Meaning | Action |
|---|---|---|
| 0.95–1.0 | Very high confidence | Auto-PR (if enabled) |
| 0.85–0.94 | High confidence | Generate PR, manual review |
| 0.70–0.84 | Medium confidence | Flag for review, manual PR |
| < 0.70 | Low confidence | Don't generate PR, explain why |
3. Validation Step
Before creating a PR, the LLM validates:
- Original code (with flag) vs. cleaned code (without flag)
- Semantic equivalence check
- Reports any issues found
If validation fails, the PR is NOT created and detailed errors are shown.
Provider Selection
How the Provider is Chosen
- Org compliance policy — Check if LLM is allowed
- Per-org provider config — Use the org's approved provider
- Environment variable — Global default from server config
- Hard default — DeepSeek (most cost-effective)
What Happens When a Provider is Unavailable
The system degrades gracefully:
- Retry — Automatically retries 3 times with exponential backoff
- Fallback provider — Try next approved provider
- Regex fallback — Use regex-based analysis (with warning)
- Skip — Flag is marked as "needs manual analysis"
Compliance & Data Privacy
Data Sent to LLM
When analyzing a flag, the following data is sent:
- Source code files containing flag references
- Flag metadata (key, name, days served)
- Language and file paths
What is NOT Sent
- Personal data (names, emails)
- API keys and secrets (redacted by default)
- Customer data not related to flag logic
- Entire repositories — only relevant files
Data Residency
| Provider | Default Region | EU Option | Self-Hosted |
|---|---|---|---|
| DeepSeek | US | ❌ | ✅ |
| OpenAI | US | ✅ (Azure) | ❌ |
| Azure OpenAI | Configurable | ✅ | ✅ (VNet) |
| Self-hosted | Your infra | ✅ | ✅ |
Redaction
Sensitive patterns are automatically redacted before sending to any LLM:
- API keys (OpenAI, AWS, generic)
- Private keys (RSA, DSA, EC)
- Connection strings (PostgreSQL, MySQL, Redis, MongoDB)
- JWTs and bearer tokens
- Passwords and secrets
Custom redaction rules can be added in the compliance settings.
Cost Tracking
| Provider | Input Cost | Output Cost | Per-Flag Estimate |
|---|---|---|---|
| DeepSeek | $0.28/M tokens | $1.10/M tokens | ~$0.001–0.005 |
| GPT-4o-mini | $0.15/M tokens | $0.60/M tokens | ~$0.001–0.003 |
| GPT-4o | $2.50/M tokens | $10.00/M tokens | ~$0.02–0.05 |
Monthly budgets can be set in the compliance settings to prevent unexpected costs.
Supported Languages
The LLM analysis supports all major languages:
- JavaScript / TypeScript (React, Vue, Angular, Node.js)
- Go
- Python
- Java / Kotlin
- Ruby
- C# / .NET
- PHP
- Swift
- Rust
- Scala