ACP vs RAG: Why Deterministic Context Beats Probabilistic Retrieval

Explanation for developers evaluating context approaches, ml/ai engineers

Document Type: Explanation
Audience: Developers evaluating context approaches, ML/AI engineers
Reading Time: 10 minutes


Executive Summary

RAG (Retrieval-Augmented Generation) uses similarity search to find "relevant" context for AI queries. This works well for knowledge retrieval but fails for constraint enforcement, where you need 100% guarantee that critical information is visible.

ACP takes a fundamentally different approach: pre-computed, deterministic context that's always available through exact-match queries—not retrieved by similarity.


Understanding RAG for Code

How RAG Works

┌─────────────────────────────────────────────────────────────────┐
│                    RAG PIPELINE FOR CODE                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. CHUNKING                                                     │
│     Split code into overlapping chunks (512-2048 tokens)         │
│     ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐                         │
│     │chunk1│ │chunk2│ │chunk3│ │chunk4│ ...                     │
│     └──────┘ └──────┘ └──────┘ └──────┘                         │
│                                                                  │
│  2. EMBEDDING                                                    │
│     Convert chunks to vectors                                    │
│     [0.23, -0.45, 0.12, ...] per chunk                          │
│                                                                  │
│  3. STORAGE                                                      │
│     Store in vector database (Pinecone, Weaviate, etc.)          │
│                                                                  │
│  4. RETRIEVAL (per query)                                        │
│     Query → embed → similarity search → top-k chunks             │
│                                                                  │
│  5. INJECTION                                                    │
│     Inject retrieved chunks into AI context                      │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

RAG Strengths

StrengthDescription
AutomaticNo manual annotation required
ScalableWorks with large codebases
FlexibleNatural language queries
Semantic understandingFinds conceptually related code

Where RAG Fails for Constraints

Problem 1: Similarity ≠ Constraint Relevance

Query: "Help me improve the authentication flow"
 
Embedding similarity ranking:
┌──────────────────────────────────────────────────────────────┐
│ Rank │ File                    │ Similarity │ Constraint     │
├──────┼─────────────────────────┼────────────┼────────────────┤
│ 1    │ auth/login.ts           │ 0.92       │ normal         │
│ 2    │ auth/register.ts        │ 0.89       │ normal         │
│ 3    │ auth/password-reset.ts  │ 0.87       │ normal         │
│ ...  │ ...                     │ ...        │ ...            │
│ 12   │ auth/session.ts         │ 0.71       │ FROZEN ⚠️      │
│ 15   │ auth/crypto.ts          │ 0.68       │ FROZEN ⚠️      │
└──────────────────────────────────────────────────────────────┘
 
Top-5 retrieval: ✓ Relevant code found
                 ✗ Frozen constraints MISSED

The most semantically similar code isn't the most constrained code. Session management and crypto are foundational concerns with different vocabulary than "authentication flow."

Problem 2: Constraints Are Metadata, Not Content

Consider this frozen file:

// @acp:lock frozen - Security critical, DO NOT modify
export const JWT_SECRET = process.env.JWT_SECRET;
export const TOKEN_EXPIRY = 3600;

The constraint ("frozen - Security critical, DO NOT modify") has no semantic relationship to the content (JWT configuration). Embedding the content doesn't capture the constraint.

Problem 3: Code Embedding Collapse

Code is structurally homogeneous. These functions embed nearly identically:

// All ~0.95 similarity to each other:
async function validateUser(user: User): Promise<boolean> { ... }
async function validatePayment(payment: Payment): Promise<boolean> { ... }
async function validateOrder(order: Order): Promise<boolean> { ... }
async function validateSession(session: Session): Promise<boolean> { ... }
 
// But constraints differ wildly:
// validateUser      → normal
// validatePayment   → FROZEN (PCI compliance)
// validateOrder     → normal  
// validateSession   → FROZEN (security critical)

Problem 4: Chunking Destroys Context

File-level constraints get split across chunks:

Original file:
┌────────────────────────────────────────────────────────────┐
│ // @acp:lock frozen - Security critical                   │
│ // @acp:domain auth                                        │
│ // @acp:owner security-team                                │
│                                                            │
│ export class SessionValidator {                            │
│   // ... 500 lines of code ...                            │
│ }                                                          │
└────────────────────────────────────────────────────────────┘
 
After chunking:
┌────────────────────┐ ┌────────────────────┐ ┌─────────────┐
│ Chunk 1:           │ │ Chunk 2:           │ │ Chunk 3:    │
│ // @acp:lock frozen│ │ validateToken() {  │ │ // helper   │
│ // @acp:domain auth│ │   jwt.verify(...)  │ │ functions   │
│ export class...    │ │   return session;  │ │ ...         │
└────────────────────┘ └────────────────────┘ └─────────────┘
     ▲                       ▲
     │                       │
     Constraint here         Code here
                             (might be retrieved without constraint)

If Chunk 2 is retrieved without Chunk 1, the AI sees the code but not the constraint.

Problem 5: Top-K Misses

Your codebase statistics:
- Total files: 1,000
- Frozen files: 10 (1%)
- Top-k retrieval: 20 chunks
 
For a random query about code modification:
- P(frozen file in top-20): ~2%
- P(frozen file missed): ~98%
 
For constraints, a 98% miss rate is catastrophic.

Problem 6: Per-Request Overhead

Every RAG query incurs:

  • Embedding computation
  • Vector similarity search
  • Context injection

For a coding session with 100 queries, that's 100 retrieval operations—even when constraints haven't changed.


How ACP Solves This

Approach: Minimal Bootstrap + On-Demand Expansion

┌─────────────────────────────────────────────────────────────────┐
│                    ACP APPROACH                                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. ANNOTATION (one-time)                                        │
│     Add structured annotations to code                           │
│     // @acp:lock frozen - Security critical                     │
│                                                                  │
│  2. INDEXING (on commit)                                         │
│     acp index → .acp.cache.json                                 │
│     {                                                            │
│       "constraints": {                                           │
│         "by_lock_level": {                                       │
│           "frozen": ["session.ts", "crypto.ts"]                 │
│         }                                                        │
│       }                                                          │
│     }                                                            │
│                                                                  │
│  3. BOOTSTRAP (always injected, ~40 tokens)                      │
│     "ACP active. Frozen files: session.ts, crypto.ts.           │
│      Run `acp primer` for full context."                        │
│                                                                  │
│  4. ON-DEMAND EXPANSION                                          │
│     AI requests more context only when needed:                   │
│     - `acp primer --domain auth` for domain details             │
│     - `acp query '.constraints'` for specific data              │
│     - Variable expansion for symbol details                      │
│                                                                  │
│  Result: 100% constraint visibility, minimal token overhead      │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Key Differences

AspectRAGACP
Query typeSimilarity searchExact-match query
RetrievalProbabilisticDeterministic
Constraint visibility~80% (top-k dependent)100%
Injection modelPer-request retrievalBootstrap + on-demand
Token costO(k) per requestO(1) bootstrap (~40 tok)
Data structureUnstructured chunksStructured JSON
VersioningRequires re-embeddingGit-integrated

The Math: Why Deterministic Matters

RAG Constraint Visibility

P(constraint visible) = P(relevant chunk in top-k)
 
For a 1000-file codebase with 10 frozen files and k=20:
- Best case (frozen files cluster): ~40%
- Worst case (frozen files dispersed): ~2%
- Average case: ~15-20%
 
Even at 80% visibility:
- 10 frozen files × 20% miss rate = 2 frozen files invisible per query
- Over 50 queries = 100 potential constraint violations

ACP Constraint Visibility

P(constraint visible) = 100%
 
All constraints in primer at session start.
All queries return complete, accurate data.
Zero probabilistic uncertainty.

When RAG Makes Sense

RAG is excellent for:

Use CaseWhy RAG Works
Documentation searchFinding relevant docs, not enforcing rules
Code explorationDiscovering similar implementations
Knowledge Q&A"How does X work?" queries
Large corpus searchWhen exhaustive retrieval isn't needed

RAG fails for:

Use CaseWhy RAG Fails
Constraint enforcementMust guarantee visibility
Access controlCan't probabilistically check permissions
Critical metadata80% accuracy is unacceptable
Audit requirementsMust prove constraint was visible

Hybrid Architecture

The optimal approach uses both:

┌─────────────────────────────────────────────────────────────────┐
│                    HYBRID ARCHITECTURE                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────────┐      ┌─────────────────────┐           │
│  │       ACP           │      │       RAG           │           │
│  │                     │      │                     │           │
│  │  Constraints        │      │  Code Discovery     │           │
│  │  Domains            │      │  Similar Patterns   │           │
│  │  Ownership          │      │  Documentation      │           │
│  │  Architecture       │      │  Knowledge Base     │           │
│  │                     │      │                     │           │
│  │  DETERMINISTIC      │      │  PROBABILISTIC      │           │
│  │  "What are rules?"  │      │  "Find similar"     │           │
│  └─────────────────────┘      └─────────────────────┘           │
│            │                            │                        │
│            └────────────┬───────────────┘                        │
│                         ▼                                        │
│               ┌─────────────────┐                                │
│               │   AI Assistant  │                                │
│               │                 │                                │
│               │  Uses RAG for   │                                │
│               │  exploration    │                                │
│               │                 │                                │
│               │  Respects ACP   │                                │
│               │  for constraints│                                │
│               └─────────────────┘                                │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Example: Constraint-Aware Code Search

User: "Find code similar to our payment validation and help me improve it"
 
1. ACP Bootstrap (always present, ~40 tokens):
   "ACP active. Frozen: payments/processor.ts, payments/gateway.ts.
    Run `acp primer --domain payments` for details."
 
2. RAG Search:
   - Query: "payment validation"
   - Results: processor.ts (frozen), gateway.ts (frozen), utils.ts, billing/validator.ts
 
3. AI Behavior:
   - Already knows from bootstrap that processor.ts and gateway.ts are frozen
   - "I found similar code in processor.ts and gateway.ts, but these are frozen."
   - "I can help you improve utils.ts and billing/validator.ts instead."
   - "If you need to modify the frozen files, contact the payments team."
   
4. AI requests more context (on-demand):
   - Runs: acp primer --domain payments
   - Gets detailed ownership, architecture constraints for informed suggestions

Implementation Comparison

RAG Setup

# Install dependencies
pip install chromadb sentence-transformers
 
# Chunk codebase
python chunk_code.py --input ./src --output ./chunks
 
# Embed and store
python embed_chunks.py --chunks ./chunks --db ./vectordb
 
# Query (per request)
python query.py "help me with auth" --db ./vectordb --top-k 20

Maintenance: Re-embed on every code change.

ACP Setup

# Install CLI
npm install -g @acp-protocol/cli
 
# Add annotations (one-time)
# @acp:lock frozen - Security critical
 
# Index (on commit)
acp index
 
# Query (exact-match)
acp query '.constraints.by_lock_level.frozen'

Maintenance: Re-index on commit (incremental, fast).


Summary

AspectRAGACP
Best forCode discoveryConstraint enforcement
RetrievalProbabilisticDeterministic
Accuracy~80%100%
Query typeNatural languageStructured + NL
SetupComplex (vectors)Simple (CLI)
MaintenanceRe-embed on changeRe-index on change

Recommendation: Use RAG for exploration and discovery. Use ACP for constraints and critical metadata. They complement each other.


Further Reading


This document is part of the ACP Documentation. Report issues