Fincher Labs RAG Agent Implementation Guide (n8n + Cloudflare)
Internal Use Only — Fincher Labs Confidential
Document Control
- Document Title: Fincher Labs RAG Agent Implementation Guide (n8n + Cloudflare)
- Document ID: FL-IMP-002
- Version: 1.0
- Last Updated: 2025-08-11
- Author: Fincher Labs
- Status: Final
- Distribution: Internal (Fincher Labs Staff and Contractors Only)
- Approver: Fincher Labs Founders
Change Log
Version | Date | Author | Changes |
---|---|---|---|
1.0 | 2025-08-11 | Fincher Labs | Initial implementation guide for n8n + Cloudflare RAG, Docusaurus embedding, CI/CD sync, security, and runbook. |
Table of Contents
- 1 Executive Summary
- 2 Architecture Overview
- 3 Cloudflare Platform Setup
- 4 GitHub Content Sync (content/)
- 5 Ingestion Workflow (n8n)
- 6 Retrieval & Answering
- 7 Frontend Chat & Docusaurus Embed
- 8 Security
- 9 CI/CD & Environments
- 10 Observability & Performance
- 11 Failure Modes & Runbook
- 12 Cost & Scaling Notes
- 13 Implementation Steps (Checklist)
- 14 References
1 Executive Summary
This guide describes how to implement a Retrieval-Augmented Generation (RAG) agent for Fincher Labs that indexes the private GitHub repository’s content/
folder and stays continuously up to date. It uses Cloudflare Vectorize for storage, Workers AI for embeddings and responses, AI Gateway for caching and rate limiting, and n8n for the ingestion, retrieval, and chat orchestration. A lightweight chat widget is mounted in Docusaurus so the agent is accessible on our docs site. The design optimizes for speed, safety, and maintainability.
2 Architecture Overview
2.1 Data Flow
- Sync: n8n polls GitHub for changes to
content/
(compare API for deltas; tree for full resync). - Ingest: For changed files, n8n fetches raw content, normalizes Markdown, chunks to ~700–1,000 tokens with 80–120 overlap.
- Embed: n8n calls Workers AI
@cf/baai/bge-m3
to create 1024‑dimensional embeddings. - Store: n8n upserts chunks + metadata to Cloudflare Vectorize via NDJSON.
- Serve: User asks a question in Docusaurus widget → n8n retrieves matches from Vectorize, optionally re‑ranks, and composes a grounded answer with source links.
2.2 Components
- Cloudflare Vectorize: Vector DB (indexes + metadata indexes; filterable search).
- Cloudflare Workers AI: Embeddings (
bge-m3
) and a text‑gen model (e.g., Llama 3.1 8B Instruct) for final answer drafting. - Cloudflare AI Gateway: Caching, rate limits, retries, model fallback; observability.
- n8n: Workflows for ingestion (GitHub → Vectorize) and Q&A (Query → Rerank → Answer).
- Docusaurus: Docs website hosting the chat widget; optional Cloudflare Pages deployment.
3 Cloudflare Platform Setup
3.1 Vectorize (Vector DB)
- Create index with
dimension: 1024
(bge‑m3 dense vectors). Add a metadata index ondocId
/path
for fast deletes/filters. - Upsert via
application/x-ndjson
stream; each line containsid
,values
, andmetadata
. - Query supports filters on metadata; choose
topK
(≤ 100; 20 if returning full values/metadata) and returnscore
and metadata. - Index info exposes
processedUpToMutation
to safely wait until mutations are searchable.
Sample: create index
curl -X POST "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/vectorize/v2/indexes" -H "Authorization: Bearer $CF_API_TOKEN" -H "Content-Type: application/json" -d '{
"name": "fincher-docs",
"type": "dense",
"dimension": 1024,
"metric": "cosine"
}'
Sample: create metadata indexes
# Create an index on "docId"
curl -X POST "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/vectorize/v2/indexes/fincher-docs/metadata_index/create" -H "Authorization: Bearer $CF_API_TOKEN" -H "Content-Type: application/json" -d '{"propertyName": "docId", "type": "string"}'
# Create an index on "path"
curl -X POST "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/vectorize/v2/indexes/fincher-docs/metadata_index/create" -H "Authorization: Bearer $CF_API_TOKEN" -H "Content-Type: application/json" -d '{"propertyName": "path", "type": "string"}'
3.2 Workers AI (Embeddings & LLM)
- Embeddings: Use
@cf/baai/bge-m3
for multilingual, long‑context embeddings (1024‑dimensional, up to ~8k tokens). Batch chunk arrays where possible. - LLM: Use a Workers AI text model for answer drafting (for example
@cf/meta/llama-3.1-8b-instruct
or the-fast
variant for latency).
Sample: embed with Workers AI REST
curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/run/@cf/baai/bge-m3" -H "Authorization: Bearer $CF_API_TOKEN" -H "Content-Type: application/json" -d '{"text": ["Chunk 1 text...", "Chunk 2 text..."]}'
3.3 AI Gateway (Caching, Rate Limits, Retries)
- Create an AI Gateway and point Workers AI and any external model calls through it for: analytics, caching, rate limiting, automatic retries (up to 5), and fallbacks. Enable caching for identical embedding requests in case of reprocessing.
3.4 Zero Trust Access & Service Tokens
- Protect the n8n instance (and any custom endpoints) behind Cloudflare Access. Issue Service Tokens (Client ID/Secret) for machine‑to‑machine webhook or chat calls.
- If exposing webhooks, configure bypass or service‑auth rules only for the required routes.
3.5 Optional: R2 Object Storage
- If needed, store raw text snapshots or large binary assets in R2. Use pre‑signed URLs for secure, time‑bound upload/download. Keep vector store as the source of truth for search.
4 GitHub Content Sync (content/
)
4.1 Authentication
Use either:
- Fine‑grained PAT with
Contents: read
(andMetadata: read
for compare), or - GitHub App installation token with
Contents: read
on the repo.
4.2 Listing & Fetching Files
- List tree (recursive) to find Markdown under
content/
. - Fetch content for changed files; response is Base64 for raw file bodies.
4.3 Change Detection (fast)
- Use Compare two commits (
base...head
) to fetch changed files since the last processed commit; paginate if needed.
4.4 Full Resync (safe)
- For cold start or schema changes, walk the Git tree recursively; use ETags with
If-None-Match
for conditional GETs to minimize rate usage.
5 Ingestion Workflow (n8n)
5.1 Split, Normalize, Chunk
- Normalize Markdown (join wrapped lines, fix hyphenation, strip artifacts).
- Chunk to ~700–1,000 tokens with 80–120 overlap. Store metadata per chunk:
docId
,path
,title
,version
,lastUpdated
(from front‑matter or commit),url
(docs site route),hash
(content digest), andseq
(chunk index).
5.2 Embeddings (bge-m3)
- Batch
text
arrays for throughput. Keep request size below provider limits. Capture model shape and token counts where available for telemetry.
5.3 Upsert to Vectorize
Send NDJSON lines with stable IDs like "{docId}:{seq}:{hash}"
so updates are idempotent. After the stream completes, poll index/info
until processedUpToMutation
≥ the returned mutationId
.
Sample: NDJSON upsert
# Each line is a JSON object
{"id":"FL-SG-001:12:ab12","values":[0.12, ... 1024 dims ...],"metadata":{"docId":"FL-SG-001","path":"content/styleguide.md","seq":12,"hash":"ab12","title":"Documentation Styleguide"}}
{"id":"FL-SG-001:13:ab12","values":[0.09, ...],"metadata":{"docId":"FL-SG-001","path":"content/styleguide.md","seq":13,"hash":"ab12"}}
5.4 Deletion on Rename/Remove
- If a file is deleted or renamed, remove its vectors via delete-by-IDs. To collect the IDs, first query the index with a metadata filter (for example,
docId
orpath
) and extractid
for each match, then calldelete_by_ids
in batches. Maintain a stableid
scheme (for example,"{docId}:{seq}:{hash}"
) so you can also reconstruct IDs without querying in emergencies.
6 Retrieval & Answering
6.1 Vectorize Query + Filter
- Query with the user’s question embedding or use text‑to‑embedding with
bge‑m3
. - Use metadata filters (for example, only
path
undercontent/
) to scope results. ChoosetopK
(≤ 100; ifreturnValues
/returnMetadata
is true, effectivetopK
for full payload is 20). Returnscore
and the stored metadata for citation.
Sample: query with filter
curl -X POST "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/vectorize/v2/indexes/fincher-docs/query" -H "Authorization: Bearer $CF_API_TOKEN" -H "Content-Type: application/json" -d '{
"vector": [ ...1024 floats... ],
"topK": 12,
"returnValues": false,
"returnMetadata": true,
"filter": {"path": {"$gte": "content/", "$lt": "content0"}} # prefix match for "content/" per Vectorize string range docs
}'
6.2 Optional Reranker
- Rerank the retrieved texts with an LLM reranker (for example, bge‑reranker‑base) to improve answer quality at small cost. Keep the final context under the LLM’s max tokens.
6.3 Compose Final Answer
- Prompt the LLM with the top N chunks (post‑rerank), instructing “answer strictly from the provided context; cite path + title”. Add a safety fallback: if confidence is low or no chunks meet a threshold, respond with “not found in docs”.
7 Frontend Chat & Docusaurus Embed
7.1 n8n Chat Trigger
- Create a Chat Trigger in n8n; set Allowed origins to your docs domain(s). Use a Respond to Chat node to connect to the retrieval workflow. Enable streaming for better UX.
Client snippet (Docusaurus or any site):
<script src="https://cdn.jsdelivr.net/npm/@n8n/chat/dist/chat.min.js"></script>
<div id="fincher-chat"></div>
<script>
const chat = window.createChat({
webhookUrl: "https://YOUR_N8N/chat/YOUR_TRIGGER_ID",
allowFileUploads: false,
enableStreaming: true,
theme: { title: "Fincher Labs Assistant" },
target: "#fincher-chat",
});
</script>
7.2 Docusaurus Integration
In docusaurus.config.js
:
export default {
// ...
scripts: [
{src: "https://cdn.jsdelivr.net/npm/@n8n/chat/dist/chat.min.js", defer: true},
],
stylesheets: [
"https://cdn.jsdelivr.net/npm/@n8n/chat/dist/style.css"
],
};
Place <div id="fincher-chat"></div>
in a homepage or DocLayout component. For Pages deployment or Cloudflare Workers hosting, set the site url
and baseUrl
accordingly.
8 Security
8.1 n8n Allowed Origins
Lock chat to your docs domain(s). If n8n sits behind Cloudflare Tunnel/Access, ensure the correct Origin
header is passed or rewrite it via a Cloudflare Transform Rule if needed. Prefer Service Tokens for server‑to‑server calls.
8.2 Turnstile Validation
Protect the chat form with Turnstile. On the server side (Cloudflare Pages Function or Worker), validate tokens via the siteverify endpoint before invoking workflow logic. Rate‑limit failures, and short‑cache success (few minutes).
9 CI/CD & Environments
- Branching: Production index (
fincher-docs
) formain
. Forstaging
, usefincher-docs-stg
and a separate n8n webhook. - Deploy: Docusaurus to Cloudflare Pages; n8n via Docker on our preferred host with Cloudflare Tunnel.
- Secrets: Store all tokens (GitHub, Cloudflare, Service Tokens) in the platform’s secret manager. Never commit to Git.
10 Observability & Performance
- Route embedding and generation calls through AI Gateway with Caching enabled.
- Set Rate limits sized to expected QPS; enable automatic retries (up to 5) and fallbacks.
- Track hit rate, latency, token volumes, and top queries in the Gateway analytics to optimize chunk sizes and prompt length.
11 Failure Modes & Runbook
- GitHub 304 / ETag flow: On 304, skip processing. If 412/409 with conditional ops, retry without precondition.
- Vectorize upsert/query errors: Backoff and retry. If the index is re‑created, perform a controlled full resync.
- Embedding timeouts: Shorten batches or use Gateway fallback to a secondary embedding model.
- Tunnel/Access issues: Verify service token headers and
Origin
rewrite. Temporarily bypass Access for the chat route only, if necessary. - Widget 4xx: Check n8n Allowed origins and CORS. Confirm Turnstile server‑side validation path.
12 Cost & Scaling Notes
- Vectorize: Cost scales with vector count and queries. Use de‑duplication by chunk
hash
to avoid re‑embedding unchanged content. - Workers AI: Choose
-fast
variants for interactive chat; cache embeddings via Gateway to reduce repeat costs. - Gateway: Tune cache TTL for embeddings (long), generation (short), and enable per‑route rate limits.
13 Implementation Steps (Checklist)
- Cloudflare: Create Vectorize index (1024), add metadata indexes (
docId
,path
). Create AI Gateway and Workers AI token/binding. - n8n: Build Ingestion workflow: GitHub → Diff/Tree → Fetch → Normalize → Chunk → Embed (Workers AI) → Vectorize Upsert (NDJSON) → Wait on
processedUpToMutation
. - n8n: Build Q&A workflow: Chat Trigger → Embed question → Vectorize Query (+ filter) → Optional Reranker → LLM Answer (cite
path
+title
) → Respond. - Security: Put n8n behind Cloudflare Access (Service Token for machine calls). Set Allowed origins. Add Turnstile + server validation.
- Docusaurus: Add
scripts
+stylesheets
, mount<div id="fincher-chat">
, and link to the chat webhook. - CI/CD: Configure separate staging/prod indexes + webhooks. Add full‑resync workflow callable on demand.
- Observability: Route through AI Gateway with caching/limits/retries; review analytics weekly.
14 References
- Cloudflare Vectorize
- Cloudflare Workers AI
- Cloudflare AI Gateway
- Cloudflare Zero Trust & Security
- n8n
- GitHub REST API
- Docusaurus