Building a PDF RAG Chatbot (FastAPI + ChromaDB)

This post describes a question-answering chatbot that retrieves answers from PDF documents using Retrieval-Augmented Generation (RAG). The project is built with FastAPI, LangChain, ChromaDB, and OpenAI. You can find the full source on GitHub: github.com/arunpatilgithub/chatbot-project.

What the Project Does

Instead of relying solely on an LLM's pre-trained knowledge, this chatbot first retrieves the most relevant passages from your own PDF documents and then feeds those passages as context to the LLM. The result is grounded, document-specific answers — even for content the LLM has never seen before.

High-Level Architecture

The system is split into two phases:

Phase 1 — Ingestion

Load PDFs — PDF files are read from the data/ directory using LangChain's PyPDFLoader.
Chunk — Each document is split into overlapping text chunks with RecursiveCharacterTextSplitter to preserve context across boundaries.
Embed — Each chunk is converted into a dense vector using OpenAIEmbeddings.
Store — Vectors are persisted to a local ChromaDB collection in the vector_store_db/ directory. This step only needs to run once (or whenever new documents are added).

Phase 2 — Retrieval & Generation

Receive question — A user sends a query to the FastAPI endpoint GET /question-answering?query=...&session_id=....
Retrieve — ChromaDB performs a similarity search and returns the top-k most relevant chunks.
Prompt — The retrieved chunks are injected into a prompt template as context alongside the user's question and the current session's chat history.
Generate — The prompt is sent to ChatOpenAI (GPT-4 by default, configurable via the GPT_MODEL environment variable) and the response is returned to the caller.

Repository Layout

Path	Purpose
`data/`	Place your PDF files here before running ingestion.
`src/`	Application source code (API server, document loader, QA system, vector-store helpers).
`models/`	Reserved for any local model artifacts.
`vector_store_db/`	Persisted ChromaDB files generated during ingestion.
`tests/`	Unit and integration tests.
`requirements.txt`	Python dependencies.

How to Run

1. Install dependencies

pip install -r requirements.txt

2. Set environment variables

Create a .env file in the project root (or export variables in your shell):

OPENAI_API_KEY=sk-...
GPT_MODEL=gpt-4          # optional, defaults to gpt-4

3. Add PDFs

Copy one or more PDF files into the data/ directory.

4. Run ingestion

Execute the loader script to chunk, embed, and persist your documents to ChromaDB:

python src/document_loader/loader.py

This populates vector_store_db/. Re-run whenever you add new PDFs.

5. Start the API server

uvicorn src.api.server:app --reload

6. Ask a question

curl "http://localhost:8000/question-answering?query=What+is+a+service+mesh%3F&session_id=user1"

The server returns a JSON response containing the answer and the session's conversation history.

Session & Conversation History

The API keeps an in-memory conversation history keyed by session_id. Each new question is prepended with the previous turns so the LLM can maintain context across a multi-turn conversation. Note that history is reset whenever the server restarts.