This post describes a question-answering chatbot that retrieves answers from PDF documents using Retrieval-Augmented Generation (RAG). The project is built with FastAPI, LangChain, ChromaDB, and OpenAI. You can find the full source on GitHub: github.com/arunpatilgithub/chatbot-project.

What the Project Does

Instead of relying solely on an LLM's pre-trained knowledge, this chatbot first retrieves the most relevant passages from your own PDF documents and then feeds those passages as context to the LLM. The result is grounded, document-specific answers — even for content the LLM has never seen before.

High-Level Architecture

The system is split into two phases:

Phase 1 — Ingestion

  1. Load PDFs — PDF files are read from the data/ directory using LangChain's PyPDFLoader.
  2. Chunk — Each document is split into overlapping text chunks with RecursiveCharacterTextSplitter to preserve context across boundaries.
  3. Embed — Each chunk is converted into a dense vector using OpenAIEmbeddings.
  4. Store — Vectors are persisted to a local ChromaDB collection in the vector_store_db/ directory. This step only needs to run once (or whenever new documents are added).

Phase 2 — Retrieval & Generation

  1. Receive question — A user sends a query to the FastAPI endpoint GET /question-answering?query=...&session_id=....
  2. Retrieve — ChromaDB performs a similarity search and returns the top-k most relevant chunks.
  3. Prompt — The retrieved chunks are injected into a prompt template as context alongside the user's question and the current session's chat history.
  4. Generate — The prompt is sent to ChatOpenAI (GPT-4 by default, configurable via the GPT_MODEL environment variable) and the response is returned to the caller.

Repository Layout

PathPurpose
data/Place your PDF files here before running ingestion.
src/Application source code (API server, document loader, QA system, vector-store helpers).
models/Reserved for any local model artifacts.
vector_store_db/Persisted ChromaDB files generated during ingestion.
tests/Unit and integration tests.
requirements.txtPython dependencies.

How to Run

1. Install dependencies

pip install -r requirements.txt

2. Set environment variables

Create a .env file in the project root (or export variables in your shell):

OPENAI_API_KEY=sk-...
GPT_MODEL=gpt-4          # optional, defaults to gpt-4

3. Add PDFs

Copy one or more PDF files into the data/ directory.

4. Run ingestion

Execute the loader script to chunk, embed, and persist your documents to ChromaDB:

python src/document_loader/loader.py

This populates vector_store_db/. Re-run whenever you add new PDFs.

5. Start the API server

uvicorn src.api.server:app --reload

6. Ask a question

curl "http://localhost:8000/question-answering?query=What+is+a+service+mesh%3F&session_id=user1"

The server returns a JSON response containing the answer and the session's conversation history.

Session & Conversation History

The API keeps an in-memory conversation history keyed by session_id. Each new question is prepended with the previous turns so the LLM can maintain context across a multi-turn conversation. Note that history is reset whenever the server restarts.

Further Reading