This post describes a question-answering chatbot that retrieves answers from PDF documents using Retrieval-Augmented Generation (RAG). The project is built with FastAPI, LangChain, ChromaDB, and OpenAI. You can find the full source on GitHub: github.com/arunpatilgithub/chatbot-project.
What the Project Does
Instead of relying solely on an LLM's pre-trained knowledge, this chatbot first retrieves the most relevant passages from your own PDF documents and then feeds those passages as context to the LLM. The result is grounded, document-specific answers — even for content the LLM has never seen before.
High-Level Architecture
The system is split into two phases:
Phase 1 — Ingestion
- Load PDFs — PDF files are read from the
data/directory using LangChain'sPyPDFLoader. - Chunk — Each document is split into overlapping text chunks with
RecursiveCharacterTextSplitterto preserve context across boundaries. - Embed — Each chunk is converted into a dense vector using
OpenAIEmbeddings. - Store — Vectors are persisted to a local ChromaDB collection in the
vector_store_db/directory. This step only needs to run once (or whenever new documents are added).
Phase 2 — Retrieval & Generation
- Receive question — A user sends a query to the FastAPI endpoint
GET /question-answering?query=...&session_id=.... - Retrieve — ChromaDB performs a similarity search and returns the top-k most relevant chunks.
- Prompt — The retrieved chunks are injected into a prompt template as context alongside the user's question and the current session's chat history.
- Generate — The prompt is sent to
ChatOpenAI(GPT-4 by default, configurable via theGPT_MODELenvironment variable) and the response is returned to the caller.
Repository Layout
| Path | Purpose |
|---|---|
data/ | Place your PDF files here before running ingestion. |
src/ | Application source code (API server, document loader, QA system, vector-store helpers). |
models/ | Reserved for any local model artifacts. |
vector_store_db/ | Persisted ChromaDB files generated during ingestion. |
tests/ | Unit and integration tests. |
requirements.txt | Python dependencies. |
How to Run
1. Install dependencies
pip install -r requirements.txt
2. Set environment variables
Create a .env file in the project root (or export variables in your shell):
OPENAI_API_KEY=sk-...
GPT_MODEL=gpt-4 # optional, defaults to gpt-4
3. Add PDFs
Copy one or more PDF files into the data/ directory.
4. Run ingestion
Execute the loader script to chunk, embed, and persist your documents to ChromaDB:
python src/document_loader/loader.py
This populates vector_store_db/. Re-run whenever you add new PDFs.
5. Start the API server
uvicorn src.api.server:app --reload
6. Ask a question
curl "http://localhost:8000/question-answering?query=What+is+a+service+mesh%3F&session_id=user1"
The server returns a JSON response containing the answer and the session's conversation history.
Session & Conversation History
The API keeps an in-memory conversation history keyed by session_id. Each new question is prepended with the previous turns so the LLM can maintain context across a multi-turn conversation. Note that history is reset whenever the server restarts.