Epstein Files RAG

Overview / Description

Epstein Files RAG is an open-source AI document-search tool that lets users explore unsealed Jeffrey Epstein court documents through conversational, retrieval-augmented search. It is aimed at researchers, journalists, and analysts who want to query investigative records in a searchable, chat-style interface instead of reading raw filings. The project is fully open-source with no proprietary dependencies and can run locally via Ollama or against cloud inference through Groq or OpenRouter. An automated ingestion step downloads and indexes a curated dataset from Hugging Face (the Nikity/Epstein-Files collection in Apache Parquet format), and strict guardrails keep responses within the investigative-document context rather than answering open-ended questions. The stack is LangChain for orchestration, ChromaDB as the vector database, Streamlit for the interface, and Python 3.10 or higher. Setup requires Python 3.10+, an optional Ollama install for local models, and API keys for Groq or OpenRouter if using cloud inference; the first data chunk ingests in roughly 3 to 5 minutes at about 0.5 GB by default, and a virtual environment is recommended. As an open-source repository, there is no pricing; running it costs only any inference or hosting you choose.

Used For

Researchers and journalists use Epstein Files RAG to query unsealed Epstein court documents conversationally, running it locally with Ollama or via cloud LLMs.

Pricing

Plan

Free

Free and open-source (no license fee)

View pricing

Plan

Free

You pay only for any cloud inference (Groq/OpenRouter) or hosting you choose

View pricing

Pros & Cons

Pros

  • Conversational RAG search over unsealed Epstein court documents
  • Fully open-source with no proprietary dependencies
  • Runs locally via Ollama or on cloud inference via Groq/OpenRouter
  • Automated ingestion of a curated Hugging Face dataset (Nikity/Epstein-Files)
  • Strict guardrails keep answers within the investigative-document context

Cons

  • Requires developer setup: Python 3.10+, virtual environment, and API keys for cloud inference
  • Cloud inference needs Groq or OpenRouter keys (potential usage costs)
  • Initial data ingestion downloads about 0.5 GB
  • Scope is limited to one specific document collection

Questions & Answers

Alternatives

PrivateGPT, AnythingLLM, LlamaIndex, Danswer, Verba

Epstein Files RAG | AI Tools Directory