Open Source Tool by doingnow

Meet ingestAI
Your data, instantly searchable.

Connect any data source — SharePoint, SQL databases, S3, Snowflake, local drives, PDFs, Word, Excel, JSON — and make it conversational with LLM-powered Q&A. One command to ingest. One prompt to find anything.

6+Data Connectors
7File Formats
RAGPowered by OpenAI
100%Local & Private
Architecture

From raw data to conversation in minutes

ingestAI uses a Retrieval-Augmented Generation (RAG) pipeline. Data is parsed, chunked, embedded, and stored locally — then retrieved semantically when you ask a question.

📂
Source
Files · DB · SharePoint · S3 · DW
🔍
Parse & Chunk
Text, tables & metadata extracted
🧠
Embed
OpenAI vector embeddings
🗄️
ChromaDB
Local persistent vector store
💬
Chat UI / API
GPT-powered answers with citations
Features

Everything you need to unlock your data

🔌

Universal Connectors

One-command ingestion from SharePoint, SQL databases, AWS S3, Snowflake, local disks, and mounted network shares — no custom ETL scripts needed.

📄

Multi-Format Parsing

Natively parses PDF, Word (.docx), Excel (.xlsx/.xls), JSON, and images (with EXIF metadata extraction). Tables, text, and properties all captured.

💬

Conversational Q&A

Ask plain-English questions in the browser UI or via REST API. The LLM answers using only your documents and always cites the source file.

🔒

Fully Local & Private

All data lives on your machine. The vector store is a local ChromaDB instance. Only the query and relevant chunks leave your environment (to OpenAI).

CLI-First Design

The ingest.py CLI was built to run unattended — schedule it with cron or a task scheduler for automatic nightly re-indexing of changing data.

🌐

REST API Included

Every function exposed via FastAPI: upload, list, delete, and query endpoints. Integrate ingestAI into any internal tool or workflow in minutes.

Data Sources

Connect nearly any data source

ingestAI ships connectors for the most common enterprise data locations. Optional dependencies are installed only when needed.

💾
Local Disk / NAS
Any OS-mounted path, recursive scan
🔷
SharePoint Online
Microsoft Graph API, app credentials
🗃️
SQL Databases
PostgreSQL · MySQL · MSSQL · SQLite
☁️
AWS S3
Any bucket & prefix, IAM auth
❄️
Snowflake DW
Any SQL query, DictCursor output
📁
Network Shares
SMB · AFP · NFS mounted drives
📄
PDF Documents
Text + tables via pdfplumber
📝
Word / Excel
.docx · .xlsx · .xls with metadata
{}
JSON Files
Nested objects fully serialised
🖼️
Images & Metadata
EXIF, dimensions, format info
Quick Start

Up and running in 5 minutes

Install once, point at your data, then open the browser — or schedule the CLI for nightly re-indexing. No cloud infrastructure required.

  • 01
    Install dependencies
    pip install -r requirements.txt
  • 02
    Add your OpenAI key
    cp .env.example .env → set OPENAI_API_KEY
  • 03
    Ingest your data
    python ingest.py local ~/Documents/reports
  • 04
    Start the web UI
    uvicorn app.main:app --port 8000
ingest.py — CLI reference
# Local folder or any mounted drive
$ python ingest.py local ~/Documents/reports --recursive
$ python ingest.py local /Volumes/NAS/Finance --ext .pdf --ext .xlsx

# SharePoint Online
$ python ingest.py sharepoint \
    --site-url https://company.sharepoint.com/sites/HR \
    --library "Policy Documents"

# Any SQL database
$ python ingest.py database \
    --url "postgresql://user:pass@host/mydb" \
    --query "SELECT id, title, body FROM articles" \
    --text-col body

# AWS S3
$ python ingest.py s3 --bucket corp-docs --prefix legal/

# Snowflake data warehouse
$ python ingest.py snowflake \
    --account myorg.us-east-1 \
    --database PROD_DB \
    --query "SELECT TITLE, CONTENT FROM KNOWLEDGE_BASE"

# List everything indexed
$ python ingest.py list
Use Cases

Built for real enterprise workflows

📋 Policy & Compliance Search

Index SharePoint policy libraries and let employees ask plain-English questions about HR policies, IT security standards, or compliance requirements.

📊 Financial Report Analysis

Ingest quarterly PDFs, Excel models, and analyst notes from a shared drive. Ask questions like "What was our EBITDA trend over the last 3 quarters?"

🏭 Operations Knowledge Base

Pull SOPs, maintenance logs, and incident reports from a warehouse. Field technicians can query procedures without searching through folders.

🔬 Research & Discovery

Ingest research papers, internal wikis, and data exports. Surface cross-document insights that would take hours to find manually.

🛒 Product & Catalogue Data

Load product specs, pricing tables, and customer feedback from a database or S3. Sales teams ask natural-language questions and get instant, cited answers.

🧑‍💼 HR & Onboarding

Make handbooks, benefit guides, and training materials instantly searchable. New hires find answers without waiting for a colleague to respond.

Get Started

Start searching your data today

ingestAI is open-source and runs entirely on your machine. For enterprise deployment, managed hosting, and support, talk to the doingnow team.