Solution Overview

Two-phase architecture

Phase 1: INGEST (online)          Phase 2: QUERY (offline)
─────────────────────────         ────────────────────────
User: "ingest Figma"              User: "Who are competitors?"
  → Scrape websites               → Hybrid retrieval (Qdrant)
  → Clean → Chunk → Embed         → RRF fusion (dense + BM25)
  → Store in Qdrant               → LLM generates grounded answer

Key: No internet access during query phase — answers from stored knowledge only

Layer	Choice	Why
LLM	Qwen3 8B via Ollama	32K context, local-only
Embeddings	snowflake-arctic-embed-s	384-dim, fast on CPU
Vector Store	Qdrant	Hybrid search, pre-filtering
Agent	Pydantic AI + AG-UI	SSE streaming to UI
Frontend	CopilotKit + Next.js 15	React 19, chat UX
Orchestration	.NET Aspire	Service discovery, OTel
Scraping	Crawl4AI	Async, headless browser

Metric	What it measures
Hit Rate	% of queries where at least one expected fact is retrieved
Context Recall	Average % of expected facts found per query

Company Intelligence

RAG-based company research assistant

Oleksii Nikiforov

Problem Statement

Solution Overview

Two-phase architecture

Chat — Chat Window

Chat — Q&A with Citations

Chat — Detailed Research

Backoffice — Data Ingestion

Tech Stack

Phase 1: Data Ingestion

Scrape → Clean → Chunk → Embed → Store

Ingestion Pipeline

Scraping

Chunking & Embedding

Phase 2: RAG Query

Retrieve → Augment → Ground → Cite

Hybrid Retrieval

Chat Agent

Phase 3: Orchestration & Observability

.NET Aspire + OpenTelemetry

.NET Aspire

OpenTelemetry

Distributed Traces

Scraper Metrics

Phase 4: RAG Evaluation

Measure retrieval quality before it reaches the LLM

Evaluation Approach

Why Aspire for Eval?

Metrics

Best Practices for RAG Evaluation

Thank You

Questions?