AI Building Blocks in .NET

RAG Document Intelligence with Microsoft.Extensions.AI & Aspire

Oleksii Nikiforov

  • Lead Software Engineer at EPAM Systems
  • AI Engineering Coach
  • +10 years in software development
  • Open Source and Blogging

nikiforovall
Oleksii Nikiforov
nikiforovall.blog

Problem Statement

  • AI integrations are tightly coupled to specific providers
  • Switching OpenAI → Ollama → Azure = rewriting code
  • RAG apps need: embeddings + vector store + LLM + evaluation
  • How do you build provider-agnostic AI apps in .NET?

.NET AI Building Blocks

The core abstractions for provider-agnostic AI

The Four Building Blocks

Block Package Purpose
MEAI Microsoft.Extensions.AI IChatClient, IEmbeddingGenerator
VectorData Microsoft.Extensions.VectorData VectorStoreCollection, search
Agent Framework Microsoft.Agents.AI IAIAgent, AG-UI protocol
MCP Model Context Protocol Tool & resource discovery

IChatClient — Universal Chat Abstraction

// Register with any provider — one line change to swap
builder.AddOllamaApiClient("ollama-llama3-1").AddChatClient();

// Consume via DI — completely provider-agnostic
IChatClient chatClient = sp.GetRequiredService<IChatClient>();
var response = await chatClient.GetResponseAsync(messages);

Same interface works with OpenAI, Azure OpenAI, Ollama, Anthropic — swap the registration, keep the code

IEmbeddingGenerator — Vector Embeddings

// Register embedding provider
builder.AddOllamaApiClient("ollama-all-minilm").AddEmbeddingGenerator();

// Provider-agnostic embedding generation
IEmbeddingGenerator<string, Embedding<float>> embedder = ...;
var embeddings = await embedder.GenerateAsync(["some text"]);
// → Embedding<float>[384]

Key: Same abstraction for OpenAI text-embedding-3-small, Ollama all-minilm, Azure, etc.

DelegatingChatClient — Middleware Pattern

sealed class LoggingChatClient(IChatClient inner, ILogger logger)
    : DelegatingChatClient(inner)
{
    public override async Task<ChatResponse> GetResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken ct = default)
    {
        logger.LogInformation("Chat request: {Count} messages", messages.Count());
        var response = await base.GetResponseAsync(messages, options, ct);
        logger.LogInformation("Chat response: {Tokens} tokens",
            response.Usage?.TotalTokenCount);

        return response;
    }
}

DelegatingChatClient — Composable Pipeline

// Built-in middleware: logging, telemetry, caching

builder.Services.AddChatClient(chatClient)
    .UseOpenTelemetry()
    .UseLogging()
    .UseFunctionInvocation();

Like ASP.NET middleware but for AI calls — logging, caching, retry, telemetry as composable behaviors

VectorStore — Attribute-Driven Data Model

public sealed class DocumentRecord
{
    [VectorStoreKey]
    public Guid Id { get; set; } = Guid.NewGuid();

    [VectorStoreData]
    public string Text { get; set; } = "";

    [VectorStoreData(IsIndexed = true)]
    public string Source { get; set; } = "";

    [VectorStoreVector(Dimensions: 384)]
    public string Embedding => Text;  // auto-embedded!
}

Key: [VectorStoreVector] pointing to a string property = automatic embedding generation at upsert time

VectorStore — Provider Registration

builder.Services.AddSingleton<VectorStoreCollection<Guid, DocumentRecord>>(sp =>
{
    var embedder = sp.GetRequiredService<IEmbeddingGenerator<string, Embedding<float>>>();
    var vectorStore = new QdrantVectorStore(
        sp.GetRequiredService<QdrantClient>(),
        new QdrantVectorStoreOptions { EmbeddingGenerator = embedder });

    return vectorStore
        .GetCollection<Guid, DocumentRecord>("documents");
});

Swap QdrantVectorStore for InMemoryVectorStore, RedisVectorStore, AzureAISearchVectorStore — same interface

Building a RAG Application

PDF → Chunks → Vectors → Grounded Answers

Solution Architecture

Phase 1: INGEST                       Phase 2: QUERY
───────────────────────────           ────────────────────────
Upload PDF document                   "What does the report say about X?"
  → PdfPig extracts text                → Vector search (Qdrant)
  → Recursive chunking (128 tokens)     → Retrieve top-5 chunks
  → Embed (all-minilm, 384-dim)         → LLM generates grounded answer
  → Upsert to Qdrant                    → Citations from stored knowledge

All AI operations go through building block abstractionsIChatClient, IEmbeddingGenerator, VectorStoreCollection

PDF Extraction → Chunking

// PdfPig — pure .NET, no native deps
var extraction = PdfTextExtractor.Extract(pdfStream);
// → Text with "## Page N" markers, page count

// Recursive chunking with overlap
var chunks = TextChunker.Chunk(extraction.Text);
// Target: 128 tokens | Max: 200 | Overlap: 25
// Separators: paragraphs → lines → sentences → words

Small chunks (128 tokens) improve retrieval precision — overlap ensures context continuity

Ingestion Pipeline

var records = chunks.Select((chunk, i) =>
    new DocumentRecord
    {
        Text = chunk,
        FileName = fileName,
        ChunkIndex = i,
        Source = source,
    });

await collection.UpsertAsync(records, ct);
// Embedding generated automatically via VectorStoreVector!

Zero embedding codeVectorStoreCollection generates embeddings at upsert using IEmbeddingGenerator

Document Ingestion

center

Vector Search Adapter

public static Func<string, CancellationToken,
    Task<IEnumerable<TextSearchResult>>>
    Create(VectorStoreCollection<Guid, DocumentRecord> collection,
           int top = 5) =>
    async (query, ct) =>
    {
        var results = new List<TextSearchResult>();
        await foreach (var result in
            collection.SearchAsync(query, top: top, ct))
        {
            results.Add(new TextSearchResult
            {
                Text = result.Record.Text,
                SourceName = FormatSourceName(...),
            });
        }
        return results;
    };

Agentic RAG — TextSearchProvider

var searchFunc = VectorSearchAdapter.Create(collection, top: 5);

var chatAgent = chatClient.AsAIAgent(
    new ChatClientAgentOptions
    {
        Name = "agentic_chat",
        ChatOptions = new ChatOptions { Instructions = SystemInstructions },
        AIContextProviders = [
            new TextSearchProvider(searchFunc, ragOptions)
        ],
    });

app.MapAGUI("/", chatAgent);

TextSearchProvider auto-injects relevant context before each LLM call — retrieval is transparent to the agent

Chat — Q&A with Citations

center

.NET Aspire Orchestration

One command to start everything

AppHost — Full Stack Orchestration

var ollama = builder.AddOllama("ollama")
    .WithDataVolume()
    .WithLifetime(ContainerLifetime.Persistent);
var llama = ollama.AddModel("llama3.1");
var embedModel = ollama.AddModel("all-minilm");

var qdrant = builder.AddQdrant("qdrant")
    .WithLifetime(ContainerLifetime.Persistent);

var api = builder.AddProject<Projects.CompanyIntel_Api>("api")
    .WithReference(llama).WithReference(embedModel)
    .WithReference(qdrant).WithReference(sqlite);

builder.AddJavaScriptApp("ui", "../CompanyIntel.UI", "dev")
    .WithEnvironment("AGENT_URL", api.GetEndpoint("http"));

dotnet aspire run — starts Ollama + model pulls + Qdrant + SQLite + API + Next.js UI

RAG Evaluation

Measuring quality with LLM-as-Judge

Why Evaluate RAG?

  • RAG quality is invisible without measurement
  • Retrieval failures → hallucinated or irrelevant answers
  • Need to catch regressions before users do
  • Four dimensions of quality:
Dimension Question
Relevance Does the answer address the question?
Coherence Is it well-structured and logical?
Groundedness Is every claim backed by context?
Retrieval Did we find the right chunks?

Microsoft.Extensions.AI.Evaluation

IEvaluator[] evaluators = [
    new RelevanceEvaluator(),
    new CoherenceEvaluator(),
    new GroundednessEvaluator(),
    new RetrievalEvaluator(),
];

  • Scoring: 1–5 scale via LLM-as-Judge
  • Judge LLM reads (question, answer, context) and scores each dimension
  • Returns NumericMetric with value, rating, and reasoning
  • Part of Microsoft.Extensions.AI.Evaluation.Quality package

Relevance (1–5)

Does the response address the user's question?

Score Meaning
5 Directly and completely answers the question
3 Partially relevant, missing key aspects
1 Completely off-topic or irrelevant

  • Judge reads (question, answer) — no context needed
  • Pure question-answer alignment check
  • Catches: wrong topic, incomplete answers, tangential responses

Coherence (1–5)

Is the answer well-organized and readable?

Score Meaning
5 Clear structure, logical flow, easy to follow
3 Somewhat organized but with inconsistencies
1 Disjointed, contradictory, hard to follow

  • Evaluates the answer in isolation — no context needed
  • Catches: repetitive text, broken formatting, contradictions
  • Important for user trust and readability

Groundedness (1–5)

Is every claim supported by the retrieved context?

Score Meaning
5 All claims directly supported by provided context
3 Some claims lack support or are extrapolated
1 Answer fabricates information not in context

Most critical metric for RAG — catches hallucinations. Requires GroundednessEvaluatorContext

Retrieval (1–5)

Are the retrieved chunks useful for answering?

Score Meaning
5 All chunks highly relevant to the question
3 Mix of relevant and irrelevant chunks
1 Retrieved chunks are useless for this question

  • Evaluates the retrieval pipeline, not the LLM answer
  • Requires RetrievalEvaluatorContext with chunk texts
  • Low score → fix chunking, embedding model, or search params

Testing with Aspire

public class AspireAppFixture : IAsyncLifetime
{
    public HttpClient ApiClient { get; private set; }
    public Uri OllamaEndpoint { get; private set; }

    public async ValueTask InitializeAsync()
    {
        var appHost = await DistributedApplicationTestingBuilder
            .CreateAsync<Projects.CompanyIntel_AppHost>();
        _app = await appHost.BuildAsync();

        await _app.StartAsync();
        await _app.ResourceNotifications.WaitForResourceHealthyAsync("api");
        
        ApiClient = _app.CreateHttpClient("api", "http");
    }
}

Evaluation Test

// 1. Call RAG endpoint
var result = await client.PostAsJsonAsync("/api/chat",
    new { message = question });

// 2. Configure LLM judge
IChatClient judge = new OllamaApiClient(ollamaEndpoint, "llama3.1");
var chatConfig = new ChatConfiguration(judge);

// 3. Provide grounding context
var context = new EvaluationContext[] {
    new GroundednessEvaluatorContext(groundingContext),
    new RetrievalEvaluatorContext(retrievedChunks),
};

// 4. Score each dimension
foreach (var evaluator in evaluators) {
    var evalResult = await evaluator.EvaluateAsync(
        messages, modelResponse, chatConfig, context, ct);
    var metric = evalResult.Get<NumericMetric>(metricName);
    Assert.True(metric.Value >= 3);
}

End-to-End Evaluation Flow

dotnet test

  → Aspire starts Ollama + Qdrant + API (ephemeral)
  → Upload test PDF (contoso-corp.pdf)
  → For each eval scenario:
      → POST /api/chat with question
      → Get (answer, context chunks)
      → Judge LLM scores 4 metrics (1–5)
      → Assert all metrics ≥ 3

  → Aspire tears down everything

Key Takeaways

  1. MEAI abstractionsIChatClient, IEmbeddingGenerator. Write once, swap providers

  2. VectorData attributes[VectorStoreKey/Data/Vector] + automatic embedding = minimal boilerplate

  3. DelegatingChatClient — middleware for AI calls. Logging, retry, telemetry as pipeline behaviors

  4. Aspire for testing — spin up full RAG stack per test run. Ephemeral containers = clean, reproducible eval

  5. LLM-as-Judge — measure Relevance, Coherence, Groundedness, Retrieval before shipping

Resources

Thank You

Questions?