RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation (RAG) is a technique that enhances LLM responses by retrieving relevant information from external knowledge sources. Mindwave provides a flexible, Laravel-native RAG implementation through its Context Discovery system.

Overview

RAG allows your AI applications to answer questions using your application's data, documents, and knowledge bases—not just the model's training data. This enables accurate, up-to-date responses grounded in your actual information.

What is RAG?

Basic Concept: Instead of relying solely on what an LLM learned during training, RAG retrieves relevant information from your knowledge sources and includes it in the prompt, allowing the model to generate responses based on current, accurate data.

The RAG Flow:

mermaid

flowchart LR
    A[User asks a question<br/><em>What is our refund policy?</em>] --> B[System searches<br/>knowledge base]
    B --> C[Context injected<br/>into prompt]
    C --> D[LLM generates<br/>response]

    style A fill:#e1f5ff
    style D fill:#e7f9e7

Flow Steps:

User asks a question → "What is our refund policy?"
System searches knowledge base → Finds relevant policy documents
Context is injected into prompt → Policy text added to the prompt
LLM generates response → Answer based on actual policy

Why Use RAG?

Up-to-Date Information

LLMs have a knowledge cutoff date
RAG provides access to current information
Your data stays fresh without retraining models

Domain-Specific Knowledge

Access proprietary company information
Use specialized technical documentation
Leverage customer support history

Reduced Hallucination

Responses grounded in actual documents
Source attribution for fact-checking
Verifiable information

Cost-Effective

No expensive fine-tuning required
Update knowledge by adding documents
Pay only for retrieval and generation

Quick Start

Basic RAG Example

php

use Mindwave\Mindwave\Facades\Mindwave;
use Mindwave\Mindwave\Context\Sources\TntSearch\TntSearchSource;
use Mindwave\Mindwave\Document\Data\Document;

// 1. Create a searchable knowledge base
$faqSource = TntSearchSource::fromArray([
    'Our refund policy: Full refund within 30 days of purchase',
    'Shipping takes 3-5 business days for domestic orders',
    'We accept Visa, Mastercard, and PayPal',
]);

// 2. Use context in a prompt
$response = Mindwave::prompt()
    ->section('system', 'Answer customer questions using the FAQ context')
    ->context($faqSource, query: 'refund policy', limit: 3)
    ->section('user', 'What is your refund policy?')
    ->run();

echo $response->content;
// Output: "We offer a full refund within 30 days of purchase."

Using Brain for Semantic Search

php

use Mindwave\Mindwave\Facades\Mindwave;
use Mindwave\Mindwave\Context\Sources\VectorStoreSource;
use Mindwave\Mindwave\Document\Data\Document;

// 1. Store documents in Brain
$brain = Mindwave::brain();
$brain->consumeAll([
    Document::make('Laravel is a PHP web framework with elegant syntax'),
    Document::make('Eloquent ORM provides intuitive database access'),
    Document::make('Blade is Laravel\'s powerful templating engine'),
]);

// 2. Create vector store source
$vectorSource = VectorStoreSource::fromBrain($brain, name: 'laravel-docs');

// 3. Use semantic search in prompts
$response = Mindwave::prompt()
    ->context($vectorSource, query: 'database queries')  // Finds "Eloquent ORM" semantically
    ->section('user', 'How do I query the database?')
    ->run();

Mindwave's RAG Architecture

Mindwave implements RAG through its Context Discovery system, which consists of three layers:

1. Context Sources

Context sources are searchable knowledge stores:

TNTSearch Source - Full-text keyword search

php

use Mindwave\Mindwave\Context\Sources\TntSearch\TntSearchSource;

$source = TntSearchSource::fromEloquent(
    SupportTicket::where('status', 'resolved'),
    fn($ticket) => "Q: {$ticket->question}\nA: {$ticket->answer}"
);

Vector Store Source - Semantic similarity search

php

use Mindwave\Mindwave\Context\Sources\VectorStoreSource;

$vectorSource = VectorStoreSource::fromBrain(
    Mindwave::brain('documentation'),
    name: 'docs'
);

Static Source - In-memory keyword matching

php

use Mindwave\Mindwave\Context\Sources\StaticSource;

$staticSource = StaticSource::fromStrings([
    'Our office hours are 9 AM - 5 PM EST',
    'Support tickets are answered within 24 hours',
]);

Eloquent Source - SQL LIKE search

php

use Mindwave\Mindwave\Context\Sources\EloquentSource;

$eloquentSource = EloquentSource::create(
    User::where('active', true),
    searchColumns: ['name', 'bio'],
    transformer: fn($user) => "{$user->name}: {$user->bio}"
);

2. Context Pipeline

Combine multiple sources with deduplication and ranking:

php

use Mindwave\Mindwave\Context\ContextPipeline;

$pipeline = (new ContextPipeline)
    ->addSource($tntSearchSource)      // Keyword search
    ->addSource($vectorStoreSource)    // Semantic search
    ->addSource($staticSource)         // Static FAQs
    ->deduplicate(true)                // Remove duplicates
    ->rerank(true);                    // Sort by relevance

$results = $pipeline->search('user authentication', limit: 10);

3. PromptComposer Integration

Context sources integrate seamlessly with PromptComposer:

php

use Mindwave\Mindwave\Facades\Mindwave;

$response = Mindwave::prompt()
    ->section('system', 'You are a helpful assistant', priority: 100)
    ->context($pipeline, priority: 75, limit: 5)  // Auto-managed context
    ->section('user', 'How do I reset my password?', priority: 100)
    ->reserveOutputTokens(500)
    ->fit()  // Automatically fits within token budget
    ->run();

Common RAG Patterns

1. Customer Support Bot

php

use Mindwave\Mindwave\Context\Sources\TntSearch\TntSearchSource;
use App\Models\SupportTicket;

class SupportBot
{
    public function answer(string $question): string
    {
        // Search past support tickets
        $knowledgeBase = TntSearchSource::fromEloquent(
            SupportTicket::where('status', 'resolved')
                ->where('rating', '>=', 4),
            fn($ticket) => "Q: {$ticket->title}\nA: {$ticket->resolution}"
        );

        return Mindwave::prompt()
            ->section('system', 'You are a support agent. Use past tickets to help.')
            ->context($knowledgeBase, limit: 3)
            ->section('user', $question)
            ->run()
            ->content;
    }
}

$bot = new SupportBot();
echo $bot->answer('How do I change my email address?');

2. Document Q&A System

php

use Mindwave\Mindwave\Context\Sources\VectorStoreSource;
use Mindwave\Mindwave\Facades\DocumentLoader;

class DocumentQA
{
    protected $brain;

    public function __construct()
    {
        $this->brain = Mindwave::brain('company-docs');
    }

    public function indexDocuments(): void
    {
        // Load various document types
        $this->brain->consumeAll([
            DocumentLoader::fromPdf('/path/to/employee-handbook.pdf'),
            DocumentLoader::fromUrl('https://docs.company.com/policies'),
            DocumentLoader::fromWord('/path/to/guidelines.docx'),
        ]);
    }

    public function ask(string $question): string
    {
        $vectorSource = VectorStoreSource::fromBrain(
            $this->brain,
            name: 'company-docs'
        );

        return Mindwave::prompt()
            ->section('system', 'Answer based on company documentation')
            ->context($vectorSource, limit: 5)
            ->section('user', $question)
            ->run()
            ->content;
    }
}

3. Multi-Source Knowledge Base

php

use Mindwave\Mindwave\Context\ContextPipeline;
use Mindwave\Mindwave\Context\Sources\{TntSearch\TntSearchSource, VectorStoreSource, StaticSource};

class KnowledgeAssistant
{
    public function answer(string $query): string
    {
        // Source 1: Static company policies
        $policies = StaticSource::fromItems([
            [
                'content' => 'Vacation policy: 15 days PTO per year',
                'keywords' => ['vacation', 'pto', 'time off'],
            ],
        ]);

        // Source 2: FAQ database
        $faq = TntSearchSource::fromEloquent(
            FAQ::all(),
            fn($faq) => "Q: {$faq->question}\nA: {$faq->answer}"
        );

        // Source 3: Semantic document search
        $docs = VectorStoreSource::fromBrain(
            Mindwave::brain('documentation')
        );

        // Combine all sources
        $pipeline = (new ContextPipeline)
            ->addSource($policies)
            ->addSource($faq)
            ->addSource($docs)
            ->deduplicate()
            ->rerank();

        return Mindwave::prompt()
            ->context($pipeline, query: $query, limit: 8)
            ->section('user', $query)
            ->run()
            ->content;
    }
}

Choosing the Right Approach

TNTSearch vs Vector Stores

Use TNTSearch when:

Searching for specific terms or product names
Working with structured data
Dataset is < 10,000 documents
Need fast, low-cost search
Keywords matter more than meaning

Use Vector Stores when:

Natural language queries
Need semantic understanding
Large knowledge base (>10,000 documents)
Multi-language support needed
Building conversational interfaces

Hybrid Approach (Best of Both)

php

$pipeline = (new ContextPipeline)
    ->addSource($tntSearchSource)      // Find exact keyword matches
    ->addSource($vectorStoreSource)    // Find semantic matches
    ->deduplicate()                    // Remove overlaps
    ->rerank();                        // Best results first

// Gets both exact matches AND semantically related content
$response = Mindwave::prompt()
    ->context($pipeline, limit: 10)
    ->section('user', 'OAuth authentication flow')
    ->run();

Best Practices

1. Optimize Chunk Sizes

php

use Mindwave\Mindwave\TextSplitters\RecursiveCharacterTextSplitter;

$splitter = new RecursiveCharacterTextSplitter(
    chunkSize: 512,        // ~128 tokens (4 chars per token)
    chunkOverlap: 50       // Overlap for context continuity
);

$chunks = $splitter->splitText($largeDocument);

// Guidelines:
// - 512-1024 chars for semantic search
// - 1000-2000 chars for keyword search
// - 50-100 char overlap between chunks

2. Manage Token Budgets

php

Mindwave::prompt()
    // Critical sections (always included)
    ->section('system', $systemPrompt, priority: 100)
    ->section('user', $userQuery, priority: 100)

    // Context (will shrink if needed)
    ->context($source, priority: 75, limit: 10)

    // Examples (removed first if space needed)
    ->section('examples', $examples, priority: 50)

    // Reserve space for response
    ->reserveOutputTokens(1000)

    // Auto-fit to model's context window
    ->fit()
    ->run();

3. Monitor Costs

php

use Mindwave\Mindwave\Observability\Models\Trace;

// Track daily RAG costs
$cost = Trace::whereDate('created_at', today())
    ->sum('estimated_cost');

echo "Today's RAG cost: \${$cost}";

// Alert on high costs
if ($cost > 10.00) {
    // Send alert or throttle
}

4. Cache Common Queries

php

use Illuminate\Support\Facades\Cache;

class CachedRAG
{
    public function answer(string $question): string
    {
        $cacheKey = 'rag:' . md5($question);

        return Cache::remember($cacheKey, now()->addHours(24), function() use ($question) {
            return Mindwave::prompt()
                ->context($this->source, query: $question)
                ->section('user', $question)
                ->run()
                ->content;
        });
    }
}

5. Test Retrieval Quality

php

use Tests\TestCase;

class RAGTest extends TestCase
{
    /** @test */
    public function it_retrieves_relevant_documents()
    {
        $source = TntSearchSource::fromArray([
            'Laravel provides Eloquent ORM',
            'Vue.js is a JavaScript framework',
        ]);

        $results = $source->search('database ORM', limit: 3);

        $this->assertGreaterThan(0, $results->count());
        $this->assertStringContainsString('Eloquent', $results->first()->content);
    }
}

Performance Optimization

Indexing Strategies

Ephemeral Indexes (TNTSearch)

Created per-request
Good for < 10k documents
No management required

Persistent Indexes (Brain/Vector Stores)

Pre-compute embeddings
Scales to millions of documents
Fast searches, higher setup cost

Batch Processing

php

use Illuminate\Support\Facades\Queue;

class IndexDocumentsJob implements ShouldQueue
{
    public function handle()
    {
        $brain = Mindwave::brain('documents');

        Document::where('indexed', false)
            ->chunk(50, function ($documents) use ($brain) {
                foreach ($documents as $doc) {
                    $brain->consume($doc->toMindwaveDocument());
                    $doc->update(['indexed' => true]);
                }
            });
    }
}

Rate Limiting

php

use Illuminate\Support\Facades\RateLimiter;

RateLimiter::for('rag-search', function (Request $request) {
    return Limit::perMinute(30)->by($request->user()->id);
});

if (RateLimiter::tooManyAttempts('rag-search', 30)) {
    abort(429, 'Too many requests');
}

Troubleshooting

Poor Retrieval Quality

Problem: Search returns irrelevant results.

Solutions:

php

// 1. Try semantic search instead of keyword
$vectorSource = VectorStoreSource::fromBrain($brain);

// 2. Combine multiple sources
$pipeline = (new ContextPipeline)
    ->addSource($keywordSource)
    ->addSource($semanticSource);

// 3. Increase result limit
->context($source, limit: 10)  // More results

// 4. Filter by score threshold
$results = $source->search($query, 10)
    ->filter(fn($item) => $item->score > 0.7);

Token Budget Issues

Problem: Context exceeds token limits.

Solutions:

php

// 1. Reduce context limit
->context($source, limit: 3)  // Fewer results

// 2. Set lower priority
->context($source, priority: 50)  // Shrinks first

// 3. Reserve more output tokens
->reserveOutputTokens(1000)

// 4. Use smaller chunks
$splitter = new RecursiveCharacterTextSplitter(chunkSize: 300);

High Costs

Problem: RAG queries are too expensive.

Solutions:

php

// 1. Use cheaper models
->model('gpt-4o-mini')

// 2. Reduce context size
->context($source, limit: 3)

// 3. Cache aggressively
Cache::remember("rag::{$query}", now()->addDay(), ...);

// 4. Use TNTSearch instead of vector search
// TNTSearch is free (local), vector search requires API calls

Next Steps

Learn More

Brain - Long-term knowledge storage
Context Discovery - Complete RAG guide
PromptComposer - Token-aware prompts
Tracing - Monitor RAG performance

Deep Dive

TNTSearch Source - Full-text search details
Vector Store Source - Semantic search guide
Context Pipeline - Multi-source aggregation
Custom Sources - Build your own

Examples

Start with a simple example and expand:

Basic FAQ Bot - Use StaticSource with 10-20 FAQs
Support Ticket Search - TNTSearch over support tickets
Document Q&A - Brain with PDFs and docs
Multi-Source Agent - Combine multiple sources with pipeline
Production System - Add caching, monitoring, and optimization

RAG (Retrieval-Augmented Generation) ​

Overview ​

What is RAG? ​

Why Use RAG? ​

Quick Start ​

Basic RAG Example ​

Using Brain for Semantic Search ​

Mindwave's RAG Architecture ​

1. Context Sources ​

2. Context Pipeline ​

3. PromptComposer Integration ​

Common RAG Patterns ​

1. Customer Support Bot ​

2. Document Q&A System ​

3. Multi-Source Knowledge Base ​

Choosing the Right Approach ​

TNTSearch vs Vector Stores ​

Hybrid Approach (Best of Both) ​

Best Practices ​

1. Optimize Chunk Sizes ​

2. Manage Token Budgets ​

3. Monitor Costs ​

4. Cache Common Queries ​

5. Test Retrieval Quality ​

Performance Optimization ​

Indexing Strategies ​

Batch Processing ​

Rate Limiting ​

Troubleshooting ​

Poor Retrieval Quality ​

Token Budget Issues ​

High Costs ​

Next Steps ​

Learn More ​

Deep Dive ​

Examples ​

RAG (Retrieval-Augmented Generation)

Overview

What is RAG?

Why Use RAG?

Quick Start

Basic RAG Example

Using Brain for Semantic Search

Mindwave's RAG Architecture

1. Context Sources

2. Context Pipeline

3. PromptComposer Integration

Common RAG Patterns

1. Customer Support Bot

2. Document Q&A System

3. Multi-Source Knowledge Base

Choosing the Right Approach

TNTSearch vs Vector Stores

Hybrid Approach (Best of Both)

Best Practices

1. Optimize Chunk Sizes

2. Manage Token Budgets

3. Monitor Costs

4. Cache Common Queries

5. Test Retrieval Quality

Performance Optimization

Indexing Strategies

Batch Processing

Rate Limiting

Troubleshooting

Poor Retrieval Quality

Token Budget Issues

High Costs

Next Steps

Learn More

Deep Dive

Examples