Testing AI Applications

Testing LLM-powered applications requires different strategies than traditional software testing. This guide covers comprehensive testing approaches for Mindwave applications, from unit tests to end-to-end RAG system validation.

Overview

AI application testing presents unique challenges:

Non-deterministic outputs - LLMs can produce different responses for identical inputs
Complex dependencies - RAG systems involve embeddings, vector stores, and retrieval pipelines
Cost considerations - Real API calls during testing add up quickly
Quality metrics - Traditional assertions don't capture semantic correctness

This guide provides practical patterns for:

Unit Testing - Test individual components in isolation
Integration Testing - Test component interactions without real APIs
Mocking Strategies - Simulate LLM responses efficiently
RAG Testing - Validate retrieval quality and accuracy
Evaluation Metrics - Measure semantic correctness
CI/CD Integration - Automate testing in your pipeline

Testing Strategies

1. Unit Testing with Fake Driver

The fake LLM driver allows fast, deterministic testing without API calls.

Basic Setup

php

<?php

namespace Tests\Feature;

use Tests\TestCase;
use Mindwave\Mindwave\Facades\Mindwave;
use Mindwave\Mindwave\Testing\Fakes\FakeLLM;

class ChatbotTest extends TestCase
{
    protected function setUp(): void
    {
        parent::setUp();

        // Use fake driver for testing
        config(['mindwave-llm.default' => 'fake']);
    }

    /** @test */
    public function it_generates_a_greeting()
    {
        $response = Mindwave::llm()
            ->generateText('Say hello');

        $this->assertNotEmpty($response);
        $this->assertIsString($response);
    }
}

Controlling Fake Responses

php

use Mindwave\Mindwave\Testing\Fakes\FakeLLM;

/** @test */
public function it_summarizes_text()
{
    // Set a specific fake response
    FakeLLM::fake([
        'response' => 'This is a test summary.',
        'usage' => [
            'prompt_tokens' => 50,
            'completion_tokens' => 10,
            'total_tokens' => 60,
        ],
    ]);

    $service = new DocumentSummarizer();
    $summary = $service->summarize('Long document text...');

    $this->assertEquals('This is a test summary.', $summary);
}

Testing Multiple Calls

php

/** @test */
public function it_handles_conversation()
{
    // Queue multiple fake responses
    FakeLLM::fake([
        ['response' => 'Hello! How can I help you?'],
        ['response' => 'I can answer questions about Laravel.'],
        ['response' => 'Laravel is a PHP framework.'],
    ]);

    $chatbot = new Chatbot();

    $response1 = $chatbot->sendMessage('Hi');
    $response2 = $chatbot->sendMessage('What can you do?');
    $response3 = $chatbot->sendMessage('What is Laravel?');

    $this->assertEquals('Hello! How can I help you?', $response1);
    $this->assertEquals('I can answer questions about Laravel.', $response2);
    $this->assertEquals('Laravel is a PHP framework.', $response3);
}

2. Integration Testing

Test component interactions while mocking expensive external calls.

Testing PromptComposer

php

use Mindwave\Mindwave\Facades\Mindwave;
use Mindwave\Mindwave\Testing\Fakes\FakeLLM;

/** @test */
public function it_builds_prompt_with_context()
{
    FakeLLM::fake();

    $composer = Mindwave::prompt()
        ->section('system', 'You are helpful')
        ->section('context', $this->getSampleContext())
        ->section('user', 'What is Laravel?')
        ->model('gpt-4');

    // Assert prompt structure
    $messages = $composer->toMessages();

    $this->assertCount(3, $messages);
    $this->assertEquals('system', $messages[0]['role']);
    $this->assertEquals('You are helpful', $messages[0]['content']);
}

/** @test */
public function it_fits_prompt_to_token_limit()
{
    FakeLLM::fake();

    $largeContext = str_repeat('Context information. ', 1000);

    $composer = Mindwave::prompt()
        ->section('system', 'You are helpful', priority: 100)
        ->section('context', $largeContext, priority: 50, shrinker: 'truncate')
        ->section('user', 'Question?', priority: 100)
        ->model('gpt-4')
        ->reserveOutputTokens(1000)
        ->fit();

    $tokenCount = $composer->getTokenCount();

    // GPT-4 has 8K context, minus 1K reserved = 7K max
    $this->assertLessThanOrEqual(7000, $tokenCount);
}

Testing Context Discovery

php

use Mindwave\Mindwave\Context\Sources\TntSearch\TntSearchSource;

/** @test */
public function it_retrieves_relevant_documents()
{
    $source = TntSearchSource::fromArray([
        'Laravel is a PHP web framework',
        'Vue.js is a JavaScript framework',
        'Docker is a containerization platform',
    ]);

    $source->initialize();
    $results = $source->search('PHP framework', limit: 3);

    $this->assertGreaterThan(0, $results->count());
    $this->assertStringContainsString('Laravel', $results->first()->content);
    $this->assertGreaterThan(0.5, $results->first()->score);
}

/** @test */
public function it_deduplicates_pipeline_results()
{
    $source1 = TntSearchSource::fromArray([
        'Laravel is a PHP framework',
        'Laravel provides Eloquent ORM',
    ], name: 'source1');

    $source2 = TntSearchSource::fromArray([
        'Laravel is a PHP framework', // Duplicate
        'Laravel uses Blade templates',
    ], name: 'source2');

    $pipeline = (new ContextPipeline)
        ->addSource($source1)
        ->addSource($source2)
        ->deduplicate(true);

    $results = $pipeline->search('Laravel', limit: 10);

    // Should have 3 unique results, not 4
    $this->assertCount(3, $results);
}

3. Mocking External APIs

Mock LLM provider APIs for integration tests that need realistic behavior.

Using HTTP Fake

php

use Illuminate\Support\Facades\Http;

/** @test */
public function it_handles_openai_api_response()
{
    Http::fake([
        'api.openai.com/*' => Http::response([
            'id' => 'chatcmpl-test',
            'object' => 'chat.completion',
            'created' => time(),
            'model' => 'gpt-4',
            'choices' => [
                [
                    'index' => 0,
                    'message' => [
                        'role' => 'assistant',
                        'content' => 'This is a test response.',
                    ],
                    'finish_reason' => 'stop',
                ],
            ],
            'usage' => [
                'prompt_tokens' => 20,
                'completion_tokens' => 10,
                'total_tokens' => 30,
            ],
        ], 200),
    ]);

    config(['mindwave-llm.default' => 'openai']);

    $response = Mindwave::llm()
        ->generateText('Test prompt');

    $this->assertEquals('This is a test response.', $response);

    // Verify request was made
    Http::assertSent(function ($request) {
        return $request->url() === 'https://api.openai.com/v1/chat/completions'
            && $request['model'] === 'gpt-4';
    });
}

Mocking Vector Stores

php

use Mindwave\Mindwave\Facades\Mindwave;

/** @test */
public function it_searches_brain_for_documents()
{
    // Use array driver for testing (in-memory)
    config(['mindwave-vectorstore.default' => 'array']);

    $brain = Mindwave::brain('test');

    // Populate with test data
    $brain->consume(Document::make('Laravel provides Eloquent ORM'));
    $brain->consume(Document::make('Vue.js is a progressive framework'));

    $results = $brain->search('ORM', count: 1);

    $this->assertCount(1, $results);
    $this->assertStringContainsString('Eloquent', $results[0]->content());
}

4. Testing RAG Systems

Comprehensive testing for retrieval-augmented generation pipelines.

Testing Retrieval Quality

php

class RAGRetrievalTest extends TestCase
{
    protected TntSearchSource $source;

    protected function setUp(): void
    {
        parent::setUp();

        $this->source = TntSearchSource::fromArray([
            'Laravel Eloquent provides an ActiveRecord ORM implementation',
            'Vue.js uses a virtual DOM for efficient rendering',
            'Docker containers package applications with dependencies',
            'Kubernetes orchestrates containerized applications',
            'Laravel routing allows you to define URL patterns',
        ], name: 'test-docs');

        $this->source->initialize();
    }

    /** @test */
    public function it_retrieves_relevant_documents_for_query()
    {
        $results = $this->source->search('ORM database', limit: 2);

        // Should return Eloquent document with high score
        $this->assertGreaterThan(0, $results->count());
        $this->assertStringContainsString('Eloquent', $results->first()->content);
        $this->assertGreaterThan(0.6, $results->first()->score);
    }

    /** @test */
    public function it_ranks_results_by_relevance()
    {
        $results = $this->source->search('container', limit: 3);

        $scores = $results->pluck('score')->toArray();

        // Scores should be in descending order
        $this->assertEquals($scores, array_values(rsort($scores)));

        // Docker should rank higher than Kubernetes for "container" query
        $dockerResult = $results->firstWhere(fn($r) =>
            str_contains($r->content, 'Docker')
        );

        $this->assertNotNull($dockerResult);
        $this->assertGreaterThan(0.5, $dockerResult->score);
    }

    /** @test */
    public function it_returns_empty_for_irrelevant_query()
    {
        $results = $this->source->search('quantum physics', limit: 5);

        // Should return results but with low scores
        if ($results->count() > 0) {
            $this->assertLessThan(0.3, $results->first()->score);
        }
    }

    protected function tearDown(): void
    {
        $this->source->cleanup();
        parent::tearDown();
    }
}

Testing End-to-End RAG

php

class DocumentQATest extends TestCase
{
    /** @test */
    public function it_answers_questions_from_documents()
    {
        // Setup: Use fake LLM with controlled response
        FakeLLM::fake([
            'response' => 'Eloquent is Laravel\'s ORM that provides an ActiveRecord implementation.',
        ]);

        // Populate knowledge base
        $source = TntSearchSource::fromArray([
            'Laravel Eloquent provides an ActiveRecord ORM implementation',
            'Eloquent allows you to interact with databases using models',
            'Each database table has a corresponding Model class',
        ]);

        // Execute RAG pipeline
        $response = Mindwave::prompt()
            ->section('system', 'Answer based on the provided context.')
            ->context($source, query: 'What is Eloquent?', limit: 2)
            ->section('user', 'What is Eloquent ORM?')
            ->run();

        // Assertions
        $this->assertStringContainsString('Eloquent', $response->content);
        $this->assertStringContainsString('ORM', $response->content);

        // Verify context was injected
        $prompt = Mindwave::prompt()
            ->context($source, query: 'What is Eloquent?', limit: 2)
            ->toMessages();

        $this->assertCount(1, array_filter($prompt, fn($m) =>
            str_contains($m['content'], 'Laravel Eloquent')
        ));
    }
}

5. Snapshot Testing

Test LLM outputs against saved snapshots to detect regressions.

Using Spatie's Snapshot Package

bash

composer require --dev spatie/phpunit-snapshot-assertions

php

use Spatie\Snapshots\MatchesSnapshots;

class LLMOutputTest extends TestCase
{
    use MatchesSnapshots;

    /** @test */
    public function it_generates_consistent_summary()
    {
        FakeLLM::fake([
            'response' => 'Laravel is a PHP web framework with expressive syntax.',
        ]);

        $summarizer = new DocumentSummarizer();
        $summary = $summarizer->summarize($this->getSampleDocument());

        // First run creates snapshot, subsequent runs compare
        $this->assertMatchesSnapshot($summary);
    }

    /** @test */
    public function it_generates_consistent_prompt_structure()
    {
        $composer = Mindwave::prompt()
            ->section('system', 'You are helpful')
            ->section('user', 'Hello')
            ->model('gpt-4');

        $messages = $composer->toMessages();

        // Snapshot the prompt structure
        $this->assertMatchesJsonSnapshot($messages);
    }
}

6. Testing with Real APIs (Sparingly)

Occasionally test with real LLM APIs to validate integration.

Conditional Real API Tests

php

/**
 * @group real-api
 * @group slow
 */
class RealAPITest extends TestCase
{
    protected function setUp(): void
    {
        parent::setUp();

        if (!env('RUN_REAL_API_TESTS')) {
            $this->markTestSkipped('Real API tests disabled');
        }

        if (!config('mindwave-llm.llms.openai.api_key')) {
            $this->markTestSkipped('OpenAI API key not configured');
        }
    }

    /** @test */
    public function it_generates_text_with_real_openai()
    {
        config(['mindwave-llm.default' => 'openai']);

        $response = Mindwave::llm()
            ->model('gpt-3.5-turbo') // Use cheap model
            ->generateText('Say "test successful" and nothing else');

        $this->assertStringContainsString('test', strtolower($response));
        $this->assertLessThan(50, strlen($response)); // Should be short
    }

    /** @test */
    public function it_tracks_real_api_costs()
    {
        $trace = null;

        Event::listen(LlmResponseCompleted::class, function($event) use (&$trace) {
            $trace = $event;
        });

        Mindwave::llm()->generateText('Test');

        $this->assertNotNull($trace);
        $this->assertGreaterThan(0, $trace->costEstimate);
        $this->assertGreaterThan(0, $trace->getTotalTokens());
    }
}

Run real API tests selectively:

bash

# Skip real API tests (default)
vendor/bin/phpunit

# Run only real API tests
RUN_REAL_API_TESTS=true vendor/bin/phpunit --group=real-api

# Exclude slow tests in CI
vendor/bin/phpunit --exclude-group=slow

Evaluation Metrics

1. Semantic Similarity

Test if outputs are semantically correct even with different wording.

php

use Mindwave\Mindwave\Facades\Mindwave;

class SemanticTest extends TestCase
{
    /**
     * Check if two texts are semantically similar using embeddings
     */
    protected function assertSemanticallySimilar(
        string $text1,
        string $text2,
        float $threshold = 0.8
    ): void {
        $embedding1 = Mindwave::embeddings()->embedText($text1);
        $embedding2 = Mindwave::embeddings()->embedText($text2);

        $similarity = $this->cosineSimilarity(
            $embedding1->toArray(),
            $embedding2->toArray()
        );

        $this->assertGreaterThan(
            $threshold,
            $similarity,
            "Texts are not semantically similar (similarity: {$similarity})"
        );
    }

    protected function cosineSimilarity(array $a, array $b): float
    {
        $dotProduct = array_sum(array_map(fn($i, $j) => $i * $j, $a, $b));
        $magnitudeA = sqrt(array_sum(array_map(fn($i) => $i * $i, $a)));
        $magnitudeB = sqrt(array_sum(array_map(fn($i) => $i * $i, $b)));

        return $dotProduct / ($magnitudeA * $magnitudeB);
    }

    /** @test */
    public function it_generates_semantically_correct_answer()
    {
        FakeLLM::fake([
            'response' => 'Laravel is a PHP framework for building web applications.',
        ]);

        $answer = $this->askQuestion('What is Laravel?');

        $expectedMeaning = 'Laravel is a PHP web framework';

        // Different words, same meaning should pass
        $this->assertSemanticallySimilar($answer, $expectedMeaning);
    }
}

2. Response Quality Metrics

php

class ResponseQualityTest extends TestCase
{
    /**
     * Measure response quality using multiple metrics
     */
    protected function assertQualityResponse(string $response): void
    {
        // Length check
        $this->assertGreaterThan(10, strlen($response), 'Response too short');
        $this->assertLessThan(5000, strlen($response), 'Response too long');

        // Coherence check - no repeated words
        $words = str_word_count($response, 1);
        $uniqueWords = array_unique($words);
        $repetitionRatio = count($words) > 0 ? count($uniqueWords) / count($words) : 0;
        $this->assertGreaterThan(0.6, $repetitionRatio, 'Too much repetition');

        // Basic structure check
        $this->assertMatchesRegularExpression('/[.!?]$/', $response, 'Should end with punctuation');

        // No common error patterns
        $this->assertStringNotContainsString('I apologize', $response);
        $this->assertStringNotContainsString('I cannot', $response);
    }

    /** @test */
    public function it_generates_quality_response()
    {
        FakeLLM::fake([
            'response' => 'Laravel is a modern PHP framework that provides elegant syntax and powerful features for web development. It includes routing, ORM, authentication, and much more.',
        ]);

        $response = Mindwave::llm()->generateText('Describe Laravel');

        $this->assertQualityResponse($response);
    }
}

3. Retrieval Evaluation

php

class RetrievalEvaluationTest extends TestCase
{
    /**
     * Precision@K: What percentage of retrieved docs are relevant?
     */
    protected function precisionAtK(
        array $retrieved,
        array $relevant,
        int $k
    ): float {
        $topK = array_slice($retrieved, 0, $k);
        $relevantInTopK = array_intersect($topK, $relevant);

        return count($topK) > 0 ? count($relevantInTopK) / count($topK) : 0;
    }

    /**
     * Recall@K: What percentage of relevant docs were retrieved?
     */
    protected function recallAtK(
        array $retrieved,
        array $relevant,
        int $k
    ): float {
        $topK = array_slice($retrieved, 0, $k);
        $relevantInTopK = array_intersect($topK, $relevant);

        return count($relevant) > 0 ? count($relevantInTopK) / count($relevant) : 0;
    }

    /** @test */
    public function it_achieves_good_precision_and_recall()
    {
        $source = TntSearchSource::fromArray([
            'Laravel Eloquent ORM',
            'Laravel Routing System',
            'Vue.js Framework',
            'React Framework',
            'Laravel Blade Templates',
        ]);

        $results = $source->search('Laravel', limit: 5);

        $retrieved = $results->pluck('content')->toArray();
        $relevant = [
            'Laravel Eloquent ORM',
            'Laravel Routing System',
            'Laravel Blade Templates',
        ];

        $precision = $this->precisionAtK($retrieved, $relevant, 3);
        $recall = $this->recallAtK($retrieved, $relevant, 3);

        // At least 80% precision
        $this->assertGreaterThan(0.8, $precision);

        // At least 66% recall (2 out of 3 relevant docs)
        $this->assertGreaterThan(0.66, $recall);
    }
}

Testing Best Practices

1. Use Test Doubles Appropriately

Good Practice:

php

// Unit test - use fake driver
public function test_service_formats_response()
{
    FakeLLM::fake(['response' => 'Test response']);

    $service = new ChatService();
    $formatted = $service->formatResponse('input');

    $this->assertStringStartsWith('[Bot]:', $formatted);
}

// Integration test - use real driver with mocked HTTP
public function test_openai_integration()
{
    Http::fake([...]);
    config(['mindwave-llm.default' => 'openai']);

    $response = Mindwave::llm()->generateText('test');

    $this->assertNotEmpty($response);
}

Bad Practice:

php

// Don't use real API in unit tests
public function test_service_formats_response()
{
    config(['mindwave-llm.default' => 'openai']); // ❌ Slow and costs money

    $service = new ChatService();
    $formatted = $service->formatResponse('input');

    $this->assertStringStartsWith('[Bot]:', $formatted);
}

2. Test Edge Cases

php

class EdgeCaseTest extends TestCase
{
    /** @test */
    public function it_handles_empty_context()
    {
        FakeLLM::fake(['response' => 'No context provided']);

        $response = Mindwave::prompt()
            ->section('system', 'You are helpful')
            ->context('') // Empty context
            ->section('user', 'Question')
            ->run();

        $this->assertNotEmpty($response->content);
    }

    /** @test */
    public function it_handles_very_long_input()
    {
        $longText = str_repeat('Word ', 10000);

        FakeLLM::fake(['response' => 'Processed long text']);

        $composer = Mindwave::prompt()
            ->section('content', $longText, shrinker: 'truncate')
            ->model('gpt-4')
            ->reserveOutputTokens(500)
            ->fit();

        // Should not exceed token limit
        $this->assertLessThanOrEqual(7500, $composer->getTokenCount());
    }

    /** @test */
    public function it_handles_special_characters()
    {
        $special = "Test with émojis 🚀 and spëcial çhars";

        FakeLLM::fake(['response' => 'Processed special chars']);

        $response = Mindwave::llm()->generateText($special);

        $this->assertNotEmpty($response);
    }

    /** @test */
    public function it_handles_api_failures_gracefully()
    {
        Http::fake([
            'api.openai.com/*' => Http::response(null, 500),
        ]);

        config(['mindwave-llm.default' => 'openai']);

        $this->expectException(\Exception::class);

        Mindwave::llm()->generateText('test');
    }
}

3. Test Cost Tracking

php

class CostTrackingTest extends TestCase
{
    /** @test */
    public function it_tracks_token_usage()
    {
        FakeLLM::fake([
            'response' => 'Test response',
            'usage' => [
                'prompt_tokens' => 100,
                'completion_tokens' => 50,
                'total_tokens' => 150,
            ],
        ]);

        $trace = null;
        Event::listen(LlmResponseCompleted::class, function($event) use (&$trace) {
            $trace = $event;
        });

        Mindwave::llm()->generateText('Test prompt');

        $this->assertEquals(100, $trace->getInputTokens());
        $this->assertEquals(50, $trace->getOutputTokens());
        $this->assertEquals(150, $trace->getTotalTokens());
    }

    /** @test */
    public function it_estimates_costs_correctly()
    {
        config([
            'mindwave-tracing.cost_estimation.enabled' => true,
            'mindwave-tracing.cost_estimation.pricing.openai.gpt-4' => [
                'input' => 0.03,
                'output' => 0.06,
            ],
        ]);

        FakeLLM::fake([
            'response' => 'Test',
            'usage' => [
                'prompt_tokens' => 1000,
                'completion_tokens' => 500,
                'total_tokens' => 1500,
            ],
        ]);

        $trace = null;
        Event::listen(LlmResponseCompleted::class, function($event) use (&$trace) {
            $trace = $event;
        });

        Mindwave::llm()->model('gpt-4')->generateText('Test');

        // (1000 / 1000 * 0.03) + (500 / 1000 * 0.06) = 0.03 + 0.03 = 0.06
        $this->assertEquals(0.06, $trace->costEstimate);
    }
}

4. Database Testing for Traces

php

use Illuminate\Foundation\Testing\RefreshDatabase;
use Mindwave\Mindwave\Observability\Models\Trace;
use Mindwave\Mindwave\Observability\Models\Span;

class TraceStorageTest extends TestCase
{
    use RefreshDatabase;

    /** @test */
    public function it_stores_traces_in_database()
    {
        config(['mindwave-tracing.database.enabled' => true]);

        FakeLLM::fake(['response' => 'Test']);

        Mindwave::llm()->generateText('Test prompt');

        $this->assertDatabaseCount('mindwave_traces', 1);
        $this->assertDatabaseCount('mindwave_spans', 1);

        $trace = Trace::first();
        $this->assertNotNull($trace->trace_id);
        $this->assertGreaterThan(0, $trace->total_input_tokens);
    }

    /** @test */
    public function it_queries_expensive_traces()
    {
        Trace::factory()->create(['estimated_cost' => 0.05]);
        Trace::factory()->create(['estimated_cost' => 0.15]);
        Trace::factory()->create(['estimated_cost' => 0.25]);

        $expensive = Trace::expensive(0.10)->get();

        $this->assertCount(2, $expensive);
    }
}

CI/CD Integration

GitHub Actions Example

yaml

# .github/workflows/test.yml
name: Tests

on: [push, pull_request]

jobs:
    test:
        runs-on: ubuntu-latest

        services:
            mysql:
                image: mysql:8.0
                env:
                    MYSQL_ROOT_PASSWORD: password
                    MYSQL_DATABASE: mindwave_test
                ports:
                    - 3306:3306
                options: --health-cmd="mysqladmin ping" --health-interval=10s

            qdrant:
                image: qdrant/qdrant:latest
                ports:
                    - 6333:6333

        steps:
            - uses: actions/checkout@v3

            - name: Setup PHP
              uses: shivammathur/setup-php@v2
              with:
                  php-version: '8.2'
                  extensions: mbstring, pdo_mysql
                  coverage: xdebug

            - name: Install Dependencies
              run: composer install --prefer-dist

            - name: Copy Environment
              run: cp .env.ci .env

            - name: Run Migrations
              run: php artisan migrate --force

            - name: Run Tests
              env:
                  DB_CONNECTION: mysql
                  DB_HOST: 127.0.0.1
                  DB_PORT: 3306
                  DB_DATABASE: mindwave_test
                  DB_USERNAME: root
                  DB_PASSWORD: password
                  MINDWAVE_LLM: fake
                  MINDWAVE_VECTORSTORE: qdrant
                  MINDWAVE_QDRANT_HOST: localhost
                  MINDWAVE_QDRANT_PORT: 6333
                  MINDWAVE_TRACING_ENABLED: true
                  MINDWAVE_TRACE_DATABASE: true
              run: vendor/bin/phpunit --coverage-clover coverage.xml

            - name: Upload Coverage
              uses: codecov/codecov-action@v3
              with:
                  files: ./coverage.xml

Laravel Dusk for E2E Testing

php

use Laravel\Dusk\Browser;
use Tests\DuskTestCase;

class ChatbotE2ETest extends DuskTestCase
{
    /**
     * @group e2e
     */
    public function test_user_can_chat_with_bot()
    {
        FakeLLM::fake([
            ['response' => 'Hello! How can I help you?'],
            ['response' => 'Laravel is a PHP framework.'],
        ]);

        $this->browse(function (Browser $browser) {
            $browser->visit('/chat')
                    ->type('message', 'Hi')
                    ->press('Send')
                    ->waitForText('Hello! How can I help you?')
                    ->type('message', 'What is Laravel?')
                    ->press('Send')
                    ->waitForText('Laravel is a PHP framework');
        });
    }
}

Common Testing Pitfalls

1. Not Cleaning Up TNTSearch Indexes

Problem:

php

/** @test */
public function it_searches_documents()
{
    $source = TntSearchSource::fromArray([...]);
    $results = $source->search('query');

    // ❌ Index file left behind
}

Solution:

php

/** @test */
public function it_searches_documents()
{
    $source = TntSearchSource::fromArray([...]);
    $source->initialize();

    try {
        $results = $source->search('query');
        $this->assertNotEmpty($results);
    } finally {
        $source->cleanup(); // ✅ Always cleanup
    }
}

2. Forgetting to Initialize Sources

Problem:

php

/** @test */
public function it_searches()
{
    $source = TntSearchSource::fromArray([...]);
    $results = $source->search('query'); // ❌ Not initialized
}

Solution:

php

/** @test */
public function it_searches()
{
    $source = TntSearchSource::fromArray([...]);
    $source->initialize(); // ✅ Initialize first
    $results = $source->search('query');
}

3. Testing with Production Credentials

Problem:

php

// .env.testing
MINDWAVE_OPENAI_API_KEY=sk-proj-real-production-key  # ❌ Dangerous!

Solution:

php

// .env.testing
MINDWAVE_LLM=fake  # ✅ Use fake driver
MINDWAVE_OPENAI_API_KEY=sk-test-fake-key

4. Not Testing Token Limits

Problem:

php

/** @test */
public function it_processes_large_document()
{
    $huge = str_repeat('word ', 100000);
    $response = Mindwave::llm()->generateText($huge);

    // ❌ May exceed token limit in production
}

Solution:

php

/** @test */
public function it_handles_large_documents()
{
    $huge = str_repeat('word ', 100000);

    $composer = Mindwave::prompt()
        ->section('content', $huge, shrinker: 'truncate')
        ->model('gpt-4')
        ->reserveOutputTokens(500)
        ->fit();

    // ✅ Verify it fits
    $this->assertLessThanOrEqual(7500, $composer->getTokenCount());
}

Troubleshooting Tests

Tests Failing Intermittently

Cause: Race conditions or non-deterministic LLM outputs

Solution:

php

// Use fake driver for deterministic tests
FakeLLM::fake(['response' => 'Consistent response']);

// Or use seeded randomness
config(['mindwave-llm.llms.mistral.random_seed' => 42]);

High Memory Usage in Tests

Cause: Large context or many test iterations

Solution:

php

protected function tearDown(): void
{
    // Clear large objects
    unset($this->largeContext);

    // Cleanup indexes
    if (isset($this->source)) {
        $this->source->cleanup();
    }

    parent::tearDown();
}

Slow Test Suite

Cause: Too many real API calls or expensive operations

Solution:

bash

# Skip slow tests by default
vendor/bin/phpunit --exclude-group=slow,real-api

# Run fast tests in parallel
vendor/bin/paratest --processes=4

Summary

Effective testing strategies for Mindwave applications:

Use the fake driver for fast, deterministic unit tests
Mock external APIs for integration tests
Test RAG pipelines with real retrieval but fake generation
Measure quality with semantic similarity and custom metrics
Test edge cases including errors and limits
Automate in CI/CD with proper isolation
Use real APIs sparingly for critical integration validation

Key Takeaway: Good tests balance speed, reliability, and cost. Favor fakes for unit tests, use mocks for integration tests, and reserve real API tests for critical paths only.

Testing AI Applications ​

Overview ​

Testing Strategies ​

1. Unit Testing with Fake Driver ​

Basic Setup ​

Controlling Fake Responses ​

Testing Multiple Calls ​

2. Integration Testing ​

Testing PromptComposer ​

Testing Context Discovery ​

3. Mocking External APIs ​

Using HTTP Fake ​

Mocking Vector Stores ​

4. Testing RAG Systems ​

Testing Retrieval Quality ​

Testing End-to-End RAG ​

5. Snapshot Testing ​

Using Spatie's Snapshot Package ​

6. Testing with Real APIs (Sparingly) ​

Conditional Real API Tests ​

Evaluation Metrics ​

1. Semantic Similarity ​

2. Response Quality Metrics ​

3. Retrieval Evaluation ​

Testing Best Practices ​

1. Use Test Doubles Appropriately ​

2. Test Edge Cases ​

3. Test Cost Tracking ​

4. Database Testing for Traces ​

CI/CD Integration ​

GitHub Actions Example ​

Laravel Dusk for E2E Testing ​

Common Testing Pitfalls ​

1. Not Cleaning Up TNTSearch Indexes ​

2. Forgetting to Initialize Sources ​

3. Testing with Production Credentials ​

4. Not Testing Token Limits ​

Troubleshooting Tests ​

Tests Failing Intermittently ​

High Memory Usage in Tests ​

Slow Test Suite ​

Summary ​

Related Documentation ​

Testing AI Applications

Overview

Testing Strategies

1. Unit Testing with Fake Driver

Basic Setup

Controlling Fake Responses

Testing Multiple Calls

2. Integration Testing

Testing PromptComposer

Testing Context Discovery

3. Mocking External APIs

Using HTTP Fake

Mocking Vector Stores

4. Testing RAG Systems

Testing Retrieval Quality

Testing End-to-End RAG

5. Snapshot Testing

Using Spatie's Snapshot Package

6. Testing with Real APIs (Sparingly)

Conditional Real API Tests

Evaluation Metrics

1. Semantic Similarity

2. Response Quality Metrics

3. Retrieval Evaluation

Testing Best Practices

1. Use Test Doubles Appropriately

2. Test Edge Cases

3. Test Cost Tracking

4. Database Testing for Traces

CI/CD Integration

GitHub Actions Example

Laravel Dusk for E2E Testing

Common Testing Pitfalls

1. Not Cleaning Up TNTSearch Indexes

2. Forgetting to Initialize Sources

3. Testing with Production Credentials

4. Not Testing Token Limits

Troubleshooting Tests

Tests Failing Intermittently

High Memory Usage in Tests

Slow Test Suite

Summary

Related Documentation