Testing AI Applications
Testing LLM-powered applications requires different strategies than traditional software testing. This guide covers comprehensive testing approaches for Mindwave applications, from unit tests to end-to-end RAG system validation.
Overview
AI application testing presents unique challenges:
- Non-deterministic outputs - LLMs can produce different responses for identical inputs
- Complex dependencies - RAG systems involve embeddings, vector stores, and retrieval pipelines
- Cost considerations - Real API calls during testing add up quickly
- Quality metrics - Traditional assertions don't capture semantic correctness
This guide provides practical patterns for:
- Unit Testing - Test individual components in isolation
- Integration Testing - Test component interactions without real APIs
- Mocking Strategies - Simulate LLM responses efficiently
- RAG Testing - Validate retrieval quality and accuracy
- Evaluation Metrics - Measure semantic correctness
- CI/CD Integration - Automate testing in your pipeline
Testing Strategies
1. Unit Testing with Fake Driver
The fake LLM driver allows fast, deterministic testing without API calls.
Basic Setup
<?php
namespace Tests\Feature;
use Tests\TestCase;
use Mindwave\Mindwave\Facades\Mindwave;
use Mindwave\Mindwave\Testing\Fakes\FakeLLM;
class ChatbotTest extends TestCase
{
protected function setUp(): void
{
parent::setUp();
// Use fake driver for testing
config(['mindwave-llm.default' => 'fake']);
}
/** @test */
public function it_generates_a_greeting()
{
$response = Mindwave::llm()
->generateText('Say hello');
$this->assertNotEmpty($response);
$this->assertIsString($response);
}
}Controlling Fake Responses
use Mindwave\Mindwave\Testing\Fakes\FakeLLM;
/** @test */
public function it_summarizes_text()
{
// Set a specific fake response
FakeLLM::fake([
'response' => 'This is a test summary.',
'usage' => [
'prompt_tokens' => 50,
'completion_tokens' => 10,
'total_tokens' => 60,
],
]);
$service = new DocumentSummarizer();
$summary = $service->summarize('Long document text...');
$this->assertEquals('This is a test summary.', $summary);
}Testing Multiple Calls
/** @test */
public function it_handles_conversation()
{
// Queue multiple fake responses
FakeLLM::fake([
['response' => 'Hello! How can I help you?'],
['response' => 'I can answer questions about Laravel.'],
['response' => 'Laravel is a PHP framework.'],
]);
$chatbot = new Chatbot();
$response1 = $chatbot->sendMessage('Hi');
$response2 = $chatbot->sendMessage('What can you do?');
$response3 = $chatbot->sendMessage('What is Laravel?');
$this->assertEquals('Hello! How can I help you?', $response1);
$this->assertEquals('I can answer questions about Laravel.', $response2);
$this->assertEquals('Laravel is a PHP framework.', $response3);
}2. Integration Testing
Test component interactions while mocking expensive external calls.
Testing PromptComposer
use Mindwave\Mindwave\Facades\Mindwave;
use Mindwave\Mindwave\Testing\Fakes\FakeLLM;
/** @test */
public function it_builds_prompt_with_context()
{
FakeLLM::fake();
$composer = Mindwave::prompt()
->section('system', 'You are helpful')
->section('context', $this->getSampleContext())
->section('user', 'What is Laravel?')
->model('gpt-4');
// Assert prompt structure
$messages = $composer->toMessages();
$this->assertCount(3, $messages);
$this->assertEquals('system', $messages[0]['role']);
$this->assertEquals('You are helpful', $messages[0]['content']);
}
/** @test */
public function it_fits_prompt_to_token_limit()
{
FakeLLM::fake();
$largeContext = str_repeat('Context information. ', 1000);
$composer = Mindwave::prompt()
->section('system', 'You are helpful', priority: 100)
->section('context', $largeContext, priority: 50, shrinker: 'truncate')
->section('user', 'Question?', priority: 100)
->model('gpt-4')
->reserveOutputTokens(1000)
->fit();
$tokenCount = $composer->getTokenCount();
// GPT-4 has 8K context, minus 1K reserved = 7K max
$this->assertLessThanOrEqual(7000, $tokenCount);
}Testing Context Discovery
use Mindwave\Mindwave\Context\Sources\TntSearch\TntSearchSource;
/** @test */
public function it_retrieves_relevant_documents()
{
$source = TntSearchSource::fromArray([
'Laravel is a PHP web framework',
'Vue.js is a JavaScript framework',
'Docker is a containerization platform',
]);
$source->initialize();
$results = $source->search('PHP framework', limit: 3);
$this->assertGreaterThan(0, $results->count());
$this->assertStringContainsString('Laravel', $results->first()->content);
$this->assertGreaterThan(0.5, $results->first()->score);
}
/** @test */
public function it_deduplicates_pipeline_results()
{
$source1 = TntSearchSource::fromArray([
'Laravel is a PHP framework',
'Laravel provides Eloquent ORM',
], name: 'source1');
$source2 = TntSearchSource::fromArray([
'Laravel is a PHP framework', // Duplicate
'Laravel uses Blade templates',
], name: 'source2');
$pipeline = (new ContextPipeline)
->addSource($source1)
->addSource($source2)
->deduplicate(true);
$results = $pipeline->search('Laravel', limit: 10);
// Should have 3 unique results, not 4
$this->assertCount(3, $results);
}3. Mocking External APIs
Mock LLM provider APIs for integration tests that need realistic behavior.
Using HTTP Fake
use Illuminate\Support\Facades\Http;
/** @test */
public function it_handles_openai_api_response()
{
Http::fake([
'api.openai.com/*' => Http::response([
'id' => 'chatcmpl-test',
'object' => 'chat.completion',
'created' => time(),
'model' => 'gpt-4',
'choices' => [
[
'index' => 0,
'message' => [
'role' => 'assistant',
'content' => 'This is a test response.',
],
'finish_reason' => 'stop',
],
],
'usage' => [
'prompt_tokens' => 20,
'completion_tokens' => 10,
'total_tokens' => 30,
],
], 200),
]);
config(['mindwave-llm.default' => 'openai']);
$response = Mindwave::llm()
->generateText('Test prompt');
$this->assertEquals('This is a test response.', $response);
// Verify request was made
Http::assertSent(function ($request) {
return $request->url() === 'https://api.openai.com/v1/chat/completions'
&& $request['model'] === 'gpt-4';
});
}Mocking Vector Stores
use Mindwave\Mindwave\Facades\Mindwave;
/** @test */
public function it_searches_brain_for_documents()
{
// Use array driver for testing (in-memory)
config(['mindwave-vectorstore.default' => 'array']);
$brain = Mindwave::brain('test');
// Populate with test data
$brain->consume(Document::make('Laravel provides Eloquent ORM'));
$brain->consume(Document::make('Vue.js is a progressive framework'));
$results = $brain->search('ORM', count: 1);
$this->assertCount(1, $results);
$this->assertStringContainsString('Eloquent', $results[0]->content());
}4. Testing RAG Systems
Comprehensive testing for retrieval-augmented generation pipelines.
Testing Retrieval Quality
class RAGRetrievalTest extends TestCase
{
protected TntSearchSource $source;
protected function setUp(): void
{
parent::setUp();
$this->source = TntSearchSource::fromArray([
'Laravel Eloquent provides an ActiveRecord ORM implementation',
'Vue.js uses a virtual DOM for efficient rendering',
'Docker containers package applications with dependencies',
'Kubernetes orchestrates containerized applications',
'Laravel routing allows you to define URL patterns',
], name: 'test-docs');
$this->source->initialize();
}
/** @test */
public function it_retrieves_relevant_documents_for_query()
{
$results = $this->source->search('ORM database', limit: 2);
// Should return Eloquent document with high score
$this->assertGreaterThan(0, $results->count());
$this->assertStringContainsString('Eloquent', $results->first()->content);
$this->assertGreaterThan(0.6, $results->first()->score);
}
/** @test */
public function it_ranks_results_by_relevance()
{
$results = $this->source->search('container', limit: 3);
$scores = $results->pluck('score')->toArray();
// Scores should be in descending order
$this->assertEquals($scores, array_values(rsort($scores)));
// Docker should rank higher than Kubernetes for "container" query
$dockerResult = $results->firstWhere(fn($r) =>
str_contains($r->content, 'Docker')
);
$this->assertNotNull($dockerResult);
$this->assertGreaterThan(0.5, $dockerResult->score);
}
/** @test */
public function it_returns_empty_for_irrelevant_query()
{
$results = $this->source->search('quantum physics', limit: 5);
// Should return results but with low scores
if ($results->count() > 0) {
$this->assertLessThan(0.3, $results->first()->score);
}
}
protected function tearDown(): void
{
$this->source->cleanup();
parent::tearDown();
}
}Testing End-to-End RAG
class DocumentQATest extends TestCase
{
/** @test */
public function it_answers_questions_from_documents()
{
// Setup: Use fake LLM with controlled response
FakeLLM::fake([
'response' => 'Eloquent is Laravel\'s ORM that provides an ActiveRecord implementation.',
]);
// Populate knowledge base
$source = TntSearchSource::fromArray([
'Laravel Eloquent provides an ActiveRecord ORM implementation',
'Eloquent allows you to interact with databases using models',
'Each database table has a corresponding Model class',
]);
// Execute RAG pipeline
$response = Mindwave::prompt()
->section('system', 'Answer based on the provided context.')
->context($source, query: 'What is Eloquent?', limit: 2)
->section('user', 'What is Eloquent ORM?')
->run();
// Assertions
$this->assertStringContainsString('Eloquent', $response->content);
$this->assertStringContainsString('ORM', $response->content);
// Verify context was injected
$prompt = Mindwave::prompt()
->context($source, query: 'What is Eloquent?', limit: 2)
->toMessages();
$this->assertCount(1, array_filter($prompt, fn($m) =>
str_contains($m['content'], 'Laravel Eloquent')
));
}
}5. Snapshot Testing
Test LLM outputs against saved snapshots to detect regressions.
Using Spatie's Snapshot Package
composer require --dev spatie/phpunit-snapshot-assertionsuse Spatie\Snapshots\MatchesSnapshots;
class LLMOutputTest extends TestCase
{
use MatchesSnapshots;
/** @test */
public function it_generates_consistent_summary()
{
FakeLLM::fake([
'response' => 'Laravel is a PHP web framework with expressive syntax.',
]);
$summarizer = new DocumentSummarizer();
$summary = $summarizer->summarize($this->getSampleDocument());
// First run creates snapshot, subsequent runs compare
$this->assertMatchesSnapshot($summary);
}
/** @test */
public function it_generates_consistent_prompt_structure()
{
$composer = Mindwave::prompt()
->section('system', 'You are helpful')
->section('user', 'Hello')
->model('gpt-4');
$messages = $composer->toMessages();
// Snapshot the prompt structure
$this->assertMatchesJsonSnapshot($messages);
}
}6. Testing with Real APIs (Sparingly)
Occasionally test with real LLM APIs to validate integration.
Conditional Real API Tests
/**
* @group real-api
* @group slow
*/
class RealAPITest extends TestCase
{
protected function setUp(): void
{
parent::setUp();
if (!env('RUN_REAL_API_TESTS')) {
$this->markTestSkipped('Real API tests disabled');
}
if (!config('mindwave-llm.llms.openai.api_key')) {
$this->markTestSkipped('OpenAI API key not configured');
}
}
/** @test */
public function it_generates_text_with_real_openai()
{
config(['mindwave-llm.default' => 'openai']);
$response = Mindwave::llm()
->model('gpt-3.5-turbo') // Use cheap model
->generateText('Say "test successful" and nothing else');
$this->assertStringContainsString('test', strtolower($response));
$this->assertLessThan(50, strlen($response)); // Should be short
}
/** @test */
public function it_tracks_real_api_costs()
{
$trace = null;
Event::listen(LlmResponseCompleted::class, function($event) use (&$trace) {
$trace = $event;
});
Mindwave::llm()->generateText('Test');
$this->assertNotNull($trace);
$this->assertGreaterThan(0, $trace->costEstimate);
$this->assertGreaterThan(0, $trace->getTotalTokens());
}
}Run real API tests selectively:
# Skip real API tests (default)
vendor/bin/phpunit
# Run only real API tests
RUN_REAL_API_TESTS=true vendor/bin/phpunit --group=real-api
# Exclude slow tests in CI
vendor/bin/phpunit --exclude-group=slowEvaluation Metrics
1. Semantic Similarity
Test if outputs are semantically correct even with different wording.
use Mindwave\Mindwave\Facades\Mindwave;
class SemanticTest extends TestCase
{
/**
* Check if two texts are semantically similar using embeddings
*/
protected function assertSemanticallySimilar(
string $text1,
string $text2,
float $threshold = 0.8
): void {
$embedding1 = Mindwave::embeddings()->embedText($text1);
$embedding2 = Mindwave::embeddings()->embedText($text2);
$similarity = $this->cosineSimilarity(
$embedding1->toArray(),
$embedding2->toArray()
);
$this->assertGreaterThan(
$threshold,
$similarity,
"Texts are not semantically similar (similarity: {$similarity})"
);
}
protected function cosineSimilarity(array $a, array $b): float
{
$dotProduct = array_sum(array_map(fn($i, $j) => $i * $j, $a, $b));
$magnitudeA = sqrt(array_sum(array_map(fn($i) => $i * $i, $a)));
$magnitudeB = sqrt(array_sum(array_map(fn($i) => $i * $i, $b)));
return $dotProduct / ($magnitudeA * $magnitudeB);
}
/** @test */
public function it_generates_semantically_correct_answer()
{
FakeLLM::fake([
'response' => 'Laravel is a PHP framework for building web applications.',
]);
$answer = $this->askQuestion('What is Laravel?');
$expectedMeaning = 'Laravel is a PHP web framework';
// Different words, same meaning should pass
$this->assertSemanticallySimilar($answer, $expectedMeaning);
}
}2. Response Quality Metrics
class ResponseQualityTest extends TestCase
{
/**
* Measure response quality using multiple metrics
*/
protected function assertQualityResponse(string $response): void
{
// Length check
$this->assertGreaterThan(10, strlen($response), 'Response too short');
$this->assertLessThan(5000, strlen($response), 'Response too long');
// Coherence check - no repeated words
$words = str_word_count($response, 1);
$uniqueWords = array_unique($words);
$repetitionRatio = count($words) > 0 ? count($uniqueWords) / count($words) : 0;
$this->assertGreaterThan(0.6, $repetitionRatio, 'Too much repetition');
// Basic structure check
$this->assertMatchesRegularExpression('/[.!?]$/', $response, 'Should end with punctuation');
// No common error patterns
$this->assertStringNotContainsString('I apologize', $response);
$this->assertStringNotContainsString('I cannot', $response);
}
/** @test */
public function it_generates_quality_response()
{
FakeLLM::fake([
'response' => 'Laravel is a modern PHP framework that provides elegant syntax and powerful features for web development. It includes routing, ORM, authentication, and much more.',
]);
$response = Mindwave::llm()->generateText('Describe Laravel');
$this->assertQualityResponse($response);
}
}3. Retrieval Evaluation
class RetrievalEvaluationTest extends TestCase
{
/**
* Precision@K: What percentage of retrieved docs are relevant?
*/
protected function precisionAtK(
array $retrieved,
array $relevant,
int $k
): float {
$topK = array_slice($retrieved, 0, $k);
$relevantInTopK = array_intersect($topK, $relevant);
return count($topK) > 0 ? count($relevantInTopK) / count($topK) : 0;
}
/**
* Recall@K: What percentage of relevant docs were retrieved?
*/
protected function recallAtK(
array $retrieved,
array $relevant,
int $k
): float {
$topK = array_slice($retrieved, 0, $k);
$relevantInTopK = array_intersect($topK, $relevant);
return count($relevant) > 0 ? count($relevantInTopK) / count($relevant) : 0;
}
/** @test */
public function it_achieves_good_precision_and_recall()
{
$source = TntSearchSource::fromArray([
'Laravel Eloquent ORM',
'Laravel Routing System',
'Vue.js Framework',
'React Framework',
'Laravel Blade Templates',
]);
$results = $source->search('Laravel', limit: 5);
$retrieved = $results->pluck('content')->toArray();
$relevant = [
'Laravel Eloquent ORM',
'Laravel Routing System',
'Laravel Blade Templates',
];
$precision = $this->precisionAtK($retrieved, $relevant, 3);
$recall = $this->recallAtK($retrieved, $relevant, 3);
// At least 80% precision
$this->assertGreaterThan(0.8, $precision);
// At least 66% recall (2 out of 3 relevant docs)
$this->assertGreaterThan(0.66, $recall);
}
}Testing Best Practices
1. Use Test Doubles Appropriately
Good Practice:
// Unit test - use fake driver
public function test_service_formats_response()
{
FakeLLM::fake(['response' => 'Test response']);
$service = new ChatService();
$formatted = $service->formatResponse('input');
$this->assertStringStartsWith('[Bot]:', $formatted);
}
// Integration test - use real driver with mocked HTTP
public function test_openai_integration()
{
Http::fake([...]);
config(['mindwave-llm.default' => 'openai']);
$response = Mindwave::llm()->generateText('test');
$this->assertNotEmpty($response);
}Bad Practice:
// Don't use real API in unit tests
public function test_service_formats_response()
{
config(['mindwave-llm.default' => 'openai']); // ❌ Slow and costs money
$service = new ChatService();
$formatted = $service->formatResponse('input');
$this->assertStringStartsWith('[Bot]:', $formatted);
}2. Test Edge Cases
class EdgeCaseTest extends TestCase
{
/** @test */
public function it_handles_empty_context()
{
FakeLLM::fake(['response' => 'No context provided']);
$response = Mindwave::prompt()
->section('system', 'You are helpful')
->context('') // Empty context
->section('user', 'Question')
->run();
$this->assertNotEmpty($response->content);
}
/** @test */
public function it_handles_very_long_input()
{
$longText = str_repeat('Word ', 10000);
FakeLLM::fake(['response' => 'Processed long text']);
$composer = Mindwave::prompt()
->section('content', $longText, shrinker: 'truncate')
->model('gpt-4')
->reserveOutputTokens(500)
->fit();
// Should not exceed token limit
$this->assertLessThanOrEqual(7500, $composer->getTokenCount());
}
/** @test */
public function it_handles_special_characters()
{
$special = "Test with émojis 🚀 and spëcial çhars";
FakeLLM::fake(['response' => 'Processed special chars']);
$response = Mindwave::llm()->generateText($special);
$this->assertNotEmpty($response);
}
/** @test */
public function it_handles_api_failures_gracefully()
{
Http::fake([
'api.openai.com/*' => Http::response(null, 500),
]);
config(['mindwave-llm.default' => 'openai']);
$this->expectException(\Exception::class);
Mindwave::llm()->generateText('test');
}
}3. Test Cost Tracking
class CostTrackingTest extends TestCase
{
/** @test */
public function it_tracks_token_usage()
{
FakeLLM::fake([
'response' => 'Test response',
'usage' => [
'prompt_tokens' => 100,
'completion_tokens' => 50,
'total_tokens' => 150,
],
]);
$trace = null;
Event::listen(LlmResponseCompleted::class, function($event) use (&$trace) {
$trace = $event;
});
Mindwave::llm()->generateText('Test prompt');
$this->assertEquals(100, $trace->getInputTokens());
$this->assertEquals(50, $trace->getOutputTokens());
$this->assertEquals(150, $trace->getTotalTokens());
}
/** @test */
public function it_estimates_costs_correctly()
{
config([
'mindwave-tracing.cost_estimation.enabled' => true,
'mindwave-tracing.cost_estimation.pricing.openai.gpt-4' => [
'input' => 0.03,
'output' => 0.06,
],
]);
FakeLLM::fake([
'response' => 'Test',
'usage' => [
'prompt_tokens' => 1000,
'completion_tokens' => 500,
'total_tokens' => 1500,
],
]);
$trace = null;
Event::listen(LlmResponseCompleted::class, function($event) use (&$trace) {
$trace = $event;
});
Mindwave::llm()->model('gpt-4')->generateText('Test');
// (1000 / 1000 * 0.03) + (500 / 1000 * 0.06) = 0.03 + 0.03 = 0.06
$this->assertEquals(0.06, $trace->costEstimate);
}
}4. Database Testing for Traces
use Illuminate\Foundation\Testing\RefreshDatabase;
use Mindwave\Mindwave\Observability\Models\Trace;
use Mindwave\Mindwave\Observability\Models\Span;
class TraceStorageTest extends TestCase
{
use RefreshDatabase;
/** @test */
public function it_stores_traces_in_database()
{
config(['mindwave-tracing.database.enabled' => true]);
FakeLLM::fake(['response' => 'Test']);
Mindwave::llm()->generateText('Test prompt');
$this->assertDatabaseCount('mindwave_traces', 1);
$this->assertDatabaseCount('mindwave_spans', 1);
$trace = Trace::first();
$this->assertNotNull($trace->trace_id);
$this->assertGreaterThan(0, $trace->total_input_tokens);
}
/** @test */
public function it_queries_expensive_traces()
{
Trace::factory()->create(['estimated_cost' => 0.05]);
Trace::factory()->create(['estimated_cost' => 0.15]);
Trace::factory()->create(['estimated_cost' => 0.25]);
$expensive = Trace::expensive(0.10)->get();
$this->assertCount(2, $expensive);
}
}CI/CD Integration
GitHub Actions Example
# .github/workflows/test.yml
name: Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
services:
mysql:
image: mysql:8.0
env:
MYSQL_ROOT_PASSWORD: password
MYSQL_DATABASE: mindwave_test
ports:
- 3306:3306
options: --health-cmd="mysqladmin ping" --health-interval=10s
qdrant:
image: qdrant/qdrant:latest
ports:
- 6333:6333
steps:
- uses: actions/checkout@v3
- name: Setup PHP
uses: shivammathur/setup-php@v2
with:
php-version: '8.2'
extensions: mbstring, pdo_mysql
coverage: xdebug
- name: Install Dependencies
run: composer install --prefer-dist
- name: Copy Environment
run: cp .env.ci .env
- name: Run Migrations
run: php artisan migrate --force
- name: Run Tests
env:
DB_CONNECTION: mysql
DB_HOST: 127.0.0.1
DB_PORT: 3306
DB_DATABASE: mindwave_test
DB_USERNAME: root
DB_PASSWORD: password
MINDWAVE_LLM: fake
MINDWAVE_VECTORSTORE: qdrant
MINDWAVE_QDRANT_HOST: localhost
MINDWAVE_QDRANT_PORT: 6333
MINDWAVE_TRACING_ENABLED: true
MINDWAVE_TRACE_DATABASE: true
run: vendor/bin/phpunit --coverage-clover coverage.xml
- name: Upload Coverage
uses: codecov/codecov-action@v3
with:
files: ./coverage.xmlLaravel Dusk for E2E Testing
use Laravel\Dusk\Browser;
use Tests\DuskTestCase;
class ChatbotE2ETest extends DuskTestCase
{
/**
* @group e2e
*/
public function test_user_can_chat_with_bot()
{
FakeLLM::fake([
['response' => 'Hello! How can I help you?'],
['response' => 'Laravel is a PHP framework.'],
]);
$this->browse(function (Browser $browser) {
$browser->visit('/chat')
->type('message', 'Hi')
->press('Send')
->waitForText('Hello! How can I help you?')
->type('message', 'What is Laravel?')
->press('Send')
->waitForText('Laravel is a PHP framework');
});
}
}Common Testing Pitfalls
1. Not Cleaning Up TNTSearch Indexes
Problem:
/** @test */
public function it_searches_documents()
{
$source = TntSearchSource::fromArray([...]);
$results = $source->search('query');
// ❌ Index file left behind
}Solution:
/** @test */
public function it_searches_documents()
{
$source = TntSearchSource::fromArray([...]);
$source->initialize();
try {
$results = $source->search('query');
$this->assertNotEmpty($results);
} finally {
$source->cleanup(); // ✅ Always cleanup
}
}2. Forgetting to Initialize Sources
Problem:
/** @test */
public function it_searches()
{
$source = TntSearchSource::fromArray([...]);
$results = $source->search('query'); // ❌ Not initialized
}Solution:
/** @test */
public function it_searches()
{
$source = TntSearchSource::fromArray([...]);
$source->initialize(); // ✅ Initialize first
$results = $source->search('query');
}3. Testing with Production Credentials
Problem:
// .env.testing
MINDWAVE_OPENAI_API_KEY=sk-proj-real-production-key # ❌ Dangerous!Solution:
// .env.testing
MINDWAVE_LLM=fake # ✅ Use fake driver
MINDWAVE_OPENAI_API_KEY=sk-test-fake-key4. Not Testing Token Limits
Problem:
/** @test */
public function it_processes_large_document()
{
$huge = str_repeat('word ', 100000);
$response = Mindwave::llm()->generateText($huge);
// ❌ May exceed token limit in production
}Solution:
/** @test */
public function it_handles_large_documents()
{
$huge = str_repeat('word ', 100000);
$composer = Mindwave::prompt()
->section('content', $huge, shrinker: 'truncate')
->model('gpt-4')
->reserveOutputTokens(500)
->fit();
// ✅ Verify it fits
$this->assertLessThanOrEqual(7500, $composer->getTokenCount());
}Troubleshooting Tests
Tests Failing Intermittently
Cause: Race conditions or non-deterministic LLM outputs
Solution:
// Use fake driver for deterministic tests
FakeLLM::fake(['response' => 'Consistent response']);
// Or use seeded randomness
config(['mindwave-llm.llms.mistral.random_seed' => 42]);High Memory Usage in Tests
Cause: Large context or many test iterations
Solution:
protected function tearDown(): void
{
// Clear large objects
unset($this->largeContext);
// Cleanup indexes
if (isset($this->source)) {
$this->source->cleanup();
}
parent::tearDown();
}Slow Test Suite
Cause: Too many real API calls or expensive operations
Solution:
# Skip slow tests by default
vendor/bin/phpunit --exclude-group=slow,real-api
# Run fast tests in parallel
vendor/bin/paratest --processes=4Summary
Effective testing strategies for Mindwave applications:
- Use the fake driver for fast, deterministic unit tests
- Mock external APIs for integration tests
- Test RAG pipelines with real retrieval but fake generation
- Measure quality with semantic similarity and custom metrics
- Test edge cases including errors and limits
- Automate in CI/CD with proper isolation
- Use real APIs sparingly for critical integration validation
Key Takeaway: Good tests balance speed, reliability, and cost. Favor fakes for unit tests, use mocks for integration tests, and reserve real API tests for critical paths only.