Technology RadarTechnology Radar

Testcontainers + Ollama for AI Integration Tests

testing
Trial

Running a real local LLM in your integration tests — via Testcontainers starting an Ollama container — gives you deterministic AI tests without mocking, rate limits, or API costs.

Why It's in Trial

Testing AI code is an unsolved problem for most teams. The options are:

  1. Mock the LLM — fast, but mocks don't behave like real models; misses hallucination, formatting, and latency characteristics
  2. Call real APIs in tests — works but incurs cost, hits rate limits, flaky in CI, needs credentials in CI
  3. Testcontainers + Ollama — runs a real (small) model locally in a Docker container, deterministic enough for integration tests, no external dependencies

Option 3 is increasingly the right answer for testing the integration between your code and AI behaviour.

Setup

Dependency (Spring Boot test):

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-testcontainers</artifactId>
  <scope>test</scope>
</dependency>
<dependency>
  <groupId>org.testcontainers</groupId>
  <artifactId>ollama</artifactId>
  <scope>test</scope>
</dependency>

Test class:

@SpringBootTest
@Testcontainers
class CustomerSupportServiceIT {

    @Container
    @ServiceConnection
    static OllamaContainer ollama = new OllamaContainer("ollama/ollama:latest")
        .withModel("llama3.2:3b");  // Small model — fast to pull, ~2GB

    @Autowired CustomerSupportService service;

    @Test
    void shouldSummariseComplaint() {
        String summary = service.summarise(
            "My order #12345 arrived damaged and I want a refund."
        );
        assertThat(summary).containsIgnoringCase("damaged");
        assertThat(summary).containsIgnoringCase("refund");
    }
}

The @ServiceConnection annotation wires the Ollama container URL directly into Spring AI's configuration — no manual property setup.

Quarkus Dev Services

Quarkus LangChain4j handles this automatically — when running tests, Quarkus Dev Services spins up an Ollama container and configures the application without any test annotations required. It's zero configuration.

Choosing a Test Model

Model Size When to use
llama3.2:3b ~2 GB Fast CI tests, basic comprehension checks
llama3.1:8b ~5 GB Better quality, acceptable CI time
mistral:7b ~4 GB Good instruction following, small footprint
nomic-embed-text ~274 MB Embedding-only tests (RAG pipelines)

For CI, a 3B model starts in ~10 seconds after the image is cached — comparable to a Postgres Testcontainer cold start.

Model Caching in CI

The first test run pulls the model (~2 GB), which is slow. Cache the pulled model in CI:

  • GitHub Actions: cache /root/.ollama or the container image layer
  • Most CI systems: use testcontainers.reuse.enable=true to keep the container running between test runs

Key Characteristics

Property Value
Requires Docker, Java 11+
Spring Boot integration @ServiceConnection (Spring Boot 3.1+)
Quarkus integration Automatic via Dev Services
CI caching Cache Ollama model downloads