Getting Started

Let’s start the journey with Spring AI from a simple application.

Prerequisites

Before writing Spring AI applications, we need to prepare the local development environment. Obviously, we need to have Java installed and configured. We also need to have a large language model (LLM) ready for testing.

Java

Spring AI requires a minimal Java version of 17. It’s recommended to use Java 21 or Java 25 LTS, so we can leverage the power of virtual threads.

Source code of this book is tested using Java 21 with virtual threads enabled.

Spring AI

This book uses Spring AI 1.1.0. Example applications in this book use Maven to manage dependencies.

To simplify dependency management of related modules, the spring-ai-bom dependency can be imported to set versions of Spring AI dependencies.

 1 <dependencyManagement>
 2   <dependencies>
 3     <dependency>
 4       <groupId>org.springframework.ai</groupId>
 5       <artifactId>spring-ai-bom</artifactId>
 6       <version>${spring-ai.version}</version>
 7       <type>pom</type>
 8       <scope>import</scope>
 9     </dependency>
10   </dependencies>
11 </dependencyManagement>

Language Model

A language model is required for development, testing and production deployments. This language model can run locally or on the cloud, as long as it provides an API endpoint to access its service.

  • To run a model locally, there are many options available, including Ollama, vLLM, and LM Studio.
  • To use a cloud-based model service, you need to open an account and pay the service by tokens.

Here let’s start from using Ollama.

Ollama is a tool to run large language models locally. You can simply download Ollama and install it on your local machine. After installation, you can open a terminal window and use Ollama CLI command ollama to work with it.

There are many models available for use with Ollama, see Ollama’s models page for a full list.

We can use ollama pull to pull a model. Here we are using Qwen3.

1 ollama pull qwen3:0.6b
An icon indicating this blurb contains information

The size of qwen3:0.6b is only 523MB. It’s good for local development and testing.

After the model is pulled, it can be run using ollama run.

1 ollama run qwen3:0.6b
An icon indicating this blurb contains information

ollama run command pulls non-existing models automatically.

ollama run starts a command-line session with the LLM. You can simply type any text to receive completions from LLM.

ollama run
Figure 1. ollama run

By default, Ollama provides its API endpoint at port 11434.

Spring Boot Application

The easiest way to create a new Spring AI application is using Spring Initializr. When adding the project’s dependencies, Ollama should be selected. This enables Spring AI to interact with Ollama. Spring Web is also added to create a simple REST API.

Below is the screenshot of Spring Initializr.

Spring Initializr UI
Figure 2. Spring Initializr UI

Now we can download the created application and open it using IntelliJ IDEA.

Adding the Ollama dependency actually includes the spring-ai-starter-model-ollama to the Maven project. This Spring Boot starter will create necessary beans to work with Spring AI.

1 <dependency>
2   <groupId>org.springframework.ai</groupId>
3   <artifactId>spring-ai-starter-model-ollama</artifactId>
4 </dependency>

Here we need to add an application.yaml file to configure the Spring Boot application. This is because qwen3 model should be used. By default, Ollama uses Mistral model. The property to configure the Ollama model is spring.ai.ollama.chat.options.model.

1 spring:
2   ai:
3     ollama:
4       chat:
5         options:
6           model: "qwen3:0.6b"

Now we add a REST endpoint to chat with an LLM. The ChatClient.Builder instance is injected into the REST controller to create ChatClient instances. This instance is provided by Ollama Spring Boot starter. A ChatClient is created from this ChatClient.Builder using the build method. chatClient.prompt().user(message).call().content() sends a request to Ollama API endpoint and receives the output.

Now we can start the Spring Boot application. Once the application is started, we can use any REST client tool to interact with the REST API.

Here we use SpringDoc to expose OpenAPI endpoint and Swagger UI to test the API.

1 <dependency>
2   <groupId>org.springdoc</groupId>
3   <artifactId>springdoc-openapi-starter-webmvc-ui</artifactId>
4   <version>2.8.9</version>
5 </dependency>

We can open a browser window and navigate to http://localhost:8080/swagger-ui/, then use Swagger UI to try the API.

Below is the result of testing the API using Swagger UI.

Use Swagger UI
Figure 3. Use Swagger UI

Use Model Service

While Ollama is great for local development and testing, we usually use cloud-based model services for production. All major cloud platforms provide AI models as services, including Google, Amazon, and Microsoft. Spring AI supports major AI model services. Here OpenAI is used as an example.

For Spring Boot, the easiest way is adding the Spring Boot starter dependency. For OpenAI support, the dependency is spring-ai-starter-model-openai.

1 <dependency>
2   <groupId>org.springframework.ai</groupId>
3   <artifactId>spring-ai-starter-model-openai</artifactId>
4 </dependency>

An OpenAI API key is required to use OpenAI services. In the configuration below, OpenAI API key is read from environment variable OPENAI_API_KEY.

1 spring:
2   ai:
3     openai:
4       apiKey: ${OPENAI_API_KEY}

Consolidate Local and Production Environment

If we use Ollama for local development and OpenAI for production, we need to have add both model dependencies in Spring AI. These two dependencies will conflict with each other. We should consolidate these two models. We can use only OpenAI model but different API endpoints in development and production.

Many model services provide an API which is compatible with OpenAI. Ollama also has this API. After Ollama is started, this API can be accessed from base URL http://localhost:11434/v1/.

An icon indicating this blurb contains a warning

OpenAI compatibility of Ollama is experimental and is subject to major adjustments including breaking changes. Only parts of OpenAI API are supported.

We can use Spring profiles to apply configurations for different environments. For the development profile, spring.ai.openai.baseUrl is configured to http://localhost:11434/v1. The API key is required for configuration, but will be ignored, so the value can be anything.

1 spring:
2   ai:
3     openai:
4       baseUrl: http://localhost:11434/v1
5       apiKey: ollama

In the production profile application-prod.yaml, spring.ai.openai.baseUrl is configured to https://api.openai.com/v1, which is the endpoint of OpenAI API.

1 spring:
2   ai:
3     openai:
4       baseUrl: https://api.openai.com/v1
5       apiKey: ${OPENAI_API_KEY}

Profiles can be switched using the option -Dspring.profiles.active, e.g. -Dspring.profiles.active=prod.

Depends on whether you want to run models locally, there are two recommendations about setup of development environment.

Cloud-based Model Services

Cloud-based model services are actually cheap to use. One option is to simply use model services for both development and production. Spring AI provides integration modules for popular model service platforms. We only need to include the Spring AI module and configure it.

Let’s use Anthropic Claude as an example. In a Spring Boot application, we can add the dependency of spring-ai-starter-model-anthropic module.

1 <dependency>
2   <groupId>org.springframework.ai</groupId>
3   <artifactId>spring-ai-starter-model-anthropic</artifactId>
4 </dependency>

Then we can configure Anthropic Claude. The prefix of configuration properties is spring.ai.anthropic. An API key is required to be configured as environment variable ANTHROPIC_API_KEY. The model claude-opus-4-0 is used.

1 spring:
2   ai:
3     anthropic:
4       apiKey: ${ANTHROPIC_API_KEY}
5       chat:
6         options:
7           model: claude-opus-4-0

Use Container

If you want to run models locally, It’s recommended to run models in a container. Container tools like Docker and Podman have already been used extensively in development. You may already use containers to run databases, message brokers, and other tools. Running models in a container means that you don’t need to install other tools.

llama.cpp

A popular choice is using llama.cpp to run models. llama.cpp provides an OpenAI compatible API to interact with the model. Model files can be downloaded from Hugging Face.

In the Docker compose file below, the model file of Qwen3-0.6B is downloaded from Hugging Face, then llama.cpp is started to serve this model.

 1 services:
 2   model-runner:
 3     image: ghcr.io/ggml-org/llama.cpp:server
 4     volumes:
 5       - model-files:/models
 6     command:
 7       - "--host"
 8       - "0.0.0.0"
 9       - "--port"
10       - "8080"
11       - "-n"
12       - "512"
13       - "-m"
14       - "/models/Qwen3-0.6B-Q8_0.gguf"
15     ports:
16       - "8180:8080"
17     depends_on:
18       model-downloader:
19         condition: service_completed_successfully
20 
21   model-downloader:
22     image: ghcr.io/alexcheng1982/model-downloader
23     restart: "no"
24     volumes:
25       - model-files:/models
26     command:
27       - "hf"
28       - "download"
29       - "unsloth/Qwen3-0.6B-GGUF"
30       - "Qwen3-0.6B-Q8_0.gguf"
31       - "--local-dir"
32       - "/models"
33 
34 volumes:
35   model-files:

After the container is started, the model API can be accessed from http://localhost:8180. In Spring AI, we can create a new profile which sets the configuration key spring.ai.openai.baseUrl to http://localhost:8180. The apiKey can be set to anything.

1 spring:
2   ai:
3     openai:
4       baseUrl: http://localhost:8180
5       apiKey: demo
An icon indicating this blurb contains information

llama.cpp provides a web UI to interact with the model. You can access this UI at http://localhost:8180 using a browser.

Ollama

Ollama can also run in a container, which means we don’t need to install Ollama on local machine.

In the Docker compose file below, Ollama is started in a container. Another container is used to pull the qwen3:0.6b model.

 1 services:
 2   ollama:
 3     image: ollama/ollama
 4     container_name: ollama
 5     ports:
 6       - "11434:11434"
 7     volumes:
 8       - ollama:/root/.ollama
 9     restart: unless-stopped
10     healthcheck:
11       test: ["CMD", "curl", "-f", "http://localhost:11434"]
12       interval: 30s
13       timeout: 10s
14       retries: 5
15     command: ["/bin/ollama", "serve"]
16 
17   ollama-pull-qwen3:
18     image: ollama/ollama
19     container_name: ollama-pull-qwen3
20     volumes:
21       - ollama:/root/.ollama
22     depends_on:
23       ollama:
24         condition: service_healthy
25     command: ["/bin/ollama", "pull", "qwen3:0.6b"]
26 
27 volumes:
28   ollama:
29     driver: local