Getting Started
Let’s start the journey with Spring AI from a simple AI application.
Prerequisites
Before writing Spring AI applications, we need to prepare the local development environment. Obviously, we need to have Java installed and configured. We also need to have a large language model (LLM) ready for development and testing.
Java
Spring AI requires a minimal Java version of 17. It’s recommended to use Java 21 or Java 25 LTS, so we can leverage the power of virtual threads.
![]() |
Source code of this book is tested using Java 25 with virtual threads enabled. |
Spring AI
This book uses Spring AI 2.0.0-M4. Example applications in this book use Maven to manage dependencies.
![]() |
Spring AI |
To simplify dependency management of Spring AI related modules, the spring-ai-bom dependency can be imported to set versions of Spring AI dependencies. In the code below, set the property spring-ai.version to the version of Spring AI.
1 <dependencyManagement>
2 <dependencies>
3 <dependency>
4 <groupId>org.springframework.ai</groupId>
5 <artifactId>spring-ai-bom</artifactId>
6 <version>${spring-ai.version}</version>
7 <type>pom</type>
8 <scope>import</scope>
9 </dependency>
10 </dependencies>
11 </dependencyManagement>
Language Model
A language model is required for development, testing and production deployments. This language model can run locally or on the cloud, as long as it provides an API endpoint to access its service. Spring AI provides built-in support for a large number of language model services.
- To use a cloud-based model service, you need to open an account on the cloud service and get an API key.
Here let’s start from using Ollama.
Ollama is a tool to run large language models locally. You can simply download Ollama and install it on your local machine. After installation, you can open a terminal window and use Ollama CLI command ollama to work with it.
There are many models available for use with Ollama, see Ollama’s models page for a full list.
We can use ollama pull to pull a model. Here we are using Qwen3.
1 ollama pull qwen3:0.6b
![]() |
The size of |
After the model is pulled, it can be run using ollama run.
1 ollama run qwen3:0.6b
![]() |
|
ollama run starts a command-line session with the LLM. You can simply type any text to receive completions from LLM.

Ollama also provides a GUI window to send messages to a model.

By default, Ollama provides its API endpoint at port 11434.
Spring Boot Application
The easiest way to create a new Spring AI application is using Spring Initializr. When adding the project’s dependencies, Ollama should be selected. This enables Spring AI to interact with Ollama. Spring Web is also added to create a simple REST API.
![]() |
Spring AI 2.x requires Spring Boot 4.x. |
Below is the screenshot of Spring Initializr.

Now we can download the created application and open it using IntelliJ IDEA or other IDEs.
Adding the Ollama dependency actually includes the spring-ai-starter-model-ollama to the Maven project. This Spring Boot starter will create necessary beans to work with Spring AI.
1 <dependency>
2 <groupId>org.springframework.ai</groupId>
3 <artifactId>spring-ai-starter-model-ollama</artifactId>
4 </dependency>
Here we need to add an application.yaml file to configure the Spring Boot application. This is because qwen3 model should be used. By default, Ollama uses Mistral model. The property to configure the Ollama model is spring.ai.ollama.chat.options.model.
1 spring:
2 ai:
3 ollama:
4 chat:
5 options:
6 model: "qwen3:0.6b"
Now we add a REST endpoint to chat with an LLM. The ChatClient.Builder instance is injected into the REST controller to create ChatClient instances. This ChatClient.Builder instance is provided by Ollama Spring Boot starter. A ChatClient is created from this ChatClient.Builder using the build method. The code chatClient.prompt().user(message).call().content() sends a request to Ollama API endpoint and receives the output.
Now we can start the Spring Boot application. Once the application is started, we can use any REST client tool to interact with the REST API.
Here we use SpringDoc to expose an OpenAPI endpoint and Swagger UI to test the API.
1 <dependency>
2 <groupId>org.springdoc</groupId>
3 <artifactId>springdoc-openapi-starter-webmvc-ui</artifactId>
4 <version>3.0.2</version>
5 </dependency>
We can open a browser window and navigate to http://localhost:8080/swagger-ui/, then use Swagger UI to test the API.
Below is the result of testing the API using Swagger UI.

Use Model Service
While Ollama is great for local development and testing, we usually use cloud-based model services for production. All major cloud platforms provide AI models as services, including Google, Amazon, and Microsoft. Spring AI supports major AI model services. Here OpenAI is used as an example.
For Spring Boot, the easiest way is adding the Spring Boot starter dependency. For OpenAI support, the dependency is spring-ai-starter-model-openai.
1 <dependency>
2 <groupId>org.springframework.ai</groupId>
3 <artifactId>spring-ai-starter-model-openai</artifactId>
4 </dependency>
An OpenAI API key is required to use OpenAI services. In the configuration below, the OpenAI API key is read from environment variable OPENAI_API_KEY.
1 spring:
2 ai:
3 openai:
4 apiKey: ${OPENAI_API_KEY}
Consolidate Local and Production Environment
If we use Ollama for local development and OpenAI for production, we need to have add both model dependencies in Spring AI. These two dependencies will conflict with each other. We should consolidate these two models. We can use only OpenAI model but different API endpoints in development and production.
Many model services provide an API which is compatible with OpenAI. Ollama also has this API. After Ollama is started, this API can be accessed from base URL http://localhost:11434/v1/.
![]() |
OpenAI compatibility of Ollama is experimental and is subject to major adjustments including breaking changes. Only parts of OpenAI API are supported. |
We can use Spring profiles to apply configurations for different environments. For the development profile, spring.ai.openai.baseUrl is configured to http://localhost:11434/v1. The API key is required for configuration, but its value will be ignored, so the value can be anything.
1 spring:
2 ai:
3 openai:
4 baseUrl: http://localhost:11434/v1
5 apiKey: ollama
In the production profile application-prod.yaml, spring.ai.openai.baseUrl is configured to https://api.openai.com/v1, which is the endpoint of OpenAI API.
1 spring:
2 ai:
3 openai:
4 baseUrl: https://api.openai.com/v1
5 apiKey: ${OPENAI_API_KEY}
Profiles can be switched using the option -Dspring.profiles.active, e.g. -Dspring.profiles.active=prod to use the prod profile.
Depends on whether you want to run models locally, there are two recommendations about setup of development environment.
Cloud-based Model Services
Cloud-based model services are actually cheap to use. One option is to simply use model services for both development and production. Spring AI provides integration modules for popular model service platforms. We only need to include the Spring AI module and configure it.
Let’s use Anthropic Claude as an example. In a Spring Boot application, we can add the dependency of spring-ai-starter-model-anthropic module.
1 <dependency>
2 <groupId>org.springframework.ai</groupId>
3 <artifactId>spring-ai-starter-model-anthropic</artifactId>
4 </dependency>
Then we can configure Anthropic Claude. The prefix of configuration properties is spring.ai.anthropic. An API key is required to be configured as environment variable ANTHROPIC_API_KEY. The model claude-opus-4-0 is used.
1 spring:
2 ai:
3 anthropic:
4 apiKey: ${ANTHROPIC_API_KEY}
5 chat:
6 options:
7 model: claude-opus-4-0
Use Container
If you want to run models locally, It’s recommended to run models in a container. Container tools like Docker and Podman have already been used extensively in development. You may already use containers to run databases, message brokers, and other tools. Running models in a container means that you don’t need to install other tools.
llama.cpp
A popular choice is using llama.cpp to run models. llama.cpp provides an OpenAI compatible API to interact with the model. Model files can be downloaded from Hugging Face.
In the Docker compose file below, the model file of Qwen3-0.6B is downloaded from Hugging Face, then llama.cpp is started to serve this model.
1 services:
2 model-runner:
3 image: ghcr.io/ggml-org/llama.cpp:server
4 volumes:
5 - model-files:/models
6 command:
7 - "--host"
8 - "0.0.0.0"
9 - "--port"
10 - "8080"
11 - "-n"
12 - "512"
13 - "-m"
14 - "/models/Qwen3-0.6B-Q8_0.gguf"
15 ports:
16 - "8180:8080"
17 depends_on:
18 model-downloader:
19 condition: service_completed_successfully
20
21 model-downloader:
22 image: ghcr.io/alexcheng1982/model-downloader
23 restart: "no"
24 volumes:
25 - model-files:/models
26 command:
27 - "hf"
28 - "download"
29 - "unsloth/Qwen3-0.6B-GGUF"
30 - "Qwen3-0.6B-Q8_0.gguf"
31 - "--local-dir"
32 - "/models"
33
34 volumes:
35 model-files:
After the container is started, the model API can be accessed from http://localhost:8180. In Spring AI, we can create a new profile which sets the configuration key spring.ai.openai.baseUrl to http://localhost:8180. The apiKey can be set to anything.
1 spring:
2 ai:
3 openai:
4 baseUrl: http://localhost:8180
5 apiKey: demo
![]() |
llama.cpp provides a web UI to interact with the model. You can access this UI at |
Ollama
Ollama can also run in a container, which means we don’t need to install Ollama on local machine.
In the Docker compose file below, Ollama is started in a container. Another container is used to pull the qwen3:0.6b model.
1 services:
2 ollama:
3 image: ollama/ollama
4 container_name: ollama
5 ports:
6 - "11434:11434"
7 volumes:
8 - ollama:/root/.ollama
9 restart: unless-stopped
10 healthcheck:
11 test: ["CMD", "curl", "-f", "http://localhost:11434"]
12 interval: 30s
13 timeout: 10s
14 retries: 5
15 command: ["/bin/ollama", "serve"]
16
17 ollama-pull-qwen3:
18 image: ollama/ollama
19 container_name: ollama-pull-qwen3
20 volumes:
21 - ollama:/root/.ollama
22 depends_on:
23 ollama:
24 condition: service_healthy
25 command: ["/bin/ollama", "pull", "qwen3:0.6b"]
26
27 volumes:
28 ollama:
29 driver: local
Summary
The chapter introduces Spring AI basics by showing how to set up Java, manage Spring AI dependencies with the BOM, and run a local model using Ollama. It then walks through creating a Spring Boot app (via Spring Initializr) with an Ollama starter, configuring the model in application.yaml, and exposing a chat REST endpoint tested through Swagger UI. Finally, it explains production-oriented setup by using OpenAI-compatible APIs and Spring profiles to switch endpoints (local vs cloud), plus alternatives like Anthropic services and containerized local model runtimes (llama.cpp or Ollama).

