Spring AI Fine-Tuning Workflow — When and How to Fine-Tune Models
Fine-tuning customizes a foundation model on your domain-specific data so it adopts your vocabulary, tone, and patterns. With Spring AI, you can automate training data preparation, trigger fine-tuning jobs via the OpenAI API, and switch to your fine-tuned model transparently. This tutorial explains when fine-tuning beats prompt engineering and how to implement it.
Fine-Tuning vs Prompt Engineering vs RAG
Approach Best For Cost
──────────────────────────────────────────────────────────────────
Prompt Engineering Quick customization, few examples Zero (tokens only)
RAG Dynamic, large, frequently updated knowledge Medium (embeddings + storage)
Fine-Tuning Fixed style, domain vocabulary, consistent format High (training) then Low (inference)
Choose fine-tuning when:
✔ The model must produce specific output format (legal boilerplate, JSON schemas)
✔ Consistent tone/brand voice across all responses
✔ Reduce prompt length (knowledge is baked in — no need to repeat instructions)
✔ You have 100+ high-quality example pairs
Do NOT fine-tune for:
✘ Injecting factual knowledge that changes (use RAG instead)
✘ When you have fewer than 50 examples
✘ When your use case changes frequently
Step 1 — Prepare Training Data
// Training data JSONL format required by OpenAI
// Each line is a JSON object with a "messages" array
@Service
public class TrainingDataPreparer {
public void writeTrainingFile(List<TrainingSample> samples, Path outputPath)
throws IOException {
ObjectMapper mapper = new ObjectMapper();
try (BufferedWriter writer = Files.newBufferedWriter(outputPath)) {
for (TrainingSample sample : samples) {
TrainingEntry entry = new TrainingEntry(List.of(
new TrainingMessage("system", sample.systemPrompt()),
new TrainingMessage("user", sample.userInput()),
new TrainingMessage("assistant", sample.expectedOutput())
));
writer.write(mapper.writeValueAsString(entry));
writer.newLine();
}
}
System.out.printf("Wrote %d training samples to %s%n",
samples.size(), outputPath);
}
}
record TrainingSample(String systemPrompt, String userInput, String expectedOutput) {}
record TrainingMessage(String role, String content) {}
record TrainingEntry(List<TrainingMessage> messages) {}
// Example training data for a Java code reviewer
TrainingSample sample1 = new TrainingSample(
"You are a senior Java code reviewer. Identify issues concisely.",
"Review: public String getUserName(User user) { return user.getName(); }",
"Issues: 1. Missing null check on 'user' parameter. Fix: if (user == null) throw new IllegalArgumentException(\"user must not be null\");"
);
// Output JSONL line:
{"messages":[
{"role":"system","content":"You are a senior Java code reviewer..."},
{"role":"user","content":"Review: public String getUserName..."},
{"role":"assistant","content":"Issues: 1. Missing null check..."}
]}
Step 2 — Upload and Start Fine-Tuning Job
@Service
public class FineTuningService {
private final OpenAIClient openAiClient;
public FineTuningService(OpenAIClient openAiClient) {
this.openAiClient = openAiClient;
}
public String uploadTrainingFile(Path trainingFile) throws IOException {
// Upload file to OpenAI
FileDetails fileDetails = openAiClient.uploadFile(
BinaryData.fromFile(trainingFile),
"fine-tune" // purpose
);
System.out.println("Uploaded file ID: " + fileDetails.getId());
return fileDetails.getId();
}
public String startFineTuning(String fileId, String baseModel) {
FineTuningJob job = openAiClient.createFineTuning(
new FineTuningOptions()
.setModel(baseModel) // "gpt-4o-mini-2024-07-18"
.setTrainingFile(fileId)
.setHyperparameters(new Hyperparameters()
.setNEpochs(3) // number of training passes
)
);
System.out.println("Fine-tuning job started: " + job.getId());
System.out.println("Status: " + job.getStatus());
return job.getId();
}
public FineTuningJob checkStatus(String jobId) {
FineTuningJob job = openAiClient.getFineTuningJob(jobId);
System.out.printf("Job %s status: %s%n", jobId, job.getStatus());
if ("succeeded".equals(job.getStatus())) {
System.out.println("Fine-tuned model: " + job.getFineTunedModel());
}
return job;
}
}
Step 3 — Use Fine-Tuned Model in Spring AI
# application.properties
# After fine-tuning completes, get your model ID from OpenAI dashboard
# e.g., ft:gpt-4o-mini-2024-07-18:my-company:java-reviewer:abc123
spring.ai.openai.chat.options.model=${FINE_TUNED_MODEL_ID:gpt-4o-mini}
@Service
public class FineTunedReviewService {
private final ChatClient fineTunedClient;
private final ChatClient standardClient;
public FineTunedReviewService(ChatClient.Builder builder,
@Value("${fine.tuned.model:}") String fineTunedModelId) {
// Fine-tuned model for production use
this.fineTunedClient = builder
.defaultOptions(OpenAiChatOptions.builder()
.withModel(fineTunedModelId.isBlank() ? "gpt-4o-mini" : fineTunedModelId)
.build())
.build();
// Fallback to standard model
this.standardClient = builder.build();
}
public String reviewCode(String code) {
// Fine-tuned model needs minimal prompt — knowledge is baked in
return fineTunedClient.prompt()
.user("Review: " + code)
.call()
.content();
}
}
Quality Comparison — Before and After Fine-Tuning
Input code:
public User getUser(int id) {
return userRepository.findById(id);
}
Standard gpt-4o-mini response (needs long system prompt to get right format):
"This method retrieves a user by ID from the repository.
Consider adding null handling..."
Fine-tuned model response (correct format, no prompt needed):
"Issues: 1. findById() returns Optional<User> but method signature
returns User — will cause compile error. 2. Missing @Transactional.
Fix: return userRepository.findById(id).orElseThrow(() ->
new UserNotFoundException(id));"
Output — Fine-Tuning Job Status
Uploaded file ID: file-abc123xyz
Fine-tuning job started: ftjob-def456uvw
Status: queued
// Poll every few minutes...
Job ftjob-def456uvw status: running
Job ftjob-def456uvw status: running
Job ftjob-def456uvw status: succeeded
Fine-tuned model: ft:gpt-4o-mini-2024-07-18:my-company:java-reviewer:abc123
// Cost estimate for 500 training samples, 3 epochs:
// Training: ~$5-15
// Inference: same as base model or slightly cheaper per token
Key Points
- Prepare at least 100 diverse, high-quality training examples — 50 is the absolute minimum and often produces weak results
- Use
gpt-4o-minias the base model for fine-tuning — it's 10x cheaper to fine-tune thangpt-4oand the performance gap is small for structured output tasks - Fine-tuning captures format and style reliably but not factual knowledge — if your domain facts change, use RAG on top of your fine-tuned model
- Evaluate the fine-tuned model against a held-out test set (20% of your data) before switching production traffic to it
- Store your training JSONL files in version control — they are the "code" for your fine-tuned model and should be versioned and documented
Comments