Spring AI Data Enrichment — Enhance Existing Records with AI-Generated Fields
Data enrichment uses AI to add computed fields to your existing database records: product category from description, customer segment from purchase history, article tags from content, or risk score from transaction data. This tutorial shows how to build scalable AI enrichment pipelines that run as background jobs and update records in place.
Common Data Enrichment Patterns
Input Record AI-Enriched Fields Added
──────────────────────────────────────────────────────────────
Product(name, description) → category, tags, targetAudience
Customer(orders, location) → segment, churnRisk, lifetimeValue
Article(title, body) → topics, readingLevel, summary
Transaction(amount, merchant) → category, isAnomaly, riskScore
Job posting (description) → skills, seniority, jobFamily
Product Enrichment — Add Category and Tags Automatically
@Entity
@Table(name = "products")
public class Product {
@Id @GeneratedValue
private Long id;
private String name;
@Column(columnDefinition = "TEXT")
private String description;
// AI-enriched fields (nullable until enriched)
private String category;
private String subcategory;
@ElementCollection
private List<String> tags;
private String targetAudience; // "professional", "student", "home user"
private Integer readingLevel; // 1-12 (school grade equivalent)
@Column(nullable = false)
private boolean enriched = false;
@Column
private LocalDateTime enrichedAt;
}
public record ProductEnrichment(
String category,
String subcategory,
List<String> tags,
String targetAudience,
int readingLevel
) {}
@Service
public class ProductEnrichmentService {
private final ChatClient chatClient;
private final ProductRepository productRepo;
public ProductEnrichmentService(ChatClient.Builder builder,
ProductRepository productRepo) {
this.chatClient = builder
.defaultOptions(OpenAiChatOptions.builder()
.withModel("gpt-4o-mini")
.withTemperature(0.0f)
.build())
.build();
this.productRepo = productRepo;
}
public void enrichProduct(Product product) {
ProductEnrichment enrichment = chatClient.prompt()
.system("""
Analyze product data and return enrichment fields.
Category examples: Electronics, Software, Books, Clothing, Tools
Tags: 3-7 relevant keywords
Target audience: professional / student / home_user / enterprise
Reading level: school grade 1-12 based on description complexity
""")
.user("Product name: %s\nDescription: %s"
.formatted(product.getName(), product.getDescription()))
.call()
.entity(ProductEnrichment.class);
product.setCategory(enrichment.category());
product.setSubcategory(enrichment.subcategory());
product.setTags(enrichment.tags());
product.setTargetAudience(enrichment.targetAudience());
product.setReadingLevel(enrichment.readingLevel());
product.setEnriched(true);
product.setEnrichedAt(LocalDateTime.now());
productRepo.save(product);
}
}
Batch Enrichment Job
@Service
public class BatchEnrichmentJob {
private final ProductEnrichmentService enricher;
private final ProductRepository repo;
// Run nightly at 2 AM
@Scheduled(cron = "0 0 2 * * ?")
public void enrichUnenrichedProducts() {
List<Product> pending = repo.findByEnrichedFalse();
System.out.printf("Starting enrichment for %d products%n", pending.size());
int success = 0, failed = 0;
for (Product product : pending) {
try {
enricher.enrichProduct(product);
success++;
// Rate limit: ~100 requests/minute for gpt-4o-mini
if (success % 10 == 0) {
Thread.sleep(1000); // brief pause every 10 products
}
} catch (Exception e) {
System.err.printf("Failed to enrich product %d: %s%n",
product.getId(), e.getMessage());
failed++;
}
}
System.out.printf("Enrichment complete: %d succeeded, %d failed%n",
success, failed);
}
// Trigger manually via REST endpoint
@PostMapping("/admin/enrich-products")
public Map<String, Integer> triggerEnrichment() {
List<Product> pending = repo.findByEnrichedFalse();
pending.parallelStream() // parallel for small batches
.forEach(enricher::enrichProduct);
return Map.of("enriched", pending.size());
}
}
Customer Segment Enrichment
public record CustomerSegment(
String segment, // "high_value", "at_risk", "new", "dormant"
double churnRisk, // 0.0-1.0
String recommendation // "offer_discount", "send_newsletter", "win_back_campaign"
) {}
@Service
public class CustomerEnrichmentService {
private final ChatClient chatClient;
public CustomerSegment segmentCustomer(Customer customer,
List<Order> recentOrders) {
String orderSummary = recentOrders.stream()
.map(o -> " - %s: $%.2f (%s)".formatted(
o.getDate(), o.getAmount(), o.getCategory()))
.collect(Collectors.joining("\n"));
return chatClient.prompt()
.user("""
Customer: %s, joined %s, location: %s
Recent orders (last 6 months):
%s
Total lifetime value: $%.2f
Days since last purchase: %d
Classify this customer and provide churn risk.
""".formatted(
customer.getName(),
customer.getJoinedDate(),
customer.getCountry(),
orderSummary,
customer.getLifetimeValue(),
customer.getDaysSinceLastPurchase()))
.call()
.entity(CustomerSegment.class);
}
}
Output
// enrichProduct(laptop)
ProductEnrichment[
category="Electronics",
subcategory="Laptops",
tags=["laptop", "windows", "business", "ultrabook", "fast"],
targetAudience="professional",
readingLevel=10
]
// Batch enrichment log:
Starting enrichment for 247 products
Enrichment complete: 244 succeeded, 3 failed
// Customer segmentation
CustomerSegment[
segment="at_risk",
churnRisk=0.72,
recommendation="win_back_campaign"
]
Key Points
- Add an
enrichedboolean column to every entity you enrich — it enables efficientfindByEnrichedFalse()queries and prevents duplicate processing - Use
temperature=0.0for enrichment — you want reproducible, deterministic categorizations, not creative variation - Throttle batch enrichment with a brief sleep every N records —
gpt-4o-miniallows ~500 requests/minute but sustained parallel calls will hit rate limits - Store
enrichedAttimestamp — it lets you re-enrich records when your category taxonomy changes or model quality improves - Validate enrichment output before saving: check that category is in your allowed list, tags count is reasonable, and required fields aren't null
Comments