Java9R: Spring AI Data Enrichment — Enhance Existing Records with AI-Generated Fields

Spring AI Data Enrichment — Enhance Existing Records with AI-Generated Fields

Data enrichment uses AI to add computed fields to your existing database records: product category from description, customer segment from purchase history, article tags from content, or risk score from transaction data. This tutorial shows how to build scalable AI enrichment pipelines that run as background jobs and update records in place.

Common Data Enrichment Patterns

Input Record                   AI-Enriched Fields Added
──────────────────────────────────────────────────────────────
Product(name, description)  →  category, tags, targetAudience
Customer(orders, location)  →  segment, churnRisk, lifetimeValue
Article(title, body)        →  topics, readingLevel, summary
Transaction(amount, merchant) → category, isAnomaly, riskScore
Job posting (description)   →  skills, seniority, jobFamily

Product Enrichment — Add Category and Tags Automatically

@Entity
@Table(name = "products")
public class Product {

    @Id @GeneratedValue
    private Long id;

    private String name;

    @Column(columnDefinition = "TEXT")
    private String description;

    // AI-enriched fields (nullable until enriched)
    private String category;
    private String subcategory;

    @ElementCollection
    private List<String> tags;

    private String targetAudience;   // "professional", "student", "home user"
    private Integer readingLevel;    // 1-12 (school grade equivalent)

    @Column(nullable = false)
    private boolean enriched = false;

    @Column
    private LocalDateTime enrichedAt;
}

public record ProductEnrichment(
        String category,
        String subcategory,
        List<String> tags,
        String targetAudience,
        int readingLevel
) {}

@Service
public class ProductEnrichmentService {

    private final ChatClient chatClient;
    private final ProductRepository productRepo;

    public ProductEnrichmentService(ChatClient.Builder builder,
                                     ProductRepository productRepo) {
        this.chatClient  = builder
                .defaultOptions(OpenAiChatOptions.builder()
                        .withModel("gpt-4o-mini")
                        .withTemperature(0.0f)
                        .build())
                .build();
        this.productRepo = productRepo;
    }

    public void enrichProduct(Product product) {
        ProductEnrichment enrichment = chatClient.prompt()
                .system("""
                        Analyze product data and return enrichment fields.
                        Category examples: Electronics, Software, Books, Clothing, Tools
                        Tags: 3-7 relevant keywords
                        Target audience: professional / student / home_user / enterprise
                        Reading level: school grade 1-12 based on description complexity
                        """)
                .user("Product name: %s\nDescription: %s"
                        .formatted(product.getName(), product.getDescription()))
                .call()
                .entity(ProductEnrichment.class);

        product.setCategory(enrichment.category());
        product.setSubcategory(enrichment.subcategory());
        product.setTags(enrichment.tags());
        product.setTargetAudience(enrichment.targetAudience());
        product.setReadingLevel(enrichment.readingLevel());
        product.setEnriched(true);
        product.setEnrichedAt(LocalDateTime.now());

        productRepo.save(product);
    }
}

Batch Enrichment Job

@Service
public class BatchEnrichmentJob {

    private final ProductEnrichmentService enricher;
    private final ProductRepository        repo;

    // Run nightly at 2 AM
    @Scheduled(cron = "0 0 2 * * ?")
    public void enrichUnenrichedProducts() {
        List<Product> pending = repo.findByEnrichedFalse();
        System.out.printf("Starting enrichment for %d products%n", pending.size());

        int success = 0, failed = 0;

        for (Product product : pending) {
            try {
                enricher.enrichProduct(product);
                success++;

                // Rate limit: ~100 requests/minute for gpt-4o-mini
                if (success % 10 == 0) {
                    Thread.sleep(1000);  // brief pause every 10 products
                }
            } catch (Exception e) {
                System.err.printf("Failed to enrich product %d: %s%n",
                        product.getId(), e.getMessage());
                failed++;
            }
        }

        System.out.printf("Enrichment complete: %d succeeded, %d failed%n",
                success, failed);
    }

    // Trigger manually via REST endpoint
    @PostMapping("/admin/enrich-products")
    public Map<String, Integer> triggerEnrichment() {
        List<Product> pending = repo.findByEnrichedFalse();
        pending.parallelStream()   // parallel for small batches
                .forEach(enricher::enrichProduct);

        return Map.of("enriched", pending.size());
    }
}

Customer Segment Enrichment

public record CustomerSegment(
        String segment,         // "high_value", "at_risk", "new", "dormant"
        double churnRisk,       // 0.0-1.0
        String recommendation   // "offer_discount", "send_newsletter", "win_back_campaign"
) {}

@Service
public class CustomerEnrichmentService {

    private final ChatClient chatClient;

    public CustomerSegment segmentCustomer(Customer customer,
                                           List<Order> recentOrders) {
        String orderSummary = recentOrders.stream()
                .map(o -> "  - %s: $%.2f (%s)".formatted(
                        o.getDate(), o.getAmount(), o.getCategory()))
                .collect(Collectors.joining("\n"));

        return chatClient.prompt()
                .user("""
                      Customer: %s, joined %s, location: %s
                      Recent orders (last 6 months):
                      %s

                      Total lifetime value: $%.2f
                      Days since last purchase: %d

                      Classify this customer and provide churn risk.
                      """.formatted(
                        customer.getName(),
                        customer.getJoinedDate(),
                        customer.getCountry(),
                        orderSummary,
                        customer.getLifetimeValue(),
                        customer.getDaysSinceLastPurchase()))
                .call()
                .entity(CustomerSegment.class);
    }
}

Output

// enrichProduct(laptop)
ProductEnrichment[
  category="Electronics",
  subcategory="Laptops",
  tags=["laptop", "windows", "business", "ultrabook", "fast"],
  targetAudience="professional",
  readingLevel=10
]

// Batch enrichment log:
Starting enrichment for 247 products
Enrichment complete: 244 succeeded, 3 failed

// Customer segmentation
CustomerSegment[
  segment="at_risk",
  churnRisk=0.72,
  recommendation="win_back_campaign"
]

Key Points

Add an enriched boolean column to every entity you enrich — it enables efficient findByEnrichedFalse() queries and prevents duplicate processing
Use temperature=0.0 for enrichment — you want reproducible, deterministic categorizations, not creative variation
Throttle batch enrichment with a brief sleep every N records — gpt-4o-mini allows ~500 requests/minute but sustained parallel calls will hit rate limits
Store enrichedAt timestamp — it lets you re-enrich records when your category taxonomy changes or model quality improves
Validate enrichment output before saving: check that category is in your allowed list, tags count is reasonable, and required fields aren't null

Spring AI Data Enrichment — Enhance Existing Records with AI-Generated Fields

Spring AI Data Enrichment — Enhance Existing Records with AI-Generated Fields

Common Data Enrichment Patterns

Product Enrichment — Add Category and Tags Automatically

Batch Enrichment Job

Customer Segment Enrichment

Output

Key Points

Comments