System Overview

This system is designed as a modular, multi-tenant evaluation pipeline for sales conversations. Rather than treating LLMs as free-form generators, it treats them as structured evaluators operating inside a controlled, observable workflow.

At a high level, the system transforms raw conversation data into validated, analyzable evaluation artifacts through a sequence of well-defined stages.

High-Level Flow

The end-to-end flow follows five primary steps:

  1. Transcript ingestion
  2. Prompted LLM scoring
  3. Structured JSON validation
  4. Storage and analytics
  5. Evaluation

Each step is independently observable, retryable, and auditable, allowing the system to scale while maintaining evaluation integrity.

Transcript Ingestion

Transcripts currently enter the system exclusively through manual file uploads. Integration with external call platforms (e.g. sales calls, demos) is planned for future releases.

During ingestion:

  • Raw transcripts are queued for processing
  • A lightweight pre-processing stage normalizes the data into a consistent internal format: Speaker turns, timestamps, and metadata (rep, org, call context) are extracted or standardized

Both the raw transcript and the normalized representation are persisted. This ensures traceability, reproducibility, and the ability to re-run scoring logic as prompts evolve.

Prompted LLM Scoring

Normalized transcripts are processed by a dedicated scoring service.

For each transcript:

  • The system assembles a prompt using:
    • A stable scoring rubric (SPIN dimensions)
    • Organization-specific prompt templates or criteria
    ewable, and aligned with the SPIN framework.
  • The LLM is invoked synchronously or asynchronously, depending on workload and configuration.
  • Latency, token usage, model identity, and prompt version are captured alongside the raw model output.

Structured JSON Validation

LLM output is treated as untrusted input until validated

The system expects all scoring responses to conform to a predefined JSON schema, typically containing:

  • Numeric scores per SPIN dimension
  • Structured coaching feedback
  • Optional rationales or evidence tags

Validation occurs in two layers:

  • Schema validation, ensuring the response is structurally correct.
  • Business rule validation, enforcing constraints such as score ranges, required fields, and completeness

If validation fails:

  • The run is marked invalid
  • Error context is recorded for analysis
  • The system may optionally retry with a corrected prompt or route the request to a fallback model

This strict validation layer is what allows the system to safely operationalize LLM-based evaluation.

Storage and Analytics

PostgreSQL serves as the system’s source of truth.

Stored entities include:

  • Transcripts and normalized representations
  • Scoring runs and validated results
  • Prompt versions and model metadata
  • Evaluation metrics and audit logs

From this core data, the system derives analytical views used by dashboards and reports, such as:

  • Performance trends over time
  • Per-representative or per-company summaries
  • Prompt effectiveness and drift indicators

Evaluation

Evaluation is a first-class concern in the system, not an afterthought.

Online Evaluation:

  • Multiple prompt or model variants can be test via A/B routing
  • Live metrics such as latency, cost, and accuracy proxies are monitored
  • Feature flags allow rapid rollback if anomalies are detected

Why FastAPI and Next.js

FastAPI is used for the backend because it:

  • Is async-friendly, which is essential for LLM calls
  • Enforces strong typing through Pydantic models
  • Cleanly separates API routing from worker-style services
  • Integrates well with observability and background processing

Next.js powers the frontend because it:

  • Supports fast server-side rendering for dashboards
  • Enables lightweight API routes for UI-specific needs
  • Provides a strong developer experience for building internal tools and evaluation interfaces

This combination balances performance, maintainability, and iteration speed.


Why PostgreSQL

PostgreSQL was chosen for its:

  • Strong consistency guarantees
  • Native JSON support for storing structured model outputs
  • Mature migration and tooling ecosystem
  • Compatibility with multi-tenant access patterns and auditing requirements