System Overview

This system is designed as a modular, multi-tenant evaluation pipeline for sales conversations. Rather than treating LLMs as free-form generators, it treats them as structured evaluators operating inside a controlled, observable workflow.

At a high level, the system transforms raw conversation data into validated, analyzable evaluation artifacts through a sequence of well-defined stages.

High-Level Flow

The end-to-end flow follows five primary steps:

Transcript ingestion
Prompted LLM scoring
Structured JSON validation
Storage and analytics
Evaluation

Each step is independently observable, retryable, and auditable, allowing the system to scale while maintaining evaluation integrity.

Transcript Ingestion

Transcripts currently enter the system exclusively through manual file uploads. Integration with external call platforms (e.g. sales calls, demos) is planned for future releases.

During ingestion:

Raw transcripts are queued for processing
A lightweight pre-processing stage normalizes the data into a consistent internal format: Speaker turns, timestamps, and metadata (rep, org, call context) are extracted or standardized

Both the raw transcript and the normalized representation are persisted. This ensures traceability, reproducibility, and the ability to re-run scoring logic as prompts evolve.

Prompted LLM Scoring

Normalized transcripts are processed by a dedicated scoring service.

For each transcript:

The system assembles a prompt using:
- A stable scoring rubric (SPIN dimensions)
- Organization-specific prompt templates or criteria
ewable, and aligned with the SPIN framework.
The LLM is invoked synchronously or asynchronously, depending on workload and configuration.
Latency, token usage, model identity, and prompt version are captured alongside the raw model output.

Structured JSON Validation

LLM output is treated as untrusted input until validated

The system expects all scoring responses to conform to a predefined JSON schema, typically containing:

Numeric scores per SPIN dimension
Structured coaching feedback
Optional rationales or evidence tags

Validation occurs in two layers:

Schema validation, ensuring the response is structurally correct.
Business rule validation, enforcing constraints such as score ranges, required fields, and completeness

If validation fails:

The run is marked invalid
Error context is recorded for analysis
The system may optionally retry with a corrected prompt or route the request to a fallback model

This strict validation layer is what allows the system to safely operationalize LLM-based evaluation.

Storage and Analytics

PostgreSQL serves as the system’s source of truth.

Stored entities include:

Transcripts and normalized representations
Scoring runs and validated results
Prompt versions and model metadata
Evaluation metrics and audit logs

From this core data, the system derives analytical views used by dashboards and reports, such as:

Performance trends over time
Per-representative or per-company summaries
Prompt effectiveness and drift indicators

Evaluation

Evaluation is a first-class concern in the system, not an afterthought.

Online Evaluation:

Multiple prompt or model variants can be test via A/B routing
Live metrics such as latency, cost, and accuracy proxies are monitored
Feature flags allow rapid rollback if anomalies are detected

Why FastAPI and Next.js

FastAPI is used for the backend because it:

Is async-friendly, which is essential for LLM calls
Enforces strong typing through Pydantic models
Cleanly separates API routing from worker-style services
Integrates well with observability and background processing

Next.js powers the frontend because it:

Supports fast server-side rendering for dashboards
Enables lightweight API routes for UI-specific needs
Provides a strong developer experience for building internal tools and evaluation interfaces

This combination balances performance, maintainability, and iteration speed.

Why PostgreSQL

PostgreSQL was chosen for its:

Strong consistency guarantees
Native JSON support for storing structured model outputs
Mature migration and tooling ecosystem
Compatibility with multi-tenant access patterns and auditing requirements