System Overview
This system is designed as a modular, multi-tenant evaluation pipeline for sales conversations. Rather than treating LLMs as free-form generators, it treats them as structured evaluators operating inside a controlled, observable workflow.
At a high level, the system transforms raw conversation data into validated, analyzable evaluation artifacts through a sequence of well-defined stages.
High-Level Flow
The end-to-end flow follows five primary steps:
- Transcript ingestion
- Prompted LLM scoring
- Structured JSON validation
- Storage and analytics
- Evaluation
Each step is independently observable, retryable, and auditable, allowing the system to scale while maintaining evaluation integrity.
Transcript Ingestion
Transcripts currently enter the system exclusively through manual file uploads. Integration with external call platforms (e.g. sales calls, demos) is planned for future releases.
During ingestion:
- Raw transcripts are queued for processing
- A lightweight pre-processing stage normalizes the data into a consistent internal format: Speaker turns, timestamps, and metadata (rep, org, call context) are extracted or standardized
Both the raw transcript and the normalized representation are persisted. This ensures traceability, reproducibility, and the ability to re-run scoring logic as prompts evolve.
Prompted LLM Scoring
Normalized transcripts are processed by a dedicated scoring service.
For each transcript:
- The system assembles a prompt using:
- A stable scoring rubric (SPIN dimensions)
- Organization-specific prompt templates or criteria
- The LLM is invoked synchronously or asynchronously, depending on workload and configuration.
- Latency, token usage, model identity, and prompt version are captured alongside the raw model output.
Structured JSON Validation
LLM output is treated as untrusted input until validated
The system expects all scoring responses to conform to a predefined JSON schema, typically containing:
- Numeric scores per SPIN dimension
- Structured coaching feedback
- Optional rationales or evidence tags
Validation occurs in two layers:
- Schema validation, ensuring the response is structurally correct.
- Business rule validation, enforcing constraints such as score ranges, required fields, and completeness
If validation fails:
- The run is marked invalid
- Error context is recorded for analysis
- The system may optionally retry with a corrected prompt or route the request to a fallback model
This strict validation layer is what allows the system to safely operationalize LLM-based evaluation.
Storage and Analytics
PostgreSQL serves as the system’s source of truth.
Stored entities include:
- Transcripts and normalized representations
- Scoring runs and validated results
- Prompt versions and model metadata
- Evaluation metrics and audit logs
From this core data, the system derives analytical views used by dashboards and reports, such as:
- Performance trends over time
- Per-representative or per-company summaries
- Prompt effectiveness and drift indicators
Evaluation
Evaluation is a first-class concern in the system, not an afterthought.
Online Evaluation:
- Multiple prompt or model variants can be test via A/B routing
- Live metrics such as latency, cost, and accuracy proxies are monitored
- Feature flags allow rapid rollback if anomalies are detected
Why FastAPI and Next.js
FastAPI is used for the backend because it:
- Is async-friendly, which is essential for LLM calls
- Enforces strong typing through Pydantic models
- Cleanly separates API routing from worker-style services
- Integrates well with observability and background processing
Next.js powers the frontend because it:
- Supports fast server-side rendering for dashboards
- Enables lightweight API routes for UI-specific needs
- Provides a strong developer experience for building internal tools and evaluation interfaces
This combination balances performance, maintainability, and iteration speed.
Why PostgreSQL
PostgreSQL was chosen for its:
- Strong consistency guarantees
- Native JSON support for storing structured model outputs
- Mature migration and tooling ecosystem
- Compatibility with multi-tenant access patterns and auditing requirements