System Overview
This system is designed as a modular, multi-tenant evaluation pipeline for sales conversations. Rather than treating LLMs as free-form generators, it treats them as structured evaluators operating inside a controlled, observable workflow.
At a high level, the system transforms raw conversation data into validated, analyzable evaluation artifacts through a sequence of well-defined stages.
High-Level Flow
The end-to-end flow follows five primary steps:
- Transcript ingestion
- Prompted LLM scoring
- Structured JSON validation
- Storage and analytics
- Evaluation
Each step is independently observable, retryable, and auditable, allowing the system to scale while maintaining evaluation integrity.
Transcript Ingestion
Transcripts currently enter the system exclusively through manual file uploads. Integration with external call platforms (e.g. sales calls, demos) is planned for future releases.
During ingestion:
- Raw transcripts are queued for processing
- A lightweight pre-processing stage normalizes the data into a consistent internal format: Speaker turns, timestamps, and metadata (rep, org, call context) are extracted or standardized
Both the raw transcript and the normalized representation are persisted. This ensures traceability, reproducibility, and the ability to re-run scoring logic as prompts evolve.
Prompted LLM Scoring
Normalized transcripts are processed by a dedicated scoring service.
For each transcript:
- The system assembles a prompt using:
- A stable scoring rubric (SPIN dimensions)
- Organization-specific prompt templates or criteria
- The LLM is invoked synchronously or asynchronously, depending on workload and configuration.
- Latency, token usage, model identity, and prompt version are captured alongside the raw model output.
Structured JSON Validation
LLM output is treated as untrusted input until validated
The system expects all scoring responses to conform to a predefined JSON schema, typically containing:
- Numeric scores per SPIN dimension
- Structured coaching feedback
- Optional rationales or evidence tags
Validation occurs in two layers:
- Schema validation, ensuring the response is structurally correct.
- Business rule validation, enforcing constraints such as score ranges, required fields, and completeness
If validation fails:
- The run is marked invalid
- Error context is recorded for analysis
- The system may optionally retry with a corrected prompt or route the request to a fallback model
This strict validation layer is what allows the system to safely operationalize LLM-based evaluation.
Storage and Analytics
PostgreSQL serves as the system’s source of truth.
Stored entities include:
- Transcripts and normalized representations
- Scoring runs and validated results
- Prompt versions and model metadata
- Evaluation metrics and audit logs
From this core data, the system derives analytical views used by dashboards and reports, such as:
- Performance trends over time
- Per-representative or per-company summaries
- Prompt effectiveness and drift indicators
Teams that want scalable, consistent, and methodology-aligned coaching without relying exclusively on manual call reviews. The system provides structured scores and coaching signals that can support training, performance tracking, and continuous improvement.
Engineers exploring LLM-as-a-Judge patterns who need a production-grade reference for prompt-based evaluation systems. The project demonstrates how to combine strict contracts, evaluation loops, and observability into a coherent architecture.
Anyone investigating how LLMs can be used to assess human behavior, reasoning, or performance—while maintaining rigor, transparency, and accountability. The system is designed to make evaluation assumptions explicit and measurable.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius.
Code block
This is a block which you can put code in, that people can copy with one click! The code to make this work is in the sitewide body and head.
Callouts
Use these callouts to share important information - you can use grey to show neutral important information, yellow to share warnings, red to share limitations, and green to emphasise positive information.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
Accordions
Accordions are an excellent way to share information which only some people will need to read - that way, you aren't distracting people who don't need it. This is controlled by a Webflow interaction
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
Text formatting
You may need to indicate that a piece of text is to be used in code or an attribute. You can just add a styled span to indicate code.
Video block
I've not got much to say here - videos are great, and you should use them as much as possible to educate users on your product!
Image
A picture is worth a thousand words! Use images as much as possible to show your users what you're trying to explain!
