QA Calibration
QA Calibration is the process of aligning multiple QA reviewers on how to interpret and apply a quality rubric consistently. In calibration sessions, reviewers independently score the same set of conversations and then compare scores to identify and resolve disagreements. The goal is to ensure that a conversation scored by one reviewer gets the same score from another — otherwise IQS data is unreliable and unfair to use for agent coaching or performance management.
Inter-Rater Reliability (IRR) = (Conversations where reviewers agree within ±10% ÷ Total conversations reviewed in calibration session) × 100
A score is "in agreement" if two independent reviewers give scores within 10 percentage points of each other on the same conversation. IRR above 85% is considered a well-calibrated program. Track IRR per rubric dimension to identify which criteria are consistently interpreted differently.
Support QA calibration sessions
Calculate inter-rater reliability
- 1Running calibration sessions only at program launch — calibration drift happens gradually. Run calibration sessions at least monthly, especially when new reviewers join.
- 2Calibrating only on easy, clear-cut conversations — calibrate on edge cases and disputed scores. Agreement on easy conversations doesn't confirm alignment on the hard ones.
- 3Treating calibration disagreements as reviewer errors — disagreements reveal rubric ambiguity. If two experienced reviewers disagree, the rubric criteria are likely under-defined.
- 4Not tracking calibration scores over time — IRR declining over months signals that standards are drifting and re-alignment is needed.