Fig. 2.
This image summarizes decision studies’ predictions of generalizability coefficients for relative (classification) decisions according to the number of clinical encounters. To facilitate visualization, 90 and 80% reliability horizontal lines are included. For example, raw absolute scores are expected to produce 90% reliable assessments after 30 evaluated clinical encounters, whereas raw peer-relative scores are expected to produce 90% reliable assessments after 47 evaluated clinical encounters. In comparison, Z-transformed absolute scores are expected to produce 90% reliable assessments after 57 evaluated clinical encounters, whereas Z-transformed peer-relative scores are expected to produce 90% reliable assessments after 55 evaluated clinical encounters.