×
Methodology & References
1. Statistical Tests Used
For Continuous Data (0-10):
We utilize the Intraclass Correlation Coefficient (ICC), specifically the "Two-way random effects, absolute agreement" model (ICC 2,k). This is the gold standard for medical OSCEs as it accounts for both systematic bias and random error.
For Pass/Fail Data (Categorical):
We utilize Fleiss' Kappa. Unlike simple percent agreement, Kappa accounts for the possibility of agreement occurring by chance.
2. Hawk vs. Dove Bias
This visual analysis calculates the Z-Score Deviation of each rater.
- Hawk (Strict): Negative deviation (scores lower than mean/expert).
- Dove (Lenient): Positive deviation (scores higher than mean/expert).
3. References
- Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin.
- Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin.
- Kottner, J., et al. (2011). Guidelines for Reporting Reliability and Agreement Studies (GRRAS). International Journal of Nursing Studies.