The program provides a unique opportunity to compare a large number of scorers with varied levels of experience to determine sleep stage scoring agreement. The objective is to examine areas of disagreement to inform future revisions of the AASM Manual for the Scoring of Sleep and Associated Events.
The sample included 9 record fragments, 1,800 epochs and more than 3,200,000 scoring decisions. More than 2,500 scorers, most with 3 or more years of experience, participated. The analysis determined agreement with the score chosen by the majority of scorers.
Sleep stage agreement averaged 82.6%. Agreement was highest for stage R sleep with stages N2 and W approaching the same level. Scoring agreement for stage N3 sleep was 67.4% and was lowest for stage N1 at 63.0%. Scorers had particular difficulty with the last epoch of stage W before sleep onset, the first epoch of stage N2 after stage N1 and the first epoch of stage R after stage N2. Discrimination between stages N2 and N3 was particularly difficult for scorers.
These findings suggest that with current rules, inter-scorer agreement in a large group is approximately 83%, a level similar to that reported for agreement between expert scorers. Agreement in the scoring of stages N1 and N3 sleep was low. Modifications to the scoring rules to improve scoring during sleep stage transitions may result in improvement.
A commentary on this article appears in this issue on page 89.
Rosenberg RS; Van Hout S. The American Academy of Sleep Medicine inter-scorer reliability program: sleep stage scoring. J Clin Sleep Med 2013;9(1):81–87.