Eyetracking is commonly used to investigate attentional bias. Although some studies have investigated the internal consistency of eyetracking, data are scarce on the test–retest reliability and agreement of eyetracking to investigate attentional bias. This study reports the test–retest reliability, measurement error, and internal consistency of 12 commonly used outcome measures thought to reflect the different components of attentional bias: overall attention, early attention, and late attention.

Healthy participants completed a preferential-looking eyetracking task that involved the presentation of threatening (sensory words, general threat words, and affective words) and nonthreatening words. We used intraclass correlation coefficients (ICCs) to measure test–retest reliability (ICC > .70 indicates adequate reliability). The ICCs(2, 1) ranged from –.31to.71. Reliability varied according to the outcome measure and threat word category. Sensory words had a lower mean ICC (.08) than either affective words (.32) or general threat words (.29). A longer exposure time was associated with higher test–retest reliability. All of the outcome measures, except second-run dwell time, demonstrated low measurement error (<6%). Most of the outcome measures reported high internal consistency (α >.93). Recommendations are discussed for improving the reliability of eyetracking tasks in future research.


eyetracking, reliability, attentional bias, preferential looking, threat

