Interobserver Agreement Test

The differences between the two measured SUVmax values in Study 1 were all less than one in absolute terms, with the exception of patient values 3, 5, 10, 23 and 26 (Additional File 2). The estimated average difference between readings 1 and 2 was 0.43 (95% CI: $0.02 to $0.88; Table 1) and the Bland-Altman agreements were $2.03 and $2.89. According to the Bland-Altman plot (Figure 2, upper part), the variance of differences across the range of measured values seemed fairly homogeneous, but an upward trend with the average measurement was visible. However, this trend seemed to be triggered by an outlier, whose distance would mean that the trend towards Bradley Blackwood`s regression line would disappear (Fig. 2, lower panel). Eliminating this outlier would further halve the estimated average difference between the measured values (0.24) and lead to a smaller Bland-Altman band (1.13 to 1.60). Kalantri et al. considered the accuracy and reliability of the pallor as a tool for detecting anemia. [5] They concluded that „clinical evaluation of pallor in cases of severe anaemia may exclude and govern modestly.“ However, the inter-observer agreement for pallor detection was very poor (Kappa values -0.07 for conjunctiva pallor and 0.20 for tongue pallor), meaning that pallor is an unreliable sign of diagnosis of anemia. Zaki R, Bulgiba A, Ismail R, Ismail NA.

Statistical methods of measuring compliance with medical devices measuring variables continuously in comparative studies of methods: a systematic verification. PLoS ONE. 2012;7 (5):e37908. The application of the VCA to the simplest stop of agreement evaluation in PET studies – the study of intra-observer variability when examining the differences between measurements between species; Bradley EL, Blackwood LG. Comparison of coupled data: a simultaneous test of average values and variances. American statistician. 1989;43(4):234–5. In contract studies that focus solely on the difference between the different measures, as in our study 1, the data are ideally presented using Bland-Altman plots, possibly optimized by the transformation of the original data log and the consideration of heterogeneity and/or trends on the scale of measurement [10, 20].

In Study 1, we observed the duality between the Bland-Altman boundaries of concordance on one side and the corresponding RC on the other. In fact, several authors of recent contract studies have defined the repeatability coefficient (or reproducibility coefficient) at 1.96 times the standard deviation of type differences [21-25], algebraic equal to 2.77 times the standard deviation within the subject in simple settings, such as our study 1. Lodge et al. designated the RC as 2.77 times the standard deviation within the subject [26]. Bland JM, DG Altman. The measurement agreement in method comparison studies. Med Res Stat Methods. 1999;8 2):135–60. The correspondence between the measurements refers to the degree of correspondence between two (or more) measures. Statistical methods used to verify compliance are used to assess the variability of inter-variability or to decide whether one variable measurement technique can replace another. In this article, we examine statistical measures of compliance for different types of data and discuss the differences between them and those for assessing correlation.

We consider repeatability as an agreement and not as a background assessment (see appendix), while the ICC is sometimes used as an assessment of repeatability [38, 39]. Since CCI is highly dependent on absenteeism and can produce high values for heterogeneous patient groups [30, 31], it must be used exclusively to assess reliability. We hope to be able to contribute in the future to a more concerted use of terms, in line with published guidelines for reporting on reliability and agreement studies [6]. In addition, we assume that the greatest challenge is most likely to clearly understand a clear understanding of the question a researcher answers before making an agreement or a reliability study.