Reliability sequential testing

1/8/2024

In this paper, we develop multistage testing procedures using two-way ANOVA for hypotheses concerning the intraclass correlation coefficient in a inter-rater reliability study. Under one-way ANOVA, the sums of squares in the estimation of the intraclass correlation coefficient possess independent increments, thus simplifying the calculation of stopping boundaries ( Liu, Schisterman, and Wu (2006)). In reliability studies evaluating the measurement error by applying the one-way ANOVA model, the multistage group sequential designs were proposed. Motivated by the idea of sequential testing that is widely used in clinical trials, it is natural to adopt and extend these sequential testing methods in the design and analysis of reliability studies to reduce the sample size and study cost. However, the early development for group sequential methods came from Pocock (1977), O’Brien and Fleming (1979), and Lan and DeMets (1983). Armitage (1954, 1958) and Bross (1952, 1958) introduced the use of sequential methods in the medical field.

Gradually, in studies involving human subjects, specialized statistical methods were called for to balance the ethical and financial advantages of stopping a study early against the risk of an incorrect conclusion ( Jennison and Turnbull (2000)). Sequential methods were introduced in response to demands for more efficient testing of anti-aircraft gunnery during World War II, culminating in Wald’s development of the sequential probability ratio test which had an immediate impact on weapons testing ( Wald (1947) Siegmund (1985) Lai (2004)). One approach is to find the size such that the power of statistical test is met ( Donner and Eliasziw (1987) Walter, Eliasziw, and Donner (1998)) while the other is to assure the precision of estimation ( Shoukri, Asyali, and Walter (2003) Bonett (2002) Saito et al. Moreover, when raters can be considered as random sample from the target population of raters, the two-way random effects model can be applied ( Fleiss (1999) Zou and McDermott (1999) Cappelleri and Ting (2003) Tian and Cappelleri (2003)).įor fixed sample design, there are two main approaches available to determine the sample size. In actual inter-rater reliability studies, multiple raters usually evaluate multiple subjects and the two-way analysis of variance (ANOVA) model is considered to be the appropriate analytical model. Shrout and Fleiss (1979) proposed a set of guidelines for choosing the appropriate model in reliability studies. There are various versions for intraclass correlation coefficient derived from different statistical models. A common measure of the reliability of measurements is the so-called intraclass correlation coefficient, with larger values indicating higher level of consistency. Inter-rater reliability refers to the reproducibility of the raters when repeated at random on the same subject or specimen under the same condition by the same rater or different ones, either simultaneously or at several time points.

Measurement error can seriously affect statistical analysis and interpretation it therefore becomes important to assess the amount of such error by calculating a reliability index ( Shrout and Fleiss (1979)). The analysis of reliability is a common feature in research and practice since most measurements involve measurement errors, particularly those made by humans. Inter-rater reliability studies are conducted to investigate the reproducibility and level of agreement on assessments among different raters.

0 Comments

Reliability sequential testing

Leave a Reply.

Author

Archives

Categories