Readings in ROC Analysis, with Emphasis on Medical Applications

Some Papers appear more than once because they belong to multiple classifications

Background

Egan JP. Signal detection theory and ROC analysis. New York: Academic Press, 1975.
Fryback DG, Thornbury JR. The efficacy of diagnostic imaging. Med Decis Making 1991; 11: 88.
Green DM, Swets JA. Signal detection theory and psychophysics. New York, NY: Wiley, 1966.
Griner PF, Mayewski RJ, Mushlin AI, Greenland P. Selection and interpretation of diagnostic tests and procedures: principles and applications. Annals Int Med 1981; 94: 553.
International Commission on Radiation Units and Measurements. Medical imaging: the assessment of image quality (ICRU Report 54). Bethesda,MD: ICRU, 1996.
Lusted LB. Signal detectability and medical decision-making. Science 1971; 171: 1217.
McNeil BJ, Adelstein SJ. Determining the value of diagnostic and screening tests. J Nucl Med 1976; 17: 439.
McNeil BJ, Keeler E, Adelstein SJ. Primer on certain elements of medical decision making. New Engl J Med 1975; 293: 211.
Metz CE, Wagner RF, Doi K, Brown DG, Nishikawa RN, Myers KJ. Toward consensus on quantitative assessment of medical imaging systems. Med Phys 22: 1057-1061, 1995.
National Council on Radiation Protection and Measurements. An introduction to efficacy in diagnostic radiology and nuclear medicine (NCRP Commentary 13). Bethesda, MD: NCRP, 1995.
Robertson EA, Zweig MH, Van Steirtghem AC. Evaluating the clinical efficacy of laboratory tests. Am J Clin Path 1983; 79: 78.
Swets JA, Pickett RM, Whitehead SF, et al. Assessment of diagnostic technologies. Science 1979; 205:753–759.
Swets JA, Pickett RM. Evaluation of diagnostic systems: methods from signal detection theory. New York, NY: Academic Press, 1982.
Wagner RF, Metz CE, Campbell G. Assessment of medical imaging systems and computer aids: a tutorial review. Acad Radiol 2007; 14: 723–748.
Zweig MH, Campbell G. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clinical Chemistry 1993; 39: 561. [Erratum published in Clinical Chemistry 1993; 39: 1589.]

General
Books

Pepe MS. The statistical evaluation of medical tests for classification and prediction. Oxford ; New York: Oxford University Press, 2004.
Zhou X-H, Obuchowski NA, McClish DK. Statistical methods in diagnostic medicine. New York, NY: Wiley-Interscience, 2002

Articles

Hanley JA. Receiver operating characteristic (ROC) methodology: the state of the art. Critical Reviews in Diagnostic Imaging 1989; 29: 307.
International Commission on Radiation Units and Measurements. Receiver Operating Characteristic Analysis in Medical Imaging (ICRU Report 79). J ICRU 2008; 8:1–62.
King JL, Britton CA, Gur D, Rockette HE, Davis PL. On the validity of the continuous and discrete confidence rating scales in receiver operating characteristic studies. Invest Radiol 1993; 28: 962.
Metz CE. Basic principles of ROC analysis. Seminars in Nucl Med 1978; 8: 283.
Metz CE. ROC methodology in radiologic imaging. Invest Radiol 1986; 21: 720.
Metz CE. Some practical issues of experimental design and data analysis in radiological ROC studies. Invest Radiol 1989; 24: 234.
Metz CE. Evaluation of CAD methods. In Computer-Aided Diagnosis in Medical Imaging (K Doi, H MacMahon, ML Giger and KR Hoffmann, eds.). Amsterdam: Elsevier Science (Excerpta Medica International Congress Series, Vol. 1182), pp. 543-554, 1999.
Metz CE. Fundamental ROC analysis. In: Handbook of Medical Imaging, Vol. 1: Physics and Psychophysics (J Beutel, H Kundel and R Van Metter, eds.). Bellingham, WA; SPIE Press, 2000, pp. 751-769.
Metz CE. Receiver operating characteristic (ROC) analysis: a tool for quantitative evaluation of observer performance and imaging systems. JACR 3: 413-422, 2006
Metz CE, Shen J-H. Gains in accuracy from replicated readings of diagnostic images: prediction and assessment in terms of ROC analysis. Med Decis Making 1992; 12: 60.
Rockette HE, Gur D, Metz CE. The use of continuous and discrete confidence judgments in receiver operating characteristic studies of diagnostic imaging techniques. Invest Radiol 1992; 27: 169.
Swets JA. ROC analysis applied to the evaluation of medical imaging techniques. Invest Radiol 1979; 14: 109.
Swets JA. Indices of discrimination or diagnostic accuracy: their ROCs and implied models. Psychol Bull 1986; 99: 100.
Swets JA. Measuring the accuracy of diagnostic systems. Science 1988; 240: 1285.
Swets JA. Signal detection theory and ROC analysis in psychology and diagnostics: collected papers. Mahwah, NJ; Lawrence Erlbaum Associates, 1996.
Swets JA, Pickett RM. Evaluation of diagnostic systems: methods from signal detection theory. New York: Academic Press, 1982.
Wagner RF, Beiden SV, Metz CE. Continuous vs. categorical data for ROC analysis: Some quantitative considerations. Academic Radiol 2001, 8: 328, 2001.

Bias

Begg CB, Greenes RA. Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics 1983; 39: 207.
Begg CB, McNeil BJ. Assessment of radiologic tests: control of bias and other design considerations. Radiology 1988; 167: 565.
Gray R, Begg CB, Greenes RA. Construction of receiver operating characteristic curves when disease verification is subject to selection bias. Med Decis Making 1984; 4: 151.
Ransohoff DF, Feinstein AR. Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. New Engl J Med 1978; 299: 926.

Curve Fitting

Dorfman DD, Alf E. Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals — rating method data. J Math Psych 1969; 6: 487.
Dorfman DD, Berbaum KS, Metz CE, Lenth RV, Hanley JA, Dagga HA. Proper ROC analysis: the bigamma model. Academic Radiol 1997; 4: 138.
Dorfman DD, Berbaum KS. A contaminated binormal model for ROC data: Part II. A formal model. Acad Radiol 2000; 7:427-437.
Grey DR, Morgan BJT. Some aspects of ROC curve-fitting: normal and logistic models. J Math Psych 1972; 9: 128.
Hanley JA. The robustness of the “binormal” assumptions used in fitting ROC curves. Med Decis Making 1988; 8: 197.
Lloyd CJ. Estimation of a convex ROC curve. Stat Prob Lett 2002; 59: 99–111.
Metz CE, Herman BA, Shen J-H. Maximum-likelihood estimation of ROC curves from continuously-distributed data. Stat Med 1998; 17: 1033.
Metz CE, Pan X. “Proper” binormal ROC curves: theory and maximum-likelihood estimation. J Math Psych 1999; 43: 1.
Ogilvie J, Creelman CD. Maximum likelihood estimaton of receiver operating characteristic curve parameters. Journal of Mathematical Psychology. 1968;5:377-391
Pan X, Metz CE. The “proper” binormal model: parametric ROC curve estimation with degenerate data. Academic Radiol 1997; 4: 380.
Pesce LL, Metz CE. Reliable and computationally efficient maximum-likelihood estimation of “proper” binormal ROC curves. Acad Radiol. 2007;14(7):814-29
Swensson RG. Unified measurement of observer performance in detecting and localizing target objects on images. Med Phys 1996; 23: 1709.
Swets JA. Form of empirical ROCs in discrimination and diagnostic tasks: implications for theory and measurement of performance. Psychol Bull 1986; 99: 181.

Statistics

Multi-Case statistical analysis: only case variation considered
Agresti A. A survey of models for repeated ordered categorical response data. Statistics in Medicine 1989; 8; 1209.
Bamber D. The area above the ordinal dominance graph and the area below the receiver operating graph. J Math Psych 1975; 12: 387.
Bandos AI, Rockette HE, Gur D. A permutation test sensitive to differences in areas for comparing ROC curves from a paired design. STATISTICS IN MEDICINE 24 (18): 2873-2893 SEP 30 2005
DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988; 44: 837.
Hajian-Tilaki KO, Hanley JA. Comparison of three methods for estimating the standard error of the area under the curve in ROC analysis of quantitative data. ACADEMIC RADIOLOGY 9 (11): 1278-1285 NOV 2002
Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143: 29.
Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983; 148: 839.
Jiang Y, Metz CE, Nishikawa RM. A receiver operating characterisitc partial area index for highly sensitive diagnostic tests. Radiology 1996; 201: 745.
Ma G, Hall WJ. Confidence bands for receiver operating characteristic curves. Med Decis Making 1993; 13: 191.
McClish DK. Analyzing a portion of the ROC curve. Med Decis Making 1989; 9: 190.
McClish DK. Determining a range of false-positive rates for which ROC curves differ. Med Decis Making 1990; 10: 283.
McNeil BJ, Hanley JA. Statistical approaches to the analysis of receiver operating characteristic (ROC) curves. Med Decis Making 1984; 4: 137.
Metz CE. Statistical analysis of ROC data in evaluating diagnostic performance. In: Multiple regression analysis: applications in the health sciences (D Herbert and R Myers, eds.). New York: American Institute of Physics, 1986, pp. 365.
Metz CE. Quantification of failure to demonstrate statistical significance: the usefulness of confidence intervals. Invest Radiol 1993; 28: 59.
Metz CE, Herman BA, Roe CA. Statistical comparison of two ROC curve estimates obtained from partially-paired datasets. Med Decis Making 1998; 18: 110.
Metz CE, Kronman HB. Statistical significance tests for binormal ROC curves. J Math Psych 1980; 22: 218.
Metz CE, Wang P-L, Kronman HB. A new approach for testing the significance of differences between ROC curves measured from correlated data. In: Information processing in medical imaging (F Deconinck, ed.). The Hague: Nijhoff, 1984, p. 432.
Thompson ML, Zucchini W. On the statistical analysis of ROC curves. Statistics in Medicine 1989; 8: 1277.
Wieand S, Gail MH, James BR, James KL. A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika 1989; 76: 585.
Zhou XH, Gatsonis CA. A simple method for comparing correlated ROC curves using incomplete data. Statistics in Medicine 1996; 15: 1687-1693.
Multi-Reader Multi-Case statistical analysis
Bandos AI, Rockette HE, Gur D. A permutation test for comparing ROC curves in multireader studies ACADEMIC RADIOLOGY 13 (4): 414-420 APR 2006
Beiden SV, Wagner RF, Campbell G. Components-of-variance models and multiple-bootstrap experiments: and alternative method for random-effects, receiver operating characteristic analysis. Academic Radiol. 2000; 7: 341.
Beiden SV, Wagner RF, Campbell G, Metz CE, Jiang Y. Components-of-variance models for random-effects ROC analysis: The case of unequal variance structures across modalities. Academic Radiol. 2001; 8: 605.
Beiden SV, Wagner RF, Campbell G, Chan H-P. Analysis of uncertainties in estimates of components of variance in multivariate ROC analysis. Academic Radiol. 2001; 8: 616.
Dorfman DD, Berbaum KS, Metz CE. ROC rating analysis: generalization to the population of readers and cases with the jackknife method. Invest Radiol 1992; 27: 723.
Dorfman DD, Berbaum KS, Lenth RV, Chen Y-F, Donaghy BA. Monte Carlo validation of a multireader method for receiver operating characteristic discrtet rating data: factorial experimental design. Academic Radiol 1998; 5: 591.
Dorfman DD, Metz CE. Multi-reader multi-case ROC analysis: comments on Begg’s commentary. Academic Radiol 1995; 2 (Supplement 1): S76.
Gallas BD One-shot estimate of MRMC variance: AUC. ACADEMIC RADIOLOGY 13 (3): 353-362 MAR 2006
Hillis SL, Obuchowski NA, Schartz KM, Berbaum KS. A comparison of the Dorfman-Berbaum-Metz and Obuchowski-Rockette methods for receiver operating characteristic (ROC) data. Stat Med 2005; 24:1579-1607.
Hillis SL, Berbaum KS. Monte Carlo validation of the Dorfman-Berbaum-Metz method using normalized pseudovalues and less data-based model simplification. Academic Radiology 2005; 12:1534-1541.
Hillis SL, Berbaum KS Power estimation for the Dorfman-Berbaum-Metz method ACADEMIC RADIOLOGY 11 (11): 1260-1273 NOV 2004
Obuchowski NA. Multireader, multimodality receiver operating characteristic curve studies: hypothesis testing and sample size estimation using an analysis of variance approach with dependent observations. Academic Radiol 1995; 2 [Supplement 1]: S22.
Obuchowski, NA. Sample size calculations in studies of test accuracy. Stat Methods Med Res 1998; 7: 371.
Obuchowski NA, Beiden SV, Berbaum KS, et al. Multireader, multicase receiver operating characteristic analysis: An empirical comparsion of five methods ACADEMIC RADIOLOGY 11 (9): 980-995 SEP 2004
Rockette HE, Obuchowski N, Metz CE, Gur D. Statistical issues in ROC curve analysis. Proc SPIE 1990; 1234: 111.
Roe CA, Metz CE. The Dorfman-Berbaum-Metz method for statistical analysis of multi-reader, multi-modality ROC data: validation by computer simulation. Academic Radiol 1997; 4: 298.
Roe CA, Metz CE. Variance-component modeling in the analysis of receiver operating characteristic index estimates. Academic Radiol 1997; 4: 587.
Regression analysis of ROC curves Pepe MS. The statistical evaluation of medical tests for classification and prediction. Oxford ; New York: Oxford University Press, 2004.
Pepe MS, Cai TX. The analysis of placement values for evaluating discriminatory measures. BIOMETRICS 60 (2): 528-535 JUN 2004
Toledano A, Gatsonis CA. Regression analysis of correlated receiver operating characteristic data. Academic Radiol 1995; 2 [Supplement 1]: S30.
Toledano AY, Gatsonis C. Ordinal regression methodology for ROC curves derived from correlated data. Statistics in Medicine 1996, 15: 1807.
Toledano AY, Gatsonis C. GEEs for ordinal categorical data: arbitrary patterns of missing responses and missingness in a key covariate. Biometrics 1999; 22, 488.
Tosteson A, Begg C. A general regression methodology for ROC curve estimation. Med Decis Making 1988; 8: 204.

Relationships with Cost/Benefit Analysis

Halpern EJ, Alpert M, Krieger AM, Metz CE, Maidment AD. Comparisons of ROC curves on the basis of optimal operating points. Academic Radiology 1996; 3: 245-253.
Metz CE. Basic principles of ROC analysis. Seminars in Nucl Med 1978; 8: 283-298.
Metz CE, Starr SJ, Lusted LB, Rossmann K. Progress in evaluation of human observer visual detection performance using the ROC curve approach. In: Information Processing in Scintigraphy (C Raynaud and AE Todd-Pokropek, eds.). Orsay, France: Commissariat à l’Energie Atomique, Département de Biologie, Service Hospitalier Frédéric Joliot, 1975, p. 420.
Phelps CE, Mushlin AI. Focusing technology assessment. Med Decis Making 1988; 8: 279.
Sainfort F. Evaluation of medical technologies: a generalized ROC analysis. Med Decis Making 1991; 11: 208.
Wagner RE, Beam CA, Beiden SV. Reader variability in mammography and its implications for expected utility over the population of readers and cases. MEDICAL DECISION MAKING 24 (6): 561-572 NOV-DEC 2004

Generalizations

Anastasio MA, Kupinski MA, Nishikawa RN. Optimization and FROC analysis of rule-based detection schemes using a multiobjective approach. IEEE Trans Med Imaging 1998; 17: 1089
Bunch PC, Hamilton JF, Sanderson GK, Simmons AH. A free response approach to the measurement and characterization of radiographic observer performance. Proc SPIE 1997; 127: 124.
Chakraborty DP. Maximum likelihood analysis of free-response receiver operating characteristic (FROC) data. Med Phys 1989; 16: 561.
Chakraborty DP, Winter LHL. Free-response methodology: alternate analysis and a new observer-performance experiment. Radiology 1990; 174: 873.
Chakraborty DP, Berbaum KS. Observer studies involving detection and localization: Modeling, analysis and validation. Medical Physics 2004; 31:2313-2330.
Chakraborty DP. A search model and figure of merit for observer data acquired according to the free-response paradigm. Phys. Med. Biol. 2006; 51:3449-3462.
Chakraborty DP. ROC Curves predicted by a model of visual search. Phys. Med. Biol. 2006; 51:3463-3482.
Edwards DC, Metz CE. Evaluating Bayesian ANN estimates of ideal observer decision variables by comparison with identity functions. Proc. SPIE 5749: 174-182, 2005.
Edwards DC, Metz CE. Optimization of an ROC hypersurface constructed only from an observer’s within-class sensitivities. Proc. SPIE 6146: 61460A1-61460A7, 2006.
Edwards DC, Metz CE. Analysis of proposed three-class classification decision rules in terms of the ideal observer decision rule. J. Math. Psych. (in press), 2006.
Egan JP, Greenberg GZ, Schulman AI. Operating characteristics, signal detection, and the method of free response. J Acoust Soc Am 1961; 33: 993.
HajianTilaki KO, Hanley JA, Joseph L, et al. Extension of receiver operating characteristic analysis to data concerning multiple signal detection tasks. ACADEMIC RADIOLOGY 4 (3): 222-229 MAR 1997
Metz CE, Starr SJ, Lusted LB. Observer performance in detecting multiple radiographic signals: prediction and analysis using a generalized ROC approach. Radiology 1976; 121: 337.
Obuchowski NA, Lieber ML, Powell KA.Data analysis for detection and localization of multiple abnormalities with application to mammography. ACADEMIC RADIOLOGY 7 (7): 516-525 JUL 2000
Starr SJ, Metz CE, Lusted LB, Goodenough DJ. Visual detection and localization of radiographic images. Radiology 1975; 116: 533.
Swensson RG. Unified measurement of observer performance in detecting and localizing target objects on images. Med Phys 1996; 23: 1709.

Papers related specifically to our Current Software
ROCKIT

Dorfman DD, Alf E. Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals — rating method data. J Math Psych 1969; 6: 487.
Metz CE, Herman BA, Shen J-H. Maximum-likelihood estimation of ROC curves from continuously-distributed data. Stat Med 1998; 17: 1033.
Metz CE, Herman BA, Roe CA. Statistical comparison of two ROC curve estimates obtained from partially-paired datasets. Med Decis Making 1998; 18: 110.
Metz CE. Statistical analysis of ROC data in evaluating diagnostic performance. In: Multiple regression analysis: applications in the health sciences (D Herbert and R Myers, eds.). New York: American Institute of Physics, 1986, pp. 365.
Metz CE. Quantification of failure to demonstrate statistical significance: the usefulness of confidence intervals. Invest Radiol 1993; 28: 59.

LABMRMC & MRMC

Dorfman DD, Berbaum KS, Metz CE. ROC rating analysis: generalization to the population of readers and cases with the jackknife method. Invest Radiol 1992; 27: 723.
Dorfman DD, Metz CE. Multi-reader multi-case ROC analysis: comments on Begg’s commentary. Academic Radiol 1995; 2 (Supplement 1): S76.
Roe CA, Metz CE. The Dorfman-Berbaum-Metz method for statistical analysis of multi-reader, multi-modality ROC data: validation by computer simulation. Academic Radiol 1997; 4: 298.
Roe CA, Metz CE. Variance-component modeling in the analysis of receiver operating characteristic index estimates. Academic Radiol 1997; 4: 587.
Hillis SL, Obuchowski NA, Schartz KM, Berbaum KS. A comparison of the Dorfman-Berbaum-Metz and Obuchowski-Rockette methods for receiver operating characteristic (ROC) data. Stat Med 2005; 24:1579-1607.
Hillis SL, Berbaum KS. Monte Carlo validation of the Dorfman-Berbaum-Metz method using normalized pseudovalues and less data-based model simplification. Academic Radiology 2005; 12:1534-1541.

LABROC4

Metz CE, Herman BA, Shen J-H. Maximum-likelihood estimation of ROC curves from continuously-distributed data. Stat Med 1998; 17: 1033.

ROCPWR

Metz CE, Wang P-L, Kronman HB. A new approach for testing the significance of differences between ROC curves measured from correlated data. In: Information processing in medical imaging (F Deconinck, ed.). The Hague: Nijhoff, 1984, p. 432.

PROPROC

Pan X, Metz CE. The “proper” binormal model: parametric ROC curve estimation with degenerate data. Academic Radiol 1997; 4: 380.
Metz CE, Pan X. “Proper” binormal ROC curves: theory and maximum-likelihood estimation. J Math Psych 1999; 43: 1.
Pesce LL, Metz CE. Reliable and computationally efficient maximum-likelihood estimation of “proper” binormal ROC curves. Acad Radiol 2007; 14:814–829.

Metz's ROC Software

Readings in ROC Analysis, with Emphasis on Medical Applications

Receiver Operating Characteristic (ROC) analysis