Reporting on Statistical Methods To Adjust for Confounding: A Cross-Sectional Survey Marcus Mu¨llner, MD; Hugh Matthews, BSc, MBBS; and Douglas G. Altman, DSc
Background: The use of complex statistical models to adjust for confounding was used. In 1 paper in 10, it was unclear which confounding is common in medical research. statistical method was used or for which variables adjustment was made. In 45% of papers, it was not clear how multicategory or
Objective: To determine the frequency and adequacy of adjust- continuous variables were treated in the analysis. Inadequate re- ment for confounding in medical articles. porting was less frequent if an author was affiliated with a de-
Design: Cross-sectional survey. partment of statistics, epidemiology, or public health and if arti- cles were published in journals with a high impact factor.
Setting: 34 scientific medical journals with a high impact factor.
Conclusions: Details of methods used to adjust for confounding
Measurements: Frequency of reporting on methods used to are frequently not reported in original research articles. adjust for confounding in 537 original research articles published in January 1998. Ann Intern Med. 2002;136:122-126. www.annals.org
Results: Of the 537 articles, 169 specified that adjustment for
For author affiliations, current addresses, and contributions, see end of text. Discovering the determinants of disease is often not
In this cross-sectional study, we sought to determine
straightforward, particularly if the disease or the
how frequently adjustment is reported in medical scien-
risk factor is rare or not easily recognized (1). In case–
tific articles and whether reporting is sufficiently de-
control, cohort, and other nonrandomized studies, the
groups being compared are likely to vary with respect toseveral demographic, clinical, and other characteristics.
In randomized trials, authors often try to demonstrate
Journal Selection
that the observed treatment effect is not explained by
Scientific medical journals were included if they
some difference in baseline characteristics (2). The def-
were published in English, were available in the British
inition of confounding is that there are alternative ex-
Medical Association’s library, and had an impact factor
planations for an observed association between a risk
(6) that placed it in the highest 20% of journals within
factor and a health outcome. This may occur when one
its medical specialty. We excluded review journals and
or more of these demographic or clinical characteristics
journals specializing in statistics, epidemiology, and
are associated with one another and with the outcome of
interest (3). Social class, for example, is known to beassociated with cardiovascular risk factors, such as smok-ing, serum cholesterol level, and leisure physical activity,
Article Selection
as well as with mortality. This difference in demo-
Two of the authors independently assessed all Jan-
graphic and clinical characteristics may account for
uary 1998 issues of the selected journals to identify full-
about half of the excess coronary and all-cause mortality
length original research articles. Short reports, scientific
in blue-collar workers compared with white-collar work-
letters, case reports, and review articles, and animal stud-
If confounding cannot be avoided at the design
stage of a study, disentangling the web of causation is
Data Collection
often difficult, and more or less complex statistical
We assessed whether adjustment for baseline vari-
methods are needed. Readers of published scientific ar-
ables or confounding was performed and whether the
ticles need to know whether and how the authors ap-
paper specified which method was used, the variables for
propriately adjusted for confounding (5).
which adjustment was made, and how these variables
122 15 January 2002 Annals of Internal Medicine Volume 136 • Number 2 www.annals.org
Reporting on Statistical Methods To Adjust for Confounding Brief Communication
Figure. Articles assessed for reporting of adjustment for confounding or baseline differences and whether adjustment was reported in the methods or results section.
were handled in the analysis (that is, whether it was
to 10, or Ͼ10)—was assessed by using chi-square tests
specified how continuous and multicategory variables
or chi-square tests for trend. We used multiple logistic
were entered into the statistical model). Inappropriate
regression to investigate simultaneously variables that
reporting was defined as not reporting or insufficiently
showed an association with inappropriate reporting.
reporting on one or more of these points. For simplicity,
Data were processed by using Excel 97 software (Mi-
we assumed that all multiple regression analyses were
crosoft Corp., Redmond, Washington) and Stata, re-
done to adjust for confounders, even though in some
lease 6 (Stata Corp., College Station, Texas).
cases they were done to identify prognostic variables. Each paper was independently assessed by two of the
authors. Agreement between the two authors was good
to very good ( ϭ 0.61 to 0.96). Disagreement was
Thirty-four journals met our inclusion criteria (Ap-
usually caused by oversight rather than differing opin-
pendix). The median impact factor of these journals in
ions. In case of disagreement or uncertainty, the third
1996 was 4.26 (interquartile range, 2.64 to 5.74). We
author (a senior statistician) was consulted.
identified 537 articles that fulfilled our inclusion criteria (Figure), of which 169 (32%) reported adjustment for confounding or baseline differences. Statistical Analysis
The univariate association between inappropriate
reporting and several variables—for example, impact
Reporting of Methods
factor (quartiles); at least one of the authors being affil-
Of the 169 articles, 152 (90%) appropriately re-
iated with department of statistics, epidemiology, or
ported methods to adjust for confounding (Figure), 7
public health; and number of authors (1 or 2, 3 to 5, 6
reported no method, and 10 mentioned but did not
www.annals.org
15 January 2002 Annals of Internal Medicine Volume 136 • Number 2 123
Brief Communication Reporting on Statistical Methods To Adjust for Confounding
Table. Association between Inadequate Reporting of Adjustment for Confounders and Study Characteristics in 169 Original Research Articles Characteristic Studies with Relative Risk (95% CI) Inadequate Reporting, n/n (%)
At least one author affiliated with department
of statistics, epidemiology, or public health
* Chi-square test for comparing two proportions or test for trend.
adequately specify a method. In these 10 articles, the
groups were comparable at baseline. In a table, the au-
authors used the phrases “multiple regression” or “mul-
thors report “baseline adjusted incidence of pulmonary
complications.” However, the reader is not told why
Of articles that specified the method, multiple logis-
adjustment was needed to present a main end point,
tic regression analysis was the most frequently used (n ϭ
how this adjustment was performed, or which of the 14
76), followed by multiple Cox proportional hazards
or more baseline variables were included (7).
models (n ϭ 43); multiple linear regression (n ϭ 33);and other methods (n ϭ 31), including stratified analy-
sis, partial correlation, direct and indirect standardiza-
A randomized, controlled trial compared two anti-
tion, mixed-effect modeling, and age-adjusted z-score.
biotics for the treatment of gonorrhea. The authors
Some papers included more than one method of ad-
stated that “After correction for baseline abnormalities
justed analysis, but each paper was counted only once in
there was no significant difference in laboratory abnor-
malities.” However, they did not indicate which baselineabnormalities were meant and how this correction was
Reporting of Variables
Of the 169 articles that reported adjustment for
confounding, 154 (91%) clearly specified variables for
which adjustment was made, and 93 (55%) clearly
A randomized, controlled trial compared balsalazide
stated how variables were handled (Figure). Only 93
with mesalamine in patients with acute ulcerative colitis.
articles (55%) met criteria for appropriate reporting: 51
The authors stated that “Logistic regression techniques
articles had one inadequacy, 17 articles had two inade-
were used to identify prognostic factors significantly as-
quacies, and 8 articles had three inadequacies.
sociated with remission.” The reader is told which vari-ables were finally significantly associated with the out-come but not which variables were entered into the
Examples of Inappropriate Reporting
A randomized, controlled trial investigated whether
surfactant administered to preterm infants reduces the
incidence of severe complications. The authors provided
Breathing patterns and respiratory muscle perfor-
great detail to demonstrate that treatment and control
mance measures during weaning were examined in a
124 15 January 2002 Annals of Internal Medicine Volume 136 • Number 2 www.annals.org
Reporting on Statistical Methods To Adjust for Confounding Brief Communication
case series of 17 patients receiving prolonged mechanical
the interpretation of the results. Transformation to ful-
ventilation. In the methods section, the authors state
fill the assumption of a normal distribution may or may
that “adjusted means were calculated for the variables
not have been performed (13). Likewise, for categorical
presenting a group effect,” but the reader is not told
variables, the definition and number of categories are
how or where this calculation was performed (10).
needed. Assuming a linear association when it is nonlin-ear may mask an association (14), having too few andbroad categories may lead to considerable residual con-
Determinants of Inappropriate Reporting
founding, and having too many categories may reduce
In 64 of the 169 articles (38%), at least one author
was a methodologist (that is, he or she was affiliated
The reasons for these shortcomings may be mani-
with a department of statistics, epidemiology, or public
fold. It is possible that all the necessary information was
health). Among papers with a methodologist-author, the
present before peer review and was omitted during the
rate of any inappropriate reporting was about half
publication process. However, errors and omissions are
that of papers without a methodologist-author (Table).
more likely the consequence of a system failure at many
The rate of inappropriate reporting tended to decrease
levels (17), including that of the authors, reviewers, stat-
as the journal impact factor increased, but this effect was
largely due to a lower rate of inappropriate reporting in
Although data analysis may be correct despite inap-
the journals in the highest quartile of impact factor. The
propriate reporting, such reporting leaves readers unable
number of authors was not associated with inadequate
to assess whether the data were processed appropriately.
Having a methodologist as an author seems to have a
A multiple logistic regression model with any inap-
“protective” effect, which is in accordance with the find-
propriate reporting as the dependent variable and meth-
ings of an earlier study (15). Why articles published in
odologist-author and impact factor (quartiles) as predic-
journals with a very high impact factor have a lower rate
tor variables showed that these two effects were largely
of inappropriate reporting remains a matter of specula-
independent (results not shown). Among the journals in
tion. It may relate to the fact that these journals more
the lower three quartiles of impact factor, 12 of 42
frequently use statistical reviewers than do lower-rank-
(29%) articles with a methodologist-author and 56 of
90 (62%) without a methodologist-author had inappro-
We suggest that readers, authors, referees, and edi-
priate reporting. In contrast, among the journals in the
tors try to assess whether original articles state which
top quartile of impact factor, the rate of inappropriate
statistical method was used to adjust for confounders,
reporting was similar among articles with and without a
for which variables adjustment was performed, and the
methodologist-author (3 of 15 [20%] articles vs. 5 of 22
way in which the variables were handled in the analysis.
[23%] articles, respectively). However, because thenumber of papers was small and this split by impactfactor was not prespecified, P values are not presented. APPENDIX
The following journals were included in the study: AmericanDISCUSSION Journal of Cardiology, American Journal of Medicine, American
Statistical methods are often misused, and poorly
Journal of Obstetrics and Gynecology, Anaesthesia, Anesthesiology,
presenting them leaves the reader unable to critically
Annals of Internal Medicine, Archives of Dermatology, Archives of
interpret the findings of an original research study (11,
Internal Medicine, BMJ, Blood, Brain, British Journal of Anaesthe-sia, British Journal of Cancer, British Journal of Dermatology, Brit-
12). Some studies use a selection procedure to reduce
ish Journal of Obstetrics and Gynaecology, British Journal of Surgery,
the number of variables for which adjustment is needed
Circulation, Critical Care Medicine, Gastroenterology, Gut, JAMA,
to those that are statistically significant. In such cases,
Journal of the American College of Cardiology, Journal of Gerontol-
authors should report all variables considered in addi-
ogy, Journal of the National Cancer Institute, Journal of Pediatrics,
tion to those for which adjustment was actually made. Journal of Rheumatology, Kidney International, The Lancet, New
Not reporting whether variables are treated as con-
England Journal of Medicine, Neurology, Pediatrics, Thorax,
tinuous or as categorical data may make a difference in
Thrombosis and Haemostasis, and Transplantation. www.annals.org
15 January 2002 Annals of Internal Medicine Volume 136 • Number 2 125
Brief Communication Reporting on Statistical Methods To Adjust for Confounding
From BMJ, London, and ICRF Medical Statistics Group, Oxford,
5. Uniform requirements for manuscripts submitted to biomedical journals.
International Committee of Medical Journal Editors. Ann Intern Med 1997; 126:34-47. [PMID: 8992922] 6. Garfield E. How can impact factors be improved? BMJ. 1996;313:411-3. Acknowledgments: The authors thank the BMJ staff, particularly Rich-
ard Smith, for providing the environment that enabled this research
7. Lotze A, Mitchell BR, Bulas DI, Zola EM, Shalwitz RA, Gunkel JH. Mul-
ticenter study of surfactant (beractant) use in the treatment of term infants withsevere respiratory failure. Survanta in Term Infants Study Group. J Pediatr. 1998;
Requests for Single Reprints: Marcus Mu¨llner, MD, Universita¨tsklinik
fu¨r Notfallmedizin, Allgemeines Krankenhaus Wien, Wa¨hringer Gu¨rtel
8. Jones RB, Schwebke J, Thorpe EM Jr, Dalu ZA, Leone P, Johnson RB.
18-20/6D, A-1090 Vienna, Austria; e-mail, [email protected].
Randomized trial of trovafloxacin and ofloxacin for single-dose therapy of gon-orrhea. Trovafloxacin Gonorrhea Study Group. Am J Med. 1998;104:28-32. Current Author Addresses: Dr. Mu¨llner: Universita¨tsklinik fu¨r Notfall-
medizin, Allgemeines Krankenhaus Wien, Wa¨hringer Gu¨rtel 18-20/6D,
9. Green JR, Lobo AJ, Holdsworth CD, Leicester RJ, Gibson JA, Kerr GD, et al. Balsalazide is more effective and better tolerated than mesalamine in the treat- ment of acute ulcerative colitis. The Abacus Investigator Group. Gastroenterol-
Mr. Matthews: Sandbanks, Graveney, Faversham, Kent ME13 9DJ,
10. Capdevila X, Perrigault PF, Ramonatxo M, Roustan JP, Peray P, d’Athis F,
Dr. Altman: ICRF Medical Statistics Group, Centre for Statistics in
et al. Changes in breathing pattern and respiratory muscle performance parame-
Medicine, Institute of Health Sciences, Old Road, Headington, Oxford
ters during difficult weaning. Crit Care Med. 1998;26:79-87. [PMID: 9428547]
11. Bender R, Grouven U. Logistic regression models used in medical research are poorly presented [Letter]. BMJ. 1996;313:628. [PMID: 8806274] Author Contributions: Conception and design: M. Mu¨llner, H. Mat-
12. Khan KS, Chien PF, Dwarakanath LS. Logistic regression models in obstet-
rics and gynecology literature. Obstet Gynecol. 1999;93:1014-20. [PMID:
Analysis and interpretation of the data: M. Mu¨llner, D.G. Altman.
Drafting of the article: M. Mu¨llner, H. Matthews, D.G. Altman.
13. Bland JM, Altman DG. Transforming data. BMJ. 1996;312:770. [PMID:
Critical revision of the article for important intellectual content: M.
Mu¨llner, H. Matthews, D.G. Altman.
14. Katz MH. Multivariable Analysis: A Practical Guide for Clinicians. New
Final approval of the article: M. Mu¨llner, H. Matthews, D.G. Altman.
Statistical expertise: M. Mu¨llner, D.G. Altman.
15. Altman DG, De Stavola BL, Love SB, Stepniewska KA. Review of survival
Administrative, technical, or logistic support: M. Mu¨llner.
analyses published in cancer journals. Br J Cancer. 1995;72:511-8. [PMID:
Collection and assembly of data: M. Mu¨llner, H. Matthews.
7640241] 16. Brenner H. A potential pitfall in control of covariates in epidemiologic stud- ies. Epidemiology. 1998;9:68-71. [PMID: 9430271] References
17. Reason J. Human error: models and management. BMJ. 2000;320:768-70. [PMID: 10720363]
1. Hill AB. The environment and disease: association or causation? Journal of the
18. Godlee F, Gale CR, Martyn CN. Effect on the quality of peer review of
Royal Society of Medicine. 1965;58:295-300.
blinding reviewers and asking them to sign their reports: a randomized controlled
2. Altman DG. Adjustment for covariate imbalance. In: Armitage P, Colton T,
trial. JAMA. 1998;280:237-40. [PMID: 9676667]
eds. Encyclopaedia of Biostatistics. New York: Wiley; 1998:1000-5.
19. Goodman SN, Altman DG, George SL. Statistical reviewing policies of
3. Hennekens CH, Buring JE, Mayrent SH. Epidemiology in Medicine. Boston:
medical journals: caveat lector? J Gen Intern Med. 1998;13:753-6. [PMID:
4. Pekkanen J, Tuomilehto J, Uutela A, Vartiainen E, Nissinen A. Social class, health behaviour, and mortality among men and women in eastern Finland.
2002 American College of Physicians–American Society of Internal
126 15 January 2002 Annals of Internal Medicine Volume 136 • Number 2 www.annals.org
Reactions to the Singapore Budget 2013 Statement Mr Tham Sai Choy, Managing Partner, KPMG in Singapore The focus of Budget 2013 is on creating a better Singapore and a more inclusive society for all Singaporeans. This is a laudable goal, and demonstrates that the Government has listened carefully to feedback from all segments of society. It is heartening to see help for Singaporeans wh
UNE PANDÉMIE DE PROFIT Quels sont les intérêts économiques derrière cette grippe porcine où grippe AH1N1 dont on nous rabat les oreilles ? Pourtant, 1 million de personnes par an meurent dans le monde de la MALARIA, qui pourrait être prévenue avec une simple moustiquaire. Les journaux n’en parlent pas! 2 millions d’enfants par an meurent dans le monde de la DIARRHÉE, alor