November 2010
Peter Congdon. Bayesian Statistical Modelling. 2nd ed. Wiley; 2007. Description: A fairly technical book, but what book about Bayesian methods is not? The first chapter provides a detailed explanation of Markov Chain Monte Carlo. The remaining chapter provide some very sophisticated examples. Excerpt: “Bayesian methods combine the evidence from the data at hand with previous quantitative knowledge to analyse practical problems in a wide range of areas. The calculations were previously complex, but it is now possible to routinely apply Bayesian methods due to advances in computing technology and the use of new sampling methods for estimating parameters. Such developments together with the availability of freeware such as WINBUGS and R have facilitated a rapid growth in the use of Bayesian methods, allowing their application in many scientific disciplines, including applied statistics, public health research, medical science, the social sciences and economics. Following the success of the first edition, this reworked and updated book provides an accessible approach to Bayesian computing and analysis, with an emphasis on the principles of prior selection, identification and the interpretation of real data sets. The second edition: * Provides an integrated presentation of theory, examples, applications and computer algorithms. * Discusses the role of Markov Chain Monte Carlo methods in computing and estimation. * Includes a wide range of interdisciplinary applications, and a large selection of worked examples from the health and social sciences. * Features a comprehensive range of methodologies and modelling techniques, and examines model fitting in practice using Bayesian principles. * Provides exercises designed to help reinforce the reader�s knowledge and a supplementary website containing data sets and relevant programs. Bayesian Statistical Modelling is ideal for researchers in applied statistics, medical science, public health and the social sciences, who will benefit greatly from the examples and applications featured. The book will also appeal to graduate students of applied statistics, data analysis and Bayesian methods, and will provide a great source of reference for both researchers and students.” Peter D. Congdon. Applied Bayesian Hierarchical Methods. Chapman and Hall/CRC; 2010. Excerpt: “The use of Markov chain Monte Carlo (MCMC) methods for estimating hierarchical models involves complex data structures and is often described as a revolutionary development. An intermediate-level treatment of Bayesian hierarchical models and their applications, Applied Bayesian Hierarchical Methods demonstrates the advantages of a Bayesian approach to data sets involving inferences for collections of related units or variables and in methods where parameters can be treated as random collections. Emphasizing computational issues, the book provides examples of the following application settings: meta-analysis, data structured in space or time, multilevel and longitudinal data, multivariate data, nonlinear regression, and survival time data. For the worked examples, the text mainly employs the WinBUGS package, allowing readers to explore alternative likelihood assumptions, regression structures, and assumptions on prior densities. It also incorporates BayesX code, which is particularly useful in nonlinear regression. To demonstrate MCMC sampling from first principles, the author includes worked examples using the R package. Through illustrative data analysis and attention to statistical computing, this book focuses on the practical implementation of Bayesian hierarchical methods. It also discusses several issues that arise when applying Bayesian techniques in hierarchical and random effects models.” Raymond J. Carroll, David Ruppert. Transformation and Weighting in Regression. 1st ed. Chapman and Hall/CRC; 1988. Description: This is a bit dated, but it has some interesting ideas, like transforming both sides of the equation to fix heteroscedascity while still maintaining linearity. Excerpt: “This monograph provides a careful review of the major statistical techniques used to analyze regression data with nonconstant variability and skewness. The authors have developed statistical techniques–such as formal fitting methods and less formal graphical techniques– that can be applied to many problems across a range of disciplines, including pharmacokinetics, econometrics, biochemical assays, and fisheries research. While the main focus of the book in on data transformation and weighting, it also draws upon ideas from diverse fields such as influence diagnostics, robustness, bootstrapping, nonparametric data smoothing, quasi-likelihood methods, errors-in-variables, and random coefficients. The authors discuss the computation of estimates and give numerous examples using real data. The book also includes an extensive treatment of estimating variance functions in regression.”
October 2010
Mubashir Arain, Michael Campbell, Cindy Cooper, Gillian Lancaster. What is a pilot or feasibility study? A review of current practice and editorial policy. BMC Medical Research Methodology. 2010;10(1):67. Abstract: “BACKGROUND: In 2004, a review of pilot studies published in seven major medical journals during 2000-01 recommended that the statistical analysis of such studies should be either mainly descriptive or focus on sample size estimation, while results from hypothesis testing must be interpreted with caution. We revisited these journals to see whether the subsequent recommendations have changed the practice of reporting pilot studies. We also conducted a survey to identify the methodological components in registered research studies which are described as ‘pilot’ or ‘feasibility’ studies. We extended this survey to grant-awarding bodies and editors of medical journals to discover their policies regarding the function and reporting of pilot studies. METHODS: Papers from 2007-08 in seven medical journals were screened to retrieve published pilot studies. Reports of registered and completed studies on the UK Clinical Research Network (UKCRN) Portfolio database were retrieved and scrutinized. Guidance on the conduct and reporting of pilot studies was retrieved from the websites of three grant giving bodies and seven journal editors were canvassed. RESULTS: 54 pilot or feasibility studies published in 2007-8 were found, of which 26 (48%) were pilot studies of interventions and the remainder feasibility studies. The majority incorporated hypothesis-testing (81%), a control arm (69%) and a randomization procedure (62%). Most (81%) pointed towards the need for further research. Only 8 out of 90 pilot studies identified by the earlier review led to subsequent main studies. Twelve studies which were interventional pilot/feasibility studies and which included testing of some component of the research process were identified through the UKCRN Portfolio database. There was no clear distinction in use of the terms ‘pilot’ and ‘feasibility’. Five journal editors replied to our entreaty. In general they were loathe to publish studies described as ‘pilot’. CONCLUSION: Pilot studies are still poorly reported, with inappropriate emphasis on hypothesis-testing. Authors should be aware of the different requirements of pilot studies, feasibility studies and main studies and report them appropriately. Authors should be explicit as to the purpose of a pilot study. The definitions of feasibility and pilot studies vary and we make proposals here to clarify terminology.” [Accessed October 25, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/67.
September 2010
R B D’Agostino. Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med. 1998;17(19):2265-2281. Abstract: “In observational studies, investigators have no control over the treatment assignment. The treated and non-treated (that is, control) groups may have large differences on their observed covariates, and these differences can lead to biased estimates of treatment effects. Even traditional covariance analysis adjustments may be inadequate to eliminate this bias. The propensity score, defined as the conditional probability of being treated given the covariates, can be used to balance the covariates in the two groups, and therefore reduce this bias. In order to estimate the propensity score, one must model the distribution of the treatment indicator variable given the observed covariates. Once estimated the propensity score can be used to reduce bias through matching, stratification (subclassification), regression adjustment, or some combination of all three. In this tutorial we discuss the uses of propensity score methods for bias reduction, give references to the literature and illustrate the uses through applied examples.” [Accessed October 11, 2010]. Available at: http://www.ncbi.nlm.nih.gov/pubmed/9802183. M M Joffe, P R Rosenbaum. Invited commentary: propensity scores. Am. J. Epidemiol. 1999;150(4):327-333. Abstract: “The propensity score is the conditional probability of exposure to a treatment given observed covariates. In a cohort study, matching or stratifying treated and control subjects on a single variable, the propensity score, tends to balance all of the observed covariates; however, unlike random assignment of treatments, the propensity score may not also balance unobserved covariates. The authors review the uses and limitations of propensity scores and provide a brief outline of associated statistical theory. They also present a new result of using propensity scores in case-cohort studies.” [Accessed October 11, 2010]. Available at: http://stat.wharton.upenn.edu/~rosenbap/AJEpropen.pdf. Kevin L. Delucchi. Sample Size Estimation in Research With Dependent Measures and Dichotomous Outcomes. Am J Public Health. 2004;94(3):372-377. Abstract: “I reviewed sample estimation methods for research designs involving nonindependent data and a dichotomous response variable to examine the importance of proper sample size estimation and the need to align methods of sample size estimation with planned methods of statistical analysis. Examples and references to published literature are provided in this article. When the method of sample size estimation is not in concert with the method of planned analysis, poor estimates may result. The effects of multiple measures over time also need to be considered. Proper sample size estimation is often overlooked. Alignment of the sample size estimation method with the planned analysis method, especially in studies involving nonindependent data, will produce appropriate estimates.” Available at: http://ajph.aphapublications.org/cgi/content/full/94/3/372. Patrick Vandewalle, Jelena Kovacevic, Martin Vetterli. Reproducible Research. Excerpt: “Welcome on this site about reproducible research. This site is intended to gather a lot of information and useful links about reproducible research. As the authors (Patrick Vandewalle, Jelena Kovacevic and Martin Vetterli) are all doing research in signal/image processing, that will also be the main focus of this site.” [Accessed October 5, 2010]. Available at: http://reproducibleresearch.net. A Caveman. The invited review - or, my field, from my standpoint, written by me using only my data and my ideas, and citing only my publications. J Cell Sci. 2000;113(18):3125-3126. Comment: The title is better than any summary I could write. [Accessed September 27, 2010]. Available at: http://jcs.biologists.org/cgi/content/abstract/113/18/3125. Susan A. Peters. Engaging with the Art and Science of Statistics. Mathematics Teacher. 2010;103(7):496. Abstract: “Statistics uses scientific tools but also requires the art of flexible and creative reasoning.” [Accessed September 24, 2010]. Available at: http://www.nctm.org/eresources/view_media.asp?article_id=9145. Gillian D. Sanders, Lurdes Inoue, Gregory Samsa, Shalini Kulasingam, David Matchar. Use of Bayesian Techniques in Randomized Clinical Trials: A CMS Case Study. Excerpt: “We provide a basic tutorial on Bayesian statistics and the possible uses of such statistics in clinical trial design and analysis. We conducted a synthesis of existing published research focusing on how Bayesian techniques can modify inferences that affect policy-level decisionmaking. Noting that subgroup analysis is a particularly fruitful application of Bayesian methodology, and an area of particular interest to CMS, we focused our efforts there rather on the design of such trials. We used simulation studies and a case study of patient-level data from eight trials to explore Bayesian techniques in the CMS decisional context in the clinical domain of the prevention of sudden cardiac death and the use of the implantable cardioverter defibrillator (ICD). We combined knowledge gained through the literature review, simulation studies, and the case study to provide findings concerning the use of Bayesian approaches specific to the CMS context.” [Accessed September 24, 2010]. Available at: http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/use_of_bayesian.html. Office for Human Research Protections, U.S. Department of Health & Human Services. Quality Improvement Activities Frequently Asked Questions. Excerpt: “Protecting human subjects during research activities is critical and has been at the forefront of HHS activities for decades. In addition, HHS is committed to taking every appropriate opportunity to measure and improve the quality of care for patients. These two important goals typically do not intersect, since most quality improvement efforts are not research subject to the HHS protection of human subjects regulations. However, in some cases quality improvement activities are designed to accomplish a research purpose as well as the purpose of improving the quality of care, and in these cases the regulations for the protection of subjects in research (45 CFR part 46) may apply.” [Accessed September 24, 2010]. Available at: http://www.hhs.gov/ohrp/qualityfaq.html. R J Lilford, D Braunholtz. For Debate: The statistical basis of public policy: a paradigm shift is overdue. BMJ. 1996;313(7057):603 -607. Excerpt: “The recent controversy over the increased risk of venous thrombosis with third generation oral contraceptives illustrates the public policy dilemma that can be created by relying on conventional statistical tests and estimates: case-control studies showed a significant increase in risk and forced a decision either to warn or not to warn. Conventional statistical tests are an improper basis for such decisions because they dichotomise results according to whether they are or are not significant and do not allow decision makers to take explicit account of additional evidence�for example, of biological plausibility or of biases in the studies. A Bayesian approach overcomes both these problems. A Bayesian analysis starts with a �prior� probability distribution for the value of interest (for example, a true relative risk)�based on previous knowledge�and adds the new evidence (via a model) to produce a �posterior� probability distribution. Because different experts will have different prior beliefs sensitivity analyses are important to assess the effects on the posterior distributions of these differences. Sensitivity analyses should also examine the effects of different assumptions about biases and about the model which links the data with the value of interest. One advantage of this method is that it allows such assumptions to be handled openly and explicitly. Data presented as a series of posterior probability distributions would be a much better guide to policy, reflecting the reality that degrees of belief are often continuous, not dichotomous, and often vary from one person to another in the face of inconclusive evidence.” [Accessed September 24, 2010]. Available at: http://www.bmj.com/content/313/7057/603.short. Laurence Freedman. Bayesian statistical methods. BMJ. 1996;313(7057):569 -570. Excerpt: “In this week’s BMJ, Lilford and Braunholtz (p 603) explain the basis of Bayesian statistical theory.1 They explore its use in evaluating evidence from medical research and incorporating such evidence into policy decisions about public health. When drawing inferences from statistical data, Bayesian theory is an alternative to the frequentist theory that has predominated in medical research over the past half century.” [Accessed September 24, 2010]. Available at: http://www.bmj.com/content/313/7057/569.short. U.S. Food and Drug Administration. Guidance for Sponsors, Clinical Investigators, and IRBs: Data Retention When Subjects Withdraw from FDA-Regulated Clinical Trials. Excerpt: “This guidance is intended for sponsors, clinical investigators and institutional review boards (IRBs). It describes the Food and Drug Administration�s (FDA) longstanding policy that already-accrued data, relating to individuals who cease participating in a study, are to be maintained as part of the study data. This pertains to data from individuals who decide to discontinue participation in a study, who are withdrawn by their legally authorized representative, as applicable, or who are discontinued from participation by the clinical investigator. This policy is supported by the statutes and regulations administered by FDA as well as ethical and quality standards applicable to clinical research. Maintenance of these records includes, as with all study records, safeguarding the privacy and confidentiality of the subject�s information.” [Accessed September 22, 2010]. Available at: http://www.fda.gov/downloads/RegulatoryInformation/Guidances/UCM126489.pdf. Office for Human Research Protections. Guidance on Withdrawal of Subjects from Research: Data Retention and Other Related Issues. Excerpt: “This document applies to non-exempt human subjects research conducted or supported by HHS. It clarifies that when a subject chooses to withdraw from (i.e., discontinue his or her participation in) an ongoing research study, or when an investigator terminates a subject�s participation in such a research study without regard to the subject�s consent, the investigator may retain and analyze already collected data relating to that subject, even if that data includes identifiable private information about the subject. For HHS-conducted or supported research that is regulated by the Food and Drug Administration (FDA), FDA�s guidance on this issue also should be consulted.” [Accessed September 22, 2010]. Available at: http://www.hhs.gov/ohrp/policy/subjectwithdrawal.html. R. L. Glass. A letter from the frustrated author of a journal paper. Journal of Systems and Software. 2000;54(1):1. Excerpt: “Editor�s Note: It seems appropriate, in this issue of JSS containing the findings of our annual Top Scholars/Institutions study, to pay tribute to the persistent authors who make a journal like this, and a study like that, possible. In their honor, we dedicate the following humorous, anonymously-authored, letter!” [Accessed September 22, 2010]. Available at: http://dx.doi.org/10.1016/S0164-1212(00)00020-0. Amy Harmon. New Drugs Stir Debate on Rules of Clinical Trials. The New York Times. 2010. Excerpt: “Controlled trials have for decades been considered essential for proving a drug�s value before it can go to market. But the continuing trial of the melanoma drug, PLX4032, has ignited an anguished debate among oncologists about whether a controlled trial that measures a drug�s impact on extending life is still the best method for evaluating hundreds of genetically targeted cancer drugs being developed.” [Accessed September 20, 2010]. Available at: http://www.nytimes.com/2010/09/19/health/research/19trial.html. Springer, PlanetMath. StatProb: The Encyclopedia Sponsored by Statistics and Probability Societies. Excerpt: “StatProb: The Encyclopedia Sponsored by Statistics and Probability Societies combines the advantages of traditional wikis (rapid and up-to-date publication, user-generated development, hyperlinking, and a saved history) with traditional publishing (quality assurance, review, credit to authors, and a structured information display). All contributions have been approved by an editorial board determined by leading statistical societies; the editorial board members are listed on the About page. All encyclopedia entries are written in LaTeX. All of the entries are automatically cross-referenced and the entire corpus is kept updated in real-time. Anyone can view articles. To submit a new article or propose a change in an existing article, you must create an account. It takes only a minute, so sign up!” [Accessed September 15, 2010]. Available at: http://statprob.com/. Isaac Asimov. The Relativity of Wrong. Originally published in The Skeptical Inquirer, Vol. 14 No. 1, Fall 1989, pages 35-44. Excerpt: “I received a letter the other day. It was handwritten in crabbed penmanship so that it was very difficult to read. Nevertheless, I tried to make it out just in case it might prove to be important. In the first sentence, the writer told me he was majoring in English literature, but felt he needed to teach me science. (I sighed a bit, for I knew very few English Lit majors who are equipped to teach me science, but I am very aware of the vast state of my ignorance and I am prepared to learn as much as I can from anyone, so I read on.) " [Accessed September 13, 2010]. Available at: http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm. Committee for Medicinal Products for Human Use. Guideline on the choice of the non-inferiority margin. Excerpt: “Many clinical trials comparing a test product with an active comparator are designed as non-inferiority trials. The term �non-inferiority� is now well established, but if taken literally could be misleading. The objective of a non-inferiority trial is sometimes stated as being to demonstrate that the test product is not inferior to the comparator. However, only a superiority trial can demonstrate this. In fact a noninferiority trial aims to demonstrate that the test product is not worse than the comparator by more than a pre-specified, small amount. This amount is known as the non-inferiority margin, or delta (Δ). [Accessed September 13, 2010]. Available at: http://www.ema.europa.eu/docs/en_GB/document_library/Scientific_guideline/2009/09/WC500003636.pdf. Committee for Proprietary Medicinal Products. Points to consider on switching between superiority and non-inferiority. Br J Clin Pharmacol. 2001;52(3):223-228. Excerpt: “A number of recent applications have led to CPMP discussions concerning the interpretation of superiority, noninferiority and equivalence trials. These issues are covered in ICH E9 (Statistical Principles for Clinical Trials). There is further relevant material in the Step 2 draft of ICH E10 (Choice of Control Group) and in the CPMP Note for Guidance on the Investigation of Bioavailability and Bioequivalence. However, the guidelines do not address some specific difficulties that have arisen in practice. In broad terms, these difficulties relate to switching from one design objective to another at the time of analysis.” Available at: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2014556/. Sage Foundation. Sage: Open Source Mathematics Software. Abstract: “Sage is a free open-source mathematics software system licensed under the GPL. It combines the power of many existing open-source packages into a common Python-based interface. Mission: Creating a viable free open source alternative to Magma, Maple, Mathematica and Matlab.” [Accessed September 8, 2010]. Available at: http://www.sagemath.org/. Alex Zolot. useR! 2010: Work with R on Amazon’s Cloud. Abstract: “Usage of R is often constrained by available memory and/or cpu power. Cloud computing allows users to get as much resources as necessary in any specific moment. The tutorial will cover software tools and procedures that are useful to manage R applications on Amazon’s Elastic Compute Cloud (EC2) and Simple Storage Service (S3) cloud services.” [Accessed September 8, 2010]. Available at: http://user2010.org/tutorials/Zolot.html.
August 2010
Iain Hrynaszkiewicz. A call for BMC Research Notes contributions promoting best practice in data standardization, sharing and publication. BMC Research Notes. 2010;3(1):235. Abstract: “BMC Research Notes aims to ensure that data files underlying published articles are made available in standard, reusable formats, and the journal is calling for contributions from the scientific community to achieve this goal. Educational Data Notes included in this special series should describe a domain-specific data standard and provide an example data set with the article, or a link to data that are permanently hosted elsewhere. The contributions should also provide some evidence of the data standard’s application and preparation guidance that could be used by others wishing to conduct similar experiments. The journal is also keen to receive contributions on broader aspects of scientific data sharing, archiving, and open data.” [Accessed September 3, 2010]. Available at: http://www.biomedcentral.com/content/3/1/235. Iain Hrynaszkiewicz. BMC Research Notes � adding value to your data. Posted on the BioMEd Central Blog, Thursdy, September 2, 2010. Excerpt: “Support for scientific data sharing is gathering more and more support in 2010, so rather than �why share data?� the question now is �how?�. Making data available in readily interpretable formats is vital to realising its value in driving new knowledge discovery, and BMC Research Notes today launches a new initiative aimed at promoting best practice in sharing and publishing data, with a focus on standardized, re-useable formats.” [Accessed September 3, 2010]. Available at: http://blogs.openaccesscentral.com/blogs/bmcblog/entry/bmc_research_notes_wants_your. Celia Brown, Richard Lilford. The stepped wedge trial design: a systematic review. BMC Medical Research Methodology. 2006;6(1):54. Abstract: “BACKGROUND: Stepped wedge randomised trial designs involve sequential roll-out of an intervention to participants (individuals or clusters) over a number of time periods. By the end of the study, all participants will have received the intervention, although the order in which participants receive the intervention is determined at random. The design is particularly relevant where it is predicted that the intervention will do more good than harm (making a parallel design, in which certain participants do not receive the intervention unethical) and/or where, for logistical, practical or financial reasons, it is impossible to deliver the intervention simultaneously to all participants. Stepped wedge designs offer a number of opportunities for data analysis, particularly for modelling the effect of time on the effectiveness of an intervention. This paper presents a review of 12 studies (or protocols) that use (or plan to use) a stepped wedge design. One aim of the review is to highlight the potential for the stepped wedge design, given its infrequent use to date. METHODS: Comprehensive literature review of studies or protocols using a stepped wedge design. Data were extracted from the studies in three categories for subsequent consideration: study information (epidemiology, intervention, number of participants), reasons for using a stepped wedge design and methods of data analysis. RESULTS: The 12 studies included in this review describe evaluations of a wide range of interventions, across different diseases in different settings. However the stepped wedge design appears to have found a niche for evaluating interventions in developing countries, specifically those concerned with HIV. There were few consistent motivations for employing a stepped wedge design or methods of data analysis across studies. The methodological descriptions of stepped wedge studies, including methods of randomisation, sample size calculations and methods of analysis, are not always complete. CONCLUSION: While the stepped wedge design offers a number of opportunities for use in future evaluations, a more consistent approach to reporting and data analysis is required.” [Accessed September 1, 2010]. Available at: http://www.biomedcentral.com/1471-2288/6/54. Michael A Hussey, James P Hughes. Design and analysis of stepped wedge cluster randomized trials. Contemp Clin Trials. 2007;28(2):182-191. Abstract: “Cluster randomized trials (CRT) are often used to evaluate therapies or interventions in situations where individual randomization is not possible or not desirable for logistic, financial or ethical reasons. While a significant and rapidly growing body of literature exists on CRTs utilizing a “parallel” design (i.e. I clusters randomized to each treatment), only a few examples of CRTs using crossover designs have been described. In this article we discuss the design and analysis of a particular type of crossover CRT - the stepped wedge - and provide an example of its use.” [Accessed September 1, 2010]. Available at: http://faculty.washington.edu/peterg/Vaccine2006/articles/HusseyHughes.2007.pdf. Keith A. McGuinness. Of rowing boats, ocean liners and tests of the ANOVA homogeneity of variance assumption. Austral Ecology. 2008;27(6):681-688. Abstract: “One of the assumptions of analysis of variance (ANOVA) is that the variances of the groups being compared are approximately equal. This assumption is routinely checked before doing an analysis, although some workers consider ANOVA robust and do not bother and others avoid parametric procedures entirely. Two of the more commonly used heterogeneity tests are Bartlett’s and Cochran’s, although, as for most of these tests, they may well be more sensitive to violations of the ANOVA assumptions than is ANOVA itself. Simulations were used to examine how well these two tests protected ANOVA against the problems created by variance heterogeneity. Although Cochran’s test performed a little better than Bartlett’s, both tests performed poorly, frequently disallowing perfectly valid analyses. Recommendations are made about how to proceed, given these results.” [Accessed August 19, 2010]. Available at: http://onlinelibrary.wiley.com/doi/10.1111/j.1442-9993.2002.tb00217.x/abstract. Patricia Keith-Spiegel, Joan Sieber, Gerald P. Koocher. Responding to Research Wrongdoing : A User Friendly Guide. Excerpt: “Every once in awhile a product comes along that is destined to make a difference. This Guide is such a product. Informed by data generated through surveys and interviews involving more than 2,000 scientists, the Guide gives voice to those researchers willing, some with eagerness and others with relief, to share their stories publicly in their own words. There are stories from scientists who want to do the right thing, but are unsure how to go about it or concerned about negative consequences for them or their junior colleagues. There are accounts from researchers who took action, and are keen to share their successful strategies with others. On the flip side, there are those who hesitated and now lament not having guidance that might have altered the course of past events.” [Accessed August 14, 2010]. Available at: http://www.ethicsresearch.com/images/RRW_7-17-10.pdf. Gerald P. Koocher, Patricia Keith-Spiegel. Peers nip misconduct in the bud. Nature. 2010;466(7305):438-440. Excerpt: “What do researchers do when they suspect a colleague of cutting corners, not declaring a conflict of interest, neglecting proper oversight of research assistants or ‘cooking’ data? In one study1, almost all said that they would personally intervene if they viewed an act as unethical, especially if it seemed minor and the offender had no history of infractions.” [Accessed August 14, 2010]. Available at: http://www.ethicsresearch.com/images/Nature_Opinion_-_Koocher_Keith-Spiegel.pdf. M. Castillo. Authorship and Bylines. American Journal of Neuroradiology. 2009;30(8):1455-1456. Excerpt: “From the ancient Greeks to Shakespeare, the question of authorship often arises. The issue of appropriate article authorshiphas always been of special interest to editors of scientific journals. In the biomedical sciences, as the complexity and funding of published studies increases, so does the length of the byline. Although a previous American Journal of Neuroradiology Editor-in-Chief already addressed this issue, I think it is time to revisit it.1 From my own experience, articles can be categorized according to the number of authors as follows: fewer than 2 authors (Editorials, Commentaries, Letters), fewer than 5 authors (Case Reports and Technical Notes), 5�10 authors (retrospective full-length articles), 10�15 (prospective, often grant-funded articles), more than 15 authors (reports of task forces, white papers, etc). Among so many authors, it is not uncommon to find individuals whose contributions are minimal and many times questionable. Who actually did enough work to be listed as an author? In other words, who can claim ownership rights in a particular intellectual property?” [Accessed August 14, 2010]. Available at: http://www.ajnr.org/cgi/reprint/ajnr.A1636v1.pdf. R A Parker. Estimating the value of an internal biostatistical consulting service. Stat Med. 2000;19(16):2131-2145. Abstract: “Biostatistical consulting is a service business. Although a consulting biostatistician’s goal is long-term collaborative relationships with investigators, this is the same as the long-term goal of any business: having a group of contented, satisfied customers. In this era of constrained resources, we must be able to demonstrate that the benefit a biostatistical consulting group provides to its organization exceeds its actual cost to the institution. In this paper, I provide both a theoretical framework for assessing the value of a biostatistical service and provide an ad hoc method to value the contribution of a biostatistical service to a grant. Using the methods described, our biostatistics group returns more than $6 for each dollar spent on institutional support in 1998.” [Accessed August 14, 2010]. Available at: http://www.ncbi.nlm.nih.gov/pubmed/10931516. Richard Horton, Richard Smith. Time to redefine authorship. BMJ. 1996;312(7033):723. Excerpt: “Physicists do it by the hundred; scientists do it in groups; fiction writers mostly alone. And medical researchers? Rarely now do they write papers alone, and the number of authors on papers is increasing steadily.1 Under pressure from molecular biologists, the National Library of Medicine in Washington is planning to list not just the first six authors in Index Medicus but the first 24 plus the last author.2 Notions of authorship are clearly in the eye of the beholder, and many authors on modern papers seem to have made only a minimal contribution.3 4 5 Few authors on modern multidisciplinary medical papers fit the 19th century notion of taking full responsibility for every word and thought included, and yet the cumbersome definition of authorship produced by the International Committee of Medical Journal Editors (the Vancouver Group) is based on that concept.6 The definition produced by editors seems to be out of touch with what is happening in the real world of research, and researchers and editors need to consider a new definition. The BMJ, Lancet, University of Nottingham, and Locknet (a network to encourage research into peer review7) are therefore organising a one day meeting on 6 June in Nottingham to consider the need for a new definition. All the members of the Vancouver Group will be there, and everybody is welcome.” [Accessed August 14, 2010]. Available at: http://www.bmj.com/cgi/content/full/312/7033/723. R A Parker, N G Berman. Criteria for authorship for statisticians in medical papers. Stat Med. 1998;17(20):2289-2299. We organize a statistician’s potential scientific and intellectual contributions to a medical study into three types of activities relating to design, implementation and analysis. For each type, we describe high-level, mid-level and low-level contributions. Using this framework, we develop a point system to assess whether authorship is justified. Although we recommend discussion and resolution of authorship issues early in the course of any project, our system is especially useful when this has not been done. [Accessed August 14, 2010]. Available at: http://www.ncbi.nlm.nih.gov/pubmed/9819828. LiquidPub. Liquid Publications: Scientific Publications meet the Web. Excerpt: “The LiquidPub project proposes a paradigm shift in the way scientific knowledge is created, disseminated, evaluated and maintained. This shift is enabled by the notion of Liquid Publications, which are evolutionary, collaborative, and composable scientific contributions. Many Liquid Publication concepts are based on a parallel between scientific knowledge artifacts and software artifacts, and hence on lessons learned in (agile, collaborative, open source) software development, as well as on lessons learned from Web 2.0 in terms of collaborative evaluation of knowledge artifacts.” [Accessed August 10, 2010]. Available at: http://project.liquidpub.org/.
July 2010
Anup Malani, Tomas J. Philipson. Push for more trials may hurt patients. Washington Examiner. 2010. Excerpt: “U.S. pharmaceutical companies are increasingly going abroad to conduct clinical trials required by the FDA. Recently, the Department of Health and Human Services released a report suggesting that the FDA lacks the resources to adequately monitor these foreign trials. Four of every five new drugs sold in the U.S. are tested in foreign trials, and the FDA inspects less than one in 10 of these. This is half the rate of inspection for domestic trials.” [Accessed July 27, 2010]. Available at: http://www.washingtonexaminer.com/opinion/columns/Push-for-more-clinical-trials-may-hurt-patients-1002114-98875969.html. H. Gilbert Welch, Lisa M. Schwartz, Steven Woloshin. The exaggerated relations between diet, body weight and mortality: the case for a categorical data approach. CMAJ. 2005;172(7):891-895. Excerpt: “Multivariate analysis has become a major statistical tool for medical research. It is most commonly used for adjustment � the process of correcting the main effect for multiple variables that confound the relation between exposure and outcome in an observational study. Any apparent relation between estrogen replacement and dementia, for example, should be adjusted for socioeconomic status, a variable that is known to relate both to access (and thus the likelihood of having received estrogen) and to measures of cognitive function (and thus the likelihood of being diagnosed with dementia). The capacity to account for numerous variables (e.g., income, education and insurance status) simultaneously constitutes a major advance in the ability of researchers to estimate the true effect of the exposure of interest. But this advance has come at a cost: the actual relation between exposure and outcome is increasingly opaque to readers, researchers and editors alike.” [Accessed July 26, 2010]. Available at: http://www.ecmaj.com/cgi/content/full/172/7/891. Phil Ender. Centering (ED230B/C). Excerpt: “Centering a variable involves subtracting the mean from each of the scores, that is, creating deviation scores. Centering can be done two ways; 1) centering using the grand mean and 2) centering using group means, which is also known as context centering.” [Accessed July 26, 2010]. Available at: http://www.gseis.ucla.edu/courses/ed230bc1/notes4/center.html. MediciGlobal. L2FU - Lost to Follow Up. Excerpt: “Patient drop outs in a clinical trial costs your company money. It can cost you the integrity of your study too! If it’s important to recover patients lost from your clinical trial, you’ve come to the right place. Here, you’ll read how L2FU’s services can help you and how to begin finding patients today!” [Accessed July 26, 2010]. Available at: http://www.l2fu.com. Steve Miller. Biostatistics, Open Source and BI � an Interview with Frank Harrell. Description: This article, published in Information Management Online, February 25, 2009, offers a nice interview with Frank Harrell, a leading proponent of modern statistical methods. Excerpt: “My correspondence with Frank provided the opportunity to ask him to do an interview for the OpenBI Forum. He graciously accepted, turning around deft responses to my sometimes ponderous questions in very short order. What follows is text for our questions and answer session. I trust that readers will learn as much from Frank�s responses as I did.” [Accessed July 19, 2010]. Available at: http://www.information-management.com/news/10015023-1.html. Karyn Heavner, Carl Phillips, Igor Burstyn, Warren Hare. Dichotomization: 2 x 2 (x2 x 2 x 2…) categories: infinite possibilities. BMC Medical Research Methodology. 2010;10(1):59. Abstract: “BACKGROUND: Consumers of epidemiology may prefer to have one measure of risk arising from analysis of a 2-by-2 table. However, reporting a single measure of association, such as one odds ratio (OR) and 95% confidence interval, from a continuous exposure variable that was dichotomized withholds much potentially useful information. Results of this type of analysis are often reported for one such dichotomization, as if no other cutoffs were investigated or even possible. METHODS: This analysis demonstrates the effect of using different theory and data driven cutoffs on the relationship between body mass index and high cholesterol using National Health and Nutrition Examination Survey data. The recommended analytic approach, presentation of a graph of ORs for a range of cutoffs, is the focus of most of the results and discussion. RESULTS: These cutoff variations resulted in ORs between 1.1 and 1.9. This allows investigators to select a result that either strongly supports or provides negligible support for an association; a choice that is invisible to readers. The OR curve presents readers with more information about the exposure disease relationship than a single OR and 95% confidence interval. CONCLUSION: As well as offering results for additional cutoffs that may be of interest to readers, the OR curve provides an indication of whether the study focuses on a reasonable representation of the data or outlier results. It offers more information about trends in the association as the cutoff changes and the implications of random fluctuations than a single OR and 95% confidence interval.” [Accessed July 19, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/59. Chris Corcoran, Louise Ryan, Pralay Senchaudhuri, et al. An Exact Trend Test for Correlated Binary Data. Biometrics. 2001;57(3):941-948. Abstract: “The problem of testing a dose-response relationship in the presence of exchangeably correlated binary data has been addressed using a variety of models. Most commonly used approaches are derived from likelihood or generalized estimating equations and rely on large-sample theory to justify their inferences. However, while earlier work has determined that these methods may perform poorly for small or sparse samples, there are few alternatives available to those faced with such data. We propose an exact trend test for exchangeably correlated binary data when groups of correlated observations are ordered. This exact approach is based on an exponential model derived by Molenberghs and Ryan (1999) and Ryan and Molenberghs (1999) and provides natural analogues to Fisher’s exact test and the binomial trend test when the data are correlated. We use a graphical method with which one can efficiently compute the exact tail distribution and apply the test to two examples.” [Accessed July 16, 2010]. Available at: http://dx.doi.org/10.1111/j.0006-341X.2001.00941.x. Casey Olives, Marcello Pagano. Bayes-LQAS: classifying the prevalence of global acute malnutrition. Emerging Themes in Epidemiology. 2010;7(1):3. Abstract: “Lot Quality Assurance Sampling (LQAS) applications in health have generally relied on frequentist interpretations for statistical validity. Yet health professionals often seek statements about the probability distribution of unknown parameters to answer questions of interest. The frequentist paradigm does not pretend to yield such information, although a Bayesian formulation might. This is the source of an error made in a recent paper published in this journal. Many applications lend themselves to a Bayesian treatment, and would benefit from such considerations in their design. We discuss Bayes-LQAS (B-LQAS), which allows for incorporation of prior information into the LQAS classification procedure, and thus shows how to correct the aforementioned error. Further, we pay special attention to the formulation of Bayes Operating Characteristic Curves and the use of prior information to improve survey designs. As a motivating example, we discuss the classification of Global Acute Malnutrition prevalence and draw parallels between the Bayes and classical classifications schemes. We also illustrate the impact of informative and non-informative priors on the survey design. Results indicate that using a Bayesian approach allows the incorporation of expert information and/or historical data and is thus potentially a valuable tool for making accurate and precise classifications.” [Accessed July 16, 2010]. Available at: http://www.ete-online.com/content/7/1/3. Sylvia Sudat, Elizabeth Carlton, Edmund Seto, Robert Spear, Alan Hubbard. Using variable importance measures from causal inference to rank risk factors of schistosomiasis infection in a rural setting in China. Epidemiologic Perspectives & Innovations. 2010;7(1):3. Abstract: “BACKGROUND: Schistosomiasis infection, contracted through contact with contaminated water, is a global public health concern. In this paper we analyze data from a retrospective study reporting water contact and schistosomiasis infection status among 1011 individuals in rural China. We present semi-parametric methods for identifying risk factors through a comparison of three analysis approaches: a prediction-focused machine learning algorithm, a simple main-effects multivariable regression, and a semi-parametric variable importance (VI) estimate inspired by a causal population intervention parameter. RESULTS: The multivariable regression found only tool washing to be associated with the outcome, with a relative risk of 1.03 and a 95% confidence interval (CI) of 1.01-1.05. Three types of water contact were found to be associated with the outcome in the semi-parametric VI analysis: July water contact (VI estimate 0.16, 95% CI 0.11-0.22), water contact from tool washing (VI estimate 0.88, 95% CI 0.80-0.97), and water contact from rice planting (VI estimate 0.71, 95% CI 0.53-0.96). The July VI result, in particular, indicated a strong association with infection status - its causal interpretation implies that eliminating water contact in July would reduce the prevalence of schistosomiasis in our study population by 84%, or from 0.3 to 0.05 (95% CI 78%-89%). CONCLUSIONS: The July VI estimate suggests possible within-season variability in schistosomiasis infection risk, an association not detected by the regression analysis. Though there are many limitations to this study that temper the potential for causal interpretations, if a high-risk time period could be detected in something close to real time, new prevention options would be opened. Most importantly, we emphasize that traditional regression approaches are usually based on arbitrary pre-specified models, making their parameters difficult to interpret in the context of real-world applications. Our results support the practical application of analysis approaches that, in contrast, do not require arbitrary model pre-specification, estimate parameters that have simple public health interpretations, and apply inference that considers model selection as a source of variation.” [Accessed July 16, 2010]. Available at: http://www.epi-perspectives.com/content/7/1/3. C. Elizabeth McCarron, Eleanor Pullenayegum, Lehana Thabane, Ron Goeree, Jean-Eric Tarride. The importance of adjusting for potential confounders in Bayesian hierarchical models synthesising evidence from randomised and non-randomised studies: an application comparing treatments for abdominal aortic aneurysms. BMC Medical Research Methodology. 2010;10(1):64. Abstract: “BACKGROUND: Informing health care decision making may necessitate the synthesis of evidence from different study designs (e.g., randomised controlled trials, non-randomised/observational studies). Methods for synthesising different types of studies have been proposed, but their routine use requires development of approaches to adjust for potential biases, especially among non-randomised studies. The objective of this study was to extend a published Bayesian hierarchical model to adjust for bias due to confounding in synthesising evidence from studies with different designs. METHODS: In this new methodological approach, study estimates were adjusted for potential confounders using differences in patient characteristics (e.g., age) between study arms. The new model was applied to synthesise evidence from randomised and non-randomised studies from a published review comparing treatments for abdominal aortic aneurysms. We compared the results of the Bayesian hierarchical model adjusted for differences in study arms with: 1) unadjusted results, 2) results adjusted using aggregate study values and 3) two methods for downweighting the potentially biased non-randomised studies. Sensitivity of the results to alternative prior distributions and the inclusion of additional covariates were also assessed. RESULTS: In the base case analysis, the estimated odds ratio was 0.32 (0.13,0.76) for the randomised studies alone and 0.57 (0.41,0.82) for the non-randomised studies alone. The unadjusted result for the two types combined was 0.49 (0.21,0.98). Adjusted for differences between study arms, the estimated odds ratio was 0.37 (0.17,0.77), representing a shift towards the estimate for the randomised studies alone. Adjustment for aggregate values resulted in an estimate of 0.60 (0.28,1.20). The two methods used for downweighting gave odd ratios of 0.43 (0.18,0.89) and 0.35 (0.16,0.76), respectively. Point estimates were robust but credible intervals were wider when using vaguer priors. CONCLUSIONS: Covariate adjustment using aggregate study values does not account for covariate imbalances between treatment arms and downweighting may not eliminate bias. Adjustment using differences in patient characteristics between arms provides a systematic way of adjusting for bias due to confounding. Within the context of a Bayesian hierarchical model, such an approach could facilitate the use of all available evidence to inform health policy decisions.” [Accessed July 14, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/64. Julie Weed. Factory Efficiency Comes to the Hospital. The New York Times. 2010. Excerpt: “The program, called �continuous performance improvement,� or C.P.I., examines every aspect of patients� stays at the hospital, from the time they arrive in the parking lot until they are discharged, to see what could work better for them and their families. Last year, amid rising health care expenses nationally, C.P.I. helped cut Seattle Children�s costs per patient by 3.7 percent, for a total savings of $23 million, Mr. Hagan says. And as patient demand has grown in the last six years, he estimates that the hospital avoided spending $180 million on capital projects by using its facilities more efficiently. It served 38,000 patients last year, up from 27,000 in 2004, without expansion or adding beds.” [Accessed July 13, 2010]. Available at: http://www.nytimes.com/2010/07/11/business/11seattle.html. Katharine Barnard, Louise Dent, Andrew Cook. A systematic review of models to predict recruitment to multicentre clinical trials. BMC Medical Research Methodology. 2010;10(1):63. Abstract: “BACKGROUND: Less than one third of publicly funded trials managed to recruit according to their original plan often resulting in request for additional funding and/or time extensions. The aim was to identify models which might be useful to a major public funder of randomised controlled trials when estimating likely time requirements for recruiting trial participants. The requirements of a useful model were identified as usability, based on experience, able to reflect time trends, accounting for centre recruitment and contribution to a commissioning decision. METHODS: A systematic review of English language articles using MEDLINE and EMBASE. Search terms included: randomised controlled trial, patient, accrual, predict, enrol, models, statistical; Bayes Theorem; Decision Theory; Monte Carlo Method and Poisson. Only studies discussing prediction of recruitment to trials using a modelling approach were included. Information was extracted from articles by one author, and checked by a second, using a pre-defined form. RESULTS: Out of 326 identified abstracts, only 8 met all the inclusion criteria. Of these 8 studies examined, there are five major classes of model discussed: the unconditional model, the conditional model, the Poisson model, Bayesian models and Monte Carlo simulation of Markov models. None of these meet all the pre-identified needs of the funder. CONCLUSIONS: To meet the needs of a number of research programmes, a new model is required as a matter of importance. Any model chosen should be validated against both retrospective and prospective data, to ensure the predictions it gives are superior to those currently used.” [Accessed July 11, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/63. John F. Hall. Journeys in Survey Research - Home. Excerpt: “Welcome to this new resource for researchers, students and others doing, or learning about, survey research and the analysis of survey data. You will find here a wealth of materials drawn from my 45 years of doing and teaching survey research.” [Accessed July 9, 2010]. Available at: http://surveyresearch.weebly.com/. Kristin L. Carman, Maureen Maurer, Jill Mathews Yegian, et al. Evidence That Consumers Are Skeptical About Evidence-Based Health Care. Health Aff. 2010;29(7):1400-1406. Abstract: “We undertook focus groups, interviews, and an online survey with health care consumers as part of a recent project to assist purchasers in communicating more effectively about health care evidence and quality. Most of the consumers were ages 18-64; had health insurance through a current employer; and had taken part in making decisions about health insurance coverage for themselves, their spouse, or someone else. We found many of these consumers' beliefs, values, and knowledge to be at odds with what policy makers prescribe as evidence-based health care. Few consumers understood terms such as “medical evidence” or “quality guidelines.” Most believed that more care meant higher-quality, better care. The gaps in knowledge and misconceptions point to serious challenges in engaging consumers in evidence-based decision making.” [Accessed July 8, 2010]. Available at: http://content.healthaffairs.org/cgi/content/abstract/29/7/1400. Statistics Without Borders. Home - Statistics Without Borders. Excerpt: “Statistics Without Borders (SWB) is an apolitical organization under the auspices of the American Statistical Association, comprised entirely of volunteers, that provides pro bono statistical consulting and assistance to organizations and government agencies in support of these organizations' not-for-profit efforts to deal with international health issues (broadly defined). Our vision is to achieve better statistical practice, including statistical analysis and design of experiments and surveys, so that international health projects and initiatives are delivered more effectively and efficiently.” [Accessed July 7, 2010]. Available at: http://community.amstat.org/AMSTAT/StatisticsWithoutBorders/Home/Default.aspx. Ross Prentice. Invited Commentary: Ethics and Sample Size–Another View. Am. J. Epidemiol. 2005;161(2):111-112. Excerpt: “In their article entitled, “Ethics and Sample Size,” Bacchetti et al. (1) provide a spirited justification, based on ethical considerations, for the conduct of clinical trials that may have little potential to provide powerful tests of therapeutic or public health hypotheses. This perspective is somewhat surprising given the longstanding encouragement by clinical trialists and bioethicists in favor of large trials (2�4). Heretofore, the defenders of smaller trials have essentially argued only that small, underpowered trials need not be unethical if well conducted given their contribution to intervention effect estimation and their potential contribution to meta-analyses (5, 6). However, Bacchetti et al. evidently go further on the basis of certain risk-benefit considerations, and they conclude: “In general, ethics committees and others concerned with the protection of research subjects need not consider whether a study is too small…. Indeed, a more legitimate ethical issue regarding sample size is whether it is too large” (1, p. 108).” [Accessed July 7, 2010]. Available at: http://aje.oxfordjournals.org. Peter Bacchetti, Leslie E. Wolf, Mark R. Segal, Charles E. McCulloch. Bacchetti et al. Respond to “Ethics and Sample Size–Another View”. Am. J. Epidemiol. 2005;161(2):113. Excerpt: “We thank Dr. Prentice (1) for taking the time to respond to our article (2). We explain here why we do not believe that he has provided a meaningful challenge to our argument. We see possible objections related to unappealing implications, use of power to measure value, implications for series of trials, how value per participant is calculated, and participants� altruistic satisfaction.” [Accessed July 7, 2010]. Available at: http://aje.oxfordjournals.org. Mitchell H. Katz. Multivariable Analysis: A Primer for Readers of Medical Research. Annals of Internal Medicine. 2003;138(8):644-650. Abstract: “Many clinical readers, especially those uncomfortable with mathematics, treat published multivariable models as a black box, accepting the author’s explanation of the results. However, multivariable analysis can be understood without undue concern for the underlying mathematics. This paper reviews the basics of multivariable analysis, including what multivariable models are, why they are used, what types exist, what assumptions underlie them, how they should be interpreted, and how they can be evaluated. A deeper understanding of multivariable models enables readers to decide for themselves how much weight to give to the results of published analyses.” [Accessed July 7, 2010]. Available at: http://www.annals.org/content/138/8/644.abstract. Peter Bacchetti, Jacqueline Leung. Sample Size Calculations in Clinical Research : Anesthesiology. Anesthesiology. 2002;97(4):1028-1029. Excerpt: “We write to make the case that the practice of providing a priori sample size calculations, recently endorsed in an Anesthesiology editorial, is in fact undesirable. Presentation of confidence intervals serves the same purpose, but is superior because it more accurately reflects the actual data, is simpler to present, addresses uncertainty more directly, and encourages more careful interpretation of results.” [Accessed July 7, 2010]. Available at: http://journals.lww.com/anesthesiology/Fulltext/2002/10000/Sample_Size_Calculations_in_Clinical_Research.50.aspx. Peter Bacchetti. Current sample size conventions: Flaws, harms, and alternatives. BMC Medicine. 2010;8(1):17. Abstract: “BACKGROUND: The belief remains widespread that medical research studies must have statistical power of at least 80% in order to be scientifically sound, and peer reviewers often question whether power is high enough. DISCUSSION: This requirement and the methods for meeting it have severe flaws. Notably, the true nature of how sample size influences a study’s projected scientific or practical value precludes any meaningful blanket designation of <80% power as “inadequate”. In addition, standard calculations are inherently unreliable, and focusing only on power neglects a completed study’s most important results: estimates and confidence intervals. Current conventions harm the research process in many ways: promoting misinterpretation of completed studies, eroding scientific integrity, giving reviewers arbitrary power, inhibiting innovation, perverting ethical standards, wasting effort, and wasting money. Medical research would benefit from alternative approaches, including established value of information methods, simple choices based on cost or feasibility that have recently been justified, sensitivity analyses that examine a meaningful array of possible findings, and following previous analogous studies. To promote more rational approaches, research training should cover the issues presented here, peer reviewers should be extremely careful before raising issues of “inadequate” sample size, and reports of completed studies should not discuss power. SUMMARY: Common conventions and expectations concerning sample size are deeply flawed, cause serious harm to the research process, and should be replaced by more rational alternatives.” [Accessed July 7, 2010]. Available at: http://www.biomedcentral.com/1741-7015/8/17. Peter Bacchetti. Peer review of statistics in medical research: the other problem. BMJ. 2002;324(7348):1271-1273. Excerpt: “The process of peer review before publication has long been criticised for failing to prevent the publication of statistics that are wrong, unclear, or suboptimal. 1 2 My concern here, however, is not with failing to find flaws, but with the complementary problem of finding flaws that are not really there. My impression as a collaborating and consulting statistician is that spurious criticism of sound statistics is increasingly common, mainly from subject matter reviewers with limited statistical knowledge. Of the subject matter manuscript reviews I see that raise statistical issues, perhaps half include a mistaken criticism. In grant reviews unhelpful statistical comments seem to be a near certainty, mainly due to unrealistic expectations concerning sample size planning. While funding or publication of bad research is clearly undesirable, so is preventing the funding or publication of good research. Responding to misguided comments requires considerable time and effort, and poor reviews are demoralising—a subtler but possibly more serious cost.” [Accessed July 7, 2010]. Available at: http://www.bmj.com/cgi/content/full/324/7348/1271. Peter Bacchetti, Leslie E. Wolf, Mark R. Segal, Charles E. McCulloch. Ethics and Sample Size. Am. J. Epidemiol. 2005;161(2):105-110. Abstract: “The belief is widespread that studies are unethical if their sample size is not large enough to ensure adequate power. The authors examine how sample size influences the balance that determines the ethical acceptability of a study: the balance between the burdens that participants accept and the clinical or scientific value that a study can be expected to produce. The average projected burden per participant remains constant as the sample size increases, but the projected study value does not increase as rapidly as the sample size if it is assumed to be proportional to power or inversely proportional to confidence interval width. This implies that the value per participant declines as the sample size increases and that smaller studies therefore have more favorable ratios of projected value to participant burden. The ethical treatment of study participants therefore does not require consideration of whether study power is less than the conventional goal of 80% or 90%. Lower power does not make a study unethical. The analysis addresses only ethical acceptability, not optimality; large studies may be desirable for other than ethical reasons.” [Accessed July 7, 2010]. Available at: http://aje.oxfordjournals.org/cgi/content/abstract/161/2/105. Larry Goldbetter, Susan E. Davis, Paul J. MacArthur. Have You Received an E-Book Contract Amendment? | NWU - National Writers Union. Excerpt: “Writers across the country are receiving letters from HarperCollins, Random House, and other publishers asking them to sign e-book amendments to their book contracts. When reviewing an e-book amendment, there are several things you should consider.” [Accessed July 7, 2010]. Available at: https://nwu.org/have-you-received-e-book-contract-amendment%3F. Kilem Li Gwet. Research Papers on Inter-Rater Reliability Estimation. Excerpt: “Below are some downloadable research papers published by Dr. Gwet on Inter-Rater Reliability. They are all in PDF format.” [Accessed July 7, 2010]. Available at: http://www.agreestat.com/research_papers.html. David L DeMets, Thomas R Fleming, Frank Rockhold, et al. Liability issues for data monitoring committee members. Clinical Trials, 2004:525 -531 vol. 1: Abstract: “In randomized clinical trials, a data monitoring committee (DMC) is often appointed to review interim data to determine whether there is early convincing evidence of intervention benefit, lack of benefit or harm to study participants. Because DMCs bear serious responsibility for participant safety, their members may be legally liable for their actions. Despite more than three decades of experiences with DMCs, the issues of liability and indemnification have yet to receive appropriate attention from either government or industry sponsors. In industry-sponsored trials, DMC members are usually asked to sign an agreement delineating their responsibilities and operating procedures. While these agreements may include language on indemnification, such language sometimes protects only the sponsor rather than the DMC members. In government-sponsored trials, there has been even less structure, since typically there are no signed agreements regarding DMC activities. This paper discusses these issues and suggests sample language for indemnification agreements to protect DMC members. This type of language should be included in DMC charters and in all consulting agreements signed by DMC members.” [Accessed July 6, 2010]. Available at: http://ctj.sagepub.com/cgi/content/abstract/1/6/525.
June 2010
Jonathan J. Shuster. Empirical vs natural weighting in random effects meta-analysis. Statistics in Medicine. 2010;29(12):1259-1265. Abstract: “This article brings into serious question the validity of empirically based weighting in random effects meta-analysis. These methods treat sample sizes as non-random, whereas they need to be part of the random effects analysis. It will be demonstrated that empirical weighting risks substantial bias. Two alternate methods are proposed. The first estimates the arithmetic mean of the population of study effect sizes per the classical model for random effects meta-analysis. We show that anything other than an unweighted mean of study effect sizes will risk serious bias for this targeted parameter. The second method estimates a patient level effect size, something quite different from the first. To prevent inconsistent estimation for this population parameter, the study effect sizes must be weighted in proportion to their total sample sizes for the trial. The two approaches will be presented for a meta-analysis of a nasal decongestant, while at the same time will produce counter-intuitive results for the DerSimonian-Laird approach, the most popular empirically based weighted method. It is concluded that all past publications based on empirically weighted random effects meta-analysis should be revisited to see if the qualitative conclusions hold up under the methods proposed herein. It is also recommended that empirically based weighted random effects meta-analysis not be used in the future, unless strong cautions about the assumptions underlying these analyses are stated, and at a minimum, some form of secondary analysis based on the principles set forth in this article be provided to supplement the primary analysis. Copyright � 2009 John Wiley & Sons, Ltd.” [Accessed June 29, 2010]. Available at: http://dx.doi.org/10.1002/sim.3607. David J. Hand. Evaluating diagnostic tests: The area under the ROC curve and the balance of errors. Statistics in Medicine. 2010;29(14):1502-1510. Abstract: “Because accurate diagnosis lies at the heart of medicine, it is important to be able to evaluate the effectiveness of diagnostic tests. A variety of accuracy measures are used. One particularly widely used measure is the AUC, the area under the receiver operating characteristic (ROC) curve. This measure has a well-understood weakness when comparing ROC curves which cross. However, it also has the more fundamental weakness of failing to balance different kinds of misdiagnoses effectively. This is not merely an aspect of the inevitable arbitrariness in choosing a performance measure, but is a core property of the way the AUC is defined. This property is explored, and an alternative, the H measure, is described. Copyright � 2010 John Wiley & Sons, Ltd.” [Accessed June 16, 2010]. Available at: http://dx.doi.org/10.1002/sim.3859. Steve Shiboski. Table of Calculators for Survival Outcomes. Description: This webpage highlights several different programs for power calculations for sirvial analysis. It includes a Java applet by Marc Bacsafra and SAS macros by Joanna Shih. [Accessed June 16, 2010]. Available at: http://cct.jhsph.edu/javamarc/index.htm. David A. Schoenfeld. Considerations for a parallel trial where the outcome is a time to failure. Description: This web page calculates power for a survival analysis. You need to specify the accrual interval, the follow-up interval, the median time to failure in the group with the smallest time to failure. Thne also specify two of the following three items: power, total number of patients, and the minimal detectable hazard ratio. In an exponential model the last term is equivalent to the ratio of median survival times. [Accessed June 16, 2010]. Available at: http://hedwig.mgh.harvard.edu/sample_size/time_to_event/para_time.html. K Akazawa, T Nakamura, Y Palesch. Power of logrank test and Cox regression model in clinical trials with heterogeneous samples. Stat Med. 1997;16(5):583-597. Abstract: “This paper evaluates the loss of power of the simple and stratified logrank tests due to heterogeneity of patients in clinical trials and proposes a flexible and efficient method of estimating treatment effects adjusting for prognostic factors. The results of the paper are based on the analyses of survival data from a large clinical trial which includes more than 6000 cancer patients. Major findings from the simulation study on power are: (i) for a heterogeneous sample, such as advanced cancer patients, a simple logrank test can yield misleading results and should not be used; (ii) the stratified logrank test may suffer some power loss when many prognostic factors need to be considered and the number of patients within stratum is small. To address the problems due to heterogeneity, the Cox regression method with a special hazard model is recommended. We illustrate the method using data from a gastric cancer clinical trial.” [Accessed June 16, 2010]. Available at: http://www3.interscience.wiley.com/journal/9725/abstract. Ian Campbell. Two-by-two Methods. Excerpt: “This page expands on the methods section published in the paper: Campbell Ian, 2007, Chi-squared and Fisher-Irwin tests of two-by-two tables with small sample recommendations, Statistics in Medicine, 26, 3661 - 3675.” [Accessed June 14, 2010]. Available at: http://www.iancampbell.co.uk/twobytwo/methods.htm. P Peduzzi, J Concato, E Kemper, T R Holford, A R Feinstein. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996;49(12):1373-1379. Abstract: “We performed a Monte Carlo study to evaluate the effect of the number of events per variable (EPV) analyzed in logistic regression analysis. The simulations were based on data from a cardiac trial of 673 patients in which 252 deaths occurred and seven variables were cogent predictors of mortality; the number of events per predictive variable was (252/7 =) 36 for the full sample. For the simulations, at values of EPV = 2, 5, 10, 15, 20, and 25, we randomly generated 500 samples of the 673 patients, chosen with replacement, according to a logistic model derived from the full sample. Simulation results for the regression coefficients for each variable in each group of 500 samples were compared for bias, precision, and significance testing against the results of the model fitted to the original sample. For EPV values of 10 or greater, no major problems occurred. For EPV values less than 10, however, the regression coefficients were biased in both positive and negative directions; the large sample variance estimates from the logistic model both overestimated and underestimated the sample variance of the regression coefficients; the 90% confidence limits about the estimated values did not have proper coverage; the Wald statistic was conservative under the null hypothesis; and paradoxical associations (significance in the wrong direction) were increased. Although other factors (such as the total number of events, or sample size) may influence the validity of the logistic model, our findings indicate that low EPV can lead to major problems.” [Accessed June 14, 2010]. Available at: http://www.ncbi.nlm.nih.gov/pubmed/8970487. Steffen Mickenautsch. Systematic reviews, systematic error and the acquisition of clinical knowledge. BMC Medical Research Methodology. 2010;10(1):53. Abstract: “BACKGROUND: Since its inception, evidence-based medicine and its application through systematic reviews, has been widely accepted. However, it has also been strongly criticised and resisted by some academic groups and clinicians. One of the main criticisms of evidence-based medicine is that it appears to claim to have unique access to absolute scientific truth and thus devalues and replaces other types of knowledge sources. DISCUSSION: The various types of clinical knowledge sources are categorised on the basis of Kant’s categories of knowledge acquisition, as being either ‘analytic’ or ‘synthetic’. It is shown that these categories do not act in opposition but rather, depend upon each other. The unity of analysis and synthesis in knowledge acquisition is demonstrated during the process of systematic reviewing of clinical trials. Systematic reviews constitute comprehensive synthesis of clinical knowledge but depend upon plausible, analytical hypothesis development for the trials reviewed. The dangers of systematic error regarding the internal validity of acquired knowledge are highlighted on the basis of empirical evidence. It has been shown that the systematic review process reduces systematic error, thus ensuring high internal validity. It is argued that this process does not exclude other types of knowledge sources. Instead, amongst these other types it functions as an integrated element during the acquisition of clinical knowledge. CONCLUSIONS: The acquisition of clinical knowledge is based on the interaction between analysis and synthesis. Systematic reviews provide the highest form of synthetic knowledge acquisition in terms of achieving internal validity of results. In that capacity it informs the analytic knowledge of the clinician but does not replace it.” [Accessed June 14, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/53. Beth Woods, Neil Hawkins, David Scott. Network meta-analysis on the log-hazard scale, combining count and hazard ratio statistics accounting for multi-arm trials: A tutorial. BMC Medical Research Methodology. 2010;10(1):54. Abstract: “BACKGROUND: Data on survival endpoints are usually summarised using either hazard ratio, cumulative number of events, or median survival statistics. Network meta-analysis, an extension of traditional pairwise meta-analysis, is typically based on a single statistic. In this case, studies which do not report the chosen statistic are excluded from the analysis which may introduce bias. METHODS: In this paper we present a tutorial illustrating how network meta-analyses of survival endpoints can combine count and hazard ratio statistics in a single analysis on the hazard ratio scale. We also describe methods for accounting for the correlations in relative treatment effects (such as hazard ratios) that arise in trials with more than two arms. Combination of count and hazard ratio data in a single analysis is achieved by estimating the cumulative hazard for each trial arm reporting count data. Correlation in relative treatment effects in multi-arm trials is preserved by converting the relative treatment effect estimates (the hazard ratios) to arm-specific outcomes (hazards). RESULTS: A worked example of an analysis of mortality data in chronic obstructive pulmonary disease (COPD) is used to illustrate the methods. The data set and WinBUGS code for fixed and random effects models are provided. CONCLUSIONS: By incorporating all data presentations in a single analysis, we avoid the potential selection bias associated with conducting an analysis for a single statistic and the potential difficulties of interpretation, misleading results and loss of available treatment comparisons associated with conducting separate analyses for different summary statistics.” [Accessed June 14, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/54. Osamu Komori, Shinto Eguchi. A boosting method for maximizing the partial area under the ROC curve. BMC Bioinformatics. 2010;11(1):314. Abstract: “BACKGROUND: The receiver operating characteristic (ROC) curve is a fundamental tool to assess the discriminant performance for not only a single marker but also a score function combining multiple markers. The area under the ROC curve (AUC) for a score function measures the intrinsic ability for the score function to discriminate between the controls and cases. Recently, the partial AUC (pAUC) has been paid more attention than the AUC, because a suitable range of the false positive rate can be focused according to various clinical situations. However, existing pAUC-based methods only handle a few markers and do not take nonlinear combination of markers into consideration. RESULTS: We have developed a new statistical method that focuses on the pAUC based on a boosting technique. The markers are combined componentially for maximizing the pAUC in the boosting algorithm using natural cubic splines or decision stumps (single-level decision trees), according to the values of markers (continuous or discrete). We show that the resulting score plots are useful for understanding how each marker is associated with the outcome variable. We compare the performance of the proposed boosting method with those of other existing methods, and demonstrate the utility using real data sets. As a result, we have much better discrimination performances in the sense of the pAUC in both simulation studies and real data analysis. CONCLUSIONS: The proposed method addresses how to combine the markers after a pAUC-based filtering procedure in high dimensional setting. Hence, it provides a consistent way of analyzing data based on the pAUC from maker selection to marker combination for discrimination problems. The method can capture not only linear but also nonlinear association between the outcome variable and the markers, about which the nonlinearity is known to be necessary in general for the maximization of the pAUC. The method also puts importance on the accuracy of classification performance as well as interpretability of the association, by offering simple and smooth resultant score plots for each marker.” [Accessed June 14, 2010]. Available at: http://www.biomedcentral.com/1471-2105/11/314. Luis Carlos Silva-Aycaguer, Patricio Suarez-Gil, Ana Fernandez-Somoano. The null hypothesis significance test in health sciences research (1995-2006): statistical analysis and interpretation. BMC Medical Research Methodology. 2010;10(1):44. Abstract: “BACKGROUND: The null hypothesis significance test (NHST) is the most frequently used statistical method, although its inferential validity has been widely criticized since its introduction. In 1988, the International Committee of Medical Journal Editors (ICMJE) warned against sole reliance on NHST to substantiate study conclusions and suggested supplementary use of confidence intervals (CI). Our objective was to evaluate the extent and quality in the use of NHST and CI, both in English and Spanish language biomedical publications between 1995 and 2006, taking into account the International Committee of Medical Journal Editors recommendations, with particular focus on the accuracy of the interpretation of statistical significance and the validity of conclusions. METHODS: Original articles published in three English and three Spanish biomedical journals in three fields (General Medicine, Clinical Specialties and Epidemiology - Public Health) were considered for this study. Papers published in 1995-1996, 2000-2001, and 2005-2006 were selected through a systematic sampling method. After excluding the purely descriptive and theoretical articles, analytic studies were evaluated for their use of NHST with P-values and/or CI for interpretation of statistical “significance” and “relevance” in study conclusions. RESULTS: Among 1,043 original papers, 874 were selected for detailed review. The exclusive use of P-values was less frequent in English language publications as well as in Public Health journals; overall such use decreased from 41 % in 1995-1996 to 21% in 2005-2006. While the use of CI increased over time, the “significance fallacy” (to equate statistical and substantive significance) appeared very often, mainly in journals devoted to clinical specialties (81%). In papers originally written in English and Spanish, 15% and 10%, respectively, mentioned statistical significance in their conclusions. CONCLUSIONS: Overall, results of our review show some improvements in statistical management of statistical results, but further efforts by scholars and journal editors are clearly required to move the communication toward ICMJE advices, especially in the clinical setting, which seems to be imperative among publications in Spanish.” [Accessed June 14, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/44. Rolf Groenwold, Maroeska Rovers, Jacobus Lubsen, Geert van der Heijden. Subgroup effects despite homogeneous heterogeneity test results. BMC Medical Research Methodology. 2010;10(1):43. Abstract: “BACKGROUND: Statistical tests of heterogeneity are very popular in meta-analyses, as heterogeneity might indicate subgroup effects. Lack of demonstrable statistical heterogeneity, however, might obscure clinical heterogeneity, meaning clinically relevant subgroup effects. METHODS: A qualitative, visual method to explore the potential for subgroup effects was provided by a modification of the forest plot, i.e., adding a vertical axis indicating the proportion of a subgroup variable in the individual trials. Such a plot was used to assess the potential for clinically relevant subgroup effects and was illustrated by a clinical example on the effects of antibiotics in children with acute otitis media. RESULTS: Statistical tests did not indicate heterogeneity in the meta-analysis on the effects of amoxicillin on acute otitis media (Q=3.29, p=0.51; I2=0%; T2=0). Nevertheless, in a modified forest plot, in which the individual trials were ordered by the proportion of children with bilateral otitis, a clear relation between bilaterality and treatment effects was observed (which was also found in an individual patient data meta-analysis of the included trials: p-value for interaction 0.021). CONCLUSIONS: A modification of the forest plot, by including an additional (vertical) axis indicating the proportion of a certain subgroup variable, is a qualitative, visual, and easy-to-interpret method to explore potential subgroup effects in studies included in meta-analyses.” [Accessed June 14, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/43. Karin Velthove, Hubert Leufkens, Patrick Souverein, Rene Schweizer, Wouter van Solinge. Testing bias in clinical databases: methodological considerations. Emerging Themes in Epidemiology. 2010;7(1):2. Abstract: “BACKGROUND: Laboratory testing in clinical practice is never a random process. In this study we evaluated testing bias for neutrophil counts in clinical practice by using results from requested and non-requested hematological blood tests. METHODS: This study was conducted using data from the Utrecht Patient Oriented Database, a unique clinical database as it contains physician requested data, but also data that are not requested by the physician, but measured as result of requesting other hematological parameters. We identified adult patients, hospitalized in 2005 with at least two blood tests during admission, where requests for general blood profiles and specifically for neutrophil counts were contrasted in scenario analyses. Possible effect modifiers were diagnosis and glucocorticoid use. RESULTS: A total of 567 patients with requested neutrophil counts and 1,439 patients with non-requested neutrophil counts were analyzed. The absolute neutrophil count at admission differed with a mean of 7.4.10E9/l for requested counts and 8.3.10E9/l for non-requested counts (p-value <0.001). This difference could be explained for 83.2% by the occurrence of cardiovascular disease as underlying disease and for 4.5% by glucocorticoid use. CONCLUSION: Requests for neutrophil counts in clinical databases are associated with underlying disease and with cardiovascular disease in particular. The results from our study show the importance of evaluating testing bias in epidemiological studies obtaining data from clinical databases.” [Accessed June 14, 2010]. Available at: http://www.ete-online.com/content/7/1/2. Physicians for Human Rights. The Torture Reports. Excerpt: “Experiments in Torture is the first report to reveal evidence indicating that CIA medical personnel allegedly engaged in the crime of illegal experimentation after 9/11, in addition to the previously disclosed crime of torture. In their attempt to justify the war crime of torture, the CIA appears to have committed another alleged war crime�illegal experimentation on prisoners.” [Accessed June 10, 2010]. Available at: http://phrtorturepapers.org/. Scott Aberegg, D Roxanne Richards, James O’Brien. Delta inflation: a bias in the design of randomized controlled trials in critical care medicine. Critical Care. 2010;14(2):R77. Abstract: “INTRODUCTION: Mortality is the most widely accepted outcome measure in randomized controlled trials of therapies for critically ill adults, but most of these trials fail to show a statistically significant mortality benefit. The reasons for this are unknown. METHODS: We searched five high impact journals (Annals of Internal Medicine, British Medical Journal, JAMA, The Lancet, New England Journal of Medicine) for randomized controlled trials comparing mortality of therapies for critically ill adults over a ten year period. We abstracted data on the statistical design and results of these trials to compare the predicted delta (delta; the effect size of the therapy compared to control expressed as an absolute mortality reduction) to the observed delta to determine if there is a systematic overestimation of predicted delta that might explain the high prevalence of negative results in these trials. RESULTS: We found 38 trials meeting our inclusion criteria. Only 5/38 (13.2%) of the trials provided justification for the predicted delta. The mean predicted delta among the 38 trials was 10.1% and the mean observed delta was 1.4% (P<0.0001), resulting in a delta-gap of 8.7%. In only 2/38 (5.3%) of the trials did the observed delta exceed the predicted delta and only 7/38 (18.4%) of the trials demonstrated statistically significant results in the hypothesized direction; these trials had smaller delta-gaps than the remainder of the trials (delta-gap 0.9% versus 10.5%; P<0.0001). For trials showing non-significant trends toward benefit greater than 3%, large increases in sample size (380% - 1100%) would be required if repeat trials use the observed delta from the index trial as the predicted delta for a follow-up study. CONCLUSIONS: Investigators of therapies for critical illness systematically overestimate treatment effect size (delta) during the design of randomized controlled trials. This bias, which we refer to as “delta inflation”, is a potential reason that these trials have a high rate of negative results.” [Accessed June 9, 2010]. Available at: http://ccforum.com/content/14/2/R77.
May 2010
Alex Guazzelli, Michael Zeller, Wen-Ching Lin, Graham Williams. PMML: An Open Standard for Sharing Models. The R Journal. 2009;1(1):60-65. Excerpt: “The PMML package exports a variety of predictive and descriptive models from R to the Predictive Model Markup Language (Data Mining Group, 2008). PMML is an XML-based language and has become the de-facto standard to represent not only predictive and descriptive models, but also data preand post-processing. In so doing, it allows for the interchange of models among different tools and environments, mostly avoiding proprietary issues and incompatibilities.” [Accessed May 29, 2010]. Available at: http://journal.r-project.org/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf. Wim Van Biesen, Francis Verbeke, Raymond Vanholder. An infallible recipe? A story of cinnamon, souffle and meta-analysis. Nephrol. Dial. Transplant. 2008;23(9):2729-2732. Excerpt: “Meta-analyses certainly do have their place in scientific research. Like herbs, if used in the correct dish, and not too much or too often, they can give that extra bit of flavour that turns �food� into a �delicious dish�. However, meta-analyses are like cinnamon: very tasteful in small quantities and in the right dish, but if you use them too much or in the wrong dish, it ruins all other flavours and you get nausea. Just as for the cinnamon, it requires skills and insight to know when and how to use a meta-analysis.” [Accessed May 27, 2010]. Available at: http://ndt.oxfordjournals.org/cgi/content/full/23/9/2729. Committee on Strategies for Small-Number-Participant Clinical Research Trials, Board on Health Sciences Policy. Small Clinical Trials: Issues and Challenges. Washington, D.C.: The National Academies Press; 2001. Abstract: “Scientific research has a long history of using well-established, well documented, and validated methods for the design, conduct, and analysis of clinical trials. A study design that is considered appropriate includes sufficient sample size (n) and statistical power and proper control of bias to allow a meaningful interpretation of the results. Whenever feasible, clinical trials should be designed and performed so that they have adequate statistical power. However, when the clinical context does not provide a sufficient number of research participants for a trial with adequate statistical power but the research question has great clinical significance, research can still proceed under certain conditions. Small clinical trials might be warranted for the study of rare diseases, unique study populations (e.g., astronauts), individually tailored therapies, in environments that are isolated, in emergency situations, and in instances of public health urgency. Properly designed trials with small sample sizes may provide substantial evidence of efficacy and are especially appropriate in particular situations. However, the conclusions derived from such studies may require careful consideration of the assumptions and inferences, given the small number of paticipants. Bearing in mind the statistical power, precision, and validity limitations of trials with small sample sizes, there are innovative design and analysis approaches that can improve the quality of such trials. A number of trial designs especially lend themselves to use in studies with small sample sizes, including one subject (n-of-1) designs, sequential designs, �within-subject� designs, decision analysis-based designs, ranking and selection designs, adaptive designs, and risk-based allocation designs. Data analysis for trials with small numbers of participants in particular must be focused. In general, certain types of analyses are more amenable to studies with small numbers of participants, including sequential analysis, hierarchical analysis, Bayesian analysis, decision analysis, statistical prediction, meta-analysis, and risk-based allocation. Because of the constraints of conducting research with small sample sizes, the committee makes recommendations in several areas: defining the research question, tailoring the study design by giving careful consideration to alternative methods, clarifying sample characteristics and methods for the reporting of results of clinical trials with small sample sizes, performing corroborative analyses to evaluate the consistency and robustness of the results of clinical trials with small sample sizes, and exercising caution in the interpretation of the results before attempting to extrapolate or generalize the findings of clinical trials with small sample sizes. The committee also recommends that more research be conducted on the development and evaluation of alternative experimental designs and analysis methods for trials with small sample sizes. Available at: http://www.nap.edu/catalog.php?record_id=10078. C David Naylor. Meta-analysis and the meta-epidemiology of clinical research. BMJ. 1997;315:617-9. Excerpt: “This week’s BMJ contains a pot-pourri of materials that deal with the research methodology of meta-analysis. Meta-analysis in clinical research is based on simple principles: systematically searching out, and, when possible, quantitatively combining the results of all studies that have addressed a similar research question. Given the information explosion in clinical research, the logic of basing research reviews on systematic searching and careful quantitative compilation of study results is incontrovertible. However, one aspect of meta-analysis as applied to randomised trials has always been controversial1 2 �combining data from multiple studies into single estimates of treatment effect.” [Accessed May 19, 2010]. Available at: http://www.bmj.com/cgi/content/extract/315/7109/617. ST Brookes, E Whitley, TJ Peters, et al. Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives. Excerpt: “Subgroup analyses are common in randomised controlled trials (RCTs). There are many easily accessible guidelines on the selection and analysis of subgroups but the key messages do not seem to be universally accepted and inappropriate analyses continue to appear in the literature. This has potentially serious implications because erroneous identification of differential subgroup effects may lead to inappropriate provision or withholding of treatment.” [Accessed May 19, 2010]. Available at: http://www.hta.ac.uk/execsumm/summ533.shtml. C Bartlett, L Doyal, S Ebrahim, et al. The causes and effects of socio-demographic exclusions from clinical trials. Excerpt: “The exclusion from trials of people likely to be in need of or to benefit from an intervention could compromise the trials� generalisability. We investigated the exclusion of women, older people and minority ethnic groups, focusing on two drug exemplars, statins and non-steroidal anti-inflammatory drugs (NSAIDs).” [Accessed May 19, 2010]. Available at: http://www.hta.ac.uk/execsumm/summ938.shtml. Leon Bax, Noriaki Ikeda, Naohito Fukui, et al. More Than Numbers: The Power of Graphs in Meta-Analysis. Am. J. Epidemiol. 2009;169(2):249-255. Abstract: “In meta-analysis, the assessment of graphs is widely used in an attempt to identify or rule out heterogeneity and publication bias. A variety of graphs are available for this purpose. To date, however, there has been no comparative evaluation of the performance of these graphs. With the objective of assessing the reproducibility and validity of graph ratings, the authors simulated 100 meta-analyses from 4 scenarios that covered situations with and without heterogeneity and publication bias. From each meta-analysis, the authors produced 11 types of graphs (box plot, weighted box plot, standardized residual histogram, normal quantile plot, forest plot, 3 kinds of funnel plots, trim-and-fill plot, Galbraith plot, and L’Abbe plot), and 3 reviewers assessed the resulting 1,100 plots. The intraclass correlation coefficients (ICCs) for reproducibility of the graph ratings ranged from poor (ICC = 0.34) to high (ICC = 0.91). Ratings of the forest plot and the standardized residual histogram were best associated with parameter heterogeneity. Association between graph ratings and publication bias (censorship of studies) was poor. Meta-analysts should be selective in the graphs they choose for the exploration of their data.” [Accessed May 19, 2010]. Available at: http://aje.oxfordjournals.org/cgi/content/abstract/169/2/249. Ylian Liem, John Wong, MG Myriam Hunink, Frank de Charro, Wolfgang Winkelmayer. Propensity scores in the presence of effect modification: A case study using the comparison of mortality on hemodialysis versus peritoneal dialysis. Emerging Themes in Epidemiology. 2010;7(1):1. Abstract: “Purpose: To control for confounding bias from non-random treatment assignment in observational data, both traditional multivariable models and more recently propensity score approaches have been applied. Our aim was to compare a propensity score-stratified model with a traditional multivariable-adjusted model, specifically in estimating survival of hemodialysis (HD) versus peritoneal dialysis (PD) patients. METHODS: Using the Dutch End-Stage Renal Disease Registry, we constructed a propensity score, predicting PD assignment from age, gender, primary renal disease, center of dialysis, and year of first renal replacement therapy. We developed two Cox proportional hazards regression models to estimate survival on PD relative to HD, a propensity score-stratified model stratifying on the propensity score and a multivariable-adjusted model, and tested several interaction terms in both models. RESULTS: The propensity score performed well: it showed a reasonable fit, had a good c-statistic, calibrated well and balanced the covariates. The main-effects multivariable-adjusted model and the propensity score-stratified univariable Cox model resulted in similar relative mortality risk estimates of PD compared with HD (0.99 and 0.97, respectively) with fewer significant covariates in the propensity model. After introducing the missing interaction variables for effect modification in both models, the mortality risk estimates for both main effects and interactions remained comparable, but the propensity score model had nearly as many covariates because of the additional interaction variables. CONCLUSION: Although the propensity score performed well, it did not alter the treatment effect in the outcome model and lost its advantage of parsimony in the presence of effect modification.” [Accessed May 18, 2010]. Available at: http://www.ete-online.com/content/7/1/1. David L Streiner. Missing data and the trouble with LOCF. Evidence Based Mental Health. 2008;11(1):3-5. Excerpt: “Missing data are the bane of all clinical research. With the possible exception of the CAPRIE trial, in which the investigators went to extraordinary lengths that enabled them to followup 99.8% of their 19 000 participants over two years, it is highly unusual for a study to end with complete data on all subjects. There are many reasons for this: a person may omit an item on a questionnaire or refuse to complete it entirely; a vial of blood may be dropped or the analyser fail to function one day; or a participant may not appear for his or her appointment. Longitudinal studies (those that follow participants over time) can be subject to all of these mishaps, but now the problem is magnified in that they could happen at each of the assessment sessions; in addition to which, participants may drop out of the study entirely before all the data are collected. Furthermore, the more sophisticated, multivariable statistical techniques that use two or more variables in the same analysis, such as multiple regression or factor analysis, make the problem even worse, in that most of them require complete data for all of the subjects. If a person is missing one variable out of the, say, 10 that are being analysed, then that subject is dropped entirely from the analysis. Simulations have shown that if as little as 10% of the data is missing, as many as 60% of the subjects could be eliminated.” [Accessed May 6, 2010]. Available at: http://ebmh.bmj.com/content/11/1/3.2.short. Fred Andersen, Torgeir Engstad, Bjorn Straume, et al. Recruitment methods in Alzheimer’s disease research: general practice versus population based screening by mail. BMC Medical Research Methodology. 2010;10(1):35. Abstract: “BACKGROUND: In Alzheimer’s disease (AD) research patients are usually recruited from clinical practice, memory clinics or nursing homes. Lack of standardised inclusion and diagnostic criteria is a major concern in current AD studies. The aim of the study was to explore whether patient characteristics differ between study samples recruited from general practice and from a population based screening by mail within the same geographic areas in rural Northern Norway. METHODS: An interventional study in nine municipalities with 70000 inhabitants was designed. Patients were recruited from general practice or by population based screening of cognitive function by mail. We sent a questionnaire to 11807 individuals [greater than or equal to] 65 years of age of whom 3767 responded. Among these, 438 individuals whose answers raised a suspicion of cognitive impairment were invited to extended cognitive testing and a clinical examination. Descriptive statistics, chi-square, independent sample t-test and analyses of covariance adjusted for possible confounders were used. RESULTS: The final study samples included 100 patients recruited by screening and 87 from general practice. Screening through mail recruited younger and more self-reliant male patients with a higher MMSE sum score, whereas older women with more severe cognitive impairment were recruited from general practice. Adjustment for age did not alter the statistically significant differences of cognitive function, self-reliance and gender distribution between patients recruited by screening and from general practice. CONCLUSIONS: Different recruitment procedures of individuals with cognitive impairment provided study samples with different demographic characteristics. Initial cognitive screening by mail, preceding extended cognitive testing and clinical examination may be a suitable recruitment strategy in studies of early stage AD. Registration: ClinicalTrial.gov Identifier: NCT00443014” [Accessed May 6, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/35. Michel Chavance, Sylvie Escolano, Monique Romon, et al. Latent variables and structural equation models for longitudinal relationships: an illustration in nutritional epidemiology. BMC Medical Research Methodology. 2010;10(1):37. Abstract: “BACKGROUND: The use of structural equation modeling and latent variables remains uncommon in epidemiology despite its potential usefulness. The latter was illustrated by studying cross-sectional and longitudinal relationships between eating behavior and adiposity, using four different indicators of fat mass. METHODS: Using data from a longitudinal community-based study, we fitted structural equation models including two latent variables (respectively baseline adiposity and adiposity change after 2 years of follow-up), each being defined, by the four following anthropometric measurement (respectively by their changes): body mass index, waist circumference, skinfold thickness and percent body fat. Latent adiposity variables were hypothesized to depend on a cognitive restraint score, calculated from answers to an eating-behavior questionnaire (TFEQ-18), either cross-sectionally or longitudinally. RESULTS: We found that high baseline adiposity was associated with a 2-year increase of the cognitive restraint score and no convincing relationship between baseline cognitive restraint and 2-year adiposity change could be established. CONCLUSIONS: The latent variable modeling approach enabled presentation of synthetic results rather than separate regression models and detailed analysis of the causal effects of interest. In the general population, restrained eating appears to be an adaptive response of subjects prone to gaining weight more than as a risk factor for fat-mass increase.” [Accessed May 6, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/37. Jonathan Graffy, Peter Bower, Elaine Ward, et al. Trials within trials? Researcher, funder and ethical perspectives on the practicality and acceptability of nesting trials of recruitment methods in existing primary care trials. BMC Medical Research Methodology. 2010;10(1):38. Abstract: “BACKGROUND: Trials frequently encounter difficulties in recruitment, but evidence on effective recruitment methods in primary care is sparse. A robust test of recruitment methods involves comparing alternative methods using a randomized trial, ‘nested’ in an ongoing ‘host’ trial. There are potential scientific, logistical and ethical obstacles to such studies. METHOD: Telephone interviews were undertaken with four groups of stakeholders (funders, principal investigators, trial managers and ethics committee chairs) to explore their views on the practicality and acceptability of undertaking nested trials of recruitment methods. These semi-structured interviews were transcribed and analyzed thematically. RESULTS: Twenty people were interviewed. Respondents were familiar with recruitment difficulties in primary care and recognised the case for ‘nested’ studies to build an evidence base on effective recruitment strategies. However, enthusiasm for this global aim was tempered by the challenges of implementation. Challenges for host studies included increasing complexity and management burden; compatibility between the host and nested study; and the impact of the nested study on trial design and relationships with collaborators. For nested recruitment studies, there were concerns that host study investigators might have strong preferences, limiting the nested study investigators' control over their research, and also concerns about sample size which might limit statistical power. Nested studies needed to be compatible with the main trial and should be planned from the outset. Good communication and adequate resources were seen as important. CONCLUSIONS: Although research on recruitment was welcomed in principle, the issue of which study had control of key decisions emerged as critical. To address this concern, it appeared important to align the interests of both host and nested studies and to reduce the burden of hosting a recruitment trial. These findings should prove useful in devising a programme of research involving nested studies of recruitment interventions.” [Accessed May 6, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/38. Kuna Gupta, Jyotsna Gupta, Sukhdeep Singh. Surrogate Endpoints: How Reliable Are They? 2010. Excerpt: “Surrogate endpoints offer three main advantages to clinical studies: The study becomes simpler. Since surrogates are usually measures of symptoms or laboratory biomarkers, they make it easier to quantify comparisons. The study becomes shorter. It generally takes less time to see the effect of an intervention on a surrogate than on the final clinical outcome, especially if the surrogate marks an intermediate point in the disease process. The study becomes less expensive. Since the study duration is shorter, the cost decreases. Measurement of the surrogate may be less costly than measurement of the true outcome. In addition, waiting for a clinical outcome may involve more medical care for sicker patients.” [Accessed May 3, 2010]. Available at: http://www.firstclinical.com/journal/2010/1005_Surrogate.pdf. Peter B. Gilkey. Questionaire. Excerpt: “You are no doubt aware that the number of questionnaires circulated is rapidly increasing, whereas the length of the working day has at best remained constant. In order to resolve the problem presented by this trend, I find it necessary to restrict my replies to questionnaires to those questioners who first establish their bona fide by completing the following questionnaire. Please fill it out and return it to me electronically. This will help me compile a profile of people who compile profiles.” [Accessed May 1, 2010]. Available at: http://www.uoregon.edu/~gilkey/dirhumor/questionaire.html. Gary Wolf. The Data-Driven Life. The New York Times. 2010. Excerpt: “And yet, almost imperceptibly, numbers are infiltrating the last redoubts of the personal. Sleep, exercise, sex, food, mood, location, alertness, productivity, even spiritual well-being are being tracked and measured, shared and displayed. On MedHelp, one of the largest Internet forums for health information, more than 30,000 new personal tracking projects are started by users every month. Foursquare, a geo-tracking application with about one million users, keeps a running tally of how many times players �check in� at every locale, automatically building a detailed diary of movements and habits; many users publish these data widely. Nintendo�s Wii Fit, a device that allows players to stand on a platform, play physical games, measure their body weight and compare their stats, has sold more than 28 million units.” [Accessed May 1, 2010]. Available at: http://www.nytimes.com/2010/05/02/magazine/02self-measurement-t.html. EuSpRIG. European Spreadsheet Risks Interest Group - spreadsheet risk management and solutions conference. Excerpt: “EuSpRIG is the largest source of information on practical methods for introducing into organisations processes and methods to inventory, test, correct, document, backup, archive, compare and control the legions of spreadsheets that support critical corporate infrastructure.” [Accessed May 1, 2010]. Available at: http://www.eusprig.org/. Laura Rosen, Michal Ben Noach, Elliot Rosenberg. Missing the forest (plot) for the trees? A critique of the systematic review in tobacco control. BMC Medical Research Methodology. 2010;10(1):34. Abstract: “BACKGROUND: The systematic review (SR) lies at the core of evidence-based medicine. While it may appear that the SR provides a reliable summary of existing evidence, standards of SR conduct differ. The objective of this research was to examine systematic review (SR) methods used by the Cochrane Collaboration (“Cochrane”) and the Task Force on Community Preventive Services (“the Guide”) for evaluation of effectiveness of tobacco control interventions. METHODS: We searched for all reviews of tobacco control interventions published by Cochrane (4th quarter 2008) and the Guide. We recorded design rigor of included studies, data synthesis method, and setting. RESULTS: About a third of the Cochrane reviews and two thirds of the Guide reviews of interventions in the community setting included uncontrolled trials. Most (74%) Cochrane reviews in the clinical setting, but few (15%) in the community setting, provided pooled estimates from RCTs. Cochrane often presented the community results narratively. The Guide did not use inferential statistical approaches to assessment of effectiveness. CONCLUSIONS: Policy makers should be aware that SR methods differ, even among leading producers of SRs and among settings studied. The traditional SR approach of using pooled estimates from RCTs is employed frequently for clinical but infrequently for community-based interventions. The common lack of effect size estimates and formal tests of significance limit the contribution of some reviews to evidence-based decision making. Careful exploration of data by subgroup, and appropriate use of random effects models, may assist researchers in overcoming obstacles to pooling data.” [Accessed May 1, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/34.
April 2010
H. Gilbert Welch, William C. Black. Overdiagnosis in Cancer. J. Natl. Cancer Inst. 2010:djq099. Abstract: “This article summarizes the phenomenon of cancer overdiagnosis–the diagnosis of a “cancer” that would otherwise not go on to cause symptoms or death. We describe the two prerequisites for cancer overdiagnosis to occur: the existence of a silent disease reservoir and activities leading to its detection (particularly cancer screening). We estimated the magnitude of overdiagnosis from randomized trials: about 25% of mammographically detected breast cancers, 50% of chest x-ray and/or sputum-detected lung cancers, and 60% of prostate-specific antigen-detected prostate cancers. We also review data from observational studies and population-based cancer statistics suggesting overdiagnosis in computed tomography-detected lung cancer, neuroblastoma, thyroid cancer, melanoma, and kidney cancer. To address the problem, patients must be adequately informed of the nature and the magnitude of the trade-off involved with early cancer detection. Equally important, researchers need to work to develop better estimates of the magnitude of overdiagnosis and develop clinical strategies to help minimize it.” [Accessed April 28, 2010]. Available at: http://jnci.oxfordjournals.org/cgi/content/abstract/djq099v1. Elisabeth Bumiller. We Have Met the Enemy and He Is PowerPoint. The New York Times. April 26, 2010. Excerpt: “Like an insurgency, PowerPoint has crept into the daily lives of military commanders and reached the level of near obsession. The amount of time expended on PowerPoint, the Microsoft presentation program of computer-generated charts, graphs and bullet points, has made it a running joke in the Pentagon and in Iraq and Afghanistan.” [Accessed April 27, 2010]. Available at: http://www.nytimes.com/2010/04/27/world/27powerpoint.html. Rip Stauffer. Some Problems with Attribute Charts | Quality Digest. Excerpt: “While p- and np- charts can be very useful, and I highly recommend them when the conditions are correct, they aren’t always the best charts to use, and should be used with some caution. There are a few inherent problems that seem to crop up a lot. This article will illustrate a couple of the foibles observed over many years of wrangling with these interesting charts.” [Accessed April 5, 2010]. Available at: http://www.qualitydigest.com/inside/quality-insider-article/some-problems-attribute-charts.html
March 2010
Jeremy Genovese. The Ten Percent Solution. Anatomy of an Education Myth. Excerpt: “For may years, versions of a claim that students remember �10% of what they read, 20% of what they hear, 30% of what they see, 50% of what they see and hear, and 90% of what they do� have been widely circulated among educators. The source of this claim, however, is unknown and its validity is questionable. It is an educational urban legend that suggests a willingness to accept assertions about instructional strategies without empirical support.” [Accessed March 25, 2010]. Available at: http://www.skeptic.com/eskeptic/10-03-24/#feature. Don Zimmerman. Devilish Dictionary for Statisticians. Description: This webpage offers some irreverent definitions of statistical terms, akin to Ambrose Bierce’s The Devil’s Dictionary. They are all very cynical and very funny. Here’s an example: “Sample–a rag-tag, bob-tailed bunch of atypical misfits who have volunteered to participate in an experiment.” [Accessed March 25, 2010]. Available at: mypage.direct.ca/z/zimmerma/devilsdictionary.htm. Brian L. Joiner, Sue Reynard, Yukihiro Ando. Fourth generation management. McGraw-Hill Professional; 1994. Excerpt: “I knew that it was important to find better ways to do things and to eliminate waste and inefficiencies; that data could shed light on murky situations; that people needed to work together. But it took another 20 years working with large companies and small, with government, service, and manufacturing organizations, with top managers, with operators on the shop floor, before I had a good understanding of how all these pieces fit into a system of management that brings rapid learning and rapid improvement. It’s a system I’ve come to call 4th Generation Management.” Available at: http://books.google.com/books?id=E99OVbYUmhEC.
February 2010
Julie Rehmeyer. Florence Nightingale: The Passionate Statistician - Science News. Science News. 2008. Excerpt: “When Florence Nightingale arrived at a British hospital in Turkey during the Crimean War, she found a nightmare of misery and chaos. Men lay crowded next to each other in endless corridors. The air reeked from the cesspool that lay just beneath the hospital floor. There was little food and fewer basic supplies. By the time Nightingale left Turkey after the war ended in July 1856, the hospitals were well-run and efficient, with mortality rates no greater than civilian hospitals in England, and Nightingale had earned a reputation as an icon of Victorian women. Her later and less well-known work, however, saved far more lives. She brought about fundamental change in the British military medical system, preventing any such future calamities. To do it, she pioneered a brand-new method for bringing about social change: applied statistics.” [Accessed February 23, 2010]. Available at: http://www.sciencenews.org/view/generic/id/38937/title/Math_Trek__Florence_Nightingale_The_passionate_statistician. Garrett Watts, Splunk Inc. jQuery Sparklines. Excerpt: “This jQuery plugin generates sparklines (small inline charts) directly in the browser using data supplied either inline in the HTML, or via javascript. The plugin is compatible with most modern browsers and has been tested with Firefox 2+, Safari 3+, Opera 9, Google Chrome and Internet Explorer 6, 7 & 8. Each example displayed below takes just 1 line of HTML or javascript to generate. The plugin was written by Gareth Watts for Splunk Inc and released under the New BSD License.” [Accessed February 10, 2010]. Available at: http://omnipotent.net/jquery.sparkline/. Douglas G Altman. Confidence intervals for the number needed to treat. BMJ. 1998;317(7168):1309-1312. Excerpt: “The number needed to treat is a useful way of reporting results of randomised clinical trials. When the difference between the two treatments is not statistically significant, the confidence interval for the number needed to treat is difficult to describe. Sensible confidence intervals can always be constructed for the number needed to treat. Confidence intervals should be quoted whenever a number needed to treat value is given” [Accessed February 8, 2010]. Available at: http://www.bmj.com/cgi/content/full/317/7168/1309. J. A C Sterne, I. R White, J. B Carlin, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338(jun29 1):b2393-b2393. Excerpt: “Missing data are unavoidable in epidemiological and clinical research but their potential to undermine the validity of research results has often been overlooked in the medical literature. This is partly because statistical methods that can tackle problems arising from missing data have, until recently, not been readily accessible to medical researchers. However, multiple imputation�a relatively flexible, general purpose approach to dealing with missing data�is now available in standard statistical software, making it possible to handle missing data semiroutinely. Results based on this computationally intensive method are increasingly reported, but it needs to be applied carefully to avoid misleading conclusions.” [Accessed February 8, 2010]. Available at: http://www.bmj.com/cgi/data/bmj.b2393/DC1/1. Anonymous. Statistical Graphics and more. Excerpt: “Statistical Graphics, Data Visualization, Visual Analytics, Data Analysis, Data Mining, User Interfaces - you name it” [Accessed February 5, 2010]. Available at: http://www.theusrus.de/blog/. Jon Peck. SPSS Inside-Out | Tips & Tricks for Statisticians to Work Better, Smarter, and Faster. Excerpt: “Welcome to the SPSS Inside-Out blog - Tips & Tricks for Statisticians to Work Better, Smarter, and Faster.” [Accessed February 5, 2010]. Available at: http://insideout.spss.com/. Edzard Ernst. How Much of CAM is Based on Research Evidence? eCAM. 2009:nep044. Abstract: “The aim of this article is to provide a preliminary estimate of how much CAM is evidence-based. For this purpose, I calculated the percentage of 685 treatment/condition pairings evaluated in the Desktop Guide to Complementary and Alternative Medicine' which ere supported by sound data. The resulting figure was 7.4%. For a range of reasons, it might be a gross over-estimate. Further investigations into this subject are required to arrive at more representative figures.” [Accessed February 4, 2010]. Available at: http://ecam.oxfordjournals.org/cgi/content/abstract/nep044v1. Doug Smith. But who’s counting? The million-billion mistake is among the most common in journalism. But why? Excerpt: “The difference between a million and a billion is a number so vast that it would seem nearly impossible to confuse the two. Take pennies. At the website of the Mega Penny Project, you can see that a million pennies stack up to be about the size of a filing cabinet. A billion would be about the size of five school buses. Or take real estate. A home in a nice part of Los Angeles might cost a million dollars. A billion dollars would buy the whole neighborhood. But journalists can’t seem to keep the two numbers straight. Committed as we are to getting the smallest details right, we seem hopelessly prone to writing “million” when, in fact, we mean “billion.”” [Accessed February 4, 2010]. Available at: http://www.latimes.com/news/opinion/commentary/la-oe-smith31-2010jan31,0,2185811.story. Clinical Evidence. How much do we know? Excerpt: “So what can Clinical Evidence tell us about the state of our current knowledge? What proportion of commonly used treatments are supported by good evidence, what proportion should not be used or used only with caution, and how big are the gaps in our knowledge? Of around 2500 treatments covered 13% are rated as beneficial, 23% likely to be beneficial, 8% as trade off between benefits and harms, 6% unlikely to be beneficial, 4% likely to be ineffective or harmful, and 46%, the largest proportion, as unknown effectiveness (see figure 1).” [Accessed February 4, 2010]. Available at: http://clinicalevidence.bmj.com/ceweb/about/knowledge.jsp. Cochrane Collaboration. The Cochrane Collaboration estimates that only “10% to 35% of medical care is based on RCTs”. On what information is this estimate based? Excerpt: “The Cochrane Collaboration has not actually conducted research to determine this estimate; it is possible that the estimate of 10-35% comes from the following passage in a chapter by Kerr L White entitled ‘Archie Cochrane’s legacy: an American perspective’ in the book ‘Non-random Reflections on Health Services Research: on the 25th anniversary of Archie Cochrane’s Effectiveness and Efficiency’. This book (published by the BMJ Publishing Group) was edited by Alan Maynard and Iain Chalmers. Iain was formerly Director of the UK Cochrane Centre, and the driving force behind the establishment of The Cochrane Collaboration; he knew Archie Cochrane well.” [Accessed February 4, 2010]. Available at: http://www.cochrane.org/docs/faq.htm#q20. John P. A. Ioannidis. Contradicted and Initially Stronger Effects in Highly Cited Clinical Research. JAMA. 2005;294(2):218-228. Abstract: “Context: Controversy and uncertainty ensue when the results of clinical research on the effectiveness of interventions are subsequently contradicted. Controversies are most prominent when high-impact research is involved. Objectives: To understand how frequently highly cited studies are contradicted or find effects that are stronger than in other similar studies and to discern whether specific characteristics are associated with such refutation over time. Design: All original clinical research studies published in 3 major general clinical journals or high-impact-factor specialty journals in 1990-2003 and cited more than 1000 times in the literature were examined. Main Outcome Measure: The results of highly cited articles were compared against subsequent studies of comparable or larger sample size and similar or better controlled designs. The same analysis was also performed comparatively for matched studies that were not so highly cited. Results: Of 49 highly cited original clinical research studies, 45 claimed that the intervention was effective. Of these, 7 (16%) were contradicted by subsequent studies, 7 others (16%) had found effects that were stronger than those of subsequent studies, 20 (44%) were replicated, and 11 (24%) remained largely unchallenged. Five of 6 highly-cited nonrandomized studies had been contradicted or had found stronger effects vs 9 of 39 randomized controlled trials (P = .008). Among randomized trials, studies with contradicted or stronger effects were smaller (P = .009) than replicated or unchallenged studies although there was no statistically significant difference in their early or overall citation impact. Matched control studies did not have a significantly different share of refuted results than highly cited studies, but they included more studies with “negative” results. Conclusions: Contradiction and initially stronger effects are not unusual in highly cited research of clinical interventions and their outcomes. The extent to which high citations may provoke contradictions and vice versa needs more study. Controversies are most common with highly cited nonrandomized studies, but even the most highly cited randomized trials may be challenged and refuted over time, especially small ones.” [Accessed February 4, 2010]. Available at: http://jama.ama-assn.org/cgi/content/abstract/294/2/218. Ann Evensen, Rob Sanson-Fisher, Catherine D’Este, Michael Fitzgerald. Trends in publications regarding evidence practice gaps: A literature review. Implementation Science. 2010;5(1):11. Abstract: “BACKGROUND: Well-designed trials of strategies to improve adherence to clinical practice guidelines are needed to close persistent evidence-practice gaps. We studied how the number of these trials is changing with time, and to what extent physicians are participating in such trials. METHODS: This is a literature-based study of trends in evidence-practice gap publications over 10 years and participation of clinicians in intervention trials to narrow evidence-practice gaps. We chose nine evidence-based guidelines and identified relevant publications in the PubMed database from January 1998 to December 2007. We coded these publications by study type (intervention versus non-intervention studies). We further subdivided intervention studies into those for clinicians and those for patients. Data were analyzed to determine if observed trends were statistically significant. RESULTS: We identified 1,151 publications that discussed evidence-practice gaps in nine topic areas. There were 169 intervention studies that were designed to improve adherence to well-established clinical guidelines, averaging 1.9 studies per year per topic area. Twenty-eight publications (34%; 95% CI: 24% - 45%) reported interventions intended for clinicians or health systems that met Effective Practice and Organization of Care (EPOC) criteria for adequate design. The median consent rate of physicians asked to participate in these well-designed studies was 60% (95% CI, 25% to 69%). CONCLUSIONS: We evaluated research publications for nine evidence-practice gaps, and identified small numbers of well-designed intervention trials and low rates of physician participation in these trials.” [Accessed February 4, 2010]. Available at: http://www.implementationscience.com/content/5/1/11. Glen Spielmans, Peter Parry. From Evidence-based Medicine to Marketing-based Medicine: Evidence from Internal Industry Documents. Journal of Bioethical Inquiry. Abstract: “While much excitement has been generated surrounding evidence-based medicine, internal documents from the pharmaceutical industry suggest that the publicly available evidence base may not accurately represent the underlying data regarding its products. The industry and its associated medical communication firms state that publications in the medical literature primarily serve marketing interests. Suppression and spinning of negative data and ghostwriting have emerged as tools to help manage medical journal publications to best suit product sales, while disease mongering and market segmentation of physicians are also used to efficiently maximize profits. We propose that while evidence-based medicine is a noble ideal, marketing-based medicine is the current reality.” [Accessed February 3, 2010]. Available at: http://freepdfhosting.com/ebaef05bfe.pdf. Wei-Jiun Lin, Huey-Miin Hsueh, James J. Chen. Power and sample size estimation in microarray studies. BMC Bioinformatics. 2010;11(1):48. Abstract: “BACKGROUND: Before conducting a microarray experiment, one important issue that needs to be determined is the number of arrays required in order to have adequate power to identify differentially expressed genes. This paper discusses some crucial issues in the problem formulation, parameter specifications, and approaches that are commonly proposed for sample size estimation in microarray experiments. Common methods for sample size estimation are formulated as the minimum sample size necessary to achieve a specified sensitivity (proportion of detected truly differentially expressed genes) on average at a specified false discovery rate (FDR) level and specified expected proportion (pi1) of the true differentially expression genes in the array. Unfortunately, the probability of detecting the specified sensitivity in such a formulation can be low. We formulate the sample size problem as the number of arrays needed to achieve a specified sensitivity with 95% probability at the specified significance level. A permutation method using a small pilot dataset to estimate sample size is proposed. This method accounts for correlation and effect size heterogeneity among genes. RESULTS: A sample size estimate based on the common formulation, to achieve the desired sensitivity on average, can be calculated using a univariate method without taking the correlation among genes into consideration. This formulation of sample size problem is inadequate because the probability of detecting the specified sensitivity can be lower than 50%. On the other hand, the needed sample size calculated by the proposed permutation method will ensure detecting at least the desired sensitivity with 95% probability. The method is shown to perform well for a real example dataset using a small pilot dataset with 4-6 samples per group. CONCLUSIONS: We recommend that the sample size problem should be formulated to detect a specified proportion of differentially expressed genes with 95% probability. This formulation ensures finding the desired proportion of true positives with high probability. The proposed permutation method takes the correlation structure and effect size heterogeneity into consideration and works well using only a small pilot dataset.” [Accessed February 1, 2010]. Available at: http://www.biomedcentral.com/1471-2105/11/48.
January 2010
Dariusz Leszczynski, Zhengping Xu. Mobile phone radiation health risk controversy: the reliability and sufficiency of science behind the safety standards. Health Research Policy and Systems. 2010;8(1):2. Abstract: “There is ongoing discussion whether the mobile phone radiation causes any health effects. The International Commission on Non-Ionizing Radiation Protection, the International Committee on Electromagnetic Safety and the World Health Organization are assuring that there is no proven health risk and that the present safety limits protect all mobile phone users. However, based on the available scientific evidence, the situation is not as clear. The majority of the evidence comes from in vitro laboratory studies and is of very limited use for determining health risk. Animal toxicology studies are inadequate because it is not possible to “overdose” microwave radiation, as it is done with chemical agents, due to simultaneous induction of heating side-effects. There is a lack of human volunteer studies that would, in unbiased way, demonstrate whether human body responds at all to mobile phone radiation. Finally, the epidemiological evidence is insufficient due to, among others, selection and misclassification bias and the low sensitivity of this approach in detection of health risk within the population. This indicates that the presently available scientific evidence is insufficient to prove reliability of the current safety standards. Therefore, we recommend to use precaution when dealing with mobile phones and, whenever possible and feasible, to limit body exposure to this radiation. Continuation of the research on mobile phone radiation effects is needed in order to improve the basis and the reliability of the safety standards.” [Accessed February 1, 2010]. Available at: http://www.health-policy-systems.com/content/8/1/2. O Thomas, L Thabane, J Douketis, et al. Industry funding and the reporting quality of large long-term weight loss trials. Int J Obes. 2008;32(10):1531-1536. Description: This article does not have full free text available, so I can only comment on the abstract. It appears that industry funded studies tend to adhere more closely to the CONSORT reporting guidelines. I suspect that peer-reviewers are more cautious with industry funded studies and demand more detailed reporting of results. The conclusion in the abstract “Our findings suggest that the efforts to improve reporting quality be directed to all obesity RCTs, irrespective of funding source.” seems to suggest that peer reviewers need to hold unfunded studies to the same standards as they hold funded studies to. [Accessed January 26, 2010]. Available at: http://dx.doi.org/10.1038/ijo.2008.137. Harriette G. C. Van Spall, Andrew Toren, Alex Kiss, Robert A. Fowler. Eligibility Criteria of Randomized Controlled Trials Published in High-Impact General Medical Journals: A Systematic Sampling Review. JAMA. 2007;297(11):1233-1240. Abstract: “Context: Selective eligibility criteria of randomized controlled trials (RCTs) are vital to trial feasibility and internal validity. However, the exclusion of certain patient populations may lead to impaired generalizability of results. Objective: To determine the nature and extent of exclusion criteria among RCTs published in major medical journals and the contribution of exclusion criteria to the representation of certain patient populations. Data Sources and Study Selection: The MEDLINE database was searched for RCTs published between 1994 and 2006 in certain general medical journals with a high impact factor. Of 4827 articles, 283 were selected using a series technique. Data Extraction: Trial characteristics and the details regarding exclusions were extracted independently. All exclusion criteria were graded independently and in duplicate as either strongly justified, potentially justified, or poorly justified according to previously developed and pilot-tested guidelines. Data Synthesis: Common medical conditions formed the basis for exclusion in 81.3% of trials. Patients were excluded due to age in 72.1% of all trials (60.1% in pediatric populations and 38.5% in older adults). Individuals receiving commonly prescribed medications were excluded in 54.1% of trials. Conditions related to female sex were grounds for exclusion in 39.2% of trials. Of all exclusion criteria, only 47.2% were graded as strongly justified in the context of the specific RCT. Exclusion criteria were not reported in 12.0% of trials. Multivariable analyses revealed independent associations between the total number of exclusion criteria and drug intervention trials (risk ratio, 1.35; 95% confidence interval, 1.11-1.65; P = .003) and between the total number of exclusion criteria and multicenter trials (risk ratio, 1.26; 95% confidence interval, 1.06-1.52; P = .009). Industry-sponsored trials were more likely to exclude individuals due to concomitant medication use, medical comorbidities, and age. Drug intervention trials were more likely to exclude individuals due to concomitant medication use, medical comorbidities, female sex, and socioeconomic status. Among such trials, justification for exclusions related to concomitant medication use and comorbidities were more likely to be poorly justified. Conclusions: The RCTs published in major medical journals do not always clearly report exclusion criteria. Women, children, the elderly, and those with common medical conditions are frequently excluded from RCTs. Trials with multiple centers and those involving drug interventions are most likely to have extensive exclusions. Such exclusions may impair the generalizability of RCT results. These findings highlight a need for careful consideration and transparent reporting and justification of exclusion criteria in clinical trials.” [Accessed January 15, 2010]. Available at: http://jama.ama-assn.org/cgi/content/abstract/297/11/1233. David Leonhardt. Making Health Care Better. The New York Times. November 8, 2009. Description: This article profiles Brent James. chief quality officer at Intermountain Health Care, and his pioneering efforts to rigorously apply evidence based medicine principles. It highlights some of the quality improvement initiatives at Intermountain and documents the resistance to change among many doctors at Intermountain. [Accessed January 14, 2010]. Available at: http://www.nytimes.com/2009/11/08/magazine/08Healthcare-t.html. Patrick Burns. R Relative to Statistical Packages: Comment 1 on Technical Report Number 1 (Version 1.0) Strategically using General Purpose Statistics Packages: A Look at Stata, SAS and SPSS. Excerpt: “The technical report Strategically using General Purpose Statistics Packages: A Look at Stata, SAS and SPSS focuses on comparing strengths and weaknesses of SAS, SPSS and Stata. There is a section on R, which some have suspected damns R with faint praise. In particular, R is characterized as hard to learn. Finally there are sections on a number of very specialized pieces of statistical software. The primary purpose of this comment is to provide an alternative view of the role that R has in the realm of statistical software.” [Accessed January 14, 2010]. Available at: http://www.ats.ucla.edu/stat/technicalreports/Number1/R_relative_statpack.pdf. Michael N. Mitchell. Strategically using General Purpose Statistics Packages: A Look at Stata, SAS and SPSS. Abstract: “This report describes my experiences using general purpose statistical software over 20 years and for over 11 years as a statistical consultant helping thousands of UCLA researchers. I hope that this information will help you make strategic decisions about statistical software { the software you choose to learn, and the software you choose to use for analyzing your research data.” [Accessed January 14, 2010]. Available at: http://www.ats.ucla.edu/stat/technicalreports/number1_editedFeb_2_2007/ucla_ATSstat_tr1_1.1_0207.pdf. G. David Garson. StatNotes: Topics in Multivariate Analysis. Description: This is a general purpose textbook, written in discrete sections in html format. It covers more than just multivariate analysis. [Accessed January 14, 2010]. Available at: http://faculty.chass.ncsu.edu/garson/PA765/statnote.htm. Robert Muenchen. R-SAS-SPSS Add-on Module Comparison. Excerpt: “R has over 3,000 add-on packages, many containing multiple procedures, so it can do most of the things that SAS and SPSS can do and quite a bit more. The table below focuses only on SAS and SPSS products and which of them have counterparts in R. As a result, some categories are extremely broad (e.g. regression) while others are quite narrow (e.g. conjoint analysis). This table does not contain the hundreds of R packages that have no counterparts in the form of SAS or SPSS products. There are many important topics (e.g. mixed models) offered by all three that are not listed because neither SAS Institute nor IBM’s SPSS Company sell a product focused just on that.” [Accessed January 14, 2010]. Available at: http://r4stats.com/add-on-modules. G. David Carson. Reliability Analysis: Statnotes, from North Carolina State University, Public Administration Program. Excerpt: “Researchers must demonstrate instruments are reliable since without reliability, research results using the instrument are not replicable, and replicability is fundamental to the scientific method. Reliability is the correlation of an item, scale, or instrument with a hypothetical one which truly measures what it is supposed to. Since the true instrument is not available, reliability is estimated in one of four ways: 1. Internal consistency: Estimation based on the correlation among the variables comprising the set (typically, Cronbach’s alpha). 2. Split-half reliability: Estimation based on the correlation of two equivalent forms of the scale (typically, the Spearman-Brown coefficient). 3. Test-retest reliability: Estimation based on the correlation between two (or more) administrations of the same item, scale, or instrument for different times, locations, or populations, when the two administrations do not differ on other relevant variables (typically, the Spearman Brown coefficient). 4. Inter-rater reliability: Estimation based on the correlation of scores between/among two or more raters who rate the same item, scale, or instrument (typically, intraclass correlation, of which there are six types discussed below). These four reliability estimation methods are not necessarily mutually exclusive, nor need they lead to the same results. All reliability coefficients are forms of correlation coefficients, but there are multiple types discussed below, representing different meanings of reliability and more than one might be used in single research setting. " [Accessed January 1, 2010]. Available at: http://faculty.chass.ncsu.edu/garson/PA765/reliab.htm.
You can find an earlier version of this page on my website.