2017-03-21 MedSci MedSci原创
为此,一个统计学家组成的团队,其中包括卡内基梅隆大学的罗伯特E.卡斯,撰写了论文“有效统计实践的十个简单规则”,并发表在PLOS计算生物学期刊的“十个简单规则”系列,目的是帮助研究团体,特别是那些不是统计专家的科学家们,或缺乏专门统计人员的科研团队,了解如何避免统计陷阱,进行不合理的统计推理。 “我们作为研究人员,中心工作和日常任务就是和数据进行沟通,针对我们想要解决的问题,破译其中数据
“我们作为研究人员,中心工作和日常任务就是和数据进行沟通,针对我们想要解决的问题,破译其中数据所蕴含的意义,” 统计学和机器学**教授卡斯写道,他也是认知神经基础中心代理主任。
这些原则是2016年6月9日在网上发布的,已经收到了非常多的关注,到目前为止,已经有超过37000页的评论,使它成为这个系列中,最受关注的前20篇论文之一,论文总数是60篇。对于他们受关注的程度,卡内基梅隆大学心理系主任Michael J. Tarr一点也不惊讶。
① 统计方法应使数据回答的科学问题
② 数据总有误差和缺失
③ 事前规划非常重要
④ 关注数据质量
⑤ 统计分析不仅仅是一系列的计算
⑥ 保持简单
⑦ 提供变量的评估
⑧ 检查你的假设
⑨ 如果可能的话,对结果进行复制
⑩ 使你的分析结论可复制
A summary of the 10 rules:
#1 – Statistical Methods Should Enable Data to Answer Scientific Questions
Collaborating with statisticians is often most helpful early in an investigation because inexperienced users of statistics often focus on which technique to use to analyze data, rather than considering all of the ways the data may answer the underlying scientific question.
#2 – Signals Always Come With Noise
Variability comes in many forms, but it is crucial to understand when it is good and when it is noise in order to express uncertainty. It also helps to identify likely sources of systematic error.
#3 – Plan Ahead, Really Ahead
Asking questions at the design stage can save headaches at the analysis stage. Careful data collection also can greatly simplify analysis and make it more rigorous.
#4 – Worry About Data Quality
When it comes to data analysis, “garbage in produces garbage out.” The complexity of modern data collection requires many assumptions about the function of technology, often including data pre-processing technology, which can have profound effects that can easily go unnoticed.
#5 – Statistical Analysis Is More Than a Set of Computations
Statistical software provides tools to assist analysis, not define them. The scientific context is critical, and the key to principled statistical analysis is to bring analytical methods into close correspondence with scientific questions.
#6 – Keep it Simple
Simplicity trumps complexity. Large numbers of measurements, interactions among explanatory variables, nonlinear mechanisms of action, missing data, confounding, sampling biases and other factors can require an increase in model complexity. But, keep in mind that a good design, implemented well, can often allow simple methods of analysis to produce strong results.
#7 – Provide Assessments of Variability
A basic purpose of statistical analysis is to help assess uncertainty, often in the form of a standard error or confidence interval, and one of the great successes of statistical modeling and inference is that it can provide estimates of standard errors from the same data that produce estimates of the quantity of interest. When reporting results, it is essential to supply some notion of statistical uncertainty.
#8 – Check Your Assumptions
Widely available statistical software makes it easy to perform analyses without careful attention to inherent assumptions, and this risks inaccurate, or even misleading, results. It is therefore important to understand the assumptions embodied in the methods and to do whatever possible to understand and assess those assumptions.
#9 – When Possible, Replicate!
Ideally, replication is performed by an independent investigator. The scientific results that stand the test of time are those that get confirmed across a variety of different, but closely related, situations. In many contexts, complete replication is very difficult or impossible, as in large-scale experiments such as multi-center clinical trials. In those cases, a minimum standard would be to follow Rule 10.
#10 – Make Your Analysis Reproducible
Given the same set of data, together with a complete description of the analysis, it should be possible to reproduce the tables, figures and statistical inferences. Dramatically improve the ability to reproduce findings by being very systematic about the steps in the analysis, by sharing the data and code used to produce the results and by following accepted statistics best practices.
In addition to Kass, the co-authors are Johns Hopkins University’s Brian S. Caffo, North Caroline State University’s Marie Davidian, Harvard University’s Xiao-Li Meng, Bin Yu of the University of California Berkeley, and Nancy Reid of the University of Toronto.
“I am a big believer in the value of identifying major ideas in statistics, and stating them clearly and concisely,” Kass said. “The 10 simple rules series is terrific, having proven its worth as a format for high-level scientific concepts. This article was pretty hard work, but we had a great team and I was extremely happy with the result.”