This blog is related to the book "The Search for Certainty. On the Clash of Science and Philosophy of Probability" by Krzysztof Burdzy (World Scientific, 2009). See the book description and sample chapters, and buy the book online.
April 10, 2010
Statistical time capsules
Statisticians may help their colleagues working in the distant future evaluate currently used statistical methods. One way to do that would be to include explicit isolated predictions in published results of statistical analysis. Typical results of a statistical analysis include a large number of probabilistic statements, implicit or explicit. A good example is the posterior distribution in Bayesian analysis. The posterior distribution is a large number of probability values. A prediction is an event of high probability. One should choose one or at most a handful of predictions from among all events of high probability present in the results of each case of statistical analysis. The prediction should be concerned with an event that is significant to the users of statistics. It should also be an event that is likely to be known to have happened (or not) in the future. Some events (or their complements), such as war or death, are likely to be unambiguously observable. Some other events, such as the value of a physical quantity falling outside a confidence intervals, may be observable in the future because of technological progress.
Inserting a prediction in a statistical report would be similar to other activities in which people leave information for the future generations or other, perhaps unknown, people. Examples include time capsules, to be opened hundreds of years from now, that will help future historians and archaeologists. They also include name tags warn by patients at hospitals and by military personnel, to help identify people in case of death.
Explicit predictions inserted in statistical reports would help future generations of statisticians to evaluate various statistical methods. Statisticians can revisit old statistical reports and compare them to currently known facts. But a typical statistical report contains a multitude of implicit predictions so the analysis of the past predictions may be biased by the present statistician's choice of the past predictions to be confronted with the currently known facts. Inserting a single explicit prediction in a statistical report would remove a bit of subjectivity from the analysis of results of a single case of statistical analysis.
Subscribe to:
Post Comments (Atom)
I frankly do not see the point in this physical implementation of prediction based on statistical techniques. Under the assumption that the model is correct, expectations can be computed, hence expected errors can be produced, without having to wait for a long range of physical outcomes to calibrate the method under scrutiny. If the model is incorrect, all methods eventually fail.
ReplyDeleteI should have said that my original blog post was related to Section 6.4 in my book titled "Experimental statistics - a missing science". Perhaps I should have used the expression "archaeological statistics" rather than "experimental statistics". I got the idea for "experimental" from "experimental physics".
ReplyDeleteA good way to explain the idea of Section 6.4 is to consider DDT - an infamous pesticide. DDT had been investigated in laboratories and tested in practice before being widely used in agriculture. But it took a long time before its true environmental impact was recognized.
Applied statistics is embedded in applied science because applied scientists use statistics to analyze their data. For this reason, the time that it may take to see the real impact of applied statistical methods is on the same time scale as the time that takes to understand the consequences of specific methods of applied science. There is nothing wrong with the short term checks of statistical methods, such as computer simulations. But everybody understands that a computer simulation of nuclear explosion is not the same as a nuclear explosion. Once again, the DDT case is a good real life example.
It would be very tedious for statistical archaeologists to go over thousands of applications of statistics published 30 or 50 years ago and compare them to currently available knowledge in a given field of science. So may proposal for the statistical time capsule is a crude but potentially helpful idea to simplify the process for the future statistical archaeologists. A time capsule would allow them to check only one clear prediction per case of applied statistical analysis.