Andrew Gelman and Christian Hennig; summarised by Finn Lindgren
22/05/2017
Personal decision making cannot be avoided in statistical data analysis, and for want of approaches to justify such decisions, the pursuit of objectivity degenerates easily to a pursuit to merely appear objective.
Scientists whose methods are branded as subjective have the awkward choice of either saying, No, we are really objective, or else embracing the subjective label and turning it into a principle, and the temptation is high to avoid this by hiding researcher degrees of freedom from the public unless they can be made to appear “objective.”
Such attitudes about objectivity and subjectivity can be an obstacle to good practice in data analysis and its communication, and we believe that researchers can be guided in a better way by a list of more specific scientific virtues when choosing and justifying their approaches.
One problem is that the terms “objective” and “subjective” are loaded with so many associations and are often used in a mixed descriptive/normative way.
For example, a statistical method that does not require the specification of any tuning parameters is objective in a descriptive sense (it does not require decisions by the individual scientist).
Often this is presented as an advantage of the method without further discussion, implying objectivity as a norm, […] [but] the analyst must make the decision of whether to use an auto-tuned approach in a setting where its inferences do not appear to make sense.
The frequentist interpretation of probability is objective in the sense that it locates probabilities in an objective world that exists independently of the observer, but the definition of these probabilities requires a subjective definition of a reference set.
Some proponents of frequentism consider its objectivity (in the sense of impersonality, conditional on the definition of the reference set) as a virtue, but this property is ultimately only descriptive; it does not imply on its own that such probabilities indeed exist in the objective world, nor that they are a worthwhile target for scientific inquiry.
The confusion arises from two directions: first, prior distributions are not necessarily any more subjective than other aspects of a statistical model; indeed, in many applications priors can and are estimated from data frequencies […].
Second, somewhat arbitrary choices come into many aspects of statistical models, Bayesian and otherwise, and therefore we think it is a mistake to consider the prior distribution as the exclusive gate at which subjectivity enters a statistical procedure.
It is […] seen as desirable that any required data-analytic decisions or tuning are performed in an objective manner, either determined somehow from the data or justified by some kind of optimality argument.
On the other hand, practitioners must apply their subjective [judgement] in the choice of what method to use, what assumptions to invoke, and what data to include in their analyses. Even using “no need for tuning” as a criterion for method selection or prioritizing bias, for example, or mean squared error, is a subjective decision.
In the context of statistical analysis, a key aspect of objectivity is therefore a process of transparency, in which the choices involved are justified based on external, potentially verifiable sources or at least transparent considerations (ideally accompanied by sensitivity analyses if such considerations leave alternative options open), a sort of “paper trail” leading from external information, through modeling assumptions and decisions about statistical analysis, all the way to inferences and decision recommendations.
The current push of some journals to share data and computer code and the advent of tools to better organize code and projects such as Github and version control goes in this direction.
Transparency also comprises spelling out explicit and implicit assumptions about the data production, some of which may be unverifiable.
Science aims at stable consensus […], which is one reason that the current crisis of non-replication is taken so seriously in psychology (Yong, 2012). Transparency contributes to this building of consensus by allowing scholars to trace the sources and information used in statistical reasoning […]. Furthermore, scientific consensus, as far as it deserves to be called “objective,” requires rationales, clear arguments, and motivation, along with elucidation of how this relates to already existing knowledge.
Following generally accepted rules and procedures counters the dependence of results on the personalities of individual researchers, although there is always a danger that such generally accepted rules and procedures are inappropriate or suboptimal for the specific situation at hand.
For such reasons, one might question the inclusion of consensus as a virtue. Its importance stems from the impossibility to access observer-independent reality which means that exchange between observers is necessary to find out about what can be taken as real and stable. Consensus cannot be enforced; as a virtue it refers to behavior that facilitates consensus.
In any case, consensus can only be achieved if researchers attempt to be impartial by taking into account competing perspectives, avoiding to favor pre-chosen hypotheses, and being open to criticism. In the context of epidemiology, Greenland (2012) proposes transparency and neutrality as replacements for objectivity.
As statisticians we are concerned with making general statements based on systematized observations, and this makes correspondence to observed reality a core concern regarding objectivity. This is not meant to imply that empirical statements about observations are the only meaningful ones that can be made about reality;
we think that scientific theories that cannot be verified (but potentially be falsified) by observations are meaningful thought constructs, particularly because observations are never “pure” and truly independent of thought constructs. Certainly in some cases the measurements, i.e., the observations the statistician deals with, require critical scrutiny before discussing any statistical analysis of them […].
A counterproductive implication of the idea that science should be “objective” is that there is a tendency in the communication of statistical analyses to either avoid or hide decisions that cannot be made in an automatic, seemingly “objective” fashion by the available data. Given that all observations of reality depend on the perspective of an observer, interpreting science as striving for a unique (“objective”) perspective is illusory. Multiple perspectives are a reality to be reckoned with and should not be hidden.
Furthermore, by avoiding personal decisions, researchers often waste opportunities to adapt their analyses appropriately to the context, the specific background and their specific research aims, and to communicate their perspective more clearly. Therefore we see the acknowledgment of multiple perspectives and context dependence as virtues, making clearer in which sense subjectivity can be productive and helpful.
To connect with the other part of our proposal, the recognition of different perspectives should be done in a transparent way. We should not say we set a tuning parameter to 2.5 (say) just because that is our belief. Rather, we should justify the choice explaining clearly how it supports the research aims.
[…] often it can be argued that even then conscious tuning or specification of a prior distribution comes with benefits compared to using default methods of which the main attraction often is that seemingly “subjective” decisions can be avoided.
To consider an important example, regularization requires such decisions. Default priors on regression coefficients are used to express the belief that coefficients are typically close to zero, and from a non-Bayesian perspective, lasso shrinkage can be interpreted as encoding an external assumption of sparsity.
Sparsity assumptions can be connected to an implicit or explicit model in which problems are in some sense being sampled from some distribution or probability measure of possible situations; see Section 5.5. This general perspective (which can be seen as Bayesian with an implicit prior on states of nature, or classical with an implicit reference set for the evaluation of statistical procedures) provides a potential basis to connect choices to experience; at least it makes transparent what kind of view of reality is encoded in the choices.
Stability can refer to reproducibility of conclusions on new data, or to alternative analyses of the same data making different choices regarding for example tuning constants, Bayesian priors, transformations, resampling, removing outliers, or even completely different methodology as far as this aims at investigating the same issue (alternative analyses that can be interpreted as doing something essentially different cannot expected to deliver a similar result). On the most basic (but not always trivial) level, the same analysis on the same data should be replicable by different researchers. In statistical theory, basic variability assuming a parametric model can be augmented by robustness against various violations of model assumptions and Bayesian sensitivity analysis.
For another example, the binomial-data confidence interval based on (y + 2)/(n + 4) gives better coverage than the classical interval based on y/n (Agresti and Coull, 1998). Whereas the latter has a straightforward justification, the former is based on trading interval width against conservatism and involves some approximation and simplification, which the authors justify by the fact that the resulting formula can be presented in elementary courses.
Debating whether this is more subjective than the classical approach, and whether this is a problem, is not helpful. Similarly, when comparing Bayesian estimates of public opinion using multilevel regression and poststratification to taking raw survey means (which indeed correspond to Bayesian analyses under unreasonable flat priors), it is irrelevant which is considered more subjective.
Transparency:
Consensus:
Impartiality:
Correspondence to observable reality:
Awareness of multiple perspectives and awareness of context dependence:
Investigation of stability: