Dimensions

Honesty in Statistics

Stan Miller

"
You can prove anything with statistics." But can you really? Perhaps a more accurate expression would be: "You can assert anything with statistics." Proving something and asserting something are two different claims.

After the most recent U.S. presidential election, supporters of the winning candidate claimed with fervor that more votes were cast for their man than were cast for either of the other two candidates. Conclusion: the country was strongly behind the winner. Opposition supporters claimed that more than half of the votes were cast for candidates other than the victor; that is, the winner received less than 50% of all votes. Conclusion: more than half the country did not support the winner. So, we must discern carefully.

Although the term "statistics" usually means a collection of data, numerical information, or facts, mathematical statistics is the science that carefully analyzes such data and examines the inferences involved in that process. Though cultures dating back to the ancient Babylonians and Egyptians gathered and tabulated data on trading commodities, wealth, and population, we see little in the way of statistical inference until the modern formulation of probability theory in the early nineteenth century.

One of the first to push the field of statistics from mere data tabulation to the broader realm of inference was the German mathematician, Karl Gauss, for whom the gaussian, or normal probability distribution curve (bell-shaped curve) was named. Many of the mathematical tools used by modern statisticians, including statistical "confidence intervals" and hypothesis testing, are based on the gaussian distribution.

In a sense, statistical tools provide a means to measure and characterize a population that is too large or complex to understand completely. When someone uses a survey (sample) to estimate the proportion of a population having a given attribute, e.g., the proportion of people favoring a specified social policy or favoring one brand of soft drink, such an estimate is not perfectly accurate but has some error associated with it. This error is caused primarily by limited sampling and by inherent variations in the population under study. We quantify or measure this by error bounds (sometimes called "margin of error") or by confidence intervals (CI). A confidence interval is a statistical representation of the error and is expressed in terms of a percentage. To continue the proportion examples given above, a 95% CI would indicate that in repeated surveys of the population, the true proportion of the population would be contained in the confidence interval in 95% of the surveys, though the most accurate proportion could only be known by an exhaustive sampling of the entire population. Of course, one can reduce the margin of error substantially by increasing the sample size.

When we evaluate the results of a public opinion poll, we should ask two questions:

(1) Was a random, representative sample of the population obtained?

(2) Is pertinent information (e.g., sample size and error bounds) provided along with the estimated proportion of those who favored the issue of interest? If both questions do not receive an affirmative answer, then we should be skeptical about the completeness and/or honesty of the statistical analysis. To maintain some degree of honesty, the news media may state that "the poll is not scientific." But if such is the case, then why report the results at all, other than to sell newspapers or gain a larger share of the broadcast audience?

Of course, sometimes, the lack of scientific procedure may be offset by a much larger sample size than is generally used in a scientific study. In politics, for example, the large number of callers in a non-scientific call-in survey may actually come much closer to anticipating actual election results by being very similar to the election itself.

Those who measure and describe populations by means of statistics ought to be very careful, for "diverse weights and diverse measures are an abomination to the Lord" (Prov. 20:10). Honesty in reporting data and facts requires only that they be reported as objective information. Honesty in making statistical inferences requires following basic mathematical principles and reporting all pertinent assumptions used in deriving the inferences. God has said, "You shall do no injustice in judgment, in measurement of length, weight, or volume. You shall have just balances, just weights" (Lev. 19:35, 36).



________________
Credenda/Agenda Vol. 6, No. 3