Subject: Simpson's paradox

Simpson's paradox refers to the reversal of relationship following the
collapsing over heterogeneity of multi-way contingency tables.  More
recently, it has come to refer to ANY change of relationship (i.e., magnitude
and sign) following such collapsing.  It can be seen to be a special case of
the more general problem of inappropriate cross-level inference, of which the
"ecological fallacy" and the "individualistic fallacy" are also special
cases.  It is also a very old problem, recognised at least as early as Yule
(1903).

An intriguing recent example is that provided by Wardrop (1995) in which he
argues that the much believed, but fallacious "hot-hand" in basketball arises
from just such collapsing.  That is, collapsed over heterogeneous shooters,
there is an apparent "hot-hand", but only at that aggregated level of
analysis; with shooters as the unit of analysis, no such relationship exists
(as Gilovich and Tversky have long maintained).

Anil Menon (Syracuse University, School of Engg. and Computer Science)
compiled this reference list and placed it on a "Simpson's Paradox" web-site
he had created.  Unfortunately, the web-site URL has changed (or no longer
exists):

Articles on Simpson's Paradox and Related topics

   Last updated: 19/03/96


   I got interested in Simpson's paradox while studying deception in
   Genetic Algorithms. Here is a list of articles that might be useful.
   Fortunately, John Vokey at the Department of Psychology, University of
   Lethbridge, was kind enough to post most of these references, saving
   me an ascii adventure. I have grouped the bibliography in several
   ways:
     * A Beginner's Guide,
     * Chronologically,
     * Alphabetically.

   Eventually, I may put up a topically organized list as well... Please
   inform me if I have let out any pertinent articles. Some related
   links are:
     * Simple example based on drug tests.
     * A news group discussion (may have been removed).
     * Graphical Methods for Categorical Data.


     _________________________________________________________________


A Beginner's Guide

   I would recommend that the newcomer start off with:
     * Authors : Blyth, C. R.
       Title : On Simpson's paradox and the sure thing principle.
       Source : Journal of the American Statistical Association, 67,
       1972, 364-381.

   For some real-life examples of Simpson's paradox, see Keyfitz's
   classic book.
     * Authors : Keyfitz, N.
       Booktitle : Applied mathematical demography, Wiley, New York, pp.
       385-391, 1977.

   My favorite analysis of Simpson's paradox is the one in Simon and
   Blume's excellent book:
     * Authors : Simon, C. P. and Blume, L.
       Booktitle : Mathematics for Economists, W. W. Norton and Company,
       New York, pp. 368-371, pp. 784-791, 1994.

   They explain it using Don Saari's results. The importance of his work
   in the study of ``social paradoxes'' cannot be over-emphasized. A good
   starting point to Saari's remarkable theorem is:
     * Authors : Saari, D. G.
       Title : The source of some paradoxes from social choice and
       probability.
       Source : Journal of Economic Theory, 41(1), 1-22, 1987

   Shyam Sunder's paper gives Yuji Ijiri's necessary and sufficient
   condition for Simpson's paradox to occur in the ``simplest possible
   case''. This condition is a special case of Saari's theorem, but is
   particularly clear and simple to use in practice. I had no idea
   accountants worried about such matters.
     * Authors : Sunder, S.
       Title : Simpson's reversal paradox and cost allocation.
       Source : Journal of Accounting Research, 21, 222-233, 1983.

   Finally, I urge the reader to take a look at Vaupel and Yashin's very
   readable paper on the pernicious effects of heterogeneity on
   statistical decision making. It reads like a Stephen King novel (and
   is also equally horrifying).
     * Authors : Vaupel, J. W. and Yashin, A. I.
       Title : Heterogeneity's ruses: some surprising effects of
       selection on population dynamics.
       Source : The American Statistician, 39(3), 176-185, 1985.


     _________________________________________________________________

Chronological Bibliography

The 1900's

   Authors : Yule, G. U.
   Title : Notes on the theory of association of attributes in
   statistics.
   Source : Biometrica, 2, 121-134, 1903.

The 1930's

   Authors : Thorndike, E. L.
   Title : On the fallacy of imputing the correlations found for groups
   to individuals or smaller groups composing them.
   Source : American Journal of Psychology, 52, 122-124, 1939.

The 1940's

   Authors : Deming, M. E. and Stephan, F. F.
   Title : On a least squares adjustment of a sampled frequency table
   when the expected marginal totals are known.
   Source : Annals of Mathematical Statistics, 11, 1940, 427-444.

   Authors : Lindquist, E. F.
   Title : Statistical analysis in educational research.
   Source : Boston: Houghton Mifflin, 1940.

   Authors : Deming, W. E. Title : Statistical adjustment of data.
   Source : New York: Dover Publications, Inc., 1943.

The 1950's

   Authors : Robinson, W. S.
   Title : Ecological correlations and the behavior of individuals.
   Source : American Sociological Review, 15, 351-357, 1950.

   Authors : Simpson, E. H.
   Title : The interpretation of interaction in contingency tables.
   Source : The American Statistician, 13, 238-241, 1951.

The 1960's

   Authors : Mosteller, F.
   Title : Association and estimation in contingency tables.
   Source : Journal of the American Statistical Association, 63, 1-28,
   1968.

The 1970's

   Authors : Goodman, L. A.
   Title : The multivariate analysis of qualitative data: interactions
   among multiple classifications.
   Source : Journal of the American Statistical Association, 65, 226-256,
   1970.

   Authors : Blyth, C. R.
   Title : On Simpson's paradox and the sure thing principle.
   Source : Journal of the American Statistical Association, 67, 1972,
   364-381.

   Authors : Bickel, P. J., Hammel, E. A., and O'Connell, J. W.
   Title : Sex bias in graduate admissions: Data from Berkeley.
   Source : Science, 187, 1975, 398-404.

   Authors : Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W.
   Title : Discrete multivariate analysis: Theory and practice. Source :
   Cambridge, Massachusetts: The MIT Press, 1975.

   Authors : Gardner, M.
   Title : On the fabric of inductive logic and some probability
   paradoxes.
   Source : Scientific American, 234, 119- 124, 1976.

   Authors : Fienberg, S. E.
   Title : The analysis of cross-classified categorical data.
   Source : Cambridge, Massachusetts: The MIT Press, 1977.

   Authors : Keyfitz, N.
   Booktitle : Applied mathematical demography, Wiley, New York, pp.
   385-391, 1977.

   Authors : Knapp, T. R.
   Title : The unit-of-analysis problem in applications of simple
   correlation analysis to educational research.
   Source : Journal of Educational Statistics, 2, 171-186, 1977.

   Authors : Freedman, D., Pisani, R., and Purves, R.
   Title : Statistics.
   Source : W.W. Norton & Company, New York, 1978.

   Authors : Whittemore, A. S.
   Title : Collapsibility of multi- dimensional contingency tables.
   Source : Journal of the Royal Statistical Society, Ser. B., 40,
   328-340, 1978.

The 1980's

   Authors : Hintzman, D. L.
   Title : Simpson's paradox and the analysis of memory retrieval.
   Source : Psychological Review, 87, 398-410, 1980.

   Authors : Flexser, A. J.
   Title : Homogenizing the 2 X 2 contingency table: A method for
   removing dependencies due to subject and item differences.
   Source : Psychological Review, 88, 327-339, 1981.

   Authors : Martin, E.
   Title : Simpson's paradox resolved: A reply to Hintzman.
   Source : Psychological Review, 88, 372-374, 1981.

   Authors : Mantell, N.
   Title : Simpson's paradox in reverse.
   Source : The American Statistician, 36, 395, 1982.

   Authors : Saari, D. G.
   Title : Inconsistencies of weighted summation voting systems.
   Source : Mathematics of Operations Research, 7(4), 479-490, 1982.

   Authors : Shapiro, S. H.
   Title : Collapsing contingency tables -- a geometric approach.
   Source : The American Statistician, 36, 43-46, 1982.

   Authors : Wagner, C. H.
   Title : Simpson's paradox in real life.
   Source : The American Statistician, 36, 46-48, 1982.

   Authors : Kennedy, J. J. (1983)
   Title : Analyzing qualitative data. Introductory log-linear analysis
   for behavioral research.
   Source : New York: Praeger Publishers, 1983.

   Authors : Sunder, S.
   Title : Simpson's reversal paradox and cost allocation.
   Source : Journal of Accounting Research, 21, 222-233, 1983.

   Authors : Knapp, T. R.
   Title : Instances of Simpson's paradox.
   Source : College Mathematics Journal, 16, 209-211, 1985.

   Authors : Paik, M.
   Title : A graphic representation of a three-way contingency table:
   Simpson's paradox and correlation.
   Source : The American Statistician, 39, 53-54, 1985.

   Authors : Vaupel, J. W. and Yashin, A. I.
   Title : The deviant dynamics of death in heterogeneous populations.
   Source : Sociological Methodology, Tuma, N. B. (ed), pp. 179-211,
   1985.

   Authors : Vaupel, J. W. and Yashin, A. I.
   Title : Heterogeneity's ruses: some surprising effects of selection on
   population dynamics.
   Source : The American Statistician, 39(3), 176-185, 1985.

   Authors : Cohen, J. E.
   Title : An uncertainty principle in demography and the unisex issue.
   Source : The American Statistician, 41, 1986, 32-39.

   Authors : Saari, D. G.
   Title : The source of some paradoxes from social choice and
   probability.
   Source : Journal of Economic Theory, 41(1), 1-22, 1987

   Authors : Saari, D. G.
   Title : Symmetry, Voting and Social Choice
   Source : The Mathematical Intelligencer, 10(3), 32-42, 1988.

   Authors : Kaigh, W. D.
   Title : A category representation paradox.
   Source : The American Statistician, 43(2), 92-97, 1989.

   Authors : Wermuth, N.
   Title : Moderating effects of subgroups in linear models.
   Source : Biometrika, 76, 81-92, 1989.

The 1990's

   Authors : Freehling, J. S.
   Title : Simpson's paradox and database profiling.
   Source : Direct Marketing, 53(5), 26-27, 1990.

   Authors : Haunsperger, D. B. and Saari, D. G.
   Title : The lack of consistency for statistical decision procedures.
   Source : The American Statistician, 45(3), 252-255, 1991.

   Authors : Klay, M. P. and Wesley, L. P.
   Title : Simpson's paradox: a maximum likelihood solution.
   Source : SRI International Technical Report, No. 502, 1-11, 1991.

   Authors : Mittal, Y.
   Title : Homogeneity of subpopulations and Simpson's Paradox.
   Source : Journal of the American Statistical Association, 86(413),
   167-172, 1991.

   Authors : Abramson N. S., Kelsey S. F., Safar P., and Sutton-Tyrrell
   K.
   Title : Simpson's paradox and clinical trials: What you find is not
   necessarily what you prove.
   Source : Annals of Emergency Medicine 21, pp. 1480-1482, 1992.

   Authors : DeBlois, B. M.
   Title : Simpson's Paradox.
   Source : Mathematica Militaris, 3(1), 1992.

   Authors : Mehrez, A., Brown, J. R., and Khouja, M.
   Title : Aggregate efficiency measures and Simpson's paradox.
   Source : Contemporary Accounting Research, 9(1), 329-342, 1992.

   Authors : Rogers, A.
   Title : Heterogeneity and selection in multistate population analysis.
   Source : Demography, 29(1), 31-38, 1992.

   Authors : Gunter, B.
   Title : A trio of statistical double takes.
   Source : Quality Progress, 26(6), 84-86, 1993.

   Authors : Simon, C. P. and Blume, L.
   Booktitle : Mathematics for Economists, W. W. Norton and Company, New
   York, pp. 368-371, pp. 784-791, 1994.

   Authors : Wardrop, R. L.
   Title : Simpson's Paradox and the Hot Hand in Basketball.
   Source : The American Statistician, 49, 24-28, 1995.


     _________________________________________________________________

Alphabetical Bibliography

   Authors : Abramson N. S., Kelsey S. F., Safar P., and Sutton-Tyrrell
   K.
   Title : Simpson's paradox and clinical trials: What you find is not
   necessarily what you prove.
   Source : Annals of Emergency Medicine 21, pp. 1480-1482, 1992.

   Authors : Bickel, P. J., Hammel, E. A., and O'Connell, J. W.
   Title : Sex bias in graduate admissions: Data from Berkeley.
   Source : Science, 187, 1975, 398-404.

   Authors : Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W.
   Title : Discrete multivariate analysis: Theory and practice. Source :
   Cambridge, Massachusetts: The MIT Press, 1975.

   Authors : Blyth, C. R.
   Title : On Simpson's paradox and the sure thing principle.
   Source : Journal of the American Statistical Association, 67, 1972,
   364-381.

   Authors : DeBlois, B. M.
   Title : Simpson's Paradox.
   Source : Mathematica Militaris, 3(1), 1992.

   Authors : Cohen, J. E.
   Title : An uncertainty principle in demography and the unisex issue.
   Source : The American Statistician, 41, 1986, 32-39.

   Authors : Deming, W. E. Title : Statistical adjustment of data.
   Source : New York: Dover Publications, Inc., 1943.

   Authors : Deming, M. E. and Stephan, F. F. Title : On a least squares
   adjustment of a sampled frequency table when the expected marginal
   totals are known.
   Source : Annals of Mathematical Statistics, 11, 1940, 427-444.

   Authors : Fienberg, S. E.
   Title : The analysis of cross-classified categorical data.
   Source : Cambridge, Massachusetts: The MIT Press, 1977.

   Authors : Flexser, A. J.
   Title : Homogenizing the 2 X 2 contingency table: A method for
   removing dependencies due to subject and item differences.
   Source : Psychological Review, 88, 327-339, 1981.

   Authors : Freedman, D., Pisani, R., and Purves, R.
   Title : Statistics.
   Source : W.W. Norton & Company, New York, 1978.

   Authors : Freehling, J. S.
   Title : Simpson's paradox and database profiling.
   Source : Direct Marketing, 53(5), 26-27, 1990.

   Authors : Gardner, M.
   Title : On the fabric of inductive logic and some probability
   paradoxes.
   Source : Scientific American, 234, 119- 124, 1976.

   Authors : Gunter, B.
   Title : A trio of statistical double takes.
   Source : Quality Progress, 26(6), 84-86, 1993.

   Authors : Goodman, L. A.
   Title : The multivariate analysis of qualitative data: interactions
   among multiple classifications.
   Source : Journal of the American Statistical Association, 65, 226-256,
   1970.

   Authors : Haunsperger, D. B. and Saari, D. G.
   Title : The lack of consistency for statistical decision procedures.
   Source : The American Statistician, 45(3), 252-255, 1991.

   Authors : Hintzman, D. L.
   Title : Simpson's paradox and the analysis of memory retrieval.
   Source : Psychological Review, 87, 398-410, 1980.

   Authors : Kaigh, W. D.
   Title : A category representation paradox.
   Source : The American Statistician, 43(2), 92-97, 1989.

   Authors : Kennedy, J. J. (1983)
   Title : Analyzing qualitative data. Introductory log-linear analysis
   for behavioral research.
   Source : New York: Praeger Publishers, 1983.

   Authors : Keyfitz, N.
   Booktitle : Applied mathematical demography, Wiley, New York, pp.
   385-391, 1977.

   Authors : Klay, M. P. and Wesley, L. P.
   Title : Simpson's paradox: a maximum likelihood solution.
   Source : SRI International Technical Report, No. 502, 1-11, 1991.

   Authors : Knapp, T. R.
   Title : The unit-of-analysis problem in applications of simple
   correlation analysis to educational research.
   Source : Journal of Educational Statistics, 2, 171-186, 1977.

   Authors : Knapp, T. R.
   Title : Instances of Simpson's paradox.
   Source : College Mathematics Journal, 16, 209-211, 1985.

   Authors : Lindquist, E. F. Title : Statistical analysis in educational
   research.
   Source : Boston: Houghton Mifflin, 1940.

   Authors : Mantell, N.
   Title : Simpson's paradox in reverse.
   Source : The American Statistician, 36, 395, 1982.

   Authors : Martin, E.
   Title : Simpson's paradox resolved: A reply to Hintzman.
   Source : Psychological Review, 88, 372-374, 1981.

   Authors : Mehrez, A., Brown, J. R., and Khouja, M.
   Title : Aggregate efficiency measures and Simpson's paradox.
   Source : Contemporary Accounting Research, 9(1), 329-342, 1992.

   Authors : Mittal, Y.
   Title : Homogeneity of subpopulations and Simpson's Paradox.
   Source : Journal of the American Statistical Association, 86(413),
   167-172, 1991.

   Authors : Mosteller, F.
   Title : Association and estimation in contingency tables.
   Source : Journal of the American Statistical Association, 63, 1-28,
   1968.

   Authors : Paik, M.
   Title : A graphic representation of a three-way contingency table:
   Simpson's paradox and correlation.
   Source : The American Statistician, 39, 53-54, 1985.

   Authors : Rogers, A.
   Title : Heterogeneity and selection in multistate population analysis.
   Source : Demography, 29(1), 31-38, 1992.

   Authors : Robinson, W. S.
   Title : Ecological correlations and the behavior of individuals.
   Source : American Sociological Review, 15, 351-357, 1950.

   Authors : Saari, D. G.
   Title : Inconsistencies of weighted summation voting systems.
   Source : Mathematics of Operations Research, 7(4), 479-490, 1982.

   Authors : Saari, D. G.
   Title : The source of some paradoxes from social choice and
   probability.
   Source : Journal of Economic Theory, 41(1), 1-22, 1987

   Authors : Saari, D. G.
   Title : Symmetry, Voting and Social Choice
   Source : The Mathematical Intelligencer, 10(3), 32-42, 1988.

   Authors : Shapiro, S. H.
   Title : Collapsing contingency tables -- a geometric approach.
   Source : The American Statistician, 36, 43-46, 1982.

   Authors : Simon, C. P. and Blume, L.
   Booktitle : Mathematics for Economists, W. W. Norton and Company, New
   York, pp. 368-371, pp. 784-791, 1994.

   Authors : Simpson, E. H.
   Title : The interpretation of interaction in contingency tables.
   Source : The American Statistician, 13, 238-241, 1951.

   Authors : Sunder, S.
   Title : Simpson's reversal paradox and cost allocation.
   Source : Journal of Accounting Research, 21, 222-233, 1983.

   Authors : Thorndike, E. L.
   Title : On the fallacy of imputing the correlations found for groups
   to individuals or smaller groups composing them.
   Source : American Journal of Psychology, 52, 122-124, 1939.

   Authors : Vaupel, J. W. and Yashin, A. I.
   Title : Heterogeneity's ruses: some surprising effects of selection on
   population dynamics.
   Source : The American Statistician, 39(3), 176-185, 1985.

   Authors : Vaupel, J. W. and Yashin, A. I.
   Title : The deviant dynamics of death in heterogeneous populations.
   Source : Sociological Methodology, Tuma, N. B. (ed), pp. 179-211,
   1985.

   Authors : Wagner, C. H.
   Title : Simpson's paradox in real life.
   Source : The American Statistician, 36, 46-48, 1982.

   Authors : Wardrop, R. L.
   Title : Simpson's Paradox and the Hot Hand in Basketball.
   Source : The American Statistician, 49, 24-28, 1995.

   Authors : Wermuth, N.
   Title : Moderating effects of subgroups in linear models.
   Source : Biometrika, 76, 81-92, 1989.

   Authors : Whittemore, A. S.
   Title : Collapsibility of multi- dimensional contingency tables.
   Source : Journal of the Royal Statistical Society, Ser. B., 40,
   328-340, 1978.

   Authors : Yule, G. U.
   Title : Notes on the theory of association of attributes in
   statistics.
   Source : Biometrica, 2, 121-134, 1903.

--
Dr. John R. Vokey, Associate Professor, Department of Psychology
University of Lethbridge, Lethbridge, Alberta, CANADA  T1K 3M4
mailto:vokey@hg.uleth.ca  http://www.uleth.ca/~vokey