what is factor vs. component analysis, and what good is either

Subject: what is factor vs. component analysis, and what good is either

Siva Ganesh says:
>Date:    Fri, 7 Aug 1998 08:05:22 GMT+12
>From:    Siva Ganesh 
>Subject: Re: Crescent of points in factor analysis.
>
>I don't have a direct answer to your particular question, perhaps I
>will think about it in the next couple of days. But, my question is,
>'did you really do a "FACTOR ANALYSIS" or was it simply a "PRINCIPAL
>COMPONENT ANALYSIS"? I have this notion that factor analysis is not
>really very useful is pure sciences (such as your study).
>I hate the fact that many computer software packages confuse between
>factor analysis and principal component analysis (eg. SPSS, SAS in
>its proc factor (the default is PCA), ...). A simple diagram I use
>for explaining the difference is,
>
>FACTORS ---> DATA (VARIABLES...) ---> PRINCIPAL COMPONENTS
>
>Ganesh.
>---------------------------------------------------




The above posting raises at least two issues; one is

>'did you really do a "FACTOR ANALYSIS" or was it simply a "PRINCIPAL
>COMPONENT ANALYSIS"?
...
>I hate the fact that many computer software packages confuse between
>factor analysis and principal component analysis (eg. SPSS, SAS

and second issue is:

>I have this notion that factor analysis is not
>really very useful is pure sciences (such as your study).

Dr. Ganesh is not alone in having such views, and I think this is one reason
why they are worth careful consideration and a careful reply. But I will
argue below that these views arise from an incomplete or mistaken picture of
factor analysis. Let me explain.


(Re: Dr. Ganesh's issue 1) Unfortunately, there is inconsistency across
scientific fields (and even to some extent _within_ some fields) on the
proper definitions of PCA and FA. One reason is that there are really three
rather than two things to be distinguished: (a) a strict Principal
Components Analysis in the mathematical sense; (b) an analysis that starts
with a PCA, then selects a reduced number of components that are then
rotated and interpreted (sometimes this is called factor analysis and
sometimes  principal component analysis); and (c) an analysis using a more
complete and formal statistical model that involves fitting parameters
describing the influence of a few latent variables or factors, and also
includes other parameters that explicitly represent or adjust for effects of
random (typically considered measurement) error. Here, random sampling of
cases is assumed, with each case providing a vector of presumably correlated
observations on multiple variables; characteristics of random errors are
also specified.

     (Note that because of the least-squares properties of truncated PCA,
method (b) above can be considered an ordinary-least-squares fitting of a
statistical model but one that does not explicitly provide for the (biasing)
effects of error on variances of variables.)

     Conceptually there are important differences between (a), (b), and (c).
However, the procedures (b) and (c) usually give similar results and lead to
the same interpretation. On the other hand, the results of procedure (a)
differs quite strongly from (b) and (c), and in any case it is not
appropriate to interpret it in terms of latent variables.

     By the way, to avoid confusion I mention that there is also a procedure
(d) commonly called "confirmatory factor analysis" in which one makes more
assumptions and so can compare fit of several explicitly stated models, and
which also has explicit treatment of the effects of error, as in (c). But
CFA is quite different in spirit and need not be discussed here (except to
note that it is no more or less "statistically valid" than the common-factor
exploratory method (c)).



(Re: Dr. Ganesh's issue 2) The common use of method (b) rather than (c) in
the "pure" (also known as "hard") sciences is probably due less to special
appropriateness of the method and more to the relative unfamiliarity of
researchers in those areas with Common Factor Analysis. But there is at
least one other possible contributing reason: it is easier to ignore the
difference between (b) and (c) in many hard science applications where there
are low levels of error in the data.


Some further explanation

1a. PRINCIPAL COMPONENTS ANALYSIS

     Mathematicians have a clear definition of PCA. To keep the discussion
simple I won't give the full definition here but will simply note that it
involves decomposition of a matrix into a sum of (outer products of) vectors
called components, and that these are mutually orthogonal and (hence) have
the property that the first component (or outer product) accounts for the
maximal amount of variance of the matrix being decomposed that could be
reproduced by any single component, the second explains maximal variance
orthogonal to the first, etc. Also, there will be as many components as
there are rows (or equivalently columns) in the original crossproduct
matrix, and the reproduction of the matrix by the sum of these vector
products will be exact.
     One common usage of this mathematical PCA in data analysis is when one
plots, for example, the first vs. second (unrotated) components to see what
interesting patterns are revealed.


1.b FACTOR ANALYSIS OR (MODIFIED) PCA

     As soon as you select fewer components and "rotate" them, you are going
beyond the mathematical definition of PCA and are inventing something
different. (E.g., because of rotation, your components no longer have
pairwise orthogonality (in the sense that pairwise inner products are zero)
and they no longer successively explain maximal amounts of residual variance
(these properties are lost for both "orthogonal" and so called "oblique"
rotation). Also, because of selecting fewer components, the summed
contributions of components no longer explains all the variance in the
matrix. Because of these differences, and, even more, because of a different
objective of the analysis (explained below) the modified procedure is often
referred to as "factor analysis". Nonetheless, it is also common in some
quarters (in SPSS etc.) to use the name "Principal Components Analysis
(PCA)" to refer to such a procedure. In this case, "PCA" is often considered
a name for a particular _kind_ of factor analysis, that kind where you
estimate factors by truncating the component matrix and then "rotating" this
reduced set of components. A different name, such as "factor analysis", is
particularly necessary when you truncate and rotate with the intent of
giving each rotated axis a scientifically meaningful name, because here a
scientific model is usually sought or intended--however dimly. Scientists in
some fields (e.g., Chemistry) consistently use "factor analysis" to mean
this kind of modified PCA, and it is clear that they intend the factors to
have scientific reality.  When they are decomposing, for example, a set of
many spectral curves derived from many mixtures of a few compounds to find
the latent spectra of the compounds that were in the mixtures, it is quite
clear that their truncation and rotation are intended to uncover the few
rotated factors that are scientifically generalizable, in fact that
correspond to (the spectra of) real physical things. (For examples of such
factor analysis applications and many other kinds as well, see, e.g., the
book _Factor_Analysis_in _Chemistry_ by Malinowsky(?) now out in a 2nd
edition --but unfortunately it's not in the building that I am currently in
so I can't provide a correct or full citation at this time).

    The act of rotation has important implications. As just noted, rotation
of Principal Components is often inspired by the desire to go beyond mere
description of the dataset at hand. Each rotated axis is often interpreted
as reflecting an underlying physical or biological processes, chemical
component, etc. This type of exploratory factor analysis by modified PCA is
quite different in spirit from mathematical principal component analysis
conceived purely as a data compression and description method.
     For the descriptive PCA, a set of components can be considered a
compressed version of a set of variables, where each component is a weighted
linear combinations of the variables. Factors, in the sense discussed above,
are different and should be considered (estimates of hypothetical) latent
variables that affect the measured variables. Because of the way they are
estimated they are not generally exact linear combinations of the variables.

     Thus, the picture that Dr. Ganesh provides is quite appropriate:

> FACTORS ---> DATA (VARIABLES...) ---> PRINCIPAL COMPONENTS

     But given this, shouldn't we ask whether such recovery of information
on latent variables is any less important in the hard or "pure" sciences
than in the "soft" sciences?

     _Treatment of error in method b._ The selection of only the first few
components is often an indirect way of acknowledging the effects of error:
by consigning the remaining components to the trash heap one is saying that
they reflect random or uninteresting perturbations due to error.


1c. COMMON FACTOR ANALYSIS

     Some would say "but then why stop with modified PCA?"  When the
scientist's intention is to give such a broader (inferential rather than
purely descriptive) scientific meaning to the factors, one might want to
construct a true *statistical model* for the crossproduct (or covariance, or
correlation) matrix.  In such a statistical model, the reduced set of
components become a set of statistical parameters estimated as
approximations of the population profiles of latent factors.

     However, good statistical models often include terms that explicitly
describe or provide for the effects of random error. The truncated-
PCA-followed-by-rotation procedure does not do that.  Nonetheless, it can be
argued that some adjustment for error is clearly needed.  In any
crossproduct matrix based on fallible data, the self-correlation of the
error will cause the diagonal cells in the matrix to be "inflated" (biased
upward). This biases (upward) the PCA based estimates of loadings, since
these inflated diagonal cells are included in the data upon which the PCA
components are based.

     A more refined method that tries to adjust for that problem is also
generically referred to as "factor analysis" but is often distinguished from
procedure (b) by the name "Common Factor Analysis". It explicitly provides
for the effects of random error by introducing added parameters called
uniquenesses, or, equivalently one minus the uniqueness which are called
communalities.  The diagonal elements of the crossproduct (or covariance, or
correlation) matrix are replaced by these added model parameters, which are
estimates of the sizes the diagonals would have had, had there been no error.



Three other brief comments:
    2 Classical multiple regression also fails to take into account the
effects of measurement error on (predictor) variables. Although interesting
work has been going on to change this, I think it is fair to say that we
have been able to do a lot of good statistical work with the classical
regression approach.
    3 Although factor method (b) (factor analysis without error adjustments)
seems clearly biased, it is sometimes claimed that it may have other
benefits. It has been argued that (because of fewer parameters and/or
perhaps for other reasons) the "Principal Components Analysis" method of
obtaing factors is more robust than Common Factor Analysis in cases where
some factor(s) are determined by only a few, and in the worst case by only
two, variables with substantial loadings.
    4 Issues involved in factor rotation, such as differences among factor
rotation criteria, is probably a far more important issue than the provision
for error or lack thereof in (b) or (c).  The effect of rotation differences
on the final interpretation can often be much greater.  At the same time,
the rationale and theoretical discussion of rotation methods pro and con is
probably much less frequent and the issues much more often misunderstood
--or simply disregarded-- in the current sceintific usage of exploratory
factor analysis in many fields. However, some kinds of researchers in the
hard sciences are more acutely aware of the problem and have done more to
try to overcome it (see, for example, the above mentioned book on FA in
chemistry for some examples).

Richard A. Harshman, Psycholog Dpt., University of Western Ontario, London,
Ontario, Canada. (lab) 519-661-3663, (office) 519-661-2111x4675, fax
519-661-3213.