POTENTIAL OF RANKED-SET SAMPLING FOR DISASTER ASSESSMENT

                        Glen Johnson & Dr. Wayne Myers
                      The Pennsylvania State University
                          University Park, PA  16802


ABSTRACT:  Parsimony of field work in forest inventory is even more important
for the disaster assessment context than for regularly scheduled forest survey
scenarios.  Interest thus lies in sample designs which enable rapid assessment
to augment detailed ground work.  Double sampling for stratification is a
relatively familiar strategy for achieving this end, wherein remote sensing or
direct aerial reconnaissance can serve the purpose of stratification.  The
less familiar strategy of ranked set sampling becomes a candidate approach in
cases where rapid assessment extends beyond categorization to ordination.  The
Center for Statistical Ecology and Environmental Statistics at Penn State has
done considerable work in extending the concepts of ranked set sampling and
developing scenarios for application.  Fundamentals of ranked set sampling are
presented, and scenarios considered for its application in the context of
inventories that are mounted in response to catastrophes.

                                 INTRODUCTION

   Remote sensing, geographic information systems (GIS), and global
positioning systems (GPS) are important technologies for rapid assessment of
impacts on natural and cultural resources of catastrophic events.  There are
various conventional strategies for incorporating the information available
from these technologies in quantitative estimation protocols.  Such strategies
often involve using remote sensing and/or GIS for stratification, double
sampling, or multi-stage sampling.  There are many possible stratification
scenarios depending upon the catastrophic context, with most being
conceptually rather straightforward.  Double sampling and multi-stage sampling
typically involve classifications of damage type or severity in the initial
phase or stage of sampling.  Such classifications involve varying degrees of
subjectivity, with the effect of subjectivity seldom being given explicit
consideration.

   The Center for Statistical Ecology and Environmental Statistics (CSEES) at
Penn State University under the directorship of Prof. G. P. Patil has devoted
considerable effort to extending ranked-set sampling (RSS) as formulated
mathematically by Takahasi and Wakimoto (1968) and Dell (1969) to support
application in more general contexts.  Cooperation between CSEES and the
Office for Remote Sensing of Earth Resources (ORSER) at Penn State University
has served to suggest potential applications of RSS to catastrophic impact
assessment using remote sensing and GIS.  This involves several possible
scenarios as alternatives or adjuncts to conventional stratification, double
sampling, and multi-stage sampling.  Although we do not yet have actual case
studies to illustrate the proposed strategies, the Conference on Inventory and
Management in the Context of Catastrophic Events provides an appropriate forum
in which to share these ideas so that they may be considered and tried by
others.  We begin by reviewing the essentials of RSS, and then proceed to
explain some of the scenarios that we have conceived.

                      ESSENTIALS OF RANKED-SET SAMPLING

   The essential advantage of RSS arises from capability to ensure that
measured samples are well spread over the distribution of the variable that is
of direct interest.  This advantage is realized through a two-step method of
sample selection.  The first step involves selection of a larger number of
potential samples than can one can afford to measure in detail.  The large
number of potential sampling units is then randomly organized into subsets of
(predetermined) fixed size.  The sampling units in each subset are then ranked
relative to other members of the same subset with respect to anticipated value
for the variable of interest.  The basis of anticipation can be anything, even
intuition.  Statistical gains due to RSS will be reduced to the extent that
the ranking proves erroneous, but not entirely eliminated unless the ranking
is no better than a random shuffle.  One unit is then taken in a rank rotation
from each subset for actual measurement; the first rank from the first subset,
the second rank from the next subset, and so on.  When a cycle of rank
selection has been completed, the cycle repeats until there are no more
subsets.  Each rank within subset is thus represented equally in the ultimate
sample.

   As opposed to direct selection of a sample size equal to the ultimate RSS
sample, the overall gains will depend on both the accuracy of ranking and the
costs of selecting/ranking potential samples.  Since cost of selecting a
potential sample is usually minor, the cost of ranking becomes the major
consideration.  RSS is particularly advantageous when ranking errors are low
and cost of ranking is low relative to cost of making an actual measurement.

   Understanding the usual RSS terminology of sets and cycles is key to
understanding the protocol for estimation.  Let m be the number of potential
sampling units in each set (set size).  Since one ultimate sample will come
from each set, it requires m sets to yield an ultimate sample for each rank.
Thus each selection CYCLE through the ranks requires m x m potential samples.
If the selection cycle through the ranks is repeated r times, the ultimate
sample size will be n = r x m. The total number of potential samples needed is
thus r x m x m, and r x m is the number of rankings to be performed.

   The mean of the ultimate sample provides an unbiased estimator of the
population mean.  That is,

   Estimated population mean = (ultimate sample sum)/(m x r)

The pooled sum of squared deviations about rank means provides the numerator
of an unbiased estimate of the variance for the estimated mean.  Thus,

                                Sum of squared deviations about rank means
   Estimated variance of mean = ------------------------------------------
                                             m x m r x (r-1)


   In comparing the relative precision (RP) of RSS to simple random samples
of size (m x r), Takahasi and Wakimoto (1968) have shown that

   1 <= RP <= (m+1)/2

for all continuous distributions with finite variance.  Theoretically, then,
greater efficiency comes with increasing set size m.  In practice, however,
ranking a larger number of items tends to be more difficult and less accurate.
Thus, choice of set size m is determined practically by the nature of the
ranking.

                   POST-CATASTROPHE RSS SAMPLING SCENARIOS

   Ranked-set sampling could be expedient for exploiting large arrays of
permanent sample plots in order to assess impacts of a catastrophic event
rapidly.  The entire grid of permanent plots might constitute potential
samples, or a subgrid could be used to reduce time and effort.  A "quick
picture" of the situation for the respective potential samples could be
obtained in several ways, with small-format aerial photography or videography
of the potential sample plots from a light plane being likely candidates.
The "quick picture" could be as simple as an aerial reconnaissance flyover
for a verbal description into a tape recorder with separate notation regarding
the starting point for each plot on the tape counter.  After randomly
assigning the potential samples to sets, the "quick picture" information for
the members of each set would be interpreted for ordering relative to severity
of impact.  This could obviate need for definition of damage classes while
still providing an idea of the extent to which damage is localized.  With
damage classes, an approximate "map" of damage distribution could be developed
from the "quick picture" information prior to initiation of field work for
ground measurement of the plots in the ultimate sample.  GPS should be useful
for locating plot vicinities in the flyover.

   Before and after satellite image data and/or GIS information could
substitute for a flyover in the above scenario.  Even in the absence of
post-event imagery, indices of susceptibility to damage could be developed
from GIS data layers and used for ranking of the potential sample locations
relative to likelihood of damage.

   When permanent plots are not available, RSS scenarios begin with
determination of potential sample locations either on a grid basis or
individually as pairs of randomly drawn map coordinates.  With potential
sample locations thus determined, alternatives for RSS subscenarios parallel
those for permanent plots.  Without permanent plots, however, any "before"
information for use in ranking must come from remote sensing and/or GIS.
Although "before" information is helpful, the ranking step of RSS can be
carried out entirely in terms of "quick picture" post-event reconnaissance
relative to severity of damage indications.  GPS should be particularly
advantageous in this respect for placing a reconnaissance aircraft "in the
vicinity" of a potential sample.  It is not essential that a potential sample
location be pinpointed exactly in the reconnaissance mission.  Uncertainty
about potential sample locations during reconnaissance may contribute to
ranking error, but ranking error is admissible for RSS.  When potential sample
locations are somewhat uncertain during reconnaissance, one will be doing the
ranking in terms of vicinity effects.  Efficiency of RSS will then increase
with spatial autocorrelation of catastrophic impacts, which is contrary to
the problematic nature of spatial autocorrelation for some past applications
of RSS using local sets.

                                  DISCUSSION

   A foundation assumption for the above RSS estimators is that all eligible
potential samples have an equal chance of allocation to each set.  Many
documented applications of RSS in other contexts have violated this
assumption, for example by ranking local clusters of plots in vegetation
sampling.  The effect of such violations has been investigated, but need not
be of concern here since the foregoing scenarios are "clean" in this respect.

   The "batch" nature of our proposed scenarios also stands in some contrast
to the typical process of RSS sample selection.  There is a tendency to run
the selection process "depth-first" rather than "breadth first."  A depth-
first approach would choose a set of potential samples, rank them, and
designate an ultimate sample unit before proceeding onward to choice of
further potential samples.  Likewise, each cycle of ultimate selection through
the ranks would be completed before initiating the next cycle.  Such a
selection sequence would usually be quite inconvenient for the context of
catastrophic events.  Completing each type of selection activity in batch
fashion will be much more commodious for the catastrophic context, and
likewise whenever aerial reconnaissance, remote sensing, or GIS is involved.

   Thus, we would first designate all potential samples in terms of map
coordinates.  We would then conduct all aerial reconnaissance, image
acquistion, and/or GIS index formulation.  In the case of digital image
analysis or GIS index formulation, the results for all potential samples would
be dumped to an external file for subsequent handling by other software such
as statistical packages.  Having designated potential sample locations and
acquired ranking information, we would then proceed to randomly organize all
potential samples as sets and develop a file of set membership.  Ranking would
be conducted next, and the respective ranks of potential samples split into
separate files.  We would then systematically sample the respective rank files
with staggered starts and interval corresponding to the set size.  The
resulting files of ultimate samples by rank would them be collated spatially
with rank tags for planning and control of ground survey operations.  The
field records would be sorted by rank tag into separate files for statistical
tabulation/estimation.

   Tabulation/estimation would transpire in two steps.  The first step would
produce sums and sums of squared deviations for the respective rank files.
The second step would combine the filewise results into overall estimates.
This two-step tabulation/estimation process makes adaptation of existing
statistical software quite simple.  This batch processing approach takes
advantage of the fact that ultimate samples need not be matched by rank
subsequent to the ranking activity itself.  Relative to the usual interleaved
approach, batch processing accommodates comparatively simple macros for
working with different software systems that have mostly not been designed for
interplay.

   The potential gains due to RSS are somewhat dependent on the form of the
distribution for the variable of interest.  McIntyre (1952) concluded that
efficiency declines with increasing skewness of the population.  Since impacts
of catastrophic events are often localized, the damage distributions will
tend to be less than optimal.  Counterbalancing this will be the availability
of some preliminary information on spatial variation arising from the "quick
picture" reconnaissance that can be used in strategic planning.  The non-
optimality of skewed distributions can be addressed by unequal allocation if
estimators are adjusted accordingly.  To optimize unequal allocation, the
frequency of ultimate samples for a rank should be proportional to the
standard deviation of the corresponding order statistic (Neyman allocation).
Halls and Dell (1966) report favorable experience with unequal allocation in a
field study.  Takahasi and Wakimoto (1968) showed that unequal allocation can
yield substantial gains, but can also lead to RSS being worse than simple
random sampling when not done appropriately.

   Patil, Sinha and Taillie (1992) provide an unbiased estimator of the
population variance when the number of cycles is greater than one, which will
be the true for the present context.  Stokes and Sager (1988) have
investigated estimation of the cumulative distribution function from RSS.
The empirical distribution function is unbiased and more efficient than
for simple random sampling, even with imperfect ranking.  Gore, Patil, Sinha
and Taillie (1992) have also investigated some multivariate considerations in
RSS.

                               LITERATURE CITED

Dell, T. R.  1969.  The theory and some applications of ranked set sampling.
   Ph.D. thesis, Department of Statistics, University of Georgia, Athens,
   Georgia.

Gore, S. D., G. P. Patil, A. K. Sinha and C. Taillie.  1992.  Certain
   multivariate considerations in ranked set sampling and composite sampling
   designs.  Technical report number 92-0806, Center for Statistical Ecology
   and Environmental Statistics, Pennsylvania State University, University
   Park, PA  16802.

Halls, L. K. and T. R. Dell.  1966.  Trial of ranked set sampling for forage
   yields.  Forest Science 12:22-26.

McIntyre, G. A.  1952.  A method for unbiased selective sampling, using ranked
   sets.  Australian Journal of Agricultural Research 3:385-390.

Patil, G. P., A. K. Sinha and C. Taillie.  1992.  Unbiased estimation of the
   population variance using ranked set sampling.  Technical report, Center
   for Statistical Ecology and Environmental Statistics, Pennsylvania State
   University, University Park, PA 16802.

Stokes, S. L. and T. W. Sager.  1988.  Characterization of a ranked set sample
   with application to estimating distribution functions.  Journal of the
   American Statistical Association 83:374-381.

Takahasi, K. and K. Wakimoto.  1968.  On unbiased estimates of the population
   mean based on the sample stratified by means of ordering.  Annals of the
   Institute of Statistical Mathematics 20:1-31.