POTENTIAL OF RANKED-SET SAMPLING FOR DISASTER ASSESSMENT Glen Johnson & Dr. Wayne Myers The Pennsylvania State University University Park, PA 16802 ABSTRACT: Parsimony of field work in forest inventory is even more important for the disaster assessment context than for regularly scheduled forest survey scenarios. Interest thus lies in sample designs which enable rapid assessment to augment detailed ground work. Double sampling for stratification is a relatively familiar strategy for achieving this end, wherein remote sensing or direct aerial reconnaissance can serve the purpose of stratification. The less familiar strategy of ranked set sampling becomes a candidate approach in cases where rapid assessment extends beyond categorization to ordination. The Center for Statistical Ecology and Environmental Statistics at Penn State has done considerable work in extending the concepts of ranked set sampling and developing scenarios for application. Fundamentals of ranked set sampling are presented, and scenarios considered for its application in the context of inventories that are mounted in response to catastrophes. INTRODUCTION Remote sensing, geographic information systems (GIS), and global positioning systems (GPS) are important technologies for rapid assessment of impacts on natural and cultural resources of catastrophic events. There are various conventional strategies for incorporating the information available from these technologies in quantitative estimation protocols. Such strategies often involve using remote sensing and/or GIS for stratification, double sampling, or multi-stage sampling. There are many possible stratification scenarios depending upon the catastrophic context, with most being conceptually rather straightforward. Double sampling and multi-stage sampling typically involve classifications of damage type or severity in the initial phase or stage of sampling. Such classifications involve varying degrees of subjectivity, with the effect of subjectivity seldom being given explicit consideration. The Center for Statistical Ecology and Environmental Statistics (CSEES) at Penn State University under the directorship of Prof. G. P. Patil has devoted considerable effort to extending ranked-set sampling (RSS) as formulated mathematically by Takahasi and Wakimoto (1968) and Dell (1969) to support application in more general contexts. Cooperation between CSEES and the Office for Remote Sensing of Earth Resources (ORSER) at Penn State University has served to suggest potential applications of RSS to catastrophic impact assessment using remote sensing and GIS. This involves several possible scenarios as alternatives or adjuncts to conventional stratification, double sampling, and multi-stage sampling. Although we do not yet have actual case studies to illustrate the proposed strategies, the Conference on Inventory and Management in the Context of Catastrophic Events provides an appropriate forum in which to share these ideas so that they may be considered and tried by others. We begin by reviewing the essentials of RSS, and then proceed to explain some of the scenarios that we have conceived. ESSENTIALS OF RANKED-SET SAMPLING The essential advantage of RSS arises from capability to ensure that measured samples are well spread over the distribution of the variable that is of direct interest. This advantage is realized through a two-step method of sample selection. The first step involves selection of a larger number of potential samples than can one can afford to measure in detail. The large number of potential sampling units is then randomly organized into subsets of (predetermined) fixed size. The sampling units in each subset are then ranked relative to other members of the same subset with respect to anticipated value for the variable of interest. The basis of anticipation can be anything, even intuition. Statistical gains due to RSS will be reduced to the extent that the ranking proves erroneous, but not entirely eliminated unless the ranking is no better than a random shuffle. One unit is then taken in a rank rotation from each subset for actual measurement; the first rank from the first subset, the second rank from the next subset, and so on. When a cycle of rank selection has been completed, the cycle repeats until there are no more subsets. Each rank within subset is thus represented equally in the ultimate sample. As opposed to direct selection of a sample size equal to the ultimate RSS sample, the overall gains will depend on both the accuracy of ranking and the costs of selecting/ranking potential samples. Since cost of selecting a potential sample is usually minor, the cost of ranking becomes the major consideration. RSS is particularly advantageous when ranking errors are low and cost of ranking is low relative to cost of making an actual measurement. Understanding the usual RSS terminology of sets and cycles is key to understanding the protocol for estimation. Let m be the number of potential sampling units in each set (set size). Since one ultimate sample will come from each set, it requires m sets to yield an ultimate sample for each rank. Thus each selection CYCLE through the ranks requires m x m potential samples. If the selection cycle through the ranks is repeated r times, the ultimate sample size will be n = r x m. The total number of potential samples needed is thus r x m x m, and r x m is the number of rankings to be performed. The mean of the ultimate sample provides an unbiased estimator of the population mean. That is, Estimated population mean = (ultimate sample sum)/(m x r) The pooled sum of squared deviations about rank means provides the numerator of an unbiased estimate of the variance for the estimated mean. Thus, Sum of squared deviations about rank means Estimated variance of mean = ------------------------------------------ m x m r x (r-1) In comparing the relative precision (RP) of RSS to simple random samples of size (m x r), Takahasi and Wakimoto (1968) have shown that 1 <= RP <= (m+1)/2 for all continuous distributions with finite variance. Theoretically, then, greater efficiency comes with increasing set size m. In practice, however, ranking a larger number of items tends to be more difficult and less accurate. Thus, choice of set size m is determined practically by the nature of the ranking. POST-CATASTROPHE RSS SAMPLING SCENARIOS Ranked-set sampling could be expedient for exploiting large arrays of permanent sample plots in order to assess impacts of a catastrophic event rapidly. The entire grid of permanent plots might constitute potential samples, or a subgrid could be used to reduce time and effort. A "quick picture" of the situation for the respective potential samples could be obtained in several ways, with small-format aerial photography or videography of the potential sample plots from a light plane being likely candidates. The "quick picture" could be as simple as an aerial reconnaissance flyover for a verbal description into a tape recorder with separate notation regarding the starting point for each plot on the tape counter. After randomly assigning the potential samples to sets, the "quick picture" information for the members of each set would be interpreted for ordering relative to severity of impact. This could obviate need for definition of damage classes while still providing an idea of the extent to which damage is localized. With damage classes, an approximate "map" of damage distribution could be developed from the "quick picture" information prior to initiation of field work for ground measurement of the plots in the ultimate sample. GPS should be useful for locating plot vicinities in the flyover. Before and after satellite image data and/or GIS information could substitute for a flyover in the above scenario. Even in the absence of post-event imagery, indices of susceptibility to damage could be developed from GIS data layers and used for ranking of the potential sample locations relative to likelihood of damage. When permanent plots are not available, RSS scenarios begin with determination of potential sample locations either on a grid basis or individually as pairs of randomly drawn map coordinates. With potential sample locations thus determined, alternatives for RSS subscenarios parallel those for permanent plots. Without permanent plots, however, any "before" information for use in ranking must come from remote sensing and/or GIS. Although "before" information is helpful, the ranking step of RSS can be carried out entirely in terms of "quick picture" post-event reconnaissance relative to severity of damage indications. GPS should be particularly advantageous in this respect for placing a reconnaissance aircraft "in the vicinity" of a potential sample. It is not essential that a potential sample location be pinpointed exactly in the reconnaissance mission. Uncertainty about potential sample locations during reconnaissance may contribute to ranking error, but ranking error is admissible for RSS. When potential sample locations are somewhat uncertain during reconnaissance, one will be doing the ranking in terms of vicinity effects. Efficiency of RSS will then increase with spatial autocorrelation of catastrophic impacts, which is contrary to the problematic nature of spatial autocorrelation for some past applications of RSS using local sets. DISCUSSION A foundation assumption for the above RSS estimators is that all eligible potential samples have an equal chance of allocation to each set. Many documented applications of RSS in other contexts have violated this assumption, for example by ranking local clusters of plots in vegetation sampling. The effect of such violations has been investigated, but need not be of concern here since the foregoing scenarios are "clean" in this respect. The "batch" nature of our proposed scenarios also stands in some contrast to the typical process of RSS sample selection. There is a tendency to run the selection process "depth-first" rather than "breadth first." A depth- first approach would choose a set of potential samples, rank them, and designate an ultimate sample unit before proceeding onward to choice of further potential samples. Likewise, each cycle of ultimate selection through the ranks would be completed before initiating the next cycle. Such a selection sequence would usually be quite inconvenient for the context of catastrophic events. Completing each type of selection activity in batch fashion will be much more commodious for the catastrophic context, and likewise whenever aerial reconnaissance, remote sensing, or GIS is involved. Thus, we would first designate all potential samples in terms of map coordinates. We would then conduct all aerial reconnaissance, image acquistion, and/or GIS index formulation. In the case of digital image analysis or GIS index formulation, the results for all potential samples would be dumped to an external file for subsequent handling by other software such as statistical packages. Having designated potential sample locations and acquired ranking information, we would then proceed to randomly organize all potential samples as sets and develop a file of set membership. Ranking would be conducted next, and the respective ranks of potential samples split into separate files. We would then systematically sample the respective rank files with staggered starts and interval corresponding to the set size. The resulting files of ultimate samples by rank would them be collated spatially with rank tags for planning and control of ground survey operations. The field records would be sorted by rank tag into separate files for statistical tabulation/estimation. Tabulation/estimation would transpire in two steps. The first step would produce sums and sums of squared deviations for the respective rank files. The second step would combine the filewise results into overall estimates. This two-step tabulation/estimation process makes adaptation of existing statistical software quite simple. This batch processing approach takes advantage of the fact that ultimate samples need not be matched by rank subsequent to the ranking activity itself. Relative to the usual interleaved approach, batch processing accommodates comparatively simple macros for working with different software systems that have mostly not been designed for interplay. The potential gains due to RSS are somewhat dependent on the form of the distribution for the variable of interest. McIntyre (1952) concluded that efficiency declines with increasing skewness of the population. Since impacts of catastrophic events are often localized, the damage distributions will tend to be less than optimal. Counterbalancing this will be the availability of some preliminary information on spatial variation arising from the "quick picture" reconnaissance that can be used in strategic planning. The non- optimality of skewed distributions can be addressed by unequal allocation if estimators are adjusted accordingly. To optimize unequal allocation, the frequency of ultimate samples for a rank should be proportional to the standard deviation of the corresponding order statistic (Neyman allocation). Halls and Dell (1966) report favorable experience with unequal allocation in a field study. Takahasi and Wakimoto (1968) showed that unequal allocation can yield substantial gains, but can also lead to RSS being worse than simple random sampling when not done appropriately. Patil, Sinha and Taillie (1992) provide an unbiased estimator of the population variance when the number of cycles is greater than one, which will be the true for the present context. Stokes and Sager (1988) have investigated estimation of the cumulative distribution function from RSS. The empirical distribution function is unbiased and more efficient than for simple random sampling, even with imperfect ranking. Gore, Patil, Sinha and Taillie (1992) have also investigated some multivariate considerations in RSS. LITERATURE CITED Dell, T. R. 1969. The theory and some applications of ranked set sampling. Ph.D. thesis, Department of Statistics, University of Georgia, Athens, Georgia. Gore, S. D., G. P. Patil, A. K. Sinha and C. Taillie. 1992. Certain multivariate considerations in ranked set sampling and composite sampling designs. Technical report number 92-0806, Center for Statistical Ecology and Environmental Statistics, Pennsylvania State University, University Park, PA 16802. Halls, L. K. and T. R. Dell. 1966. Trial of ranked set sampling for forage yields. Forest Science 12:22-26. McIntyre, G. A. 1952. A method for unbiased selective sampling, using ranked sets. Australian Journal of Agricultural Research 3:385-390. Patil, G. P., A. K. Sinha and C. Taillie. 1992. Unbiased estimation of the population variance using ranked set sampling. Technical report, Center for Statistical Ecology and Environmental Statistics, Pennsylvania State University, University Park, PA 16802. Stokes, S. L. and T. W. Sager. 1988. Characterization of a ranked set sample with application to estimating distribution functions. Journal of the American Statistical Association 83:374-381. Takahasi, K. and K. Wakimoto. 1968. On unbiased estimates of the population mean based on the sample stratified by means of ordering. Annals of the Institute of Statistical Mathematics 20:1-31.