Date:    Fri, 30 Apr 1999 08:53:13 EDT
From:    Frank Biasi 
Subject: Re: Jenk's Optimization for Natural Breaks Classification

Comments By: Frank Biasi@Stewardship@TNCERO
Originally To: SMTP@TNCHQ04@Servers[]
Originally From: "Weber, Theodore" 
Original Date:  3/23/1999  12:53 PM
Comments:

I am forwarding a summary on this subject that Ted Weber compiled and posted
on the CONSGIS list.

Frank Biasi
The Nature Conservancy

-------------------------[Original Message]--------------------------

Pete August and Jim Lucht both recommended "Thematic Cartography and
Visualization"  by Terry Slocum, Prentice Hall
1999. Pete August kindly faxed me a copy of the relevant pages, which I have
summarized below. Slocum mentioned two algorithms for determining optimal
classes developed by Jenks and others: the Jenks-Caspall algorithm, and the
Fisher-Jenks algorithm.

The Jenks-Caspall algorithm is an empirical approach based on minimizing the
sum of absolute deviations about class means. It begins with an arbitrary
set of classes, calculates a total error, and attempts to reduce this error
by moving observations between adjacent classes.

The Fisher-Jenks algorithm, in contrast, uses a formula "that guarantees an
optimal solution" (Slocum, 1999). Although Fisher developed the method,
Jenks introduced the idea to cartographers, and is the one who is credited
for "Jenks's optimal method." Fisher's contribution was that any optimal
partition is the sum of optimal partitions of subsets of the data. The
optimal partition for each subset is the one with the smallest total error
(the sum of absolute deviations about the class median, or alternatively,
the sum of squared deviations about the class mean). Thus, not every
possible solution needs to be explicitly calculated; the data is divided
into subsets. A matrix (i, j) is created of the sums of absolute deviations
about the median for each ith through jth observation. This is used to
calculate the error of a particular partition, and errors for subsets are
added to derive the total error. Like the Jenks-Caspall algorithm, the
partitioning with the lowest total error is the optimal partition.

The optimal method (called "natural breaks" in ArcView) is the "best" choice
for grouping similar values together. It can also be used to determine the
appropriate number of classes. The goodness of absolute deviation or
variance fit can be calculated for different numbers of classes. Where the
curve flattens, or when the accuracy exceeds a desired threshold (say, 80%),
this may be the appropriate number of classes to use. An Avenue script to do
this would be welcome.

For further details, refer to Slocum, 1999.


original message:

> I occasionally use the "natural breaks" method of classifying themes in
> ArcView, which uses a statistical formula called Jenk's optimization.
> However, I cannot find any material describing this method. Can anyone
> point
> me to references describing Jenk's optimization, and how it compares to
> other clustering algorithms? Mail me directly, and I will summarize.
>
> Thanks,
> Ted Weber
> Maryland Dept. of Natural Resources