From:    "Simon, Steve, PhD" 
Subject: *** FAQ for STAT-L/SCI.STAT.CONSULT ***

With my recent upgrade to Microsoft Office 97, I have found how easy it
is to create and convert files to HTML format.  I have made an attempt
to do this for the FAQ.  David Ronis and I will experiment with the
format over the next couple of months.

I have tried to format the FAQ so that it is easily readable in either
the text version or in the HTML version.  That means that all the
indenting (which was a pain to modify anyway) is now gone.  I have also
tried to put web sites at the beginning of a paragraph so as to avoid
awkward line breaks.  With some web sites being 50-60 characters long,
anyplace but the beginning of a paragraph would almost guarantee an
awkward line break.

I have also condensed the FAQ a bit and combined some of the questions
together.  For example, STATLIB is now listed under interesting web
sites rather than being in a separate question.  The question about
references and the question about books have been combined.

I realize that the new format will be a little less readable than the
previous versions.  If it is a lot less readable, please let me know.
Also, if you have any comments on how I can improve the format, please
send them along.

If you've sent me some suggestions for the FAQ in the past couple of
months, please bear with me as I try to sort out all the formatting
issues.  I hope that things will be back to normal in September or
October.

FAQ for STAT-L/SCI.STAT.CONSULT, July 30, 1997

This FAQ is posted once a month to STAT-L/SCI.STAT.CONSULT. David Ronis
regularly posts this to his web site:

http://www-personal.umich.edu/~dronis/statfaq.htm.

Variations and earlier versions of the FAQ can be found on other sites
on the web. You are welcome to post all or part of this FAQ at your web
site. Please don't modify it without my permission, and please let me
know where you are posting it.

Table of contents

1 What is STAT-L/SCI.STAT.CONSULT?
2 What are other related listserv/usenet groups?
3 How do I know that my message got posted?
4 How do I use LISTSERV to...
5 How do I get the archives of STAT-L/SCI.STAT.CONSULT?
6 Why have I stopped seeing messages?
7 How can I contact the ASA, Biometric Society, or IMS?
8 How can I contact the major statistics software vendors?
9 Where can I find free/shareware statistical software?
10 What statistics resources can be found on the web?
11 What should I do about these "Spams"?
12 What are some of the problems with stepwise regression?
13 What is the answer to the Monty Hall, Envelope, or Birthday problem?
14 Can someone provide me with references and/or books about [topic]?
15 Acknowledgments

1 What is STAT-L/SCI.STAT.CONSULT?

STAT-L and SCI.STAT.CONSULT are a combined LISTSERV/USENET group for the
discussion of statistical consulting issues. Through the magic of
Internet, any message posted on SCI.STAT.CONSULT also appears on STAT-L.
Any message posted on STAT-L appears on SCI.STAT.CONSULT. So you can
follow all the fascinating questions and answers using either system.

We discuss statistical issues of all levels of difficulty, as well as
statistical education, the practice of statistical consulting, and other
related topics. We also like to debate some of the more controversial
issues in Statistics like the validity of the statistical models used in
the Bell Curve book and the pitfalls of stepwise regression models.

Be sure to put your name and e-mail address at the end of your message.
Some people have e-mail systems that strip headers from a message,
making it impossible for them to reply directly to you.

If you have a question about a particular statistics package, you will
probably get a faster and more accurate answer by posting the question
on the list that specializes in a particular package (e.g.,
SAS-L/COMP.SOFT-SYS.SAS or S-NEWS). Refer to the section "How can I
contact the major statistics software vendors?"

We appreciate questions at a levels from beginner to expert. Sometimes,
the beginner questions lead to some interesting discussions as to the
subtle nuances in statistical consulting. If you want advice on how to
analyze some data, please include some context as to what your data
means and what you are trying to investigate. No one can answer a
question well that only says "Listed below is some data. How do I
analyze it?"

Be careful about advice on STAT-L/SCI.STAT.CONSULT. You'll find many
people who are glad to help you, but you must realize the serious
limitations of e-mail. There is no adequate substitute for getting
advice face-to-face with a professional, especially BEFORE collecting
any data and BEFORE performing any experiments. Even the most
experienced and wise Statisticians will be unable to make sense out of a
poorly designed study.

There are three types of messages that we discourage. First, try to
avoid any overly commercial pitches, including posting your resume. On
the other hand, we do like to hear about job openings, especially ones
that list starting salaries so we can bemoan how little we make on our
current jobs. Postings of upcoming conferences are also acceptable.

Second, don't post your homework questions on here, even if you have
permission to do so from your teacher. On the other hand, asking for
recommendations on books for beginners is fine.

Third, while we enjoy a spirited debate, please refrain from flaming and
personal attacks. Although we have occasional lapses, this list has a
generally high level of civility and politeness. Let's keep it that way.

Here's some additional advice from Richard Ulrich for SCI.STAT.CONSULT
folks.

If you are going to CROSS-POST to several groups, PLEASE send
>just one message in which you LIST THE SEVERAL GROUPS in the
>header.
>i) That way, when someone writes a response, it will show
>up in EACH group where the question could be read, not just
>in one.
>ii) That way, when a person reads with a Threaded-newsreader,
>he will see your message just ONCE, instead of over and over.

2 What are other related LISTSERV/USENET groups?

http://www.mailbase.ac.uk/lists-k-o/minitab/files/list-of-lists is a
very comprehensive list of statistical lists. It is maintained by Mike
Fuller.

http://www.minitab.com/maillist.htm/minitab/files/list-of-lists is another good source of
information.

SPECIAL WARNING!!! Please, please, please note that subscription
requests go to the LISTSERV or MAILBASE address. If you send a
subscription request to the list itself, it will be read by hundreds or
thousands of people, none of whom can get you subscribed. Some of these
people will be annoyed enough at your naivete that they will introduce
you to a concept known as "flaming".

ALBERT-GIFI -- The Albert Gifi mailing list discusses correspondence
analysis, multidimensional scaling, nonlinear multivariate analysis, and
optimal scaling

Subscriptions to: LISTSERV@JULIA.MATH.UCLA.EDU
How to subscribe: subscribe ALBERT-GIFI First-name Last-name
Post messages to: ALBERT-GIFI@JULIA.MATH.UCLA.EDU

ALLSTAT -- Discussions on this list are similar to
STAT-L/SCI.STAT.CONSULT, but there is a decidedly British flavor to
ALLSTAT and a more U.S. flavor to STAT-L/SCI.STAT.CONSULT. This is
particularly noticeable in the postings of meetings. ALLSTAT is a
Mailbase system so it uses a slightly different syntax than the LISTSERV
system.

Subscriptions to: MAILBASE@MAILBASE.AC.UK
How to subscribe: join ALLSTAT First-name Last-name
Post messages to: ALLSTAT@MAILBASE.AC.UK
Web info and FAQ: http://www.stats.gla.ac.uk/allstat//minitab/files/list-of-lists

Note: Contrary to previous information in this FAQ, you must include
your name when subscribing. "Subscribe" can be substituted for "join,"
however. Here are some additional comments from Dr. Stuart Young, the
list owner.

>Note also, that while Allstat does indeed have a "UK flavour"
>it is not a discussion list. It is a "broadcast system" for
>distributing notices. Discussions are not encouraged on the
>list - replies go to the sender, not to the list.

CRSP-L -- Help With Center for Research in Security Prices (CRSP) Data
Bases.

Subscriptions to: LISTSERV@TAMVM1.TAMU.EDU
How to subscribe: sub CRSP-L First-name Last-name
Post messages to: CRSP-L@TAMVM1.TAMU.EDU
Web info and FAQ: http://www-leland.stanford.edu/class/gsb/crsp/CRSP-L/t-of-lists

EDSTAT-L/SCI.STAT.EDU -- Statistics training and education issues.

Subscriptions to: LISTSERV@JSE.STAT.NCSU.EDU
How to subscribe: subscribe EDSTAT-L Firstname Lastname
Post messages to: EDSTAT-L@JSE.STAT.NCSU.EDU

MULTILEVEL -- This list is for people using multilevel analysis
(multilevel modeling; hierarchical data analysis) and any associated
software (e.g. MLn, HLM, VARCL, GENMOD). MULTILEVEL is a MAILBASE system
so it uses a slightly different syntax than the LISTSERV system.

Subscriptions to: MAILBASE@MAILBASE.AC.UK
How to subscribe: subscribe MULTILEVEL first-name last-name
Post messages to: MAILBASE@MAILBASE.AC.UK

SCI.STAT.MATH -- A more mathematical flavor can be found on
SCI.STAT.MATH, which sad to say, is not mirrored to any LISTSERVer.

SEMNET -- SEMNET is an open forum for ideas and questions about the
methodology that includes analysis of covariance structures, path
analysis, and confirmatory factor analysis.

Subscriptions to: LISTSERV@UA1VM.UA.EDU
How to subscribe: sub SEMNET first-name last-name
Post messages to: SEMNET@UA1VM.UA.EDU
Web info and FAQ: http://www.gsu.edu/~mkteer/semfaq.htmlsb/crsp/CRSP-L/t-of-lists

3 How do I know that my message got posted?

First of all, be patient. It takes a while for your message to be
posted. Internet is faster than the Post Office, but it isn't always
instantaneous. There's nothing more annoying than seeing the same
messages posted again and again in a half hour time period by people who
are unsure whether their messages got through. Please wait half a day or
more before panicking.

Second, if you are having trouble posting, it is more likely than not a
local problem. Check with your help desk or other local resource.

Third, no matter where you post your message from, if the message gets
through, it will be added to two very nice USENET archives, AltaVista
and DejaNews. Search for your message using the subject line or a
reasonably unique phrase in the message itself. This system is not
instantaneous. Wait half a day or more before searching for your
message. See the section "How do I get the archives of
STAT-L/SCI.STAT.CONSULT?" for the web address and other details about
AltaVista and DejaNews.

Fourth, if you are using SCI.STAT.CONSULT, then you will eventually see
a copy of your message, if it got posted. There are specal USENET groups
where you can practice sending test messages (MISC.TEST or ALT.TEST). If
you are a beginner, don't post to SCI.STAT.CONSULT until after you are
comfortable posting to one of these test groups.

You will also see your message if you receive the digest from STAT-L.

If you receive individual messages rather than the digest from STAT-L,
you will not see your own message when it is posted. The presumption is
that you read it when you wrote it, so why would you want to see it
again?

You can change this default in two ways. Send a e-mail to
LISTSERV@VM1.MCGILL.CA with a one line message: SET STAT-L REPRO to
inform STAT-L that you wish it to send you back a copy of any message
you send in. Send a one line message: SET STAT-L ACK to inform STAT-L
that you wish it to send a brief acknowledgment that your message has
been sent to the list. Finally, send a one line message: SET STAT-L
NOREPRO if you want to go back to the default. Please note that all of
these commands go to LISTSERV and not to STAT-L.

Finally, please note that not every question posted on
STAT-L/SCI.STAT.CONSULT gets an answer. No one is getting paid for their
time, so you need to appeal to their curiosity or their altruism. If no
one answered your question, maybe you need to ask the question
differently?

4 How do I use LISTSERV to...

A good resource about LISTSERV can be found at
http://www.sagrelto.com/sagrelto/tutorial/rsp/CRSP-L/t-of-lists and a general overview of
LISTSERV versus other systems (e.g., MAILBASE) can be found at
http://www.nekesc.k12.ks.us/cds.html.rial/rsp/CRSP-L/t-of-lists

...subscribe to STAT-L?

If you are using SCI.STAT.CONSULT, your USENET reader software should
have a menu pick or a command that will allow you to subscribe to
SCI.STAT.CONSULT. Every reader is different, so please consult your help
file or your local computer guru.

To subscribe to STAT-L, send a message to LISTSERV@VM1.MCGILL.CA with a
single line: SUB STAT-L First-name Last-name in the body of the text.
Please be sure that you send the message to LISTSERV@VM1.MCGILL.CA and
not to STAT-L@VM1.MCGILL.CA. If you send your subscription request to
STAT-L, hundreds of people will see your message and none of them will
be able to subscribe you to the list. Some in fact will flame you for
not reading these instructions more carefully.

It's sort of like a newspaper which has a circulation desk and a
letters-to-the-editor desk. If you want to start delivery of the paper
you send it to the circulation desk. If you want to start delivery of
STAT-L, you send the request to LISTSERV. Sending a subscription request
to STAT-L is like sending a letter to the editor that reads "Please
start delivery of the Sunday paper to 1313 Mockingbird Lane".

...get the digest option turned on/off?

If you have no strong preference, the digest option (multiple messages
compiled into a single mailing, usually daily) is less burdensome on
Internet and creates fewer bounced messages for the list administrator
to deal with. The default when you sign up is for the digest option. To
cancel digest format and to receive the list as separate mailings, send
the command SET STAT-L MAIL to LISTSERV@VM1.MCGILL.CA.

To receive the list in digest format, send the command SET STAT-L DIGEST
in the body of a message to LISTSERV@VM1.MCGILL.CA. Again, please be
sure that you send all of these types of messages to
LISTSERV@VM1.MCGILL.CA and not to STAT-L@VM1.MCGILL.CA.

...obtain a list of subscribers to STAT-L?

Send the command REVIEW STAT-L F=MAIL to LISTSERV@VM1.MCGILL.CA or
REVIEW STAT-L BY NAME F=MAIL to sort by name or REVIEW STAT-L BY COUNTRY
F=MAIL to sort by country. This does not include subscribers to
SCI.STAT.CONSULT, as they do not subscribe to the list the same way. I
know of no way to obtain the list of subscribers to SCI.STAT.CONSULT.

...keep my name off of the list of subscribers

Send the a message to LISTSERV@VM1.MCGILL.CA with a line in the body of
the message reading SET STAT-L CONCEAL YES in the body of the message.
To reverse this, send the command SET STAT-L CONCEAL NO in the body of
the message.

...stop mail from STAT-L (temporarily or permanently)?

Send a message to LISTSERV@VM1.MCGILL.CA (again, please don't send the
message to STAT-L@VM1.MCGILL.CA). To signoff permanently, include the
line UNSUBSCRIBE STAT-L in the body of the message. To temporarily
suspend mail, use the line SET STAT-L NOMAIL and when you are ready to
resume reading, use the line SET STAT-L MAIL or SET listname DIGEST
depending on your preference for individual messages versus a daily
digest.

What if my initial signoff command doesn't work?

This happens sometimes, particularly if your e-mail address changes,
even slightly. The key thing to remember here is that only the list
owner can help you with this. Sending a message to STAT-L will not help
much unless the list owner happens to be following STAT-L right at that
moment.I would recommend that you get a list of subscribers and see how
your e-mail address looks to the system (see above for details).

Some mail systems (like ELM) allow you to change the FROM field of a
message. If your mail system supports this, then try sending a message
to LISTSERV and change the FROM field so it looks like it came from the
original address. You could also ask your system administrator to create
a temporary (or permanent) alias name for you for outbound messages
(including the necessary deviant domain part).

If none of the above works, or if it seems too complicated, don't panic.
Every list has a human owner who can go in and unsubscribe you manually.
You can find the e-mail address of the list owner on the same list of
subscribers that you just got (again, see above). When I last checked in
August 1995, the list owner was * OWNER= MICHAEL@VM1.MCGILL.CA (Michael
Walsh, McGill University) * (514-398-3680) Send a message directly to
the list owner, explaining your problem. The list owner will manually
unsubscribe you from STAT-L.

5 How do I get the archives of STAT-L/SCI.STAT.CONSULT?

The are three ways to get archives of STAT-L/SCI.STAT.CONSULT. First,
the LISTSERV software for STAT-L maintains monthly archive files back to
1994. Send the command INDEX STAT-L to LISTSERV.VM1.MCGILL.CA to obtain
a listing of these file names. Ssend the command GET filename filetype
F=MAIL to receive a specific archive file.

You can also search the archives for keywords, but the syntax is a
throwback to mainframe days. Here's an example of how to find statistics
humor in previous postings. Send the following message to
LISTSERV@VM1.MCGILL.CA (not to STAT-L!)

// JOB Echo=No Database Search DD=Rules
//Rules DD *
Search jokes in stat-l Index
/*

This will get you the following output: --

>Database STAT-L, 11 hits.
> Index Item # Date Time Recs Subject ------ ---- ---- ---- -------
>002264 94/05/12 20:47 57 Re: anyone know a good stats joke...
>002346 94/05/16 12:42 24 Re: heard any good stats jokes?
>002352 94/05/12 16:42 29 Re: anyone know a good stats joke...
>002374 94/05/17 00:39 34 Re: anyone know a good stats joke...
>002387 94/05/17 17:16 30 Re: anyone know a good stats joke...
>004886 94/10/11 09:36 49 Re: The charge of epistemological naivete
>005643 94/11/07 17:45 59 Re: Political Correctness vs. Offensive topics
of +
>005664 94/11/08 11:32 36 Re: Political Correctness vs. Offensive topics
of +
>008101 95/03/02 14:58 116 us government censorship to the internet?
>009133 95/04/18 04:56 90 --NEED HELP WITH EVALUATION--
>021605 96/12/23 10:04 48 Re: Farms (STAT-L 21 Dec 1996)

Obviously only some of these are successful hits. For example, any
message with the word "epistemological" in the title can't be humorous.
Send to LISTSERV@VM1.MCGILL.CA the following syntax to get the text of
specific messages:

// JOB Echo=No Database Search DD=Rules
//Rules DD *
Search jokes in stat-l
Print all of 2264 2346 2352 2374 2387
/*

Send the command GET LISTDB MEMO F=MAIL to LISTSERV@UGA.CC.UGA.EDU to
get a full description of LISTSERV search functions (note that
LISTSERV.VM1.MCGILL.CA does not have this file).

gopher://jse.stat.ncsu.edu/11/othergroups/statl/ is a gopher site that
contains the archives of STAT-L. If you are still using gopher software,
point it to jse.stat.ncsu. This site has archives going back to 1990. In
case you were curious, there were 21 messages posted for the whole month
of January 1990. Volume has picked up a bit since then.

http://www.reference.com.us/cds.html.rial/rsp/CRSP-L/t-of-lists also maintains an archive of STAT-L, other
lists, USENET groups, and web discussion groups. I'm not sure how far
back this archive goes.

Finally, archives of USENET messages, including messages for
SCI.STAT.CONSULT are maintained at two sites,
http://altavista.digital.comcds.html.rial/rsp/CRSP-L/t-of-lists which apparently only goes back a month or
so, and http://www.dejanews.coml.comcds.html.rial/rsp/CRSP-L/t-of-lists going back to March 19, 1995. Follow the
instructions at either site for restricting your search to just one
newsgroup.

Some people may wish to prevent their postings from being added to these
databases. If your posting contains an X-Header looking like
x-no-archive: yes or if you place x-no-archive: yes as the first line of
the body text of your message, then your message not be archived.

6 Why have I stopped seeing messages?

Nine times out of ten, the problem is at your site. If you aren't
already good friends with the people who administer your Internet
connection, now is a good time to start. These people will know when the
connection is running smoothly and when it is erratic.

Posting a test message to STAT-L/SCI.STAT.CONSULT is not likely to help.
If you aren't seeing normal traffic, what makes you think that you will
see your test message? Also, the people who read your test message are
not in a position to diagnose your problem. Only your new found friends
who run your local Internet connection are in a position to diagnose
your problem.

Your first step is to check one of the USENET archives described above
(Altavista or Dejanews). If you see messages in either archive that are
more than 48 hours old and which you have not received at your local
site (via either SCI.STAT.CONSULT or STAT-L), then you have a real
problem.

There are some obvious self-diagnostic questions you should ask
yourself. For STAT-L readers, ask yourself if you have received mail
from other Internet sources. If not, then perhaps the problem is bigger
than STAT-L. Also for STAT-L readers, find out if your site has been
bouncing back e-mail recently. The number one cause for not getting
STAT-L mail is that the list administrator noticed a bunch of bounced
e-mail error messages and has de-activated your subscription.

To find out if you've been deactivated, send a message to
LISTSERV@VM1.MCGILL.CA with QUERY STAT-L in the body of the message.
Please make sure you send this to the LISTSERV address and not the
STAT-L address. Within a few hours, you should get a reply showing your
status. If you don't get a response, that's a good sign that the
listserver is down, which would mean that nobody is getting messages
from STAT-L. If you do get a response, here's what it might look like.

Distribution options for Steve Simon , list STAT-L: Ack=
No, Mail= Digests, Files= Yes, Repro= No, Header= Short(BSMTP), Renewal=
Yes, Conceal= No

If your account was de-activated, the response will be

You are not subscribed to the STAT-L list.

or your distribution option will be set to NOMAIL. In either case, work
with your local Internet experts to fix the problem and then either
re-subscribe or set the distribution option back to MAIL.

By the way, don't complain to the list owner for de-activating your
account. The typical listowner has to sort through hundreds or thousands
of bounced message reports weekly, and the only way to stop these
bounced message reports is to de-activate accounts. The people who you
need to talk to are your new found friends who maintain your Internet
access.

Failure to receive messages is less common for SCI.STAT.CONSULT readers.
If you are experiencing problems, the obvious thing to look for is
whether any of the newsgroups are getting through. If nothing is getting
through, then you have a local problem. If you get postings from other
newsgroups, then perhaps your server has decided not to carry
SCI.STAT.CONSULT anymore. Either way, you have to talk to your local
Internet experts.

7 How can I contact the ASA, Biometric Society, or IMS?

American Statistical Association
1429 Duke St.
Alexandria, VA 22314-3402
Tel: 703-684-1221
FAX: 703-684-2036
E-M: asasinfo@amstat.org
Web: http://www.amstat.orgoml.comcds.html.rial/rsp/CRSP-L/t-of-lists

The International Biometric Society
808 17th Street, NW, Suite 200
Washington, DC 20006-3910
Tel: 202-223-9669
FAX: 202-223-9569
E-M: 75703.1407@compuserve.com
Web: http://www.stat.uga.edu/~lynne/symposium/biometric.htmlof-lists


Institute of Mathematical Statistics
3401 Investment Boulevard, Suite 7
Hayward, CA 94545
Tel: 510-783-8141 (Hazel Lowery)
FAX: 510-783-4131
E-M: HLLIMS@stat.berkeley.edu
Web: http://www.imstat.orgdu/~lynne/symposium/biometric.htmlof-lists

8 How can I contact the major statistics software vendors?

The web site http://www.statistics.com/vendors.htmlum/biometric.htmlof-lists maintained by
Resampling Stats, Inc. has a very nice list of statistics software
vendor information.

Many of these companies have numerous locations and international
distributors. I have only listed corporate headquarters to save space.
If you can, check out the web site to get more detailed information.
Also please bear in mind that mergers and other business activity may
quickly make parts of this list obsolete.

Finally, I need to repeat my earlier plea about listservers. Please,
please, please note that subscription requests go to the LISTSERV or
MAILBASE or MAJORDOMO address.

APTECH SYSTEMS INC. (GAUSS)
Aptech Systems, Inc.
23804 SE Kent-Kangley Road
Maple Valley, WA 98038 USA
Tel: 206-432-7855
FAX: 206-432-7832
Web: http://www.aptech.com/com/vendors.htmlum/biometric.htmlof-lists
E-M: support@aptech.com (support) info@aptech.com (sales information)

GAUSS mailing list --
Subscriptions to: MAJORDOMO@ECO.UTEXAS.EDU
How to subscribe: subscribe GAUSSIANS
Post messages to: GAUSSIANS@ECO.UTEXAS.EDU

CIVILIZED SOFTWARE (MLAB)
Civilized Software, Inc.
8120 Woodmont Ave. #250
Bethesda, MD 20815 USA
Tel: 1-301-652-4714
Fax: 1-301-656-1069
Web: http://www.civilized.comm/vendors.htmlum/biometric.htmlof-lists
E-M: csi@civilized.com

CONCEPTUAL SOFTWARE INC. (DBMS/COPY)
Conceptual Software Inc.
9660 Hillcroft # 510
Houston, TX 77096.
Tel: 713-721-4200
Fax: 713-721-4298
Web: http://www.conceptual.com/vendors.htmlum/biometric.htmlof-lists
E-M: eroberts@conceptual.com (General Information)
eroberts@conceptual.com (Sales) hfeldman@conceptual.com (Customer
Support)

CYTEL SOFTWARE CORPORATION (StatXact, LogXact, EaSt)
Cytel Software Corporation
675 Massachusettes Ave.
Cambridge, MA 02139 USA
Tel: (617) 661-2011
Fax: (617) 661-4405
Web: http://www.cytel.coml.com/vendors.htmlum/biometric.htmlof-lists
E-M: sales@cytel.com

DATA DESCRIPTION, INC. (DATADESK)
Data Description, Inc.
Box 4555
Ithaca, NY 14853 USA
Tel: (607) 257-1000
FAX: (607) 257-4146
Web: http://www.datadesk.com/datadesk/.htmlum/biometric.htmlof-lists
E-M: datadesk@datadesk.com

DataMost Corp. (STATMOST)
DataMost Corporation
520 West 9460 South
Sandy, UT 84070 USA
Tel: (801) 255-5008
Fax: (801) 255-5009
Web: http://www.datamost.com/datadesk/.htmlum/biometric.htmlof-lists
E-M: techsupp@datamost.com

MATHSOFT (MATHCAD)
MathSoft, Inc.
101 Main Street
Cambridge, MA 02142 USA
Tel: 617 577-1017
Fax: 617 577-8829
Web: http://www.mathsoft.com/datadesk/.htmlum/biometric.htmlof-lists
E-M: ideas@mathsoft.com (comments and suggestions) support@mathsoft.com
(Support, US or Canada) help@mathsoft.com (Support outside US/Canada)
sales-info@mathsoft.com (Sales, US or Canada) int-info@mathsoft.com
(Sales outside US/Canada)

MATHWORKS (MATLAB)
The MathWorks, Inc.
24 Prime Park Way
Natick, MA 01760-1500 USA
Tel: (508) 653-1415
Fax: (508) 653-2997
Web: http://www.mathworks.com/home.htmlhtmlum/biometric.htmlof-lists
E-M: info@mathworks.com (Sales, pricing, information)
support@mathworks.com (Technical support) bugs@mathworks.com (Bug
reports) suggest@mathworks.com (Product suggestions)
service@mathworks.com (Service)

MINITAB INC.
Minitab Inc.
3081 Enterprise Drive
State College, PA 16801 USA
Tel: 814 238-3280
Fax: 814 238-4383
Web: http://www.minitab.comom/home.htmlhtmlum/biometric.htmlof-lists
E-M: sales@minitab.com

NCSS Statistical Software (NCSS, PASS)
NCSS Statistical Software
329 North 1000 East Kaysville, Utah 84037 USA
Tel: (800) 898-6109 (801) 546-0445
Fax: (801) 546-3907
Web: http://www.ncss.comcomom/home.htmlhtmlum/biometric.htmlof-lists
E-M: ncss@ix.netcom.com

PALISADE CORPORATION (@RISK)
Palisade Corporation
31 Decker Road
Newfield, NY 14867 USA
Tel: 607-277-8000 800-432-7475
Fax: 607-277-8001
Web: http://www.palisade.comm/home.htmlhtmlum/biometric.htmlof-lists

RESAMPLINGS STATS
Resampling Stats
612 N. Jackson St.
Arlington, VA 22201 USA
Tel: 703-522-2713
Fax: 703-522-5846
Web: http://www.statistics.comhome.htmlhtmlum/biometric.htmlof-lists
E-M: stats@resample.com learning@statistics.com

SAS INSTITUTE (JMP, SAS)
SAS Institute Inc.
SAS Campus Drive
Cary, NC 27513 USA
Tel: 919 677-8000 919 677-8008 (JMP technical support) 919 677-8000, ext
5071 (JMP sales)
Fax: 919 677-8123
Web: http://www.sas.comics.comhome.htmlhtmlum/biometric.htmlof-lists
ftp: ftp://ftp.sas.com
E-M: corpcom@unx.sas.com (Corporate Communications) sasedu@vm.sas.com
(Education) eurwww@mvs.sas.com (European Offices) pubs@unx.sas.com
(Publications) software@sas.sas.com (Sales and Marketing)
bussol@unx.sas.com (Business Solutions Division) sasblb2@vm.sas.com
(jmp-sales)

JMP mailing list --
Subscriptions to: MAJORDOMO@WUBIO.WUSTL.EDU
How to subscribe: subscribe JMP-L
Post messages to: JMP-L@WUBIOS.WUSTL.EDU

SAS mailing list --
Subscriptions to: LISTSERV@UGA.CC.UGA.EDU
How to subscribe: subscribe SAS-L First-name Last-name
Post messages to: SAS-L@UGA.CC.UGA.EDU

SAS Technical Support News --
Subscriptions to: LISTSERV@VM.SAS.COM
How to subscribe: subscribe TSNEWS-L First-name Last-name
Post messages to: Messages posted by SAS Institute only

SCIENTIFIC CONSULTING INC (PCNONLIN)
E-M: 75450.3171@compuserve.com

SPSS Inc. (BMDP, SPSS, Systat)
SPSS, Inc.
444 North Michigan Avenue
Chicago IL 60611 USA
Tel: 312 329-3410 800 543-2185 312-494-3283 (SYSTAT Technical Support)
Fax: 312/329-3668
BBS: 312/836-1900 (8/N/1)
ftp: ftp.spss.com
E-M: support@spss.com
Web: http://www.spss.comcs.comhome.htmlhtmlum/biometric.htmlof-lists

BMDP mailing list --
Subscriptions to: LISTSERV@VM1.MCGILL.CA
How to subscribe: sub BMDP-L Firstname Lastname
Post messages to: BMDP-L@VM1.MCGILL.CA

SPSS mailing list --
Subscriptions to: LISTSERV@UGA.CC.UGA.EDU
How to subscribe: sub SPSSX-L Firstname Lastname
Post messages to: SPSSX-L@UGA.CC.UGA.EDU

SYSTAT mailing list --
Subscriptions to: LISTSERV@SPSS.COM
How to subscribe: sub SYSTAT-L Firstname Lastname
Post messages to: SYSTAT-L@SPSS.COM

STATA CORPORATION
Stata Corporation
702 University Drive
East College Station, Texas 77840 USA
Tel: 409-696-4600 800-STATA-PC
Fax: 409-696-4601
Web: http://www.stata.com/.comhome.htmlhtmlum/biometric.htmlof-lists
E-M: stata@stata.com

STATA mailing list --
Subscriptions to: majordomo@hsphsun2.harvard.edu
How to subscribe: subscribe STATALIST
Post messages to: STATALIST@hsphsun2.HARVARD.EDU

STATISTICAL SCIENCES (S-PLUS)
Statistical Sciences, Suite 500
1700 Westlake Avenue N.
Seattle WA 98109-9891 USA
Tel: (206) 283-8802 (business) (800) 569-0123 (sales)
Fax: (206) 283-6310
Web: http://www.statsci.com/omhome.htmlhtmlum/biometric.htmlof-lists
E-M: sales@statsci.com (Sales) support@statsci.com (Support)
mktg@statsci.com (Marketing)

S-plus mailing list --
Subscriptions to: S-NEWS-REQUEST@UTSTAT.TORONTO.EDU
How to subscribe: subscribe
Post messages to: S-NEWS@UTSTAT.TORONTO.EDU
Also check out the parent company, Mathsoft.

STATISTICS AND EPIDEMIOLOGY RESEARCH CORPORATION (EGRET)
Tel: 206-632-3014
FAX: 206-547-4140
E-M: rhm@ms.washington.edu
Apparently, EGRET has been purchased by Cytel Corporation.

STATSOFT (STATISTICA)
StatSoft, Inc.
2300 East 14th Street
Tulsa, OK, USA 74104-4442 USA
Tel: (918) 749-1119
Fax: (918) 749-2217
Web: http://www.statsoftinc.comome.htmlhtmlum/biometric.htmlof-lists
E-M: info@statsoftinc.com

SUDAAN
SUDAAN
Product Coordinator Statistical Software Center
Research Triangle Institute
3040 Cornwallis Road
Research Triangle Park NC 27709-2194 USA
Tel: (919) 541-6602
Fax: (919) 541-7431
Web: http://www.rti.org/patents/sudaan/sudaan.htmletric.htmlof-lists
E-M: sudaan@rti.org

UNISTAT
Web: http://www.unistat.coments/sudaan/sudaan.htmletric.htmlof-lists

Here is a list of software for experimental design, collated by Bob
Wheeler.

RS/1 software - including RS/Discover (A general purpose statistics
package with extensive experimental design and analysis capability.)
BBN Domain Corp.
150 Cambridge Park Dr.
Cambridge, MA 02140
Tel: 617-873-5000
Fax: 617-873-6153
E-M: jtsullivan@bbn.com
Web: http://www.bbndomain.com/s/sudaan/sudaan.htmletric.htmlof-lists

Design Ease & Design Expert software (Experimental design, analysis, and
training.)
Stat-Ease, Inc.
2021 E. Hennepin Ave., Ste. 191
Minneapolis, MN 55413
Tel: 612-378-9449
Fax: 612-378-2152
E-M: 72103,1436@compuserve.com

ECHIP software (Experimental design, analysis and training for
scientists and engineers.)
ECHIP, Incorporated
724 Yorklyn Road
Hockessin, DE 19707-8733
Tel: 302-239-5429
Fax: 302-239-6227
E-M: support@echip.com

9 Where can I find free/shareware statistical software?

Any search for free/shareware statistical software should start with
Statlib. Other software is arranged alphabetically after the description
of Statlib.

http://lib.stat.cmu.edu//s/sudaan/sudaan.htmletric.htmlof-lists is the site for Statlib, a system for
distributing statistical software by the web and by electronic mail. and
ftp. If you do not have web access, send an e-mail to
statlib@lib.stat.cmu.edu with a single line in the body of the message
send index. This will give you an index of the general material
available on the statlib server.

http://www.mrc-bsu.cam.ac.uk/bugs/Welcome.htmltric.htmlof-lists is the home page for
BUGS/CODA. BUGS stands for Bayesian analysis Using Gibbs Sampling. CODA
is a set of S-plus programs to analyze convergence diagnostics of BUGS
output. This software is described in Carlin BP and Louis TA (1996)
"Bayes and Empirical Bayes Methods for Data Analysis" Chapman and Hall,
London.

ftp://plato.la.asu.edu/pub/donlp2 is the ftp site for DONLP2. There have
been recent updates to DONLP2, one of the few high-quality programs for
general nonlinear programming problems available completely free over
the net. There are four different versions (in f77 resp f2c/cc and with
exact or numerical differentiation), there is a separate file with three
papers as postscript files and the user's guide (README's and
donlp2doc.txt file) have been updated last on 6-24-96.

ftp://ftp.cdc.gov/pub/epi/epiinfo is the ftp site for Epi-Info/Epi-Map.
Epi-info is a series of computer programs produced by the Centers for
Disease Control and Prevention and the World Health Organization which
provides public-domain software for word processing, database and
statistics work in public health. There is a companion product, Epi-map,
for geographic mapping. Support is available through telephone (404)
728-0545, fax (404) 315-6440 or E-M: EpiInfo@CDC1.CDC.GOV.

http://GKing.Harvard.Educ.uk/bugs/Welcome.htmltric.htmlof-lists is the web site for EI/EzI. EI and EzI
implement the statistical methods, graphics, and diagnostics in Gary
King's forthcoming book _Reconstructing Individual Behavior from
Aggregate Data: A Solution to the Ecological Inference Problem_
(Princeton: Princeton University Press, April 1997). EI requires Gauss
(from Aptech Systems) and is platform- independent. EzI does not require
Gauss, but runs only under MS-DOS (or Windows 95 or OS/2), requires at
least 8 MB of memory, and about 2MB of hard disk space.

http://www.psychologie.uni-trier.de:8000/projects/gpower.htmlts is the web
site for GPOWER. GPower is a routine for study size and power, is made
available by a bunch of German cognitive scientists. It does t-tests,
F-tests and Chi-squared. It has a handy routine for effect size
calculation. It exists in Mac, Mac+FPU, Powermac and BC-compatible
versions, as well as DOS.

http://www.medent.umontreal.ca/multilevelprojects/gpower.htmlts has information about MLn and
other shareware/freeware software for multilevel analyses.

http://www.compulink.co.uk/~kovcomp/levelprojects/gpower.htmlts is the web site for MVSP. MVSP is a
MultiVariate Statistical Package which provides an inexpensive yet easy
means of analysing your data. It calculates principal components,
principal coordinates and correspondence analyses (including detrended
CA), as well as hierarchical cluster analysis using nineteen distance or
similarity measures and seven clustering strategies, and diversity
indices. The program is DOS based and menu-driven; a Windows version is
nearing completion.

http://www.compulink.co.uk/~kovcomp/levelprojects/gpower.htmlts is the web site for Oriana. Oriana
for Windows ver. 1.0 calculates the special forms of sample and
inter-sample statistics required for circular data (e.g. directional
data or time of day). Oriana calculates the circular mean, length of the
mean vector, circular standard deviation and standard error, 95% and 99%
confidence limits, and Rayleigh's test of uniformity for each sample in
your data file. Pairs of samples can be compared with Watson's F-test
for two circular means. The overall distributions of two samples can be
compared with Chi-squared tests. The data for each sample can be
summarised with rose diagrams or circular histograms as well as linear
histograms. The individual observations can be shown in raw data plots.
Uniformity plots allow you to assess whether the data depart from a
uniform distribution.

http://www-prophet.bbn.com/~kovcomp/levelprojects/gpower.htmlts is the web site for Prophet Software.
PROPHET is a UNIX-based workstation software package that gives
researchers a wide range of computing capabilities. One of PROPHET's
greatest assets is its new graphical user interface. Employing the
latest advances in software technology, PROPHET lets you store, analyze
and present Data Tables, Graphs, Statistical Analyses and Mathematical
Modeling, and Sequence Analyses with high-resolution graphics and
multiple windows. Anyone, from the computer-naive to the
computer-sophisticate, can learn to use it quickly and effectively.

http://odin.mdacc.tmc.edu/anonftpmp/levelprojects/gpower.htmlts is the web site for STPLAN, RANLIST,
WINDOWS, STATTAB, and SURVAN. The MD Anderson Center at the University
of Texas makes available a series of packages for both Mac and DOS which
are basic in terms of interface but well documented. These include
STPLAN: Sample size and power RANLIST: Randomization plans for clinical
trials WINDOWS (!) Kernel smoothing of dose-response curves (smoothing
of the relationship between a continuous variable and a binary outcome)
STATTAB : Statistical tables SURVAN: Survival analysis, including Cox
regression.

http://forrest.psych.unc.edu/research/ViSta.htmls/gpower.htmlts is the web site for
VISTA (Visual Statistics System). ViSta is a Visual Statistics system
designed for a wide ranges of users. It is particularly useful for those
needing to learn statistics, and to their teachers. ViSta is also
designed to be used for research and development in computational and
graphical statistics.

http://www.westat.comunc.edu/research/ViSta.htmls/gpower.htmlts is the web site for Westat, developers of
WesVarPC. WesVarPC is a software package developed at Westat, Inc., that
computes estimates and replicate variance estimates from survey data
collected using complex sampling and estimation procedures. This
flexible software supports a wide range of complex sample designs,
including multistage, stratified, and unequal probability samples. The
replicate variance estimates can also reflect a number of estimation
schemes, such as poststratification or ratio estimation. There is a
mailing list, WESVAR-L.
Subscriptions to: listserv@listserv.westat.com
How to subscribe: subscribe WESVAR-L 
Post messages to: WESVAR-L@listserv.westat.com

ftp://ftp.stat.umn.edu/pub/xlispstat is the ftp site for Xlisp-Stat.
Xlisp-Stat is a comprehensive statistical environment based on the XLISP
dialect of LISP. It runs on Amiga, Macintosh, MS-DOS, MS-Windows, and
X11. XLISP-STAT is highly extensible, and many interesting extensions
can be found at Statlib (see above for details about Statlib). There is
a mailing list, stat-lisp-news. At the moment, the list is maintained by
hand.
Subscriptions to: LISTSERV@JULIA.MATH.UCLA.EDU
How to subscribe: Ask to join and include your e-mail address
Post messages to: stat-lisp-news@stat.umn.edu

10 What statistics resources can be found on the web?

This section does not include web sites described in the "How can I
contact the major statistics software vendors?" section or in other
parts of the FAQ. The web is growing and changing rapidly, so it is
impossible for me to compile a comprehensive list. Here are some
interesting sites which have been mentioned on STAT-L/SCI.STAT.CONSULT.
You are welcome to send me other interesting web sites.

http://www.nottingham.ac.uk/~mhzmd/bonf.htmlhtmls/gpower.htmlts A biography of Carlo Emilio
Bonferroni (Michael Dewey).

http://www-leland.stanford.edu/class/gsb/excel2sas.htmlr.htmlts Excel to SAS and
other data translations.

http://www.rt66.com/~llubetedu/class/gsb/excel2sas.htmlr.htmlts Lloyd's Warehouse of Economic Indicators.

ftp://ftp.sas.com/pub/neural/measurement.html Measurement theory FAQ.

ftp://ftp.sas.com/pub/neural/FAQ.html Neural networks FAQ.

http://www.stat.wisc.edu/statistics/consult/el2sas.htmlr.htmlts the ASA Section on
Statistical Consulting.

http://www.interchg.ubc.ca/cacb/poweronsult/el2sas.htmlr.htmlts Statistical power analysis
software (Len Thomas).

http://www.execpc.com/~helberg/statistics.html2sas.htmlr.htmlts Statistics on the Web
(Clay Helberg).

http://www.isds.duke.edu/stats-sites.html.html2sas.htmlr.htmlts Statistics servers and other
links (The Institute of Statistics and Decision Sciences).

http://www.stat.ucla.edu/textbook/es.html.html2sas.htmlr.htmlts UCLA Statistics Textbook (interactive
pages using JavaScript, Perl, xlisp-stat, etc.)

http://www.stat.ufl.edu/vlib/statistics.html/l2sas.htmlr.htmlts Virtual Library of
Statistics

http://www.utexas.edu/world/lecture/ics.html/l2sas.htmlr.htmlts World Lecture Hall (Web-based
lectures on many academic topics including Statistics).

Web sites for statistics journals (compiled by Tony Corso)

http://www.ams.org/journals/lecture/ics.html/l2sas.htmlr.htmlts American Mathematical Society Journals
http://www.amstat.org/publications/index.htmll2sas.htmlr.htmlts American Statistical
Association Publications
http://www.stat.colostate.edu/annapprdex.htmll2sas.htmlr.htmlts The Annals of Applied Probability
http://www.stat.berkeley.edu/users/annstattmll2sas.htmlr.htmlts The Annals of Statistics
http://www.nuff.ox.ac.uk/biometrikaannstattmll2sas.htmlr.htmlts Biometrika
http://www.wiwi.hu-berlin.de/~sigbert/cs.htmll2sas.htmlr.htmlts Computational Statistics
http://www.shef.ac.uk/uni/companies/apt/apt2.htmls.htmlr.htmlts Journal of Applied
Probability
http://www.o2.net/~jasr/jasr.htmles/apt/apt2.htmls.htmlr.htmlts Journal of Applied Statistical
Reasoning
http://www.carfax.co.uk/jas-ad.htms/apt/apt2.htmls.htmlr.htmlts Journal of Applied Statistics
http://www.pitt.edu/~csna/joc.htmls/apt/apt2.htmls.htmlr.htmlts Journal of Classification
http://fisher.stat.unipg.it/iasc/Misc-stat-journ-JCGS.htmltmlts Journal of
Computational and Graphical Statistics
http://www.stat.ucla.edu/journals/jebsstat-journ-JCGS.htmltmlts Journal of Educational and
Behavioral Statistics
http://www.apnet.com/www/journal/mv.htmtat-journ-JCGS.htmltmlts Journal of Multivariate Analysis
http://www.gbhap.com/journals/718/718-top.htmurn-JCGS.htmltmlts Journal of Nonparametric
Statistics
http://jscs.stat.vt.edu/JSCSs/718/718-top.htmurn-JCGS.htmltmlts Journal of Statistical Computation and
Simulation
http://www.elsevier.nl/locate/inca/505561.htmurn-JCGS.htmltmlts Journal of Statistical
Planning and Inference
http://www.stat.ucla.edu/journals/jss5561.htmurn-JCGS.htmltmlts Journal of Statistical Software
http://www2.ncsu.edu/ncsu/pams/stat/info/jse/homepage.htmltmlts Journal of
Statistics Education
http://interstat.stat.vt.edu/InterStatfo/jse/homepage.htmltmlts Interstat - Statistics on the
Internet
http://vision.arc.nasa.gov/publications/Psychometrika.htmltmlts Psychometrika
http://www.gbhap.com/journals/604/604-top.htmometrika.htmltmlts Statistics - Theoretical
and Applied Statistics
http://www.elsevier.nl/inca/publications/store/5/0/5/5/7/3tmlts Statistics &
Probability Letters
http://www.stat.ucla.edu/ims/publications/journals/statscitmlts Statistical
Science Journal
http://www.maths.uq.oz.au/~gks/webguide/journals.htmlatscitmlts Guide to the Web
for Statisticians: Journals

11 What should I do about these "Spams"?

http://www.cauce.orgoz.au/~gks/webguide/journals.htmlatscitmlts is a web site for the Coalition Against Unsolicited
Commercial E-mails (CAUCE). Visit this site if you want to do something
constructive to stop spam. This site is lobbying for legislation that
would make junk e-mail illegal, just like junk FAXes were outlawed
recently. In my humble opinion, this seems like the best solution to a
problem that is getting worse and worse over time.

A message distributed across multiple newsgroups or list servers,
usually for commercial purposes, is known as a Spam. Some examples of
Spams that have hit STAT-L/SCI.STAT.CONSULT are the green card lawyers,
information about lonely women in Russia, and blueprints of the original
atom bomb. First, keep in mind that often it is not the original spam
messages that are so conspicuous and potentially intrusive, but rather
the inevitable threads of discussion which seem to result from them.
Please do not complain to STAT-L about a spam. The person who sent the
spam is almost certainly not a subscriber to STAT-L and will not see
your complaint. Other victims of the spam will see your complaint
though, which multiplies the annoying effect of the spam.

There are constructive steps that you can take to discourage a spam but
be assured that hundreds if not thousands of people have probably
already done this on your behalf. You can do nothing and still be
assured that others are looking out for everyone's interests. So the
best course of action is to shrug off the message. You might want to get
in the practice of recognizing a spam by its subject line and deleting
it unread.

If you don't want to ignore the spam, try following the advice given
recently by Michael Palij:

>In a situation such as this I suggest that you send E-mail
>to the postmaster of the machine from which the offending
>E-mail was sent, alerting the postmaster of the E-mail
>message and including a copy of the E-mail message. If
>for some reason postmaster@machine does not work send
>E-mail to root@machine. Don't respond to the person of
>the account that sent the E-mail nor mailbomb. The
>reasons for this are:
>1. The E-mail may have a forged name/account. That is,
>the return address may be bogus or belong to someone who
>has a legitimate account on the specific machine but who
>did not send the E-mail.
>2. Some people, if they want to punish a particular
>person/account or machine, may send out a spam message
>such as the one above, with the expectation that the
>person's account or machine/site will be overwhelmed by
>the reaction (yes, some people will send a copy of a
>coredump or Moby Dick to the offending E-mail address in
>the hope that it will crash the mail program). In this
>way, an innocent person gets hurt because of a set-up.
>3. Notifying the person who has responsibility for the
>machine (i.e., the postmaster or root) will allow that
>person to determine whether one of their real users
>posted the message (and give that person a good talking
>to) or whether their system was hacked and someone posted
>the offending message as a prank/whatever.
>In general, try to stay cool about such occurrences, E-mail
>the postmaster to investigate the situation, and appreciate
>that much more may be going on than you realize.

12 What are some of the problems with stepwise regression?

All of this material is quoted from various e-mails that appeared on
STAT-L/SCI.STAT.CONSULT in 1996. Thanks go to Ira Bernstein, Ronan
Conroy, Frank Harrell for their detailed explanations and to Richard
Ulrich who originally compiled these comments. I have done some very
minor editing, (mostly adding and changing line breaks) but have tried
to avoid any substantive changes to these well written explanations.

Frank Harrell's comments:

>Here are SOME of the problems with stepwise variable selection.
>
> 1. It yields R-squared values that are badly biased high
> 2. The F and chi-squared tests quoted next to each variable on the
> printout do not have the claimed distribution
> 3. The method yields confidence intervals for effects and predicted
> values that are falsely narrow (See Altman and Anderson Stat in Med)
> 4. It yields P-values that do not have the proper meaning and the
> proper correction for them is a very difficult problem
> 5. It gives biased regression coefficients that need shrinkage
> (the coefficients for remaining variables are too large;
> see Tibshirani, 1996).
> 6. It has severe problems in the presence of collinearity
> 7. It is based on methods (e.g. F tests for nested models) that were
> intended to be used to test pre-specified hypotheses.
> 8. Increasing the sample size doesn't help very much (see
> Derksen and Keselman)
> 9. It allows us to not think about the problem
> 10. It uses a lot of paper
>
>Note that 'all possible subsets' regression does not solve any of these
>problems.
>
>
>References
>----------
>@article{alt89,
>author = "Altman, D. G. and Andersen, P. K.",
>journal = "Statistics in Medicine",
>pages = "771-783",
>title = "Bootstrap investigation of the stability of a {C}ox
> regression model",
>volume = "8",
>year = "1989"
>Shows that stepwise methods yields confidence limits that are far
>too narrow.
>}
>
>@article{der92bac,
>author = {Derksen, S. and Keselman, H. J.},
>journal = {British Journal of Mathematical and Statistical Psychology},
>pages = {265-282},
>title = {Backward, forward and stepwise automated subset selection
>algorithms: {F}requency of obtaining authentic and noise variables},
>volume = {45},
>year = {1992},
>annote = {variable selection}
>Conclusions:
>
>"The degree of correlation between the predictor variables affected
>the frequency with which authentic predictor variables found their way
>into the final model.
>
>The number of candidate predictor variables affected the number of
>noise variables that gained entry to the model.
>
>The size of the sample was of little practical importance in
>determining the number of authentic variables contained in the final
>model.
>
>The population multiple coefficient of determination could be
>faithfully estimated by adopting a statistic that is adjusted by
>the total number of candidate predictor variables rather than the
>number of variables in the final model."
>
>}
>
>@article{roe91pre,
>author = {Roecker, Ellen B.},
>journal = {Technometrics},
>pages = {459-468},
>title = {Prediction error and its estimation for subset--selected
models},
>volume = {33},
>year = {1991}
>Shows that all-possible regression can yield models that are "too
small".
>}
>
>@article{man70why,
>author = {Mantel, Nathan},
>journal = {Technometrics},
>pages = {621-625},
>title = {Why stepdown procedures in variable selection},
>volume = {12},
>year = {1970},
>annote = {variable selection; collinearity}
>}
>
>@article{hur90,
>author = "Hurvich, C. M. and Tsai, C. L.",
>journal = American Statistician,
>pages = "214-217",
>title = "The impact of model selection on inference in linear
regression",
>volume = "44",
>year = "1990"
>}
>@article{cop83reg,
>author = {Copas, J. B.},
>journal = "Journal of the Royal Statistical Society B",
>pages = {311-354},
>title = {Regression, prediction and shrinkage (with discussion)},
>volume = {45},
>year = {1983},
>annote = {shrinkage; validation; logistic model}
>Shows why the number of CANDIDATE variables and not the number in the
>final model is the number of d.f. to consider.
>}
>
>@article{tib96reg,
>author = {Tibshirani, Robert},
>journal = "Journal of the Royal Statistical Society B",
>pages = {267-288},
>title = {Regression shrinkage and selection via the lasso},
>volume = {58},
>year = {1996},
>annote = {shrinkage; variable selection; penalized MLE; ridge
regression}
>}
Ira Bernstein's comments:
>I think that there are two distinct questions here: (a) _when_ is
>stepwise selection appropriate and (b) _why_ is it so popular.
>
>Since I have seen some variation in usage of the term "stepwise", I
>define it as any of a number of _data_ driven variable selection
>schemes used in regression and discriminant analysis, among other
>applications. Some, inappropriately IMHO (since there is no official
>body to define "appropriate"), use it to describe what I would call
>hierarchical (_hypothesis_ driven) selection. Like I would assume
>many, I would discourage stepwise selection and encourage
>hierarchical selection. I, of course, assume the researcher does
>not "cheat" by defining his/her "hierarchy" given the data but does
>so by considering alternatives in advance of analysis and,
>preferably, replicates the study (dream on).
>
>I would probably only argue slightly with "never" as an answer to the
>use of stepwise selection since I don't know what knowledge we would
>lose if all papers using stepwise regression were to vanish from
>journals at the same time programs providing their use were to become
>terminally virus-laden. However, I have been in situations that
>looked like "I have good reason to look at variables A, B, and C;
>then look at D, and E, but I have no basis to favor F over G or vice
>versa past that point." Older versions of SPSS (I haven't used newer
>versions since switching to SAS a decade ago) allowed this mixture,
>and I would personally not object to it as long as the strategy were
>defined in advance and made clear to readers.
>
>As to part (b), I think that there are two groups that are inclined
>to favor its usage. One consists of individuals with little formal
>training in data analysis who confuse knowledge of data analysis
>with knowledge of the syntax of SAS, SPSS, etc. They seem to figure
>that "if its there in a program, its gotta be good and better than
>actually thinking about what my data might look like". They are
>fairly easy to spot and to condemn in a right-thinking group of
>well-trained data analysts (like ourselves). However, there is also
>a second group who are often well trained (and may be here in this
>group ready to flame me). They believe in statistics uber
>alles--given any properly obtained data base, a suitable computer
>program can objectively make substantive inferences without active
>consideration of the underlying hypotheses. If stepwise selection
>is the parent of this line blind data analysis, then automatic
>variable respecification in confirmatory factor analysis is the
>child.
Ronan Conroy's comments:
>I am struck by the fact that Judd and McClelland in their excellent
>book "Data Analysis: A Model Comparison Approach" (Harcourt Brace
>Jovanovich, ISBN 0-15-516765-0) devote less than 2 pages to stepwise
>methods. What they do say, however, is worth repeating:
>
>1. Stepwise methods will not necessarily produce the best model if
there
>are redundant predictors (common problem).
>
>2. All-possible-subset methods produce the best model for each possible
>number of terms, but larger models need not necessarily be subsets of
>smaller ones, causing serious conceptual problems about the underlying
>logic of the investigation.
>
>3. Models identified by stepwise methods have an inflated risk of
>capitalising on chance features of the data. They frequently fail
>when applied to new datasets. They are rarely tested in this way.
>
>4. Since the interpretation of coefficients in a model depends on the
>other terms included, "it seems unwise," to quote J and McC, "to let
>an automatic algorithm determine the questions we do and do not ask
>about our data". RC adds that stepwise methods abusers frequently
>would rather not think about their data, for reasons that are funny
>to describe over a second Guinness.
>
>5. I quote this last point directly, as it is sane and succinct:
>
>"It is our experience and strong belief that better models and a
>better understanding of one's data result from focussed data
>analysis, guided by substantive theory." (p 204)
>
>They end with a quote from Henderson and Velleman's paper "Building
>multiple regression models interactively". Biometrics 1981;37:391-411
>
>"The data analyst knows more than the computer"
>
>and add
>
>"failure to use that knowledge produces inadequate data analysis."
>
>Personally, I would no more let an automatic routine select my model
>than I would let some best-fit procedure pack my suitcase.

13 What is the answer to the Monty Hall, Envelope, or Birthday problem?

There is a classic probability puzzle, which is called the Monty Hall
problem. Here's a nice description from the rec.puzzles FAQ. "The Monty
Hall problem can be stated as follows: A gameshow host displays three
closed doors. Behind one of the doors is a car. The other two doors have
goats behind them. You are then asked to choose a door. After you have
made your choice, one of the remaining two doors is then opened by the
host (who knows what's behind the doors), revealing a goat. Will
switching your initial guess to the remaining door increase your chances
of guessing the door with the car?"

The general consensus is that the probability of winning the car is 1/3
if you don't switch and 2/3 if you do switch. But there are some
implicit assumptions in this problem that cause a raging debate every
time it appears on STAT-L. For example, the host may be perversely
trying to goad you into a bad switch and reveals a door only when your
current door has a car behind it. There are at least thirty web sites
that discuss this problem. Here are three good sites:

http://www.smartpages.com/faqs/sci-math-faq/montyhall/faq.htmls SCI.MATH
FAQ
http://www.cs.ruu.nl/wais/html/na-dir/puzzles/archive/decision.html
REC.PUZZLES FAQ
http://www.ram.org/computing/monty_hall.htmls/archive/decision.html has a simulation model
based on this problem.

You can also read about this problem in Engel, E. and Venetoulias, A.
(1991). Monty Hall's probability puzzle. Chance, Vol 4, # 2, 6-9. and
Selvin, S. (1975). A problem in probability, in "Letters to the Editor,"
The American Statistician, 29, 67 and 134.

The envelope exchange problem goes something like this (again from the
rec.puzzles FAQ). "Someone has prepared two envelopes containing money.
One contains twice as much money as the other. You have decided to pick
one envelope, but then the following argument occurs to you: Suppose my
chosen envelope contains $X, then the other envelope either contains
$X/2 or $2X. Both cases are equally likely, so my expectation if I take
the other envelope is .5 * $X/2 + .5 * $2X = $1.25X, which is higher
than my current $X, so I should change my mind and take the other
envelope. But then I can apply the argument all over again. Something is
wrong here! Where did I go wrong? In a variant of this problem, you are
allowed to peek into the envelope you chose before finally settling on
it. Suppose that when you peek you see $100. Should you switch now?"

Again, there are some subtle assumptions in this problem that cause a
lot of commentary. A good reference to the problem is Christensen, R.
and Utts, J. (1992) "Bayesian Resolution of the 'Exchange Paradox,'" The
American Statistician, 46(4), 274-276. Note also comments in the Letters
to the Editor column in two separate issues the American Statistician in
1993 (pages 160, 311).

http://www.cs.ruu.nl/wais/html/na-dir/puzzles/archive/decision.html, the
rec.puzzles FAQ contains a nice discussion of this problem.

The birthday problems goes something like this. There are "r" people in
a room. What is the probability that two or more people have the same
birthday?

Assuming uniform probabilities for each birthdate, the probability of a
match is 1-(n!/(n^r)*(n-r)!) where n equals the number of days in a year
and r equals the number of people in the group. For r=23, the
probability exceeds 0.5. A nice summary of this problem with extensions
into non-uniform birthdates is Nunnikhoven, T.S. (1992) "A Birthday
Problem Solution for Nonuniform Birth Frequencies," The American
Statistician, 46(4), 270-274.

http://pascal.dartmouth.edu/~zhu/applets/Birthday/Birthday.javahtml is a
Java applet for computing these probabilities.

http://www.mste.uiuc.edu/reese/birthday/intro.htmlBirthday.javahtml has a simulation of
the birthday problem.

14 Can someone provide me with references and/or books about [topic]?

Before you post a question like this, it would be nice if you did a
little work beforehand. The best resource for finding references about a
statistical topic is the Current Index to Statistics Extended Database
(CISED), a CD-ROM with 180,000 references in statistics journals since
1974, with coverage of selected journals dating back as far as 1940.
Many university libraries have this product, and some make it available
to their students through a web browser. Licensing agreements, however,
prevent libraries from making this product available to the general
public. If you want to purchase an individual license, it is available
for as little as $95.
http://www.stat.uchicago.edu/~cis/thday/intro.htmlBirthday.javahtml is a web site that contains more
information about CISED. Two e-mail contacts at IMS and ASA are
kmkims@stat.berkeley.edu and cised@amstat.org, respectively.

http://www.stat.wisc.edu/statistics/consult/statbook.htmly.javahtml is Glen
McPherson's Essential Book List. Back in 1993, Glen McPherson polled the
members of STAT-L/SCI.STAT.CONSULT to create a list of books essential
to anyone in the statistical consulting field. The list is organized by
major topic areas. Brian Yandell has put this list up on his web site.

http://www.stat.wisc.edu/statistics/consult/book.htmlhtmly.javahtml is another
interesting booklist that can be found at the same web site.

15 Acknowledgments

This list has grown thanks to the small and large contributions of many
people. Part of it was shamelessly stolen from well written messages on
STAT-L. Here is a partial list of people who you should thank for
directly or indirectly contributing to this FAQ: Gary Ash, Kenneth
Benoit, Grant Blank, Jim Box, Benjamin Chan, Ronan Conroy, Tony Corso,
Donald Cram, Byron Davis, Barry DeCicco, Joe Dolgos, Rick Engberg, Emil
Friedman, Mike Fuller, Steve Goodman, Bill Gould, Timothy Green, Duane
Griffin, Clay Helberg, Tim Hesterberg, Charles Kincaid, Warren Kovach,
Jan de Leeuw, Lloyd Lubet, Haiko Luepsen, Hans Mittelmann, Brian
Monsell, John Nash, Jonathan Newman, Michael Palij, Dennis Roberts,
David Ronis, Warren Sarle, Ronald Schoenberg, Russell Schulz, Jim
Steiger, Len Thomas, Richard Ulrich, Vittorio Viaggi, Michael Walsh,
Meredith Warshaw, Bob Wheeler, Will Wheeler, John Whittington, Forest
Young, Sara Young, Stuart Young, Craig Ziegler.

If there are errors in this FAQ, they are probably my fault; it is
difficult to accurately transcribe all of the information I have
received, even with cut and paste. Please send any corrections and
additions. Complaints are appreciated also, but please realize that I am
doing this on a volunteer effort, mostly during lunch breaks and after
work hours.

*** End of FAQ for STAT-L/SCI.STAT.CONSULT ***
Steve Simon, ssimon@cmh.edu, Standard Disclaimer.
Office of Medical Research, Children's Mercy Hospital
2401 Gillham Road, Kansas City, MO  64108
TEL: 816-234-3963 FAX: 855-1703
Vision: The Children's Mercy Hospital commits to providing
quality pediatric medical care with service excellence and
efficiency to everyone we serve.