From: "Simon, Steve, PhD"Subject: *** FAQ for STAT-L/SCI.STAT.CONSULT *** With my recent upgrade to Microsoft Office 97, I have found how easy it is to create and convert files to HTML format. I have made an attempt to do this for the FAQ. David Ronis and I will experiment with the format over the next couple of months. I have tried to format the FAQ so that it is easily readable in either the text version or in the HTML version. That means that all the indenting (which was a pain to modify anyway) is now gone. I have also tried to put web sites at the beginning of a paragraph so as to avoid awkward line breaks. With some web sites being 50-60 characters long, anyplace but the beginning of a paragraph would almost guarantee an awkward line break. I have also condensed the FAQ a bit and combined some of the questions together. For example, STATLIB is now listed under interesting web sites rather than being in a separate question. The question about references and the question about books have been combined. I realize that the new format will be a little less readable than the previous versions. If it is a lot less readable, please let me know. Also, if you have any comments on how I can improve the format, please send them along. If you've sent me some suggestions for the FAQ in the past couple of months, please bear with me as I try to sort out all the formatting issues. I hope that things will be back to normal in September or October. FAQ for STAT-L/SCI.STAT.CONSULT, July 30, 1997 This FAQ is posted once a month to STAT-L/SCI.STAT.CONSULT. David Ronis regularly posts this to his web site: http://www-personal.umich.edu/~dronis/statfaq.htm. Variations and earlier versions of the FAQ can be found on other sites on the web. You are welcome to post all or part of this FAQ at your web site. Please don't modify it without my permission, and please let me know where you are posting it. Table of contents 1 What is STAT-L/SCI.STAT.CONSULT? 2 What are other related listserv/usenet groups? 3 How do I know that my message got posted? 4 How do I use LISTSERV to... 5 How do I get the archives of STAT-L/SCI.STAT.CONSULT? 6 Why have I stopped seeing messages? 7 How can I contact the ASA, Biometric Society, or IMS? 8 How can I contact the major statistics software vendors? 9 Where can I find free/shareware statistical software? 10 What statistics resources can be found on the web? 11 What should I do about these "Spams"? 12 What are some of the problems with stepwise regression? 13 What is the answer to the Monty Hall, Envelope, or Birthday problem? 14 Can someone provide me with references and/or books about [topic]? 15 Acknowledgments 1 What is STAT-L/SCI.STAT.CONSULT? STAT-L and SCI.STAT.CONSULT are a combined LISTSERV/USENET group for the discussion of statistical consulting issues. Through the magic of Internet, any message posted on SCI.STAT.CONSULT also appears on STAT-L. Any message posted on STAT-L appears on SCI.STAT.CONSULT. So you can follow all the fascinating questions and answers using either system. We discuss statistical issues of all levels of difficulty, as well as statistical education, the practice of statistical consulting, and other related topics. We also like to debate some of the more controversial issues in Statistics like the validity of the statistical models used in the Bell Curve book and the pitfalls of stepwise regression models. Be sure to put your name and e-mail address at the end of your message. Some people have e-mail systems that strip headers from a message, making it impossible for them to reply directly to you. If you have a question about a particular statistics package, you will probably get a faster and more accurate answer by posting the question on the list that specializes in a particular package (e.g., SAS-L/COMP.SOFT-SYS.SAS or S-NEWS). Refer to the section "How can I contact the major statistics software vendors?" We appreciate questions at a levels from beginner to expert. Sometimes, the beginner questions lead to some interesting discussions as to the subtle nuances in statistical consulting. If you want advice on how to analyze some data, please include some context as to what your data means and what you are trying to investigate. No one can answer a question well that only says "Listed below is some data. How do I analyze it?" Be careful about advice on STAT-L/SCI.STAT.CONSULT. You'll find many people who are glad to help you, but you must realize the serious limitations of e-mail. There is no adequate substitute for getting advice face-to-face with a professional, especially BEFORE collecting any data and BEFORE performing any experiments. Even the most experienced and wise Statisticians will be unable to make sense out of a poorly designed study. There are three types of messages that we discourage. First, try to avoid any overly commercial pitches, including posting your resume. On the other hand, we do like to hear about job openings, especially ones that list starting salaries so we can bemoan how little we make on our current jobs. Postings of upcoming conferences are also acceptable. Second, don't post your homework questions on here, even if you have permission to do so from your teacher. On the other hand, asking for recommendations on books for beginners is fine. Third, while we enjoy a spirited debate, please refrain from flaming and personal attacks. Although we have occasional lapses, this list has a generally high level of civility and politeness. Let's keep it that way. Here's some additional advice from Richard Ulrich for SCI.STAT.CONSULT folks. If you are going to CROSS-POST to several groups, PLEASE send >just one message in which you LIST THE SEVERAL GROUPS in the >header. >i) That way, when someone writes a response, it will show >up in EACH group where the question could be read, not just >in one. >ii) That way, when a person reads with a Threaded-newsreader, >he will see your message just ONCE, instead of over and over. 2 What are other related LISTSERV/USENET groups? http://www.mailbase.ac.uk/lists-k-o/minitab/files/list-of-lists is a very comprehensive list of statistical lists. It is maintained by Mike Fuller. http://www.minitab.com/maillist.htm/minitab/files/list-of-lists is another good source of information. SPECIAL WARNING!!! Please, please, please note that subscription requests go to the LISTSERV or MAILBASE address. If you send a subscription request to the list itself, it will be read by hundreds or thousands of people, none of whom can get you subscribed. Some of these people will be annoyed enough at your naivete that they will introduce you to a concept known as "flaming". ALBERT-GIFI -- The Albert Gifi mailing list discusses correspondence analysis, multidimensional scaling, nonlinear multivariate analysis, and optimal scaling Subscriptions to: LISTSERV@JULIA.MATH.UCLA.EDU How to subscribe: subscribe ALBERT-GIFI First-name Last-name Post messages to: ALBERT-GIFI@JULIA.MATH.UCLA.EDU ALLSTAT -- Discussions on this list are similar to STAT-L/SCI.STAT.CONSULT, but there is a decidedly British flavor to ALLSTAT and a more U.S. flavor to STAT-L/SCI.STAT.CONSULT. This is particularly noticeable in the postings of meetings. ALLSTAT is a Mailbase system so it uses a slightly different syntax than the LISTSERV system. Subscriptions to: MAILBASE@MAILBASE.AC.UK How to subscribe: join ALLSTAT First-name Last-name Post messages to: ALLSTAT@MAILBASE.AC.UK Web info and FAQ: http://www.stats.gla.ac.uk/allstat//minitab/files/list-of-lists Note: Contrary to previous information in this FAQ, you must include your name when subscribing. "Subscribe" can be substituted for "join," however. Here are some additional comments from Dr. Stuart Young, the list owner. >Note also, that while Allstat does indeed have a "UK flavour" >it is not a discussion list. It is a "broadcast system" for >distributing notices. Discussions are not encouraged on the >list - replies go to the sender, not to the list. CRSP-L -- Help With Center for Research in Security Prices (CRSP) Data Bases. Subscriptions to: LISTSERV@TAMVM1.TAMU.EDU How to subscribe: sub CRSP-L First-name Last-name Post messages to: CRSP-L@TAMVM1.TAMU.EDU Web info and FAQ: http://www-leland.stanford.edu/class/gsb/crsp/CRSP-L/t-of-lists EDSTAT-L/SCI.STAT.EDU -- Statistics training and education issues. Subscriptions to: LISTSERV@JSE.STAT.NCSU.EDU How to subscribe: subscribe EDSTAT-L Firstname Lastname Post messages to: EDSTAT-L@JSE.STAT.NCSU.EDU MULTILEVEL -- This list is for people using multilevel analysis (multilevel modeling; hierarchical data analysis) and any associated software (e.g. MLn, HLM, VARCL, GENMOD). MULTILEVEL is a MAILBASE system so it uses a slightly different syntax than the LISTSERV system. Subscriptions to: MAILBASE@MAILBASE.AC.UK How to subscribe: subscribe MULTILEVEL first-name last-name Post messages to: MAILBASE@MAILBASE.AC.UK SCI.STAT.MATH -- A more mathematical flavor can be found on SCI.STAT.MATH, which sad to say, is not mirrored to any LISTSERVer. SEMNET -- SEMNET is an open forum for ideas and questions about the methodology that includes analysis of covariance structures, path analysis, and confirmatory factor analysis. Subscriptions to: LISTSERV@UA1VM.UA.EDU How to subscribe: sub SEMNET first-name last-name Post messages to: SEMNET@UA1VM.UA.EDU Web info and FAQ: http://www.gsu.edu/~mkteer/semfaq.htmlsb/crsp/CRSP-L/t-of-lists 3 How do I know that my message got posted? First of all, be patient. It takes a while for your message to be posted. Internet is faster than the Post Office, but it isn't always instantaneous. There's nothing more annoying than seeing the same messages posted again and again in a half hour time period by people who are unsure whether their messages got through. Please wait half a day or more before panicking. Second, if you are having trouble posting, it is more likely than not a local problem. Check with your help desk or other local resource. Third, no matter where you post your message from, if the message gets through, it will be added to two very nice USENET archives, AltaVista and DejaNews. Search for your message using the subject line or a reasonably unique phrase in the message itself. This system is not instantaneous. Wait half a day or more before searching for your message. See the section "How do I get the archives of STAT-L/SCI.STAT.CONSULT?" for the web address and other details about AltaVista and DejaNews. Fourth, if you are using SCI.STAT.CONSULT, then you will eventually see a copy of your message, if it got posted. There are specal USENET groups where you can practice sending test messages (MISC.TEST or ALT.TEST). If you are a beginner, don't post to SCI.STAT.CONSULT until after you are comfortable posting to one of these test groups. You will also see your message if you receive the digest from STAT-L. If you receive individual messages rather than the digest from STAT-L, you will not see your own message when it is posted. The presumption is that you read it when you wrote it, so why would you want to see it again? You can change this default in two ways. Send a e-mail to LISTSERV@VM1.MCGILL.CA with a one line message: SET STAT-L REPRO to inform STAT-L that you wish it to send you back a copy of any message you send in. Send a one line message: SET STAT-L ACK to inform STAT-L that you wish it to send a brief acknowledgment that your message has been sent to the list. Finally, send a one line message: SET STAT-L NOREPRO if you want to go back to the default. Please note that all of these commands go to LISTSERV and not to STAT-L. Finally, please note that not every question posted on STAT-L/SCI.STAT.CONSULT gets an answer. No one is getting paid for their time, so you need to appeal to their curiosity or their altruism. If no one answered your question, maybe you need to ask the question differently? 4 How do I use LISTSERV to... A good resource about LISTSERV can be found at http://www.sagrelto.com/sagrelto/tutorial/rsp/CRSP-L/t-of-lists and a general overview of LISTSERV versus other systems (e.g., MAILBASE) can be found at http://www.nekesc.k12.ks.us/cds.html.rial/rsp/CRSP-L/t-of-lists ...subscribe to STAT-L? If you are using SCI.STAT.CONSULT, your USENET reader software should have a menu pick or a command that will allow you to subscribe to SCI.STAT.CONSULT. Every reader is different, so please consult your help file or your local computer guru. To subscribe to STAT-L, send a message to LISTSERV@VM1.MCGILL.CA with a single line: SUB STAT-L First-name Last-name in the body of the text. Please be sure that you send the message to LISTSERV@VM1.MCGILL.CA and not to STAT-L@VM1.MCGILL.CA. If you send your subscription request to STAT-L, hundreds of people will see your message and none of them will be able to subscribe you to the list. Some in fact will flame you for not reading these instructions more carefully. It's sort of like a newspaper which has a circulation desk and a letters-to-the-editor desk. If you want to start delivery of the paper you send it to the circulation desk. If you want to start delivery of STAT-L, you send the request to LISTSERV. Sending a subscription request to STAT-L is like sending a letter to the editor that reads "Please start delivery of the Sunday paper to 1313 Mockingbird Lane". ...get the digest option turned on/off? If you have no strong preference, the digest option (multiple messages compiled into a single mailing, usually daily) is less burdensome on Internet and creates fewer bounced messages for the list administrator to deal with. The default when you sign up is for the digest option. To cancel digest format and to receive the list as separate mailings, send the command SET STAT-L MAIL to LISTSERV@VM1.MCGILL.CA. To receive the list in digest format, send the command SET STAT-L DIGEST in the body of a message to LISTSERV@VM1.MCGILL.CA. Again, please be sure that you send all of these types of messages to LISTSERV@VM1.MCGILL.CA and not to STAT-L@VM1.MCGILL.CA. ...obtain a list of subscribers to STAT-L? Send the command REVIEW STAT-L F=MAIL to LISTSERV@VM1.MCGILL.CA or REVIEW STAT-L BY NAME F=MAIL to sort by name or REVIEW STAT-L BY COUNTRY F=MAIL to sort by country. This does not include subscribers to SCI.STAT.CONSULT, as they do not subscribe to the list the same way. I know of no way to obtain the list of subscribers to SCI.STAT.CONSULT. ...keep my name off of the list of subscribers Send the a message to LISTSERV@VM1.MCGILL.CA with a line in the body of the message reading SET STAT-L CONCEAL YES in the body of the message. To reverse this, send the command SET STAT-L CONCEAL NO in the body of the message. ...stop mail from STAT-L (temporarily or permanently)? Send a message to LISTSERV@VM1.MCGILL.CA (again, please don't send the message to STAT-L@VM1.MCGILL.CA). To signoff permanently, include the line UNSUBSCRIBE STAT-L in the body of the message. To temporarily suspend mail, use the line SET STAT-L NOMAIL and when you are ready to resume reading, use the line SET STAT-L MAIL or SET listname DIGEST depending on your preference for individual messages versus a daily digest. What if my initial signoff command doesn't work? This happens sometimes, particularly if your e-mail address changes, even slightly. The key thing to remember here is that only the list owner can help you with this. Sending a message to STAT-L will not help much unless the list owner happens to be following STAT-L right at that moment.I would recommend that you get a list of subscribers and see how your e-mail address looks to the system (see above for details). Some mail systems (like ELM) allow you to change the FROM field of a message. If your mail system supports this, then try sending a message to LISTSERV and change the FROM field so it looks like it came from the original address. You could also ask your system administrator to create a temporary (or permanent) alias name for you for outbound messages (including the necessary deviant domain part). If none of the above works, or if it seems too complicated, don't panic. Every list has a human owner who can go in and unsubscribe you manually. You can find the e-mail address of the list owner on the same list of subscribers that you just got (again, see above). When I last checked in August 1995, the list owner was * OWNER= MICHAEL@VM1.MCGILL.CA (Michael Walsh, McGill University) * (514-398-3680) Send a message directly to the list owner, explaining your problem. The list owner will manually unsubscribe you from STAT-L. 5 How do I get the archives of STAT-L/SCI.STAT.CONSULT? The are three ways to get archives of STAT-L/SCI.STAT.CONSULT. First, the LISTSERV software for STAT-L maintains monthly archive files back to 1994. Send the command INDEX STAT-L to LISTSERV.VM1.MCGILL.CA to obtain a listing of these file names. Ssend the command GET filename filetype F=MAIL to receive a specific archive file. You can also search the archives for keywords, but the syntax is a throwback to mainframe days. Here's an example of how to find statistics humor in previous postings. Send the following message to LISTSERV@VM1.MCGILL.CA (not to STAT-L!) // JOB Echo=No Database Search DD=Rules //Rules DD * Search jokes in stat-l Index /* This will get you the following output: -- >Database STAT-L, 11 hits. > Index Item # Date Time Recs Subject ------ ---- ---- ---- ------- >002264 94/05/12 20:47 57 Re: anyone know a good stats joke... >002346 94/05/16 12:42 24 Re: heard any good stats jokes? >002352 94/05/12 16:42 29 Re: anyone know a good stats joke... >002374 94/05/17 00:39 34 Re: anyone know a good stats joke... >002387 94/05/17 17:16 30 Re: anyone know a good stats joke... >004886 94/10/11 09:36 49 Re: The charge of epistemological naivete >005643 94/11/07 17:45 59 Re: Political Correctness vs. Offensive topics of + >005664 94/11/08 11:32 36 Re: Political Correctness vs. Offensive topics of + >008101 95/03/02 14:58 116 us government censorship to the internet? >009133 95/04/18 04:56 90 --NEED HELP WITH EVALUATION-- >021605 96/12/23 10:04 48 Re: Farms (STAT-L 21 Dec 1996) Obviously only some of these are successful hits. For example, any message with the word "epistemological" in the title can't be humorous. Send to LISTSERV@VM1.MCGILL.CA the following syntax to get the text of specific messages: // JOB Echo=No Database Search DD=Rules //Rules DD * Search jokes in stat-l Print all of 2264 2346 2352 2374 2387 /* Send the command GET LISTDB MEMO F=MAIL to LISTSERV@UGA.CC.UGA.EDU to get a full description of LISTSERV search functions (note that LISTSERV.VM1.MCGILL.CA does not have this file). gopher://jse.stat.ncsu.edu/11/othergroups/statl/ is a gopher site that contains the archives of STAT-L. If you are still using gopher software, point it to jse.stat.ncsu. This site has archives going back to 1990. In case you were curious, there were 21 messages posted for the whole month of January 1990. Volume has picked up a bit since then. http://www.reference.com.us/cds.html.rial/rsp/CRSP-L/t-of-lists also maintains an archive of STAT-L, other lists, USENET groups, and web discussion groups. I'm not sure how far back this archive goes. Finally, archives of USENET messages, including messages for SCI.STAT.CONSULT are maintained at two sites, http://altavista.digital.comcds.html.rial/rsp/CRSP-L/t-of-lists which apparently only goes back a month or so, and http://www.dejanews.coml.comcds.html.rial/rsp/CRSP-L/t-of-lists going back to March 19, 1995. Follow the instructions at either site for restricting your search to just one newsgroup. Some people may wish to prevent their postings from being added to these databases. If your posting contains an X-Header looking like x-no-archive: yes or if you place x-no-archive: yes as the first line of the body text of your message, then your message not be archived. 6 Why have I stopped seeing messages? Nine times out of ten, the problem is at your site. If you aren't already good friends with the people who administer your Internet connection, now is a good time to start. These people will know when the connection is running smoothly and when it is erratic. Posting a test message to STAT-L/SCI.STAT.CONSULT is not likely to help. If you aren't seeing normal traffic, what makes you think that you will see your test message? Also, the people who read your test message are not in a position to diagnose your problem. Only your new found friends who run your local Internet connection are in a position to diagnose your problem. Your first step is to check one of the USENET archives described above (Altavista or Dejanews). If you see messages in either archive that are more than 48 hours old and which you have not received at your local site (via either SCI.STAT.CONSULT or STAT-L), then you have a real problem. There are some obvious self-diagnostic questions you should ask yourself. For STAT-L readers, ask yourself if you have received mail from other Internet sources. If not, then perhaps the problem is bigger than STAT-L. Also for STAT-L readers, find out if your site has been bouncing back e-mail recently. The number one cause for not getting STAT-L mail is that the list administrator noticed a bunch of bounced e-mail error messages and has de-activated your subscription. To find out if you've been deactivated, send a message to LISTSERV@VM1.MCGILL.CA with QUERY STAT-L in the body of the message. Please make sure you send this to the LISTSERV address and not the STAT-L address. Within a few hours, you should get a reply showing your status. If you don't get a response, that's a good sign that the listserver is down, which would mean that nobody is getting messages from STAT-L. If you do get a response, here's what it might look like. Distribution options for Steve Simon , list STAT-L: Ack= No, Mail= Digests, Files= Yes, Repro= No, Header= Short(BSMTP), Renewal= Yes, Conceal= No If your account was de-activated, the response will be You are not subscribed to the STAT-L list. or your distribution option will be set to NOMAIL. In either case, work with your local Internet experts to fix the problem and then either re-subscribe or set the distribution option back to MAIL. By the way, don't complain to the list owner for de-activating your account. The typical listowner has to sort through hundreds or thousands of bounced message reports weekly, and the only way to stop these bounced message reports is to de-activate accounts. The people who you need to talk to are your new found friends who maintain your Internet access. Failure to receive messages is less common for SCI.STAT.CONSULT readers. If you are experiencing problems, the obvious thing to look for is whether any of the newsgroups are getting through. If nothing is getting through, then you have a local problem. If you get postings from other newsgroups, then perhaps your server has decided not to carry SCI.STAT.CONSULT anymore. Either way, you have to talk to your local Internet experts. 7 How can I contact the ASA, Biometric Society, or IMS? American Statistical Association 1429 Duke St. Alexandria, VA 22314-3402 Tel: 703-684-1221 FAX: 703-684-2036 E-M: asasinfo@amstat.org Web: http://www.amstat.orgoml.comcds.html.rial/rsp/CRSP-L/t-of-lists The International Biometric Society 808 17th Street, NW, Suite 200 Washington, DC 20006-3910 Tel: 202-223-9669 FAX: 202-223-9569 E-M: 75703.1407@compuserve.com Web: http://www.stat.uga.edu/~lynne/symposium/biometric.htmlof-lists Institute of Mathematical Statistics 3401 Investment Boulevard, Suite 7 Hayward, CA 94545 Tel: 510-783-8141 (Hazel Lowery) FAX: 510-783-4131 E-M: HLLIMS@stat.berkeley.edu Web: http://www.imstat.orgdu/~lynne/symposium/biometric.htmlof-lists 8 How can I contact the major statistics software vendors? The web site http://www.statistics.com/vendors.htmlum/biometric.htmlof-lists maintained by Resampling Stats, Inc. has a very nice list of statistics software vendor information. Many of these companies have numerous locations and international distributors. I have only listed corporate headquarters to save space. If you can, check out the web site to get more detailed information. Also please bear in mind that mergers and other business activity may quickly make parts of this list obsolete. Finally, I need to repeat my earlier plea about listservers. Please, please, please note that subscription requests go to the LISTSERV or MAILBASE or MAJORDOMO address. APTECH SYSTEMS INC. (GAUSS) Aptech Systems, Inc. 23804 SE Kent-Kangley Road Maple Valley, WA 98038 USA Tel: 206-432-7855 FAX: 206-432-7832 Web: http://www.aptech.com/com/vendors.htmlum/biometric.htmlof-lists E-M: support@aptech.com (support) info@aptech.com (sales information) GAUSS mailing list -- Subscriptions to: MAJORDOMO@ECO.UTEXAS.EDU How to subscribe: subscribe GAUSSIANS Post messages to: GAUSSIANS@ECO.UTEXAS.EDU CIVILIZED SOFTWARE (MLAB) Civilized Software, Inc. 8120 Woodmont Ave. #250 Bethesda, MD 20815 USA Tel: 1-301-652-4714 Fax: 1-301-656-1069 Web: http://www.civilized.comm/vendors.htmlum/biometric.htmlof-lists E-M: csi@civilized.com CONCEPTUAL SOFTWARE INC. (DBMS/COPY) Conceptual Software Inc. 9660 Hillcroft # 510 Houston, TX 77096. Tel: 713-721-4200 Fax: 713-721-4298 Web: http://www.conceptual.com/vendors.htmlum/biometric.htmlof-lists E-M: eroberts@conceptual.com (General Information) eroberts@conceptual.com (Sales) hfeldman@conceptual.com (Customer Support) CYTEL SOFTWARE CORPORATION (StatXact, LogXact, EaSt) Cytel Software Corporation 675 Massachusettes Ave. Cambridge, MA 02139 USA Tel: (617) 661-2011 Fax: (617) 661-4405 Web: http://www.cytel.coml.com/vendors.htmlum/biometric.htmlof-lists E-M: sales@cytel.com DATA DESCRIPTION, INC. (DATADESK) Data Description, Inc. Box 4555 Ithaca, NY 14853 USA Tel: (607) 257-1000 FAX: (607) 257-4146 Web: http://www.datadesk.com/datadesk/.htmlum/biometric.htmlof-lists E-M: datadesk@datadesk.com DataMost Corp. (STATMOST) DataMost Corporation 520 West 9460 South Sandy, UT 84070 USA Tel: (801) 255-5008 Fax: (801) 255-5009 Web: http://www.datamost.com/datadesk/.htmlum/biometric.htmlof-lists E-M: techsupp@datamost.com MATHSOFT (MATHCAD) MathSoft, Inc. 101 Main Street Cambridge, MA 02142 USA Tel: 617 577-1017 Fax: 617 577-8829 Web: http://www.mathsoft.com/datadesk/.htmlum/biometric.htmlof-lists E-M: ideas@mathsoft.com (comments and suggestions) support@mathsoft.com (Support, US or Canada) help@mathsoft.com (Support outside US/Canada) sales-info@mathsoft.com (Sales, US or Canada) int-info@mathsoft.com (Sales outside US/Canada) MATHWORKS (MATLAB) The MathWorks, Inc. 24 Prime Park Way Natick, MA 01760-1500 USA Tel: (508) 653-1415 Fax: (508) 653-2997 Web: http://www.mathworks.com/home.htmlhtmlum/biometric.htmlof-lists E-M: info@mathworks.com (Sales, pricing, information) support@mathworks.com (Technical support) bugs@mathworks.com (Bug reports) suggest@mathworks.com (Product suggestions) service@mathworks.com (Service) MINITAB INC. Minitab Inc. 3081 Enterprise Drive State College, PA 16801 USA Tel: 814 238-3280 Fax: 814 238-4383 Web: http://www.minitab.comom/home.htmlhtmlum/biometric.htmlof-lists E-M: sales@minitab.com NCSS Statistical Software (NCSS, PASS) NCSS Statistical Software 329 North 1000 East Kaysville, Utah 84037 USA Tel: (800) 898-6109 (801) 546-0445 Fax: (801) 546-3907 Web: http://www.ncss.comcomom/home.htmlhtmlum/biometric.htmlof-lists E-M: ncss@ix.netcom.com PALISADE CORPORATION (@RISK) Palisade Corporation 31 Decker Road Newfield, NY 14867 USA Tel: 607-277-8000 800-432-7475 Fax: 607-277-8001 Web: http://www.palisade.comm/home.htmlhtmlum/biometric.htmlof-lists RESAMPLINGS STATS Resampling Stats 612 N. Jackson St. Arlington, VA 22201 USA Tel: 703-522-2713 Fax: 703-522-5846 Web: http://www.statistics.comhome.htmlhtmlum/biometric.htmlof-lists E-M: stats@resample.com learning@statistics.com SAS INSTITUTE (JMP, SAS) SAS Institute Inc. SAS Campus Drive Cary, NC 27513 USA Tel: 919 677-8000 919 677-8008 (JMP technical support) 919 677-8000, ext 5071 (JMP sales) Fax: 919 677-8123 Web: http://www.sas.comics.comhome.htmlhtmlum/biometric.htmlof-lists ftp: ftp://ftp.sas.com E-M: corpcom@unx.sas.com (Corporate Communications) sasedu@vm.sas.com (Education) eurwww@mvs.sas.com (European Offices) pubs@unx.sas.com (Publications) software@sas.sas.com (Sales and Marketing) bussol@unx.sas.com (Business Solutions Division) sasblb2@vm.sas.com (jmp-sales) JMP mailing list -- Subscriptions to: MAJORDOMO@WUBIO.WUSTL.EDU How to subscribe: subscribe JMP-L Post messages to: JMP-L@WUBIOS.WUSTL.EDU SAS mailing list -- Subscriptions to: LISTSERV@UGA.CC.UGA.EDU How to subscribe: subscribe SAS-L First-name Last-name Post messages to: SAS-L@UGA.CC.UGA.EDU SAS Technical Support News -- Subscriptions to: LISTSERV@VM.SAS.COM How to subscribe: subscribe TSNEWS-L First-name Last-name Post messages to: Messages posted by SAS Institute only SCIENTIFIC CONSULTING INC (PCNONLIN) E-M: 75450.3171@compuserve.com SPSS Inc. (BMDP, SPSS, Systat) SPSS, Inc. 444 North Michigan Avenue Chicago IL 60611 USA Tel: 312 329-3410 800 543-2185 312-494-3283 (SYSTAT Technical Support) Fax: 312/329-3668 BBS: 312/836-1900 (8/N/1) ftp: ftp.spss.com E-M: support@spss.com Web: http://www.spss.comcs.comhome.htmlhtmlum/biometric.htmlof-lists BMDP mailing list -- Subscriptions to: LISTSERV@VM1.MCGILL.CA How to subscribe: sub BMDP-L Firstname Lastname Post messages to: BMDP-L@VM1.MCGILL.CA SPSS mailing list -- Subscriptions to: LISTSERV@UGA.CC.UGA.EDU How to subscribe: sub SPSSX-L Firstname Lastname Post messages to: SPSSX-L@UGA.CC.UGA.EDU SYSTAT mailing list -- Subscriptions to: LISTSERV@SPSS.COM How to subscribe: sub SYSTAT-L Firstname Lastname Post messages to: SYSTAT-L@SPSS.COM STATA CORPORATION Stata Corporation 702 University Drive East College Station, Texas 77840 USA Tel: 409-696-4600 800-STATA-PC Fax: 409-696-4601 Web: http://www.stata.com/.comhome.htmlhtmlum/biometric.htmlof-lists E-M: stata@stata.com STATA mailing list -- Subscriptions to: majordomo@hsphsun2.harvard.edu How to subscribe: subscribe STATALIST Post messages to: STATALIST@hsphsun2.HARVARD.EDU STATISTICAL SCIENCES (S-PLUS) Statistical Sciences, Suite 500 1700 Westlake Avenue N. Seattle WA 98109-9891 USA Tel: (206) 283-8802 (business) (800) 569-0123 (sales) Fax: (206) 283-6310 Web: http://www.statsci.com/omhome.htmlhtmlum/biometric.htmlof-lists E-M: sales@statsci.com (Sales) support@statsci.com (Support) mktg@statsci.com (Marketing) S-plus mailing list -- Subscriptions to: S-NEWS-REQUEST@UTSTAT.TORONTO.EDU How to subscribe: subscribe Post messages to: S-NEWS@UTSTAT.TORONTO.EDU Also check out the parent company, Mathsoft. STATISTICS AND EPIDEMIOLOGY RESEARCH CORPORATION (EGRET) Tel: 206-632-3014 FAX: 206-547-4140 E-M: rhm@ms.washington.edu Apparently, EGRET has been purchased by Cytel Corporation. STATSOFT (STATISTICA) StatSoft, Inc. 2300 East 14th Street Tulsa, OK, USA 74104-4442 USA Tel: (918) 749-1119 Fax: (918) 749-2217 Web: http://www.statsoftinc.comome.htmlhtmlum/biometric.htmlof-lists E-M: info@statsoftinc.com SUDAAN SUDAAN Product Coordinator Statistical Software Center Research Triangle Institute 3040 Cornwallis Road Research Triangle Park NC 27709-2194 USA Tel: (919) 541-6602 Fax: (919) 541-7431 Web: http://www.rti.org/patents/sudaan/sudaan.htmletric.htmlof-lists E-M: sudaan@rti.org UNISTAT Web: http://www.unistat.coments/sudaan/sudaan.htmletric.htmlof-lists Here is a list of software for experimental design, collated by Bob Wheeler. RS/1 software - including RS/Discover (A general purpose statistics package with extensive experimental design and analysis capability.) BBN Domain Corp. 150 Cambridge Park Dr. Cambridge, MA 02140 Tel: 617-873-5000 Fax: 617-873-6153 E-M: jtsullivan@bbn.com Web: http://www.bbndomain.com/s/sudaan/sudaan.htmletric.htmlof-lists Design Ease & Design Expert software (Experimental design, analysis, and training.) Stat-Ease, Inc. 2021 E. Hennepin Ave., Ste. 191 Minneapolis, MN 55413 Tel: 612-378-9449 Fax: 612-378-2152 E-M: 72103,1436@compuserve.com ECHIP software (Experimental design, analysis and training for scientists and engineers.) ECHIP, Incorporated 724 Yorklyn Road Hockessin, DE 19707-8733 Tel: 302-239-5429 Fax: 302-239-6227 E-M: support@echip.com 9 Where can I find free/shareware statistical software? Any search for free/shareware statistical software should start with Statlib. Other software is arranged alphabetically after the description of Statlib. http://lib.stat.cmu.edu//s/sudaan/sudaan.htmletric.htmlof-lists is the site for Statlib, a system for distributing statistical software by the web and by electronic mail. and ftp. If you do not have web access, send an e-mail to statlib@lib.stat.cmu.edu with a single line in the body of the message send index. This will give you an index of the general material available on the statlib server. http://www.mrc-bsu.cam.ac.uk/bugs/Welcome.htmltric.htmlof-lists is the home page for BUGS/CODA. BUGS stands for Bayesian analysis Using Gibbs Sampling. CODA is a set of S-plus programs to analyze convergence diagnostics of BUGS output. This software is described in Carlin BP and Louis TA (1996) "Bayes and Empirical Bayes Methods for Data Analysis" Chapman and Hall, London. ftp://plato.la.asu.edu/pub/donlp2 is the ftp site for DONLP2. There have been recent updates to DONLP2, one of the few high-quality programs for general nonlinear programming problems available completely free over the net. There are four different versions (in f77 resp f2c/cc and with exact or numerical differentiation), there is a separate file with three papers as postscript files and the user's guide (README's and donlp2doc.txt file) have been updated last on 6-24-96. ftp://ftp.cdc.gov/pub/epi/epiinfo is the ftp site for Epi-Info/Epi-Map. Epi-info is a series of computer programs produced by the Centers for Disease Control and Prevention and the World Health Organization which provides public-domain software for word processing, database and statistics work in public health. There is a companion product, Epi-map, for geographic mapping. Support is available through telephone (404) 728-0545, fax (404) 315-6440 or E-M: EpiInfo@CDC1.CDC.GOV. http://GKing.Harvard.Educ.uk/bugs/Welcome.htmltric.htmlof-lists is the web site for EI/EzI. EI and EzI implement the statistical methods, graphics, and diagnostics in Gary King's forthcoming book _Reconstructing Individual Behavior from Aggregate Data: A Solution to the Ecological Inference Problem_ (Princeton: Princeton University Press, April 1997). EI requires Gauss (from Aptech Systems) and is platform- independent. EzI does not require Gauss, but runs only under MS-DOS (or Windows 95 or OS/2), requires at least 8 MB of memory, and about 2MB of hard disk space. http://www.psychologie.uni-trier.de:8000/projects/gpower.htmlts is the web site for GPOWER. GPower is a routine for study size and power, is made available by a bunch of German cognitive scientists. It does t-tests, F-tests and Chi-squared. It has a handy routine for effect size calculation. It exists in Mac, Mac+FPU, Powermac and BC-compatible versions, as well as DOS. http://www.medent.umontreal.ca/multilevelprojects/gpower.htmlts has information about MLn and other shareware/freeware software for multilevel analyses. http://www.compulink.co.uk/~kovcomp/levelprojects/gpower.htmlts is the web site for MVSP. MVSP is a MultiVariate Statistical Package which provides an inexpensive yet easy means of analysing your data. It calculates principal components, principal coordinates and correspondence analyses (including detrended CA), as well as hierarchical cluster analysis using nineteen distance or similarity measures and seven clustering strategies, and diversity indices. The program is DOS based and menu-driven; a Windows version is nearing completion. http://www.compulink.co.uk/~kovcomp/levelprojects/gpower.htmlts is the web site for Oriana. Oriana for Windows ver. 1.0 calculates the special forms of sample and inter-sample statistics required for circular data (e.g. directional data or time of day). Oriana calculates the circular mean, length of the mean vector, circular standard deviation and standard error, 95% and 99% confidence limits, and Rayleigh's test of uniformity for each sample in your data file. Pairs of samples can be compared with Watson's F-test for two circular means. The overall distributions of two samples can be compared with Chi-squared tests. The data for each sample can be summarised with rose diagrams or circular histograms as well as linear histograms. The individual observations can be shown in raw data plots. Uniformity plots allow you to assess whether the data depart from a uniform distribution. http://www-prophet.bbn.com/~kovcomp/levelprojects/gpower.htmlts is the web site for Prophet Software. PROPHET is a UNIX-based workstation software package that gives researchers a wide range of computing capabilities. One of PROPHET's greatest assets is its new graphical user interface. Employing the latest advances in software technology, PROPHET lets you store, analyze and present Data Tables, Graphs, Statistical Analyses and Mathematical Modeling, and Sequence Analyses with high-resolution graphics and multiple windows. Anyone, from the computer-naive to the computer-sophisticate, can learn to use it quickly and effectively. http://odin.mdacc.tmc.edu/anonftpmp/levelprojects/gpower.htmlts is the web site for STPLAN, RANLIST, WINDOWS, STATTAB, and SURVAN. The MD Anderson Center at the University of Texas makes available a series of packages for both Mac and DOS which are basic in terms of interface but well documented. These include STPLAN: Sample size and power RANLIST: Randomization plans for clinical trials WINDOWS (!) Kernel smoothing of dose-response curves (smoothing of the relationship between a continuous variable and a binary outcome) STATTAB : Statistical tables SURVAN: Survival analysis, including Cox regression. http://forrest.psych.unc.edu/research/ViSta.htmls/gpower.htmlts is the web site for VISTA (Visual Statistics System). ViSta is a Visual Statistics system designed for a wide ranges of users. It is particularly useful for those needing to learn statistics, and to their teachers. ViSta is also designed to be used for research and development in computational and graphical statistics. http://www.westat.comunc.edu/research/ViSta.htmls/gpower.htmlts is the web site for Westat, developers of WesVarPC. WesVarPC is a software package developed at Westat, Inc., that computes estimates and replicate variance estimates from survey data collected using complex sampling and estimation procedures. This flexible software supports a wide range of complex sample designs, including multistage, stratified, and unequal probability samples. The replicate variance estimates can also reflect a number of estimation schemes, such as poststratification or ratio estimation. There is a mailing list, WESVAR-L. Subscriptions to: listserv@listserv.westat.com How to subscribe: subscribe WESVAR-L Post messages to: WESVAR-L@listserv.westat.com ftp://ftp.stat.umn.edu/pub/xlispstat is the ftp site for Xlisp-Stat. Xlisp-Stat is a comprehensive statistical environment based on the XLISP dialect of LISP. It runs on Amiga, Macintosh, MS-DOS, MS-Windows, and X11. XLISP-STAT is highly extensible, and many interesting extensions can be found at Statlib (see above for details about Statlib). There is a mailing list, stat-lisp-news. At the moment, the list is maintained by hand. Subscriptions to: LISTSERV@JULIA.MATH.UCLA.EDU How to subscribe: Ask to join and include your e-mail address Post messages to: stat-lisp-news@stat.umn.edu 10 What statistics resources can be found on the web? This section does not include web sites described in the "How can I contact the major statistics software vendors?" section or in other parts of the FAQ. The web is growing and changing rapidly, so it is impossible for me to compile a comprehensive list. Here are some interesting sites which have been mentioned on STAT-L/SCI.STAT.CONSULT. You are welcome to send me other interesting web sites. http://www.nottingham.ac.uk/~mhzmd/bonf.htmlhtmls/gpower.htmlts A biography of Carlo Emilio Bonferroni (Michael Dewey). http://www-leland.stanford.edu/class/gsb/excel2sas.htmlr.htmlts Excel to SAS and other data translations. http://www.rt66.com/~llubetedu/class/gsb/excel2sas.htmlr.htmlts Lloyd's Warehouse of Economic Indicators. ftp://ftp.sas.com/pub/neural/measurement.html Measurement theory FAQ. ftp://ftp.sas.com/pub/neural/FAQ.html Neural networks FAQ. http://www.stat.wisc.edu/statistics/consult/el2sas.htmlr.htmlts the ASA Section on Statistical Consulting. http://www.interchg.ubc.ca/cacb/poweronsult/el2sas.htmlr.htmlts Statistical power analysis software (Len Thomas). http://www.execpc.com/~helberg/statistics.html2sas.htmlr.htmlts Statistics on the Web (Clay Helberg). http://www.isds.duke.edu/stats-sites.html.html2sas.htmlr.htmlts Statistics servers and other links (The Institute of Statistics and Decision Sciences). http://www.stat.ucla.edu/textbook/es.html.html2sas.htmlr.htmlts UCLA Statistics Textbook (interactive pages using JavaScript, Perl, xlisp-stat, etc.) http://www.stat.ufl.edu/vlib/statistics.html/l2sas.htmlr.htmlts Virtual Library of Statistics http://www.utexas.edu/world/lecture/ics.html/l2sas.htmlr.htmlts World Lecture Hall (Web-based lectures on many academic topics including Statistics). Web sites for statistics journals (compiled by Tony Corso) http://www.ams.org/journals/lecture/ics.html/l2sas.htmlr.htmlts American Mathematical Society Journals http://www.amstat.org/publications/index.htmll2sas.htmlr.htmlts American Statistical Association Publications http://www.stat.colostate.edu/annapprdex.htmll2sas.htmlr.htmlts The Annals of Applied Probability http://www.stat.berkeley.edu/users/annstattmll2sas.htmlr.htmlts The Annals of Statistics http://www.nuff.ox.ac.uk/biometrikaannstattmll2sas.htmlr.htmlts Biometrika http://www.wiwi.hu-berlin.de/~sigbert/cs.htmll2sas.htmlr.htmlts Computational Statistics http://www.shef.ac.uk/uni/companies/apt/apt2.htmls.htmlr.htmlts Journal of Applied Probability http://www.o2.net/~jasr/jasr.htmles/apt/apt2.htmls.htmlr.htmlts Journal of Applied Statistical Reasoning http://www.carfax.co.uk/jas-ad.htms/apt/apt2.htmls.htmlr.htmlts Journal of Applied Statistics http://www.pitt.edu/~csna/joc.htmls/apt/apt2.htmls.htmlr.htmlts Journal of Classification http://fisher.stat.unipg.it/iasc/Misc-stat-journ-JCGS.htmltmlts Journal of Computational and Graphical Statistics http://www.stat.ucla.edu/journals/jebsstat-journ-JCGS.htmltmlts Journal of Educational and Behavioral Statistics http://www.apnet.com/www/journal/mv.htmtat-journ-JCGS.htmltmlts Journal of Multivariate Analysis http://www.gbhap.com/journals/718/718-top.htmurn-JCGS.htmltmlts Journal of Nonparametric Statistics http://jscs.stat.vt.edu/JSCSs/718/718-top.htmurn-JCGS.htmltmlts Journal of Statistical Computation and Simulation http://www.elsevier.nl/locate/inca/505561.htmurn-JCGS.htmltmlts Journal of Statistical Planning and Inference http://www.stat.ucla.edu/journals/jss5561.htmurn-JCGS.htmltmlts Journal of Statistical Software http://www2.ncsu.edu/ncsu/pams/stat/info/jse/homepage.htmltmlts Journal of Statistics Education http://interstat.stat.vt.edu/InterStatfo/jse/homepage.htmltmlts Interstat - Statistics on the Internet http://vision.arc.nasa.gov/publications/Psychometrika.htmltmlts Psychometrika http://www.gbhap.com/journals/604/604-top.htmometrika.htmltmlts Statistics - Theoretical and Applied Statistics http://www.elsevier.nl/inca/publications/store/5/0/5/5/7/3tmlts Statistics & Probability Letters http://www.stat.ucla.edu/ims/publications/journals/statscitmlts Statistical Science Journal http://www.maths.uq.oz.au/~gks/webguide/journals.htmlatscitmlts Guide to the Web for Statisticians: Journals 11 What should I do about these "Spams"? http://www.cauce.orgoz.au/~gks/webguide/journals.htmlatscitmlts is a web site for the Coalition Against Unsolicited Commercial E-mails (CAUCE). Visit this site if you want to do something constructive to stop spam. This site is lobbying for legislation that would make junk e-mail illegal, just like junk FAXes were outlawed recently. In my humble opinion, this seems like the best solution to a problem that is getting worse and worse over time. A message distributed across multiple newsgroups or list servers, usually for commercial purposes, is known as a Spam. Some examples of Spams that have hit STAT-L/SCI.STAT.CONSULT are the green card lawyers, information about lonely women in Russia, and blueprints of the original atom bomb. First, keep in mind that often it is not the original spam messages that are so conspicuous and potentially intrusive, but rather the inevitable threads of discussion which seem to result from them. Please do not complain to STAT-L about a spam. The person who sent the spam is almost certainly not a subscriber to STAT-L and will not see your complaint. Other victims of the spam will see your complaint though, which multiplies the annoying effect of the spam. There are constructive steps that you can take to discourage a spam but be assured that hundreds if not thousands of people have probably already done this on your behalf. You can do nothing and still be assured that others are looking out for everyone's interests. So the best course of action is to shrug off the message. You might want to get in the practice of recognizing a spam by its subject line and deleting it unread. If you don't want to ignore the spam, try following the advice given recently by Michael Palij: >In a situation such as this I suggest that you send E-mail >to the postmaster of the machine from which the offending >E-mail was sent, alerting the postmaster of the E-mail >message and including a copy of the E-mail message. If >for some reason postmaster@machine does not work send >E-mail to root@machine. Don't respond to the person of >the account that sent the E-mail nor mailbomb. The >reasons for this are: >1. The E-mail may have a forged name/account. That is, >the return address may be bogus or belong to someone who >has a legitimate account on the specific machine but who >did not send the E-mail. >2. Some people, if they want to punish a particular >person/account or machine, may send out a spam message >such as the one above, with the expectation that the >person's account or machine/site will be overwhelmed by >the reaction (yes, some people will send a copy of a >coredump or Moby Dick to the offending E-mail address in >the hope that it will crash the mail program). In this >way, an innocent person gets hurt because of a set-up. >3. Notifying the person who has responsibility for the >machine (i.e., the postmaster or root) will allow that >person to determine whether one of their real users >posted the message (and give that person a good talking >to) or whether their system was hacked and someone posted >the offending message as a prank/whatever. >In general, try to stay cool about such occurrences, E-mail >the postmaster to investigate the situation, and appreciate >that much more may be going on than you realize. 12 What are some of the problems with stepwise regression? All of this material is quoted from various e-mails that appeared on STAT-L/SCI.STAT.CONSULT in 1996. Thanks go to Ira Bernstein, Ronan Conroy, Frank Harrell for their detailed explanations and to Richard Ulrich who originally compiled these comments. I have done some very minor editing, (mostly adding and changing line breaks) but have tried to avoid any substantive changes to these well written explanations. Frank Harrell's comments: >Here are SOME of the problems with stepwise variable selection. > > 1. It yields R-squared values that are badly biased high > 2. The F and chi-squared tests quoted next to each variable on the > printout do not have the claimed distribution > 3. The method yields confidence intervals for effects and predicted > values that are falsely narrow (See Altman and Anderson Stat in Med) > 4. It yields P-values that do not have the proper meaning and the > proper correction for them is a very difficult problem > 5. It gives biased regression coefficients that need shrinkage > (the coefficients for remaining variables are too large; > see Tibshirani, 1996). > 6. It has severe problems in the presence of collinearity > 7. It is based on methods (e.g. F tests for nested models) that were > intended to be used to test pre-specified hypotheses. > 8. Increasing the sample size doesn't help very much (see > Derksen and Keselman) > 9. It allows us to not think about the problem > 10. It uses a lot of paper > >Note that 'all possible subsets' regression does not solve any of these >problems. > > >References >---------- >@article{alt89, >author = "Altman, D. G. and Andersen, P. K.", >journal = "Statistics in Medicine", >pages = "771-783", >title = "Bootstrap investigation of the stability of a {C}ox > regression model", >volume = "8", >year = "1989" >Shows that stepwise methods yields confidence limits that are far >too narrow. >} > >@article{der92bac, >author = {Derksen, S. and Keselman, H. J.}, >journal = {British Journal of Mathematical and Statistical Psychology}, >pages = {265-282}, >title = {Backward, forward and stepwise automated subset selection >algorithms: {F}requency of obtaining authentic and noise variables}, >volume = {45}, >year = {1992}, >annote = {variable selection} >Conclusions: > >"The degree of correlation between the predictor variables affected >the frequency with which authentic predictor variables found their way >into the final model. > >The number of candidate predictor variables affected the number of >noise variables that gained entry to the model. > >The size of the sample was of little practical importance in >determining the number of authentic variables contained in the final >model. > >The population multiple coefficient of determination could be >faithfully estimated by adopting a statistic that is adjusted by >the total number of candidate predictor variables rather than the >number of variables in the final model." > >} > >@article{roe91pre, >author = {Roecker, Ellen B.}, >journal = {Technometrics}, >pages = {459-468}, >title = {Prediction error and its estimation for subset--selected models}, >volume = {33}, >year = {1991} >Shows that all-possible regression can yield models that are "too small". >} > >@article{man70why, >author = {Mantel, Nathan}, >journal = {Technometrics}, >pages = {621-625}, >title = {Why stepdown procedures in variable selection}, >volume = {12}, >year = {1970}, >annote = {variable selection; collinearity} >} > >@article{hur90, >author = "Hurvich, C. M. and Tsai, C. L.", >journal = American Statistician, >pages = "214-217", >title = "The impact of model selection on inference in linear regression", >volume = "44", >year = "1990" >} >@article{cop83reg, >author = {Copas, J. B.}, >journal = "Journal of the Royal Statistical Society B", >pages = {311-354}, >title = {Regression, prediction and shrinkage (with discussion)}, >volume = {45}, >year = {1983}, >annote = {shrinkage; validation; logistic model} >Shows why the number of CANDIDATE variables and not the number in the >final model is the number of d.f. to consider. >} > >@article{tib96reg, >author = {Tibshirani, Robert}, >journal = "Journal of the Royal Statistical Society B", >pages = {267-288}, >title = {Regression shrinkage and selection via the lasso}, >volume = {58}, >year = {1996}, >annote = {shrinkage; variable selection; penalized MLE; ridge regression} >} Ira Bernstein's comments: >I think that there are two distinct questions here: (a) _when_ is >stepwise selection appropriate and (b) _why_ is it so popular. > >Since I have seen some variation in usage of the term "stepwise", I >define it as any of a number of _data_ driven variable selection >schemes used in regression and discriminant analysis, among other >applications. Some, inappropriately IMHO (since there is no official >body to define "appropriate"), use it to describe what I would call >hierarchical (_hypothesis_ driven) selection. Like I would assume >many, I would discourage stepwise selection and encourage >hierarchical selection. I, of course, assume the researcher does >not "cheat" by defining his/her "hierarchy" given the data but does >so by considering alternatives in advance of analysis and, >preferably, replicates the study (dream on). > >I would probably only argue slightly with "never" as an answer to the >use of stepwise selection since I don't know what knowledge we would >lose if all papers using stepwise regression were to vanish from >journals at the same time programs providing their use were to become >terminally virus-laden. However, I have been in situations that >looked like "I have good reason to look at variables A, B, and C; >then look at D, and E, but I have no basis to favor F over G or vice >versa past that point." Older versions of SPSS (I haven't used newer >versions since switching to SAS a decade ago) allowed this mixture, >and I would personally not object to it as long as the strategy were >defined in advance and made clear to readers. > >As to part (b), I think that there are two groups that are inclined >to favor its usage. One consists of individuals with little formal >training in data analysis who confuse knowledge of data analysis >with knowledge of the syntax of SAS, SPSS, etc. They seem to figure >that "if its there in a program, its gotta be good and better than >actually thinking about what my data might look like". They are >fairly easy to spot and to condemn in a right-thinking group of >well-trained data analysts (like ourselves). However, there is also >a second group who are often well trained (and may be here in this >group ready to flame me). They believe in statistics uber >alles--given any properly obtained data base, a suitable computer >program can objectively make substantive inferences without active >consideration of the underlying hypotheses. If stepwise selection >is the parent of this line blind data analysis, then automatic >variable respecification in confirmatory factor analysis is the >child. Ronan Conroy's comments: >I am struck by the fact that Judd and McClelland in their excellent >book "Data Analysis: A Model Comparison Approach" (Harcourt Brace >Jovanovich, ISBN 0-15-516765-0) devote less than 2 pages to stepwise >methods. What they do say, however, is worth repeating: > >1. Stepwise methods will not necessarily produce the best model if there >are redundant predictors (common problem). > >2. All-possible-subset methods produce the best model for each possible >number of terms, but larger models need not necessarily be subsets of >smaller ones, causing serious conceptual problems about the underlying >logic of the investigation. > >3. Models identified by stepwise methods have an inflated risk of >capitalising on chance features of the data. They frequently fail >when applied to new datasets. They are rarely tested in this way. > >4. Since the interpretation of coefficients in a model depends on the >other terms included, "it seems unwise," to quote J and McC, "to let >an automatic algorithm determine the questions we do and do not ask >about our data". RC adds that stepwise methods abusers frequently >would rather not think about their data, for reasons that are funny >to describe over a second Guinness. > >5. I quote this last point directly, as it is sane and succinct: > >"It is our experience and strong belief that better models and a >better understanding of one's data result from focussed data >analysis, guided by substantive theory." (p 204) > >They end with a quote from Henderson and Velleman's paper "Building >multiple regression models interactively". Biometrics 1981;37:391-411 > >"The data analyst knows more than the computer" > >and add > >"failure to use that knowledge produces inadequate data analysis." > >Personally, I would no more let an automatic routine select my model >than I would let some best-fit procedure pack my suitcase. 13 What is the answer to the Monty Hall, Envelope, or Birthday problem? There is a classic probability puzzle, which is called the Monty Hall problem. Here's a nice description from the rec.puzzles FAQ. "The Monty Hall problem can be stated as follows: A gameshow host displays three closed doors. Behind one of the doors is a car. The other two doors have goats behind them. You are then asked to choose a door. After you have made your choice, one of the remaining two doors is then opened by the host (who knows what's behind the doors), revealing a goat. Will switching your initial guess to the remaining door increase your chances of guessing the door with the car?" The general consensus is that the probability of winning the car is 1/3 if you don't switch and 2/3 if you do switch. But there are some implicit assumptions in this problem that cause a raging debate every time it appears on STAT-L. For example, the host may be perversely trying to goad you into a bad switch and reveals a door only when your current door has a car behind it. There are at least thirty web sites that discuss this problem. Here are three good sites: http://www.smartpages.com/faqs/sci-math-faq/montyhall/faq.htmls SCI.MATH FAQ http://www.cs.ruu.nl/wais/html/na-dir/puzzles/archive/decision.html REC.PUZZLES FAQ http://www.ram.org/computing/monty_hall.htmls/archive/decision.html has a simulation model based on this problem. You can also read about this problem in Engel, E. and Venetoulias, A. (1991). Monty Hall's probability puzzle. Chance, Vol 4, # 2, 6-9. and Selvin, S. (1975). A problem in probability, in "Letters to the Editor," The American Statistician, 29, 67 and 134. The envelope exchange problem goes something like this (again from the rec.puzzles FAQ). "Someone has prepared two envelopes containing money. One contains twice as much money as the other. You have decided to pick one envelope, but then the following argument occurs to you: Suppose my chosen envelope contains $X, then the other envelope either contains $X/2 or $2X. Both cases are equally likely, so my expectation if I take the other envelope is .5 * $X/2 + .5 * $2X = $1.25X, which is higher than my current $X, so I should change my mind and take the other envelope. But then I can apply the argument all over again. Something is wrong here! Where did I go wrong? In a variant of this problem, you are allowed to peek into the envelope you chose before finally settling on it. Suppose that when you peek you see $100. Should you switch now?" Again, there are some subtle assumptions in this problem that cause a lot of commentary. A good reference to the problem is Christensen, R. and Utts, J. (1992) "Bayesian Resolution of the 'Exchange Paradox,'" The American Statistician, 46(4), 274-276. Note also comments in the Letters to the Editor column in two separate issues the American Statistician in 1993 (pages 160, 311). http://www.cs.ruu.nl/wais/html/na-dir/puzzles/archive/decision.html, the rec.puzzles FAQ contains a nice discussion of this problem. The birthday problems goes something like this. There are "r" people in a room. What is the probability that two or more people have the same birthday? Assuming uniform probabilities for each birthdate, the probability of a match is 1-(n!/(n^r)*(n-r)!) where n equals the number of days in a year and r equals the number of people in the group. For r=23, the probability exceeds 0.5. A nice summary of this problem with extensions into non-uniform birthdates is Nunnikhoven, T.S. (1992) "A Birthday Problem Solution for Nonuniform Birth Frequencies," The American Statistician, 46(4), 270-274. http://pascal.dartmouth.edu/~zhu/applets/Birthday/Birthday.javahtml is a Java applet for computing these probabilities. http://www.mste.uiuc.edu/reese/birthday/intro.htmlBirthday.javahtml has a simulation of the birthday problem. 14 Can someone provide me with references and/or books about [topic]? Before you post a question like this, it would be nice if you did a little work beforehand. The best resource for finding references about a statistical topic is the Current Index to Statistics Extended Database (CISED), a CD-ROM with 180,000 references in statistics journals since 1974, with coverage of selected journals dating back as far as 1940. Many university libraries have this product, and some make it available to their students through a web browser. Licensing agreements, however, prevent libraries from making this product available to the general public. If you want to purchase an individual license, it is available for as little as $95. http://www.stat.uchicago.edu/~cis/thday/intro.htmlBirthday.javahtml is a web site that contains more information about CISED. Two e-mail contacts at IMS and ASA are kmkims@stat.berkeley.edu and cised@amstat.org, respectively. http://www.stat.wisc.edu/statistics/consult/statbook.htmly.javahtml is Glen McPherson's Essential Book List. Back in 1993, Glen McPherson polled the members of STAT-L/SCI.STAT.CONSULT to create a list of books essential to anyone in the statistical consulting field. The list is organized by major topic areas. Brian Yandell has put this list up on his web site. http://www.stat.wisc.edu/statistics/consult/book.htmlhtmly.javahtml is another interesting booklist that can be found at the same web site. 15 Acknowledgments This list has grown thanks to the small and large contributions of many people. Part of it was shamelessly stolen from well written messages on STAT-L. Here is a partial list of people who you should thank for directly or indirectly contributing to this FAQ: Gary Ash, Kenneth Benoit, Grant Blank, Jim Box, Benjamin Chan, Ronan Conroy, Tony Corso, Donald Cram, Byron Davis, Barry DeCicco, Joe Dolgos, Rick Engberg, Emil Friedman, Mike Fuller, Steve Goodman, Bill Gould, Timothy Green, Duane Griffin, Clay Helberg, Tim Hesterberg, Charles Kincaid, Warren Kovach, Jan de Leeuw, Lloyd Lubet, Haiko Luepsen, Hans Mittelmann, Brian Monsell, John Nash, Jonathan Newman, Michael Palij, Dennis Roberts, David Ronis, Warren Sarle, Ronald Schoenberg, Russell Schulz, Jim Steiger, Len Thomas, Richard Ulrich, Vittorio Viaggi, Michael Walsh, Meredith Warshaw, Bob Wheeler, Will Wheeler, John Whittington, Forest Young, Sara Young, Stuart Young, Craig Ziegler. If there are errors in this FAQ, they are probably my fault; it is difficult to accurately transcribe all of the information I have received, even with cut and paste. Please send any corrections and additions. Complaints are appreciated also, but please realize that I am doing this on a volunteer effort, mostly during lunch breaks and after work hours. *** End of FAQ for STAT-L/SCI.STAT.CONSULT *** Steve Simon, ssimon@cmh.edu, Standard Disclaimer. Office of Medical Research, Children's Mercy Hospital 2401 Gillham Road, Kansas City, MO 64108 TEL: 816-234-3963 FAX: 855-1703 Vision: The Children's Mercy Hospital commits to providing quality pediatric medical care with service excellence and efficiency to everyone we serve.