You have a lot of choices for how to do your data analysis. I have found that the best option for most people I work with is to use IBM SPSS software. Here are the main reasons why IBM SPSS is your best choice.
Before I list the advantages of IBM SPSS, let me list some alternatives and explain when you might wish to use thse alternatives.
Web-based calculators. There are a wide range of web-based calculators available for free, and the quality of these calculators is generally quite good. This is a reasonable alternative if you do the same analysis over and over again and you rarely deviate from that routine. These web-based calculators, however, rarely provide you with a graphical summary of your data. Also, if you switch to a different type of analysis, you have to find a different web-based calculator.
Spreadsheet software. There are several excellent spreadsheet programs, such as Microsoft Excel, that do a fine job for simple descriptive summaries. While these programs also provide some inferential statistics, there are serious problems with the validity of the algorithms. Two websites offer criticisms of spreadsheets in general and Excel in particular. Another problem with speadsheets is that their graphical displays are generally crude with poorly chosen default options.
Database software. If your primary concern is with accurate data entry, especially for a complex research project, such as a multi-center trial, you should use database software, such as Microsoft Access or MySQL. Unfortunately, database software will not provide anything except for the most elementary statistical summaries, so you will have to pair your database with a different program for data analysis.
R, SAS, and Stata. I am only listing three programs here, but there are at least a dozen programs out there that are serious competitors to IBM SPSS. These are all excellent programs with many of the same advantages of IBM SPSS. If you are already familiar with one of these programs, you should keep using it. The only serious disadvantage of these programs is that they are harder to learn for a beginner.
Here are the four reasons why I recommend IBM SPSS over any of these previously mentioned alternatives.
-
Comprehensive data management tools. The most critical part of any data analysis is the initial data entry. If you enter the data the wrong way, you won’t be able to analyze it properly. While you can use a wide range of options for data entry, often entering the data into IBM SPSS is the best choice. IBM SPSS offers a simple spreadsheet format for data entry that is intuitive and easy to start with. More importantly, IBM SPSS provides a broad range of data documentation (especially value labels) that will help you to ensure consistency in your data entry.
-
Excellent graphical display options. Before you start your data analysis, you need to understand how your data behaves. This is best done graphically. IBM SPSS provides scatterplots, boxplots, and histograms that help you to see patterns in your data. You shouldn’t publish findings based solely on an intuitive interpretation of graphics, of course. Rather, these graphics will provide you with a general framework for understanding your data, so that you will be better able to interpret the complex inferential procedures that follow.
-
A broad range of statistical models. Often you will not know at the start of a research project what statistical models would be best suited for your particular project. Sometimes you will have a general idea, but often the statistical model will change as you start examining your data. Or you will want to run an alternate analysis as a quality check for the originally planned analysis. IBM SPSS offers a broad range of highly flexible statistical models: most notably the general linear model and a variety of logistic regression models. These allow you to have a single program that will meet almost all your data analysis needs. Although a few people might need to supplement IBM SPSS with another program like R, for most of the people I work with, IBM SPSS will be the only statitical software package that they need.
-
An easy to learn menu driven interface. Many of the competing statistical software programs, such as R, SAS, and Stata, are run primarily as a programming language. While a programming language offers some important advantages, it takes much longer to learn. Furthermore, the complexity often discourages you from trying a new and different approach.
You can find an earlier version of this page on my original website.