A macro to execute PROC TTEST for multiple binary grouping variables in SAS (and sorting t-test statistics by their absolute values)
This article is originally published at https://chemicalstatistician.wordpress.com
In SAS, you can perform PROC TTEST for multiple numeric variables in the same procedure. Here is an example using the built-in data set SASHELP.BASEBALL; I will compare the number of at-bats and number of walks between the American League and the National League.
proc ttest data = sashelp.baseball; class League; var nAtBat nBB; ods select ttests; run;
Here are the resulting tables.
Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|
Pooled | Equal | 320 | 2.05 | 0.0410 |
Satterthwaite | Unequal | 313.66 | 2.06 | 0.04 |
Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|
Pooled | Equal | 320 | 0.85 | 0.3940 |
Satterthwaite | Unequal | 319.53 | 0.86 | 0.3884 |
What if you want to perform PROC TTEST for multiple grouping (a.k.a. classification) variables? You cannot put more than one variable in the CLASS statement, so you would have to run PROC TTEST separately for each binary grouping variable. If you do put LEAGUE and DIVISION in the same CLASS statement, here is the resulting log.
1303 proc ttest 1304 data = sashelp.baseball; 1305 class league division; -------- 22 202 ERROR 22-322: Expecting ;. ERROR 202-322: The option or parameter is not recognized and will be ignored. 1306 var natbat; 1307 ods select ttests; 1308 run;
There is no syntax in PROC TTEST to use multiple grouping variables at the same time, so this tutorial provides a macro to do so. There are several nice features about my macro:
- It allows you to use multiple grouping variables at the same time.
- It sorts the t-test statistics by their absolute values within each grouping variable.
- It shows the name of each continuous variable in the output table, unlike the above output.
Here is its basic skeleton.
- Count the number of grouping variables.
- Use a DO-loop to iterate through each grouping variable.
- Create a variable called FIRST to denote the first iteration of the loop. This will become useful at the end.
- In each iteration, execute PROC TTEST with the ith grouping variable. Produce the output data set containing the results.
- Sort the resulting data set by the absolute value of the t-test statistics in descending order.
- Use PROC SQL to abstract the key information:
- the grouping variable’s name
- the t-test statistic
- the absolute value of the t-test statistic
- the P-value of the 2-sample t-test (assuming unequal variances)
- Create a new data set from the results of the first iteration; use the aforementioned “FIRST” variable to determine if it is the first iteration. Append the results of the subsequent iterations to this data set using PROC APPEND.
- Delete the intermediate data sets that were created within the macro: TTEST_RESULTS1 and TTEST_RESULTS2.
Here is the code for the macro. Note that you can feed more than one numeric variable into this macro.
%macro ttest_by_class(ds, class_vars, numeric_vars, output); * create a counter that will be used later for appending data sets; %let first = 1; * find the number of class variables fed into the macro; %let num_class_vars = %sysfunc(countw(&class_vars.)); %put There are &num_class_vars. grouping variables to process.; ***** loop through all class variables; %do i = 1 %to &num_class_vars.; * extract the ith class variable; %let ClassVar = %scan(&class_vars, &i, ' '); %put Starting variable &i. of &num_class_vars., &ClassVar.; * create an output data set containing the statistical results of PROC TTEST; * suppress printing of output using ODS EXCLUDE ALL; ods exclude all; proc ttest data = &ds.; class &ClassVar.; var &numeric_vars.; ods output ttests = ttest_results1; run; ods exclude none; * choose the method using unequal variances; * calculate the absolute value of the t-test statistics; data ttest_results2; set ttest_results1; if variances = "Unequal"; abstValue = abs(tValue); run; * sort the data set by the absolute value of the t-test statistics in descending order; proc sort data = ttest_results2; by descending abstValue; run; * create a data set of the label, t-test statistic, absolute value of the t-test statistic, and P-value for the ith grouping variable; proc sql noprint; create table ttest_results3 as select "&ClassVar." as Grouping_Variable label = 'Grouping Variable' length = 100 format = $100., Variable as Numeric_Variable label = 'Numeric Variable' length = 100 format = $100., tValue label = "t-Test Statistic" format = 8.4, abstValue label = "Absolute Value of t-Test Statistic" format = 8.4, Probt as PValue label = "P-Value" format = 8.4 from ttest_results2; quit; * append the data sets as each new result is generated; %if &first. %then %do; data &output.; set ttest_results3; run; %let first = 0; %end; %else %do; proc append base = &output. data = ttest_results3; run; %end; %end; * delete the intermediate data sets that were created within the macro; proc datasets library = work noprint; delete ttest_results: ; run; %mend;
Let’s try it with the SASHELP.BASEBALL data set again!
%ttest_by_class(sashelp.baseball, League Division, nAtBat nHits nHome nRuns nRBI nBB, baseball_ttests); proc print data = baseball_ttests noobs label; run;
Here is the output from PROC PRINT.
Grouping Variable | Numeric Variable | t-Test Statistic | Absolute Value of t-Test Statistic | P-Value |
---|---|---|---|---|
League | nHome | 3.2134 | 3.2134 | 0.0014 |
League | nRuns | 2.8408 | 2.8408 | 0.0048 |
League | nRBI | 2.6692 | 2.6692 | 0.0080 |
League | nAtBat | 2.0582 | 2.0582 | 0.0404 |
League | nHits | 1.9186 | 1.9186 | 0.0559 |
League | nBB | 0.8637 | 0.8637 | 0.3884 |
Division | nRBI | 1.6386 | 1.6386 | 0.1023 |
Division | nRuns | 1.5652 | 1.5652 | 0.1186 |
Division | nHits | 1.4758 | 1.4758 | 0.1410 |
Division | nBB | 1.2131 | 1.2131 | 0.2260 |
Division | nAtBat | 0.9442 | 0.9442 | 0.3458 |
Division | nHome | 0.5238 | 0.5238 | 0.6008 |
Within my macro, notice that I used ODS EXCLUDE ALL to suppress the printing of the output from PROC TTEST. This is very important, because PROC TTEST can take a long time to complete. Furthermore, I used ODS OUTPUT to specify the one table that I want, which saves me time and memory by excluding the output that I don’t want.
As I mentioned before, this macro also sorts the results by the absolute values of the t-test statistics. Thus, if that is your goal, you can do that, too! In fact, you can use it with just one grouping variable and multiple continuous variables, and you will get a nice table of the results that are indexed by the names of the continuous variables.
I thank Cyrus Bradford from SAS Technical Support for his help with the above macro. Although he did not write the exact macro above, he helped me with a very similar macro for a slightly different purpose, and he wrote most of the code. My main contribution was expanding it to allow multiple numeric variables to be fed into the macro.
Thanks for visiting r-craft.org
This article is originally published at https://chemicalstatistician.wordpress.com
Please visit source website for post related comments.