Lab 3 -- Ridge Regression On SPSS
In situations of high multicollinearity (as evidenced by large VIFs), several alternative non-OLS (Ordinary Least Squares) prediction techniques may be attempted to avoid the resulting large sampling variances of the ßs. Ridge regression is one technique proposed in such a situation. These are the instructions for accomplishing at least a basic ridge regression in SPSS.
You must do Ridge Regression in SPSS using syntax. SPSS provides a canned syntax file (Ridge_regression.sps) used for accomplishing Ridge Regression on generic data sets. On individual PC installations, it is copied to the SPSS directory, but in a network installation, as in our computer labs, the file can be difficult to find. You must create (or edit from the example included here) another syntax file that gives specific information about your data set (I have called this file mrdemo_Ridge.sps). This file (mrdemo_Ridge.sps) acts as an "interface" between your data set and the generic SPSS Ridge Regression syntax file; it calls the SPSS file with information specific to your data file.
Thus, you need three files, the two syntax files necessary are:
You also need a data file amenable to a ordinary regression analysis. For this example, we will use the data file that was used in the Regression lab (MRDEMO.sav).
Though the location and association of these files can be set up in a variety of ways, we will put all of these files in the SAME DIRECTORY. So your first task is to copy the two "sps" files (note that this is the suffix used by SPSS for syntax files) to a directory. These are available in this Ridge Regression folder (these instructions are in that folder). Next, copy your regression data file (we will use MRDEMO.sav here) to the same directory.
The steps to accomplish the Ridge Regression are:
1. Open the data file (mrdemo.sav) using SPSS,
2. Open (double-click) the syntax file with information about the data set (mrdemo_Ridge.sps),
3. Modify mrdemo_Ridge.sps to fit your data,
4. In the syntax window, click Run/All -- your output should follow.
Step # 3 deserves some elaboration. The mrdemo_Ridge.sps file is:
INCLUDE 'Ridge regression.sps'.
RIDGEREG DEP=gpa /ENTER = greq to ar
/START=0 /STOP=1 /INC=0.05.
You need to change those parts that I have highlighted in RED. Thus, simply change "gpa" to your dependent variable name and "greq to ar" to a list of your predictors (the "to" convention can be used or you may simply list all of the names).
The output (below) shows both the change in the ßs (the Ridge "trace") and the change in the R2s as a function of increasing "k," starting with a k of zero (ordinary least squares solution), up to a k of 1.0 in increments of .05. (The starting, stopping, and increment value are changeable in mrdemo_Ridge.sps.)
R-SQUARE AND BETA COEFFICIENTS FOR ESTIMATED VALUES OF K K RSQ GREQ GREV MAT AR ______ ______ ________ ________ ________ ________ .00000 .64037 .323476 .211237 .321946 .202255 .05000 .63999 .309608 .211463 .308098 .206151 .10000 .63896 .297807 .210701 .296307 .208121 .15000 .63742 .287519 .209291 .286023 .208795 .20000 .63547 .278380 .207446 .276887 .208572 .25000 .63317 .270147 .205308 .268654 .207716 .30000 .63057 .262644 .202974 .261153 .206408 .35000 .62771 .255744 .200511 .254254 .204773 .40000 .62464 .249352 .197967 .247865 .202905 .45000 .62137 .243394 .195379 .241909 .200869 .50000 .61794 .237812 .192770 .236331 .198715 .55000 .61437 .232560 .190161 .231084 .196481 .60000 .61068 .227601 .187565 .226130 .194194 .65000 .60688 .222902 .184992 .221437 .191876 .70000 .60301 .218438 .182450 .216980 .189545 .75000 .59906 .214188 .179945 .212737 .187213 .80000 .59505 .210131 .177480 .208689 .184890 .85000 .59099 .206253 .175059 .204819 .182583 .90000 .58690 .202538 .172683 .201113 .180299 .95000 .58278 .198975 .170355 .197559 .178042 1.0000 .57864 .195553 .168073 .194146 .175816
The idea in ridge regression, at least as originally proposed by Hoerl and Kennard, is one of compromise; we find a small constant, k, that increases the stability (an equivalent way of looking at this is to decrease the weights' variability, which you can see from the Ridge Trace, occurs) of the weights appreciably while not decreasing the R2 by too much. Hoerl and Kennard proposed a "Ridge Trace" in which the ßs are plotted as a function of k. The notion was that we could look at the plot and make some decision about a k that seemed reasonable. At the same time as the ßs are "calming down," however, the R2 is necessarily decreasing (as the OLS ßs guarantee us the maximum R2). So SPSS produces both the Ridge Trace and a plot of R2 as a function of k. There are more advanced ways of looking at this, but this is the basic and original method. The truth is that the data set herein really didn't need ridge regression in the first place as the VIFs were all below 2.0 so we see no drastic improvement in the variance of the ßs with little loss in R2. If pressed we might use 1/F (where F is the F-ratio for the OLS regression), which has been suggested as an analytic method of obtaining k. In the case of these data, this would be k = 1/11.129 or k ≈ .1. Looking at the values calculated for us by SPSS, using this k does little damage to the R2 (reduction of only about .00141), and might be argued to provide some stability to the ßs. For your data set, the situation may be different.