Mapping codes to levels for factors imported from SPSS data in R
R, the R-project, is a great, free, statistical software package that can do many many things... however, some things can be a bit of a pain to figure out.
For example, if your collaborators have their data in SPSS and they give you a .sav file, you might be interested in knowing the mapping from codes (or values) to level names (semantic information) for categorical variables (called "factors" in R). R treats the codes as unimportant, but you might want to know their original values so that you can better interoperate with your SPSS-using collaborators.
Fortunately, the read.spss function creates a structure called label.table that contains this mapping.
Here's an example of how to access this mapping... say you have an SPSS .sav file called data.sav and the categorical variable in question is INCOME.
> #Load the foreign library, read in sav file
> library(foreign)
> dat <- read.spss("data.sav")
>
> #Attach the object to access its variables
> attach(dat)
>
> #Save the mapping for INCOME to map
> map <- attr(dat,"label.table")$INCOME
>
> #Access the codes with as.integer()
> as.integer(map)
[1] 97 96 95 94 93 92 91 90 87 86 85 84 83 82 81 80 17 16 15 14
13 12 11 10 9 8 7 6 5 4 3 2 1
>
> #Access the labels with names()
> names(map)
[1] "RF - 75-150K" "RF - gt 75K" "RF - 35-75K" "RF - gt 35K"
"RF - 15-35K" "RF - lt 35K" "RF - lt 15K" "Refused"
"DK - 75-150K" "DK - gt 75K" "DK - 35-75K" "DK - gt 35K"
"DK - 15-35K" "DK - lt 35K" "DK - lt 15K" "Don't know"
"Over 150k" "Exactly 150K" "100-150k" "Exactly 100K"
"75-100K" "Exactly 75K" "50-75K" "Exactly 50K"
"35-50K" "Exactly 35K" "25-35K" "Exactly 25K"
"15-25K" "Exactly 15K" "10-15K" "Exactly 10K"
"Under 10K"