TO: Students in CJ 330
FROM: R. B. Taylor
DATE: 11/22/99
RE: Processing your data: 11/22/99 LAB
GOALS today
1. If you have data spread across several spreadsheets, getting those data onto one spreadsheet.
2. After you have imported the data to spss (covered in the last lab) you want to ADD VARIABLE AND VALUE LABELS so you know what is what.
3. You want to look at the FREQUENCY DISTRIBUTIONS of each variable AND CHECK FOR OUT OF RANGE VALUES.
4. Make corrections as needed.
5. Re-run frequencies
6. Run a HISTOGRAM
6. Run a CLUSTERED BAR CHART
GETTING DATA ONTO ONE SPD
Let's say you have your data spread out across three different spreadsheets. Each spreadsheet has different data units coded. These three are
al.wks
tom.wks
suzie.wks.
* Open one file in excel, e.g., al.wks.
* Go to SAVE AS and give it a different name, such as BIGTRYAA.wks. Be sure to save it in Lotus format.
* Also open up the two other spreadsheets.
* Go to WINDOW, go to tom.wks, and highlight all the data rows with the mouse (hold and drag). Be sure you get all the columns and all the rows that have data in them - you do NOT want the first row because that has variable names, remember?
* Click EDIT, COPY
* Go to WINDOW, get back to BIGTRYAA.
* Click mouse in FIRST column of FIRST row BELOW where you have data already entered.
* Click EDIT, PASTE. The data should appear.
REPEAT THE OPERATION for importing the third spreadsheet file
* SAVE BIGTRYAA once all the data are in - you want to be sure to save it in 1-2-3 Lotus format.
Import data into spss - we covered this last lab. Once you have it imported SAVE the SPSS data file - BIGTRYBB, for example. Be sure to save it on your A floppy.
VARIABLE AND VALUE LABELS. A variable label is a short text description - 20 to 30 characters - of what your VARIABLE NAME means. You need and want this because it will result in better labeled printout.
* With SPSS running and your data file up and in the window
* Double click on the variable. Whole column should black out, and a box should come up. Click on LABELS.
* Under variable name write in short description.
* For VALUE LABELS you are telling it what each numeric code means. The steps are:
- put number value in
- write in text for label
- click on ADD. When you are all done, you click on CONTINUE.
* If you have any special missing value codes, go to missing values and enter those.
* Click on OK and you are done for that variable.
NOTE: if you are pressed for time, you do not need to enter value labels if you consistently use the values the same way, for example, if 0 = NO and 1=YES, just enter these labels for the first 0/1 variable.
NOTE: there are ways you can do this through a syntax box that is easier and faster.You type it into the syntax box, highlight it, and run it. Ask me if you want a hint. The code would look like this:
VARIABLE LABELS
NKILLED Number killed in episode'
NATTACKS Number of physical attacks'
.
VALUE LABELS
NKILLED 0 None
50 Fifty OR MORE'
GENDER 0 Male'
1 Female'
.
FREQUENCIES.
HOW TO USE THIS INFORMATION. First you want to check your values. If there are any values that are out of range you will need to go back to your original coding sheet for that case and fix it. VALUES THAT ARE OUT OF RANGE will mess you up later on. For example, if your variable is GENDER and the permissible values are 0, 1 if you have a 2 or 3 that is a problem. Second, you can look at the frequency distribution for each variabl: how many cases do you have at each value? What percent of cases do you have at each value? What percent of cases fall below a certain value?
* click on Statistics orAnalyze
* click on Descriptives
* click on Frequencies
* drag mouse, and highlight all your variables on the left hand side, then click on right pointing arrow so they all end up in right hand box.
* click on Statistics and then click boxes for mean, median, min, max, then click on continue.
* click on ok.
* Lots of numbers will fly by. You now have created a listing file. It LISTS your OUTPUT or RESULTS. Try and print it out. Remember your terminal number.
BUT BEFORE YOU PRINT IT OUT save it to you're a: floppy!. Better yet, save it on both!
FIXING CODING ERRORS. If you see a value for a variable that is out of range,
* flip back to the data sheet, and looking down the column for the variable, find the offending value.
* making a note of the row number, scroll all the way to the left so you can see what the ID number is for that offending case
* go to the ORIGINAL data sheet for THAT case. IF a valid value is on the coding sheet, but not in the data file, fix the data file. IF a valid value is NOT on the coding sheet (e.g., permissible values were 0,1, and the coding sheet has a 7, you have no choice but to go back to the data file, take out the incorrect value, and leave it empty. SPSS will treat that as a missing value.
AFTER you have fixed all your coding errors for all your variables, SAVE YOUR SPSS FILE AGAIN with a new filename, e.g., BIGTRYCC. Be sure to save it on at least two floppies.
LOOKING AT A HISTOGRAM.
* Click on Graph
* Click on Histogram
* highlight the variable you want and put it in right hand box.
A chart will be added to your output file. It shows you (height of each bar) how many cases at each value of the variable.
LOOKING AT A CLUSTERED BAR CHART
Suppose that you think scores on variable Y differ based on variable X (e.g., crime news more likely as lead story on the 10:00 or 11:00 news compared to the 5:00 or 6:00 news. If you have a variable LATENEWS (coded 0 for 5 or 6 and 1 for 10 or 11) you can see how scores on an outcome vary depending on the LATENEWS variable.
* click on Graph
* click on picture for simple
* click on summaries for groups of cases
* put your splitter variable (LATENEWS) into the category axis box
* click on other summary function
* click on dependent variable (Y)
* click right arrow to put in right hand box