GUIDE FOR GRADUATE STUDENTS ASSEMBLING EMPIRICAL RESEARCH PAPERS
Ralph's (arbitrary) Rules for Paper Assembly
Ralph B. Taylor
Department of Criminal Justice
(latest revision 2/2/2016)
Comments welcome: tuclasses at gmail.com
This page seeks to provide boiler-plate guidance for assembling and organizing quantitatively-based, empirical research papers of journal submission length. This is probably not helpful for theoretical review papers, or for empirical papers based on qualitative approaches.
It is oriented toward graduate students.
The format suggested here is provided as a guide only.
These suggestions do NOT cover formatting. Details on formatting can be found in many journals. For example the Criminology guidelines can be found at
The guidelines for authors put out by the Journal of Criminal Justice has some interesting and thoughtful points. See:
The body of your paper, including title page and abstract, should be in the range of 25-30 pages of text. Endnotes, references and tables are on top of that. You should keep endnotes to a minimum, using these only to explain technical points not of interest to the general reader, or to describe results not shown in detail. If your paper is 15-20 pages it is probably leaving out a lot of important stuff. Ask someone to look at it.
If you are presenting a rather complex study you may find you need to stick to the main points, and summarize points that are not centrally relevant.
In addition to section headings use subheadings. For example, in a methods section you might have subheadings like: participants, procedures, data sources, analysis plan, and variables, to take just one example.
Remember, always use past tense. By the time you get to writing up the results, the study has been done the data were collected and the results showed this and that.
DESCRIPTIONS OF SECTIONS
Title page includes paper title and author(s) and their affiliations, followed by contact information. A note at the bottom of the page should acknowledge contributions from those reviewing earlier drafts, should acknowledge funding sources (if appropriate), data sources like ICPSR or elsewhere (if appropriate), and should report if earlier versions were presented at regional or national conferences. The note at the bottom ends with the first author's address and contact information including email address.
Think hard about your title. It should be specific and informative and pique the reader's interest. Have the title link in a memorable way to a question you are asking or a finding that you have produced.
You want to work harder on this than on any other page of the paper. It should start with a key question posed. The body should explain how what you did linked to earlier work, what data sources you used, what your key findings were, and what it means. All in less than 200 words. A now-deceased former professor, Clinton B. DeSoto had a line about abstracts that went like this: try and have one line in there which is theoretical and thoughtful and moves the reader beyond the immediate questions of the study, helping him/her see bigger connections. Good advice. I should be more conscientious about following it. Make your abstract good and you will avoid the following problem.
Traditionally this section is the hardest for students to write. The reason it is hardest, I think, is because it asks you to do two things which are challenging: organize the work in an area by highlighting the main threads pursued and questions asked; and, drill down to very specific -- but not nitpicking -- criticisms of the work which has come before. It should be no more than 8 - 12 pages in length.
It requires, therefore, a lot of thought.
Organizing. You need to think enough about the area to organize it. A series of paragraphs starting
"Abel (2003) found that the widgets were over produced when times were hard...
"Cain (1999), interestingly, found that widgets are smaller than they used to be .....
"Horatio and Alger (2004) asked a slightly different question....
will send all your readers scrabbling for the scotch and soda. The above is nothing more than a series, with unclear connections between the different paragraphs, and no sense of leading the reader anywhere.
Once you have thought enough about an area to organize it, then use those ideas as subheadings. Begin each section with an opening point and end with a summary point.
WHAT TYPE OF QUESTION ARE YOU ASKING?
A successful introduction clearly states, early on, the type of question your work addresses and why it is important. The criticisms you seek to avoid from the jaundiced reviewer of the article you submit are: "what is interesting here is not new, and what is new here is not interesting."
There are basically four types of "new and interesting" questions your project (or anybody's) can target.
1. There is a gap. Here is something that people have overlooked. The oversight may be on the outcome side, or it may be on the predictor side. AND the gap is important for at least a couple of reasons.
2. There are one or more flaws to be fixed. People have looked at the effect of A B and C on Y, but the studies that have been done are either incomplete (explain), seriously flawed (explain why), misleading (explain why) and you are going to fix this (explain how).
3. There are two (or more, but be careful as you expand) theoretical perspectives on how/whether/why A B and C affect Y. But the work to date has failed to pit these two (or more) theoretical frames against one another to see which one provides a more adequate explanation of the variation in Y. Yourstudy is designed to provide this test of the competing theoretical perspectives while being fair to both of them. This is called a "strong inference" setup. See: Platt, J. R. (1964). Strong inference. Science, 146(3642), 347-353.
4. There is some type of intervention or program or policy that has been implemented, and it is important, and we do not yet know whether it is working or if it is working how or how well it is working; therefore we need to evaluate it. My study will do this.
Criticizing. Your introduction must contain some criticism of earlier work, and these criticisms must be specific, sensible, and non-trivial. Say you find that work in this area has failed to consider the effects of a spatially lagged outcome on the predictors of fear of crime. That is an important and cogent point. Point to examples of studies where it has been left out, and where it would have made sense to include it. Explain to the reader how, lacking that information, studies might be misleading.
Pointing out directions the work has not yet explored is part of the picture, and if your study is going to go in new and uncharted directions .... to boldly go .... then also tell the reader why this might be important.
All things methodological. If there are methodological limitations of past studies -- e.g., they forgot to think about error covariances -- and these are relatively trivial points, these can be included as a motivator of your study, but they should not be the main motivator. Beware the study that is primarily methodological. It is unlikely to get accepted. You need theoretical touchstones. Not a lot. But they need to be clear
Theory, theory, theory. The most important point about your study may be how it elaborates or tests or expands current theory, unless your paper is grounded theorizing, which is rather different. But if it is not grounded theorizing, it is crucial that your study highlight what theory is being addressed, and how your study will add an important piece. At the same time, you want to include some implications for policy and/or practice, if they seem potentially relevant (see below).
or: Evaluation Evaluation Evaluation If you are doing a question 4 then you are likely to be evaluating a program or policy or practice. The most important point of your study may be that something worked, or that some piece of something work, or something did not work, or something might work but it did not get implemented properly. That said, there are likely to be some theoretical implications for organizations, or implementation science, or something, and those deserve mention too.
Subsections. At the risk of being formulaic, your introduction should contain the following subsections:
An initial page or two which sets the stage. Think of it as the prologue. You want an excellent opening line, an excellent opening paragraph. Not something hackneyed: "Most studies show..." Kevin Wang and I opened a paper with "Urban alleys are often avoided even during daylight hours. This was not always so." This page or two will identify key theoretical ideas, and outline the contribution of your study in broad terms. It can tell the reader what sections follow next in the introduction.
Methods or Data
If your study is primary data analysis, tell the reader about
the sample or the respondents: who are they?
how did you get hold of them?
what specifically were they asked to do?
If your study is secondary data analysis, tell the reader
from what source were these data obtained?
who originally collected these data and for what purposes?
important features of the sample?
If these are administrative data:
who collected them?
for what purposes?
over what period of time?
using what types of categories?
with what types of reliability checks?
If these are survey data:
What was the sampling frame?
What was the sampling design?
When were the data collected by whom under whose auspices?
What was the response rate? Is one of the standardized AAPOR response rates reported? http://www.aapor.org/Response_Rates_An_Overview1.htm
Give the original wording of questions and the response categories.
Explain for each specific variable: what is it and how does it work?
If you are doing multivariate work, there may be a large number of side issues that need to be discussed in this section: skewness of variables, data transforms done if needed, missing data, checks on multicollinearity, and the like.
Every methods section in an empirical paper should reference a table that includes, for each variable included in the analyses, descriptive information, starting with the outcome. Ideally for each variable the reader can see at a glance for continuous variables:
* n of cases
* standard deviation
For binary or mulitnomial variables the table explains what each category in the variable means, what the corresponding numerical code was, and the n and percent in each category.
Such a table is absolutely essential for helping the reader understand what you ared doing.
Additional Comments for Graduate Students Doing Stat 2 Papers
You will want to either summarize in your own words, or quote directly from the primary source material, a description of the data collection procedures. People need to be able to understand: what was the sampling frame?; what was the sampling strategy?; what was the response rate?; how was the survey conducted -- telephone or in person for example?; what kinds of sections were there in the survey?
If there are any special analytical things you did before you got started with the analyses, either in terms of missing values, or special recoding, tell the reader about those
If you have developed an index, tell the reader about each item that went into each index. Tell him/her about Cronbach's alpha. Be as specific as you can about how the index was constructed.
If you are using the crime data, be absolutely clear about the time period covered, the offense in question, and the rate.
Your table needs to be a formatted, word processed document, not just a bunch of patched together spss printout. Each table needs to be totally self sufficient - each variable is clearly and fully explained.
Be sure you can clearly explain the weighting variable; you also might want to report on the range of weights applied.
You want to provide the reader with a rationale for the
analyses you use - what are the reasons that hlm is being used here, and why
is it better than another approach - do not need a lot here, just a short
Walk the reader through your results. Be specific but not tedious. Tell the reader what is significant, which direction impacts are going, and help the reader interpret these impacts. For example: "The difference between white and nonwhite respondents on the sense of community index, after controlling for other predictors, was .5, with whites reporting significantly weaker sense of community. This contradicts previous literature but aligns with the neighborhood changes taking place in this part of the city during the survey period."
When you are discussing a specific table, tell the reader in the text which table the results can be found in.
If impacts are not statistically significant, they need not be mentioned in the results section unless there is something really really surprising about their being non-signficant. Non-significant means essentially zero. Sometimes there is a temptation to make a big deal about things which did NOT come out. The mantra here is: it is always extremely hazardous to make inferences from negative results (null findings). The reason? Because there can be so so many reasons why things did not come out.
You can end this section with a summary of key results.
Discussion - at least 5 pages
A discussion is NOT simply a re-hashing of results. Rather, what it does is look back and look ahead. Again: organize this section. Think about main points and use those to organize your material.
Start with a brief summary of of your main findings, if you did not end the results section with such.
Look back. Return to each of the major theories or majorl questions introduced in the first part of the paper. Revisiting each one: how does each look differently in light of the new information you have gathered?. Imagine you are looking at characters at the beginning of the 4th act of a play. The 4th act, naturally, comes after the 3rd act, which contains all the dramatic action. Which characters look triumphant ("results seem to provide a robust expansion of the Gromit theory in the following ways")? Which ones look bedraggled ("Although the Wallace theory predicted large impacts of X on Y1 and Y2, those did not emerge here") ? How or in what ways are each of these theories altered by the results which have been presented since the introduction?
Look ahead. Given what you know now, what are the next steps? Do NOT just end the discussion with a vague "clarion call for further research." Do NOT just give lists of ways the generalizability could be tested. (in fact, don't mention generalizability at all, because this is an empirical question for future research. See p. 164 in Taylor (1994) Research Methods in Criminal Justice (McGraw Hill) Instead, be specific, and lay out specific avenues which need to be investigated in the future, and explain why each of these avenues may be important. Imagine you were going to be researching this problem for the next five years. In a nutshell, what will you be pursuing and why?
You also want to honestly acknowledge study limitations. My preference is to list the limitations, but then also remind the reader of the study strengths immediately following.
It is easy to think that the lack of external validity is a study limitation. If you say this, which many commonly do, you will show that you do not understand external validity. External validity is always an empirical question. Which means before you try and see whether or not the results replicate, and before you fail to replicate, there is no limitation. This is an important and widely misunderstood point.
End with a strong summary paragraph.
Formatted preferably using something like Endnote. If you are not using Endnote or Reference Manager you are wasting a lot of time. Learn how to use this tool. Follow the format of the journal to which you are submitting.
Do NOT use Word table templates. These just put in lines which are hard to get out. I STRONGLY recommend using Excel to build the table. The main advantage is you can alter things like how many decimal places to show. It also allows you to get results in easily and in a way you can edit them from there.
If you are using Stata you can get tables sent directly in formatted fashion, with only minor tweaking required after, using outreg2.
Each table should start on a separate page.
Some arbitrary rules:
NEVER re-key results from a table, with one exception. Your table should report anywhere from 1 to 5 p levels if these are being reported: p < .05; p < .01; p < .001 are the most typical. E.g., * = p < .05; ** = p < .01 and so on. You can re-key probability levels from tables so they match these levels. SO .038 becomes < .05. NOTE - some journals ask for specific p levels. If they do not, round. It makes your table easier to read.
NEVER report .000 as a probability level. Think about it for a second.
ALWAYS include an informative table title telling the reader in some detail about what is happening.
First table should have solid descriptive information about respondents and/or key variables: minima, maxima, means and standard deviations at a minimum. If someone else seeks replicate your study, this is one of the first things they will cross check.
SPECIAL STATA NOTE: There are all kinds of ways you can pull tables from Stata output files, directly into Excel, WITHOUT rekeying. LEARN ABOUT OUTREG2
Nothing special to say here because there is SO much to say - just be sure they are good and legible. You want clear titles, clear legends. Data source should be indicated on the figure legend. If the figure is based on weighted data, that should be indicated as well.
Constructing clear figures that communicate well requires careful thought and attention to detail and decluttering. For a brilliant discussion of the latter see:
"An Economist's Guide to Visualizing Data" by Jonathan Schwabish of the U.S. Congressional Budget Office, downloadable from
Some Specific Additional Comments for Graduate Students Working in the Statistics II Course and Doing Multilevel Papers with Secondary Data Analyses and Doing Maps
If you are going to include maps, be sure there is lots of information on the map so the reader can clearly see: what are the organizing spatial units, what is the region shown, what is the variable being mapped, and what are the groupings used for the variable being mapped.
The mapping program many of you are using seems to have a default option for grouping when you request a chloropleth map such that "natural breaks" in the numbers are sought, and used to decide how to define the levels of the variable depicted. NOTE that this will RARELY result in groupings which have an equal number of neighborhoods or police districts in each grouping. For example, under the "natural breaks" option you may have only one neighborhood or police district in the highest group, or only one in the lowest group. The advantage of natural breaks is that there is at least some separation, on the variable, between the neighborhoods/districts in the different groups. So you do not have tiny differences between where one group leaves off and the next begins, in terms of scores on the variable. The disadvantages are two fold. First, the breaks are specific to the data set itself. There is nothing theoretically meaningful about the groupings chosen. Second, it is very unlikely under this scenario that you will obtain roughly equal numbers of neighborhoods in each district or neighborhood. For example, if you were mapping the neighborhoods or districts by quartiles, which amounts to about four roughly equal groups, the middle two groups would contain about half of your neighborhoods (around 22) or districts (around 11 or 12). This would then correspond to the interquartile range -- scores from the 25th percentile to the 75th percentile -- at the ecological level. This is theoretically a pretty useful description. If you were mapping by quintiles (5 about equal size groups) then the middle three groups would correspond roughly to about 60% of your Level 2 units, which represents, roughly, the mean + / - one standard deviation. This also is theoretically useful. But most importantly: tell the reader which option you have chosen.
Whichever option you do choose, be sure to label the real limits for each interval.
Building Your Toolkit
Becoming a skilled social science writer is a lifelong challenge. I cringe when I go back and read either my dissertation or articles written soon after I received my doctorate. They are uniformly horrible. In my graduate program, there was no specific instruction on how to mold research papers, so it took a while to figure this out. Since that time, we have had an explosion of books on topics like "Your dissertation in 15 minutes a day" and the APA manual of style has bloomed from about 100 pages to a massive doorstop.
You need to be reading about how to write well and how to write productively. (Remember my two standards tape?) So get started now.
Just a couple of short things to get you started.
If you are looking for practical and readable you MUST read
Becker, H. S. (2007). Writing for Social Scientists: How to Start and Finish Your Thesis, Book, or Article (2nd Edition). Chicago: University of Chicago Press.
Candace McCoy uses this in the writing course she teaches in the John Jay doctoral program.
Also strongly recommended for thoughts not only about the content of writing but also developing a personal (and perhaps social style) of writing is:
Silvia, P. J. (2007). How to Write a Lot. Washington, DC: American Psychological Association.
Silvia focuses mostly on writing papers in psychology, but many of the same lessons apply about how to organize specific sections of papers.
Finally, although written for high school students writing essays, I found the following most helpful
Payne, L. V. (1969). The Lively Art of Writing. New York: New American Library.
I have pdfed some pages about organizing the opening of an essay and you can find them here: http://www.rbtaylor.net/payne_33_55.pdf