="http://www.w3.org/2000/svg" viewBox="0 0 512 512">

Chapter 2: Research Process and Research Methods

2.2 Stages in the Sociological Research Process

Sociological research consists of several stages. The researcher must first choose a topic to investigate and then become familiar with prior research on the topic. Once appropriate data are gathered and analyzed, the researcher can then draw appropriate conclusions. This section discusses these various stages of the research process.

Choosing a Research Topic

The first step in the research process is choosing a topic. There are countless topics from which to choose, so how does a researcher go about choosing one? Many sociologists choose a topic based on a theoretical interest they may have. For example, Émile Durkheim’s interest in the importance of social integration motivated his monumental study of suicide. Many sociologists since the 1970s have had a theoretical interest in gender, and this interest has motivated a huge volume of research on the difference that gender makes for behavior, attitudes, and life chances. The link between theory and research lies at the heart of the sociological research process, as it does for other social, natural, and physical sciences. Accordingly, this book discusses many examples of studies motivated by sociologists varied theoretical interests.

graphic of female and male figure along with photo of nonbinary individual

Over the course of the past several decades our knowledge of sex, sexuality, sexual orientation and gender has burgeoned. While research teaches us more every day, we are far from understanding these topics in simple or binary terms. Magda Ehlers – Pexels, Woman With White and Blue Floral Face Paint – Cotton Bro

Many sociologists also choose a topic based on a social policy interest they may have. For example, sociologists concerned about poverty have investigated its effects on individuals’ health, educational attainment, and other outcomes during childhood, adolescence, and adulthood. Sociologists concerned about racial prejudice and discrimination have carried out many studies documenting their negative consequences for people of color. As this book emphasizes, the roots of sociology in the United States lie in the use of sociological knowledge to achieve social reform, and many sociologists today continue to engage in numerous research projects because of their social policy interests. The news story discussed earlier in this chapter related to Oregon’s problems with poverty and hunger provides a good example of this type of research.

A third source of inspiration for research topics is personal experience. Like other social scientists (and probably also natural and physical scientists), many sociologists have had various experiences during childhood, adolescence, or adulthood that led them to study a topic from a sociological standpoint. For example, a sociologist whose parents divorced while the sociologist was in high school may become interested in studying the effects of divorce on children. A sociologist who was arrested during college for a political protest may become interested in studying how effective protest might be for achieving the aims of a social movement. A sociologist who acted in high school plays may choose a dissertation during graduate school that focuses on a topic involving social interaction. Although the exact number will never be known, many research studies in sociology are undoubtedly first conceived because personal experience led the author to become interested in the theory or social policy addressed by the study.

Conducting a Literature Review

Whatever topic is chosen, the next stage in the research process is a review of the literature. A researcher who begins a new project typically reads a good number of studies that have already been published on the topic that the researcher wants to investigate. In sociology, most of these studies are published in journals, but many are also published as books. The government and private research organizations also publish reports that researchers consult for their literature reviews.

Regardless of the type of published study, a literature review has several goals. First, the researcher needs to determine that the study they have in mind has not already been done. Second, the researcher needs to determine how the proposed study will add to what is known about the topic of the study. How will the study add to theoretical knowledge of the topic? How will the study improve on the methodology of earlier studies? How will the study aid social policy related to the topic? Typically, a research project must answer at least one of these questions satisfactorily for it to have a chance of publication in a scholarly journal, and a thorough literature review is necessary to determine the new study’s possible contribution. A third goal of a literature review is to see how prior studies were conducted. What research design did they use? From where did their data come? How did they measure key concepts and variables? A thorough literature review enhances the methodology of the researcher’s new study and enables the researcher to correct any possible deficiencies in the methodology of prior studies.

In “the old days,” researchers would conduct a literature review primarily by going to an academic library, consulting a printed index of academic journals, trudging through shelf after shelf of printed journals, and photocopying articles they found or taking notes on index cards. Those days are long gone. Now researchers use any number of electronic indexes and read journal articles online or download a PDF version to read later. Literature reviews are still a lot of work, but the time they take is immeasurably shorter than just a decade ago.

Formulating a Hypothesis

After the literature review has been completed, it is time to formulate the hypothesis that will guide the study. As you might remember from a science class, a is a statement of the relationship between two variables concerning the units of analysis the researcher is studying. To understand this definition, we must next define variable and unit of analysis. Let’s start with , which refers to the type of entity a researcher is studying. As we discuss further in a moment, the most common unit of analysis in sociology is a person, but other units of analysis include organizations and geographical locations. A is any feature or factor that may differ among the units of analysis that a researcher is studying. Key variables in sociological studies of people as the units of analysis include gender, race and ethnicity, social class, age, and any number of attitudes and behaviors. Whatever unit of analysis is being studied, sociological research aims to test relationships between variables or, more precisely, to test whether one variable affects another variable, and a hypothesis outlines the nature of the relationship that is to be tested.

Figure 2.3 Causal Path for the Independent and Dependent Variable

graphic showing independent variable connected to and leading to the dependent variable

Suppose we want to test the hypothesis that women were more likely than men to have voted for Biden in 2020. The first variable in this hypothesis is gender, whether someone is a woman or a man. The second variable is voting preference—for example, whether someone voted for Biden or Trump. In this example, gender is the independent variable and voting preference is the dependent variable. An is a variable we think can affect another variable. This other variable is the , or the variable we think is affected by the independent variable (see Figure 2.3 “Causal Path for the Independent and Dependent Variable”). When sociological research tests relationships between variables, it normally is testing whether an independent variable affects a dependent variable.


Test Yourself Box lising hypotheses and asking you to pick out both the independent and dependent varaibles. Hypotheses include: the more time spent studying for an exam, the higher the grade earned; the higher the price of a car, the better the quality of the car; the higher the level of education attained, the more a person will earn over their lifetime; the more likes a post gets on social media, the more popular the poster; quitting smoking cigaretts will extend your life; and students who complete a college degree, the more likely they were to have attended preschool.


Many hypotheses in sociology involve variables concerning people, but many also involve variables concerning organizations and geographical locations. As this statement is meant to suggest, sociological research is conducted at different levels, depending on the unit of analysis chosen. As noted earlier, the most common unit of analysis in sociology is the person; this is probably the type of research with which you are most familiar. If we conduct a national poll to see how gender influences voting decisions or how race influences views on the state of the economy, we are studying characteristics, or variables, involving people, and the person is the unit of analysis. Another common unit of analysis in sociology is the organization. Suppose we conduct a study of hospitals to see whether the patient-to-nurse ratio (the number of patients divided by the number of nurses) is related to the average number of days patients stay in the hospital. In this example, the patient-to-nurse ratio and the average number of days patients stay are both characteristics of the hospital, and the hospital is the unit of analysis. A third unit of analysis in sociology is the geographical location, whether it is cities, states, regions of a country, or whole societies. For instance, if a sociologist is interested in regional differences in death rates due to COVID-19, they may track the number of available ventilators, open hospital beds, or even the differential in types of hospitals (e.g., hospitals with and without ICU beds) located in rural versus urban areas. Another sociologist might be interested in how the economic hardships catalyzed by the 2007-2009 Great Recession, which resulted in the closure of many rural hospitals in the U.S., later impacted people in rural communities during the COVID-19 pandemic as they sought medical treatment.

image of sign stating "save our hospital"

Numerous rural hospitals have closed since the 2008 economic recession. Many factors have contributed to this, including aging facilities, low rates of insurance in rural populations and business decisions by corporate owners. Rural hospital closures result in low access to care, especially emergency care. Hyun Namkoong

Measuring Variables and Gathering Data

After the hypothesis has been formulated, the sociologist is now ready to begin the actual research. Data must be gathered via one or more of the research designs examined later in this chapter, and variables must be measured. Data can either be (numerical) or (nonnumerical). Data gathered through a questionnaire are usually quantitative. The answers a respondent gives to a questionnaire are coded for computer analysis. For example, if a question asks whether respondents consider themselves to be politically conservative, moderate, or liberal, those who answer “conservative” might receive a “1” for computer analysis; those who are “moderate” might receive a “2”; and those who say “liberal” might receive a “3.”

Data gathered through observation and/or intensive interviewing, research designs discussed later in this chapter, are usually qualitative. If a researcher interviews college students at length to see what they think about dating violence and how seriously they regard it, the researcher may make simple comparisons, such as “most” of the interviewed students take dating violence very seriously, but without really statistically analyzing the in-depth responses from such a study. Instead, the goal is to make sense of what the researcher observes or of the in-depth statements that people provide to an interviewer and then to relate the major findings to the hypothesis or topic the researcher is investigating.

The measurement of variables is a complex topic and lies far beyond the scope of this discussion. Suffice it to say that accurate measurement of variables is essential in any research project. In a questionnaire, for example, a question should be worded clearly and unambiguously. Take the following question, which has appeared in national surveys: “Do you ever drink more than you think you should?” This question is probably meant to measure whether the respondent has an alcohol problem. But some respondents might answer yes to this question even if they only have a few drinks per year if, for example, they come from a religious background that frowns on alcohol use; conversely, some respondents who drink far too much might answer no because they do not think they drink too much. A researcher who interpreted a yes response from the former respondents as an indicator of an alcohol problem or a no response from the latter respondents as an indicator of no alcohol problem would be in error.

As another example, suppose a researcher hypothesizes that younger couples are happier than older couples. Instead of asking couples how happy they are through a questionnaire, the researcher decides to observe couples as they walk through a shopping mall. Some interesting questions of measurement arise in this study. First, how does the researcher know who is a couple? Second, how sure can the researcher be of the approximate age of each person in the couple? The researcher might be able to distinguish people in their 20s or early 30s from those in their 50s and 60s, but age measurement beyond this gross comparison might often be in error. Third, how sure can the researcher be of the couple’s degree of happiness? Is it possible to determine how happy a couple is by watching them for a few moments in the mall? What exactly does happiness look like, and do all people look this way when they are happy? These and other measurement problems in this study might be so severe that the study should not be done, at least if the researcher hopes to publish it.


After any measurement issues have been resolved, it is time to gather the data. For the sake of simplicity, let’s assume the unit of analysis is the person. A researcher who is doing a study “from scratch” must decide which people to study. Because it is certainly impossible to study everybody, the researcher only studies a , or subset of the population of people in whom the researcher is interested. Depending on the purpose of the study, the population of interest varies widely: it can be the adult population of the United States, the adult population of a particular state or city, all young women aged 13–18 in the nation, or countless other variations.

Many researchers who do survey research (discussed in a later section) study people selected for a of the population of interest. In a random sample, everyone in the population (whether it be the whole U.S. population or just the population of a state or city, all the college students in a state or city or all the students at just one college, and so forth) has the same chance of being included in the survey. The ways in which random samples are chosen are too complex to fully discuss here but suffice it to say the methods used to determine who is in the sample are equivalent to flipping a coin or rolling some dice. The beauty of a random sample is that it allows us to generalize the results of the sample to the population from which the sample comes. This means that we can be fairly sure of the attitudes of the whole U.S. population by knowing the attitudes of just 400 people randomly chosen from that population.

Other researchers use , in which members of the population do not have the same chance of being included in the study. If you ever filled out a questionnaire after being approached in a shopping mall or on campus, you were likely part of a nonrandom sample. While the results of the study (marketing research or social science research) for which you were interviewed might have been interesting, they could not necessarily be generalized to all students or all people in a state or in the nation because the sample for the study was not random.

image of high school library

High school classes often are used as a convenience sample in sociological and other social science research. Pixabay Pexels

A specific type of nonrandom sample is the , which refers to a nonrandom sample that is used because it is relatively quick and inexpensive to obtain. If you ever filled out a questionnaire during a high school or college class, as many students have done, you were very likely part of a convenience sample—a researcher can simply go into class, hand out a survey, and have the data available for coding and analysis within a few minutes. Convenience samples often include students, but they also include other kinds of people. When prisoners are studied, they constitute a convenience sample, because they are not going anywhere. Partly because of this fact, convenience samples are also sometimes called captive-audience samples.

Another specific type of nonrandom sample is the . In this type of sample, a researcher tries to ensure that the makeup of the sample resembles one or more characteristics of the population as closely as possible. For example, on a campus of 10,000 students where 60% of the students are women and 40% are men, a researcher might decide to study 100 students by handing out a questionnaire to those who happen to be in the student center building on a particular day. If the researcher decides to have a quota sample based on gender, the researcher will select 60 women students and 40 male students to receive the questionnaire. This procedure might make the sample of 100 students more representative of all the students on campus than if it were not used, but it still does not make the sample entirely representative of all students. The students who happen to be in the student center on a particular day might be very different in many respects from most other students on the campus.

As we shall see later when research design is discussed, the choice of a design is very much related to the type of sample that is used. Surveys lend themselves to random samples, for example, while observation studies and experiments lend themselves to nonrandom samples.

Analyzing Data

After all data have been gathered, the next stage is to analyze the data. If the data are quantitative, the analysis will almost certainly use highly sophisticated statistical techniques beyond the scope of this discussion. Many statistical analysis software packages exist for this purpose, and sociologists learn to use one or more of these packages during graduate school. If the data are qualitative, researchers analyze their data (what they have observed and/or what people have told them in interviews) in ways again beyond our scope. Many researchers now use qualitative analysis software that helps them uncover important themes and patterns in the qualitative data they gather. However qualitative or quantitative data are analyzed, it is essential that the analysis be as accurate as possible. To go back to a point just made, this means that variable measurement must also be as accurate as possible, because even expert analysis of inaccurate data will yield inaccurate results. As a phrase from the field of computer science summarizes this problem, “garbage in, garbage out.” Data analysis can be accurate only if the data are accurate to begin with (Blackstone, 2012).

Analysis of variables includes frequency distributions and measures of central tendency. A frequency distribution is a way of summarizing the distribution of responses on a single survey question. Let’s take a look at Figure 2.4: “Frequency Distribution of Older Workers’ Financial Security” that shows us the frequency distribution for just one variable (opinion of financial security) from a survey of older workers. We’ll analyze the item on respondents’ self-reported financial security.

Figure 2.4 Frequency Distribution of Older Workers’ Financial Security

In general, how financially secure would you say you are?




Not at all secure




Between not at all and moderately secure




Moderately secure




Between moderately and very secure




Very secure




Total valid cases = 180

No response = 3


As you can see in the frequency distribution on self-reported financial security, more respondents reported feeling “moderately secure” than any other response category. We also learn from this single frequency distribution that fewer than 10% of respondents reported being in one of the two most secure categories.

tell us what the most common, or average, response is on a question. There are three kinds of measures of central tendency: modes, medians, and means. refers to the most common response given to a question. A is the middle point in a distribution of responses, as in where half the responses occur above, and half occur below. And, finally, to obtain a , one must add the value of all responses on a given variable and then divide that number by the total number of responses.

In the previous example of older workers’ self-reported levels of financial security, the appropriate measure of central tendency would be the median, as this is an ordinal-level variable. Ordinal level variables can be rank ordered (for example, 3 = low, 2 = medium, 1 = high) though we cannot calculate a mathematical distance between those attributes. We can simply say that one attribute of an ordinal-level variable is more or less than another attribute. Ordinal-level attributes are also mutually exclusive. Examples of ordinal-level measures include social class, degree of support for policy initiatives, television program rankings, and prejudice. While we can say that one person’s support for some public policy may be more or less than his neighbor’s level of support, we cannot say exactly how much more or less.

If we were to list all responses to the financial security question in order and then choose the middle point in that list, we’d have our median. In Figure 2.5 “Distribution of Responses and Median Value of Workers’ Financial Stability” (below), the value of each response to the financial security question is noted, and the middle point within that range of responses is highlighted.

Figure 2.5 Distribution of Responses and Median Value on Workers’ Financial Stability

Figure 2.3 Distribution of Responses and Median Value on Workers’ Financial Stability. Responses and data are as follows: Not at all secure, frequency of responses = 46, Total values of responses = 46; between not at all and moderately secure, , frequency of responses = 43, Total values of responses = 86; Moderately secure, , frequency of responses = 76, Total values of responses = 228; Between moderately and very secure, , frequency of responses = 11, Total values of responses = 44; Very secure, , frequency of responses = 4, Total values of responses = 20; and Total, , frequency of responses = 180, Total values of responses = 424.


To find the middle point, we simply divide the number of valid cases by two. The number of valid cases, 180, divided by 2 is 90, so we’re looking for the 90th value on our distribution to discover the median. As you can see in Figure 2.5, “Distribution of Responses and median Value of Worker’s Financial Stability,” that value is 3 (the figure is in bold and underlined), thus the median on our financial security question is 3, or “moderately secure.”

Test Yourself

Criteria of Causality

An important part of the initial stages of analysis involves determining whether causality exists between correlated variables. A between variables simply means that there is a relationship between variables (or between the behaviors that we are attributing to the variables). For instance, let’s say we had questions about the patterns between the time students spend on the internet and the grades they earn. Our hypothesis could be that the more students spend on the internet, the lower their grades will be. In order to test this hypothesis, first we’d have to determine causality, or in this case, whether time spent on the internet actually causes grades to go down. refers to the idea that one event, behavior, or belief will result in the occurrence of another, subsequent event, behavior, or belief. In other words, it is about cause and effect.

As researchers analyze their data, they naturally try to determine whether their analysis supports their hypothesis. As noted above, when we test a hypothesis, we want to be able to conclude that an independent variable (time spent on the internet) affects a dependent variable (grades). Four criteria must be satisfied before we can conclude this (see Table 2.1 “Criteria of Causality”).

Table 2.1 Criteria of Causality

Criteria For Causality

1. The independent and dependent variables must be statistically related.

2. The independent variable must precede the dependent variable in time and/or logic.

3. The relationship between the independent and dependent variables must not be spurious.

4. No better explanation exists for the relationship between the independent and dependent variables.


First, the independent variable and the dependent variable must be statistically related. That means that the independent variable makes a statistical difference for where one ranks on the dependent variable. Suppose we hypothesize that age was related to voting preference in the 2020 presidential election. Here age is clearly the independent variable and voting preference the dependent variable. (It is possible for age to affect voting preference, but it is not possible for voting preference to affect age.) Exit poll data indicate that 65% of 18-to 24-year-olds voted for Biden in 2020, while only 47% of those 65 and older voted for him. The two variables are thus statistically related, as younger voters were more likely than older voters to prefer Biden (Cable News Network, 2021).

The second criterion is called the causal order (or chicken-and-egg) problem and reflects the familiar saying that “correlation does not mean causation.” Just because an independent and a dependent variable are related does not automatically mean that the independent variable affects the dependent variable. It might well be that the dependent variable is affecting the independent. To satisfy this criterion, the researcher must be sure that the independent variable precedes the dependent variable in time or in logic. In the example just discussed, age might affect voting preference, but voting preference definitely cannot affect age. However, causal order is not as clear in other hypotheses. For example, suppose we find a statistical relationship between marital happiness and job satisfaction: the happier people are in their marriage, the more satisfied they are with their jobs. Which makes more sense, that having a happy marriage leads you to like your job more, or that being satisfied with your work leads you to have a happier marriage? In this example, causal order is not very clear, and thus the second criterion is difficult to satisfy.

The third criterion involves . A relationship between an independent variable and dependent variable is spurious if a third variable accounts for the relationship because it affects both the independent and dependent variables. Although this sounds a bit complicated, an example or two should make it clear. If you did a survey of Americans 18 and older, you would find that people who attend college have worse acne than people who do not attend college. Does this mean that attending college causes worse acne? Certainly not. You would find this statistical relationship only because a third variable, age, affects both the likelihood of attending college and the likelihood of having acne: young people are more likely than older people to attend college, and also more likely—for very different reasons—to have acne. Controlling for age makes it clear that the original relationship between attending college and having acne was spurious, as shown in Figure 2.6 “Diagram of a Spurious Relationship.”

Figure 2.6 Diagram of a Spurious Relationship

Graphic showing age in box, pointing to two other boxes labeled attending college and acne

In another example, the more fire trucks at a fire, the more damage the fire causes. Does that mean that fire trucks somehow make fires worse, as the familiar saying “too many cooks spoil the broth” might suggest? Of course not! The third variable here is the intensity of the fire: the more intense the fire, the more fire trucks respond to fight it, and the more intense the fire, the more damage it causes. The relationship between the number of fire trucks and damage the fire causes is spurious.

The final criterion of causality is that our explanation for the relationship between the independent and dependent variables is the best explanation. Even if the first three criteria are satisfied, that does not necessarily mean the two variables are in fact related. For example, the U.S. crime rate dropped in the early 1980s, and in 1984 the reelection campaign of President Ronald Reagan took credit for this drop. This relationship satisfied the first three criteria: the crime rate fell after President Reagan took office in 1981, the drop in the crime rate could not have affected the election of this president, and there was no apparent third variable that influenced both why Reagan was elected and why the crime rate fell. However, social scientists pointed to another reason that accounted for the crime rate decrease during the 1980s: a drop in the birth rate some 15–20 years earlier, which led to a decrease during the early 1980s of the number of U.S. residents in the high-crime ages of 15–30 (Steffensmeier & Harer, 1991). The relationship between the election of Ronald Reagan and the crime rate drop was thus only a coincidence.

Drawing a Conclusion

Once the data are analyzed, the researcher finally determines whether the data analysis supports the hypothesis that has been tested, considering the criteria of causality just discussed. Whether or not the hypothesis is supported, the researcher (if writing for publication) typically also discusses what the results of the present research imply for both prior and future studies on the topic. If the primary purpose of the project has been to test or refine a particular theory, the conclusion will discuss the implications of the results for this theory. If the primary purpose has been to test or advance social policy, the conclusion will discuss the implications of the results for policy making relevant to the project’s subject matter.

Test Yourself


Section 2.2 References

Blackstone, A. (2012). Principles of sociological inquiry: Qualitative and Quantitative Methods.  University of Maine.

Cable News Network. (n.d.). National results 2020 president Exit Polls. CNN. Retrieved from https://edition.cnn.com/election/2020/exit-polls/president/national-results

Steffensmeier, D.,  and M. D. Harer. (1991). Did crime rise or fall during the Reagan presidency? The effects of an “aging” U.S. population on the nation’s crime rate. Journal of Research in Crime and Delinquency, 28(3), 330–359.

CC licensed content, Shared previously and Adapted:

Barr, Scott, Sarah Hoiland, Shailaja Menon, Cathay Matresse, Florencia Silverira and Rebecca Vonderhaar.  (n.d.) Introduction to sociology. Introduction to Sociology | Simple Book Production. Lumen Learning.  License: CC BY 4.0. License Terms:  Access for free at https://courses.lumenlearning.com/wm-introductiontosociology/.

Conerly, Tonja, Kathleen Holmes, Asha Lal Tamang, Jennifer Hensley, Jennifer L. Trost, Pamela Alcasey, Kate McGonigal, Heather Griffiths, Nathan Keirns, Eric Strayer, Tommy Sadler, Susan Cody-Rydzewski, Gail Scaramuzzo, Sally Vyain, Jeff Bry and Faye Jones. (2021).  Introduction to Sociology 3E. OpenStax. Houston, TX.  License: CC BY 4.0.  License Terms:  Access for free at https://openstax.org/books/introduction-sociology-3e/pages/1-introduction.

Griffiths, Heather, Nathan Keirns, Eric Stayer, Susan Cody-Rydzewski, Gail Scaramuzzo, Tommy Sadler, Sally Vyain, Jeff Bry and Faye Jones.  (2015).  Introduction to Sociology 2E. OpenStax. Houston, TX.  License: CC BY 4.0.  License Terms:  Access for free at https://openstax.org/books/introduction-sociology-2e/pages/1-introduction-to-sociology.

Saylor Foundation.  (2015). Social Problems: Continuity and Change. License:  CC BY-NC-SA 3.0.  License Terms:  Access for free at https://saylordotorg.github.io/text_social-problems-continuity-and-change/.


Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Exploring Our Social World: The Story of Us by Jean M. Ramirez; Suzanne Latham; Rudy G. Hernandez; and Alicia E. Juskewycz is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book