Different types of quantitative data analysis

Descriptive analysis is the very first stage of data analysis. The objective of descriptive analysis is to summarize the data in meaningful ways in order to suggest potential patterns or trends. Descriptive analysis involves ‘descriptive statistics’, which are methods of summarizing and describing the main features of a dataset. Descriptive statistics are differentiated from inferential statistics, which are methods of making claims about an entire population on the basis of data collected among a portion of the population. Examples of descriptive statistics include measures of frequency, measures of central tendency and measures of dispersion. 

  • Measures of frequency, such as numbers or percentages of cases, tell us how often a certain event, response or profile is likely to occur.
  • Measures of central tendency, such the mean, mode or median, tell us which event, response or profile is the most common among the sample.
  • Measures of dispersion, such as the range, variance or standard deviation, tell us how spread out the responses are across a given range. 

The data can be summarized in tables, for example by presenting cross-tabulations of different variables in your dataset. The data can also be summarized through visualizations such as bar charts, pie charts, box plots, histograms or scatterplots.

Examples of descriptive statistics

Figure 1: Cross-tabulation of migrants’ English-speaking skills by visa category 

P2C5F1

(Source: Chiswick et al., 2006)

Figure 2: Treemap of reasons for migration

P2C5F2

(IOM, 2023)

Figure 3: Stacked bar chart of migrant deaths by year and region of the world

P2F5F3

(IOM, 2022)

Figure 4: Proportional symbol map of international migrant populations in West & Central Africa

P2C5F4

(IOM, 2021)

Exploratory data analysis is used to see what data can reveal beyond theory-based hypothesis testing (IBM, 2024). It provides an overview of how the variables in your dataset are related and might therefore shed light on unexpected relationships. 

  • Cluster analysis is an exploratory analysis that tries to identify structures within the data by grouping cases into categories that were previously unknown. 
  • Exploratory factor analysis is a dimension reducing analysis aimed at identifying the underlying structure of the data. For example, migrant integration is often presented as a multidimensional process, involving economic, social and political aspects and the drivers and dynamics of migrant integration are likely to differ across these dimensions. Exploratory factor analysis can show how a broad range of integration indicators are structured into different dimensions based on underlying commonalities. Fajth and Lessard-Phillips’ (2023) factor analysis of 18 common indicators of migrant integration available in the European Social Survey revealed five ‘empirical’ dimensions: economic integration; health; cultural and political integration; minority socialization; and subjective well-being. By reducing integration indicators into dimensions, one might investigate how migrant groups compare across the five indicators or whether the factors influencing integration depend on the dimension at hand.
  • Latent class analysis is another way of uncovering hidden groups in data. This technique enables researchers to identify groups of respondents, or ‘profiles’, sharing similar patterns of response across multiple indicators. Luthra and colleagues (2016) use latent class analysis to develop a typology of Polish migrants in Western European on the basis of migrants’ reasons for migration, prior experience of migration and intentions to stay. They identified six different profiles with distinct combinations of the aforementioned variables: “traditional circular”; “short-term accumulator”; “committed expat”; “living and learning”; “follower”; and “adventurer”.  

Correlational analysis establishes a relationship between two variables, referred to as the dependent variable, i.e., the outcome, and the independent variable, i.e., the predictor. Whereas the dependent variable is plotted on the “y axis” or vertical axis (see share of overcrowded households in figure below), the independent variable is plotted on the “x axis” or horizontal axis (see immigrant share of population in figure below). The magnitude, direction and statistical significance of the relationship can be established through bivariate analysis methods, or modelled in a simple linear regression. 

However, it is important to note that correlation does not imply causality. Various factors must be carefully considered to avoid drawing erroneous conclusions. A correlation may appear significant but could be "spurious," indicating either a coincidental relationship or the influence of an unaccounted third confounding factor. For example, the scatterplot in Figure 1 shows a significant correlation between the share of immigrants in the population and the share of overcrowded households. While a correlation exists between immigration and overcrowding, it is incorrect to infer a causal relationship without considering other relevant variables. Factors such as employment opportunities, rural-urban migration or access to public services may simultaneously influence both immigration and overcrowding, confounding the observed relationship.

These complexities can be addressed through multivariate regression analysis which estimates the effect of the independent variable on the dependent variable “net of” or “controlling for” any other independent variables included in the model. By incorporating these additional factors, researchers can more accurately determine the true impact of the variable of interest, reducing the risk of spurious association. Furthermore, it is essential to consider the possibility of reverse causality, the possibility that the relationship goes in the opposite direction. For example, if remittances are sent to cushion bad economic (or health) shocks one might find a positive correlation between remittances and poverty (or health outcomes) and be tempted to draw the incorrect conclusion that remittances lead to worsened outcomes (McKenzie and Sasin, 2007).

Figure 1: Scatterplot of the correlation between immigration overcrowding in United States Counties 

P2C5F5

(Camarota and Zeigler, 2020)

Experimental analysis establishes a cause-effect relationship between two variables by showing that a change in the independent variable directly leads to a change in the dependent variable. This is usually done in an experiment whereby participants are randomly assigned to different groups, including a control group and one or more treatment groups, where they undergo specific ‘manipulations’ or interventions by the researcher(s). 

  • In their randomized field experiment exploring methods to stimulate migrants’ savings in their home country, Ashraf and colleagues (2015) discovered that migrants place significant value on having greater control over their financial activities in their home country. By offering US based migrants bank accounts in El Salvador with varying degrees of control, the researchers found that migrants offered accounts that were solely in the name of the migrant were more likely to open an account and deposit savings than those offered accounts that were in the name of the recipient or jointly in the name of the migrant and recipient.
  • Experimental designs are usually constrained by smaller sample sizes and may be difficult to implement in natural settings. Therefore, survey experiments, which can be conducted among larger samples of respondents are an appealing alternative. Hainmueller and colleagues (2015) designed a survey experiment to study discrimination against immigrants in naturalization processes. Swiss respondents were presented with profiles of immigrants and asked to decide on their application for naturalization. The immigrant profiles varied on seven attributes including sex, country of origin, age, years since arrival, education, language skills and integration status. Each attribute could take on various values, which were randomly chosen to form the immigrant profiles presented to respondents. The randomization of the values enabled the researchers to determine which profiles were discriminated against in naturalization processes.

Table 1: Different types of quantitative data analysis (UNHCR, 2016)

Exploratory

Aims to discover the data and identify potential patterns, signals, and stories that are to be confirmed. It also assesses the relevance, completeness and reliability of the data. Exploration helps to understand not just what the data covers, but also what it represents, what seems wrong and what is potentially missing.

Conduct preliminary interviews with a small sample of refugees and migrants to identify the key factors that affected their decisions to leave, as part of efforts to identify the components of a larger study.

Descriptive

Aims to summarize and compare the data to answer basic questions regarding who, what, when, why and how. It generalizes the data through categories and aggregation, describes, compares, and seeks patterns, anomalies and trends.

Tabulate how many refugees and migrants are arriving where and from which countries over time.

Explanatory

Aims to connect and relate the data, to answer the why question. It identifies relationships, associations, correlations and other connections between data to develop plausible explanations and identify underlying processes, drivers and factors.

Conduct logistic regression on the probability of being a victim of sex and gender-based violence during the migrant journey according to various socioeconomic, geographic and cultural variables, based on a sample survey of refugees and migrants arriving by boat.

Interpretive

Aims to identify implications and conclusions. It moves beyond findings towards drawing and evaluating conclusions based on the strength of the evidence, argumentation and context.

Conduct a meta-analysis of research on the impact of allowing refugees and asylum-seekers to seek work in different countries, assess the quality of the evidence available, and make evidence-informed policy recommendations.

Anticipatory

Aims to predict, forecast and ascertain the likelihood of future trends, scenarios and outcomes based on current and historical data.

Cross-tabulate historical data on the impact of weather and political events on the rate of arrivals in a particular location to anticipate future arrivals.