Indian Millennial: Quantile Selection in the Gender Pay Gap

The provided sources outline a novel semiparametric methodology designed to estimate the selection-corrected gender wage gap across the entire wage distribution. This approach addresses the issue of non-random selection into employment, where individuals with higher earning potential are more likely to work full-time, potentially biasing observed wage gap measurements.

Core Methodological Framework

The methodology builds upon the Roy model of labor supply, where individuals choose to work if their potential wage exceeds their reservation wage. The identification of the selection-corrected wage distribution relies on three primary components:

Instrumental Variables (IV): The method uses instruments ($W$) that explain variation in the latent outcome (potential wages) but do not directly affect the selection mechanism once those latent wages and other observed characteristics are held fixed.
Rank Invariance: In the context of the Roy model, this is interpreted as a condition where the instrument provides no additional information about an individual's position (rank) in the reservation wage distribution, conditional on their potential wage.
Semiparametric Flexibility: Unlike the classical Heckman correction or other parametric models, this approach does not impose specific functional forms on the selection probability or the distribution of unobserved factors.

The Three-Step Estimation Procedure

The sources detail a specific algorithm (Algorithm 1) to implement this methodology:

Estimation of Inverse Selection Probabilities: The inverse selection probability function, $g(Y,X)$, is estimated via a flexible B-spline estimator using a conditional moment restriction. This step uses the instrument ($W$) to help identify the selection mechanism without a parametric model.
Imposing Shape Restrictions: To ensure the stability of the weights, the unconstrained estimates are projected onto a cone of functions bounded below by 1. This ensures that the estimated probability of selection remains valid (between 0 and 1).
Weighted Quantile Regression: Finally, the conditional quantile function of the latent outcome is estimated using inverse probability weighting (IPW). The weights ($D \hat{g}(Y,X)$) account for the non-random selection, allowing for the recovery of the wage distribution as if the entire population were employed full-time.

The "Initial Wage" as an Instrument

In the empirical application to German administrative data, the authors use an individual’s initial wage (their earliest recorded wage) as the primary instrument. The methodological justification is that early-career wages capture persistent individual heterogeneity—such as ability, motivation, and social skills—that shapes long-run earning potential but does not directly dictate the current employment decision once current potential wages are accounted for. To satisfy the exclusion restriction and avoid issues with recent economic shocks or state dependence, the authors require this initial wage to be from the distant past (at least two years prior to the observation period).

Methodology in the Context of Existing Literature

This approach distinguishes itself from other common methods in the field:

Versus MAR (Missing at Random): Traditional MAR methods assume selection depends only on observed characteristics. The authors demonstrate that MAR can lead to over-adjustment or bias when unobserved selection (selection on unobservables) is present, which their IV-based method accounts for.
Versus Parametric Heckman/JEE Models: While models like those by Arellano and Bonhomme or Yu et al. also address quantile selection, they often rely on specific parametric assumptions about the selection process or different exclusion restrictions (like out-of-work benefit income). The proposed semiparametric IV method is more robust to model misspecification because it avoids these restrictions.

The empirical application described in the sources utilizes a massive dataset of German administrative social security records (SIAB) to analyze the gender wage gap. This application is significant because it moves beyond simple mean comparisons, using a new semiparametric IV methodology to account for non-random selection into the labor force across the entire wage distribution.

Data and Methodology Overview

The study focuses on a 2017 cross-section of German nationals aged 25–50. The researchers analyze gross daily wages in full-time employment, accounting for the fact that a large portion of the population (especially 52% of women) works part-time or is non-employed.

The core of the empirical strategy is the use of the "initial wage" (the earliest recorded wage in an individual's history) as an instrumental variable. The authors argue that this instrument captures persistent individual heterogeneity—such as ability and motivation—that affects long-term earnings potential without directly influencing current labor supply decisions once current potential wages are controlled for.

Key Findings on Selection Patterns

The application reveals that both men and women are positively selected into full-time employment, meaning those who work full-time generally have higher potential wages than those who do not. However, the intensity of this selection varies dramatically by gender and education:

Women and the "Bottom" of the Distribution: For women, positive selection is most pronounced at the lower and median quantiles, particularly among those with low education (no vocational training). When correcting for this, estimated wages for the broader female population are significantly lower than observed wages, causing the gender pay gap at the 25th percentile to widen.
Men and the "Top" of the Distribution: For men, the strongest selection effects are found among the highly educated at the upper quantiles. This suggests that highly productive men are disproportionately represented in top full-time roles. Because this male selection is so strong at the top, correcting for it actually narrows the gender pay gap at higher quantiles.

Contextual Significance

The sources highlight that this empirical application helps reconcile conflicting findings in existing literature.

Widening vs. Narrowing Gaps: By showing that selection varies by quantile, the authors explain why some studies (like Maasoumi and Wang) find widening selection-corrected gaps, while others (like Arellano and Bonhomme) find them narrowing.
Critique of MAR: The application demonstrates that traditional Missing at Random (MAR) assumptions—which assume selection only depends on observed traits—can lead to over-adjustment or biased estimates. Their IV-based method provides a more representative picture of the potential wage distribution by accounting for unobserved attributes like ambition and productivity.

Ultimately, the empirical application proves that gender wage gaps persist across all education levels and quantiles, but their magnitude is heavily influenced by complex, gender-specific selection patterns that vary across the economic spectrum.

The key findings in the sources highlight that non-random selection into employment significantly distorts observed gender wage gaps, but the nature of this distortion varies dramatically across the wage distribution and between genders.

General Selection Patterns

The research establishes that both men and women are positively selected into full-time employment. This means individuals with higher potential earnings are more likely to work full-time, while those with lower potential wages are more often part-time or non-employed. Consequently, simply looking at the wages of full-time workers overstates the overall wage levels for the entire population.

Heterogeneous Selection by Gender and Quantile

The study uncovers two distinctive, gender-specific patterns of selection:

Women at the Lower Tail: Selection effects are strongest for women at the lower end and median of the wage distribution. For this group, correcting for selection reveals that potential wages are much lower than observed wages, as those with the lowest earning potential are the most likely to be absent from the full-time workforce.
Men at the Upper Tail: For men, the strongest positive selection occurs among the highly educated at the top of the distribution. This suggests that highly productive men are disproportionately represented in top-tier full-time roles, driven by factors like career-oriented incentives and ambition.

Impact on the Gender Wage Gap

Accounting for these selection patterns leads to significant revisions of the gender pay gap at different points in the distribution:

Widening at the Bottom: Because positive selection is so pronounced for women at lower quantiles, correcting for it causes the gender wage gap to increase in the lower half of the distribution. For example, the gap at the 25th percentile increases by roughly 2 percentage points after correction.
Narrowing at the Top: Conversely, the strong positive selection among highly productive men at the upper end means that correcting for their selection narrows the gender wage gap at higher quantiles.

Findings by Education Level

The sources emphasize that the role of selection differs by educational attainment:

Low Education: The gender wage gap at the median for those with no vocational training increases from 3.6% to 10.7% once selection is corrected.
High Education: For university graduates, the selection-corrected gap is slightly lower than the uncorrected gap (dropping by about 2 percentage points at the median) because the correction for high-earning men is more substantial than for women in the same group.

Persistence of the Gap

A central finding is that while selection correction changes the magnitude of the disparity, the gender wage gap persists across all education levels and every quantile. Men consistently out-earn women regardless of whether selection is accounted for or which estimation method is used.

The provided sources position their research within a rich history of labor economics and statistical modeling, specifically addressing the challenge of non-random selection into employment when measuring the gender pay gap.

Foundations and the Selection Problem

The literature on the gender pay gap traditionally acknowledges that employment is not random; individuals with higher earning potential are more likely to work full-time. Early foundational work by **Gronau ** and **Heckman ** applied the **Roy model ** to labor supply decisions, suggesting that individuals choose to work only if their potential wage exceeds their reservation wage.

Classical Mean-Based Corrections

Historically, most studies focused on average wages and addressed selection using the **classical Heckman correction **. These models typically relied on specific exclusion restrictions to identify the selection equation—variables that affect the decision to work but not the wage itself. Common instruments included:

Family characteristics: Number and age of children (e.g., Mulligan and Rubinstein ).
Policy variation: Differences in the tax and transfer system (e.g., Blundell et al. ).

However, the sources note that these classical restrictions are often difficult to justify, particularly for men.

Evolution to Quantile Selection Models

As the literature evolved, researchers recognized that selection patterns and their effects could vary across the wage distribution, leading to the development of quantile selection models.

Arellano and Bonhomme : Proposed a framework for selection correction across the entire wage distribution using out-of-work benefit income as an instrument. They found positive selection for both genders and reported that the selection-corrected gender wage gap was actually smaller than the uncorrected one in the UK.
Maasoumi and Wang : Found the opposite in the US, showing that selection-corrected gaps had increased over time across various quantiles.
Alternative Approaches: Some researchers developed bounds to address non-random selection without requiring point identification (Blundell et al. , Honoré and Hu ). Others used imputation-based methods relying on the Missing at Random (MAR) assumption, which assumes selection depends only on observed characteristics (Olivetti and Petrongolo , Blau et al. ).

Limitations of Existing Methodologies

The sources identify several gaps in the existing literature that their methodology aims to fill:

Parametric Restrictions: Recent contributions to quantile selection (Zhang and Wang , Yu et al. ) rely on parametric assumptions about the selection mechanism, which can lead to model misspecification.
MAR Assumption Biases: The sources argue that MAR-based imputation methods often over-adjust or produce biased estimates because they ignore unobserved factors like ambition and productivity that influence both wages and the decision to work.
Reconciling Conflict: The current study’s findings help rationalize the conflicting results between Maasoumi and Wang (widening gaps) and Arellano and Bonhomme (narrowing gaps) by showing that selection patterns vary significantly by education level and quantile.

Famous quotes

Wednesday, April 15, 2026

Quantile Selection in the Gender Pay Gap