Technical writing
BLS OEWS: The Federal Database Behind Wage Statistics for 830 Occupations Across the US Economy
The Bureau of Labor Statistics Occupational Employment and Wage Statistics (OEWS) survey is the federal government's primary source for wage benchmarking across the US labor market. Every year it publishes mean wages, median wages, and wage percentiles from the 10th through the 90th for 830 occupations — covering roughly 57 million wage-and-salary workers across every state, metropolitan area, and nonmetropolitan region in the country.
What OEWS Is and Why It Exists
Employers, policymakers, researchers, and workers all need a reliable answer to the same question: what does this job pay? The OEWS program exists to provide that answer at scale. Originally called the Occupational Employment Statistics (OES) survey, it was renamed in 2021 to Occupational Employment and Wage Statistics to better reflect the fact that wages — not just headcounts — are the product most users care about.
The survey covers non-farm wage-and-salary workers employed in the fifty states, the District of Columbia, and Puerto Rico. It excludes the self-employed, agricultural workers, private household workers, unpaid family workers, and members of the armed forces. Federal government workers are included with some caveats around data confidentiality. The result is an annual snapshot of the wage structure of the US economy that no private data provider can replicate at comparable geographic depth.
Each annual release covers approximately 1.1 million establishments sampled over a rolling three-year window. BLS conducts two semiannual panels per year — one in November and one in May — and each establishment is surveyed exactly once during the three-year cycle. The May release, which arrives roughly six months after the reference period, publishes the definitive annual estimates. The November panel feeds into the following year's release. This rolling design keeps the sample large enough to support estimates at the metropolitan statistical area level for hundreds of occupations simultaneously.
The key statistics published for each occupation-area combination are: mean hourly wage, mean annual wage, median annual wage (the 50th percentile), hourly wage percentiles at the 10th, 25th, 75th, and 90th, and total employment in the occupation. Annual equivalents of all hourly figures are also provided, calculated on a 2,080-hour work year. Where employment or wage precision falls below BLS confidentiality thresholds, the cell is suppressed.
The SOC System: How 830 Occupations Are Defined
OEWS uses the 2018 edition of the Standard Occupational Classification (SOC) system, the federal taxonomy maintained by the Office of Management and Budget. The SOC is hierarchical: 23 major groups break into 98 minor groups, which break into 461 broad occupations, which break into 867 detailed occupations at the six-digit code level. OEWS covers 830 of those detailed occupations, omitting agricultural, private household, and military categories that fall outside the survey's scope.
The 23 major groups span the full range of economic activity. Management occupations carry codes beginning with 11-0000. Business and Financial Operations occupations begin with 13-0000. Computer and Mathematical occupations begin with 15-0000. Architecture and Engineering occupations begin with 17-0000. Life, Physical, and Social Science occupations begin with 19-0000. Community and Social Service occupations begin with 21-0000. Legal occupations begin with 23-0000. Education, Training, and Library occupations begin with 25-0000. Arts, Design, Entertainment, Sports, and Media occupations begin with 27-0000.
Healthcare Practitioners and Technical occupations (29-0000) and Healthcare Support occupations (31-0000) are separated into two major groups because the wage and training profiles differ so dramatically. Protective Service occupations begin with 33-0000. Food Preparation and Serving occupations begin with 35-0000. Building and Grounds Cleaning and Maintenance occupations begin with 37-0000. Personal Care and Service occupations begin with 39-0000. Sales and Related occupations begin with 41-0000. Office and Administrative Support occupations begin with 43-0000. Farming, Fishing, and Forestry occupations begin with 45-0000. Construction and Extraction occupations begin with 47-0000. Installation, Maintenance, and Repair occupations begin with 49-0000. Production occupations begin with 51-0000. Transportation and Material Moving occupations begin with 53-0000.
The highest-paying detailed occupations in OEWS are uniformly in medicine. Anesthesiologists reported a mean annual wage of approximately $331,000 in recent releases — the highest of any occupation tracked. Oral and maxillofacial surgeons followed at roughly $317,000. Obstetricians and gynecologists came in near $296,000, general surgeons near $288,000, and orthodontists near $270,000. Psychiatrists averaged approximately $249,000. Among non-physician occupations, chief executives were the highest-paid group at around $246,000 mean annual wage. Petroleum engineers averaged approximately $157,000. Architectural and engineering managers averaged approximately $158,000. Air traffic controllers averaged approximately $132,000.
At the detailed occupation level within Computer and Mathematical (15-0000), the wage distribution has grown steadily more dispersed over the past decade as artificial intelligence and cloud infrastructure roles have commanded increasingly large premiums. Software quality assurance analysts sit near the bottom of the range within the major group while senior software architects and machine learning engineers — typically captured under Software Developers (15-1252) or Computer and Information Research Scientists (15-1221) — sit near the top. The SOC system's broad occupation codes can mask this internal dispersion, which is one reason researchers often combine OEWS data with survey sources like the Current Population Survey when studying within-occupation wage inequality.
Geographic Coverage: 590+ Areas From National to Nonmetro
OEWS publishes wage estimates at six levels of geographic aggregation. At the national level, data are published both across all industries combined and separately for each of the twenty NAICS supersectors — allowing users to observe, for example, what accountants earn in Finance and Insurance versus Manufacturing. At the state level, all fifty states, the District of Columbia, and Puerto Rico receive their own files. Below the state level, OEWS covers 564 or more Metropolitan Statistical Areas and their metropolitan divisions, nonmetropolitan areas (the balance of each state outside MSA boundaries), and combined statistical areas in some supplemental publications.
Metropolitan Divisions are a special case. Large MSAs — the New York-Newark-Jersey City area being the canonical example — are subdivided into metropolitan divisions for OEWS purposes because the labor markets within them differ substantially. A nurse working in Nassau County faces a different local wage environment than one working in the Bronx, even though both fall within the broader New York metro area.
The geographic wage variation captured by OEWS is one of its most practically important features. For software developers (SOC 15-1252), the San Jose-Sunnyvale-Santa Clara, CA MSA — the core of Silicon Valley — consistently reports mean annual wages near $176,000, against a national mean near $124,000. That $52,000 gap compounds through career progression. For registered nurses (29-1141), the San Francisco-Oakland-Hayward MSA reports mean annual wages near $138,000 against a national mean near $82,000 — a 68 percent premium. For elementary school teachers (25-2021), Hawaii reports mean annual wages near $81,000 while Mississippi reports near $47,000, a gap attributable primarily to state funding formulas rather than labor market tightness.
The rural-urban wage premium for professional and technical occupations typically runs 15–25 percent when comparing MSA workers to their nonmetropolitan counterparts. The premium is largest for occupations concentrated in industry clusters that are themselves geographically concentrated: legal services in New York, oil and gas extraction in the Permian Basin, biomedical research in the Boston–Cambridge area. For service occupations with demand that tracks population density more uniformly — home health aides, food service workers, janitors — the urban premium is smaller and sometimes reversed once cost-of-living differences are accounted for.
Data suppression affects many geographic cells. When an occupation in a specific MSA employs too few workers, or when a single employer would be identifiable from the wage estimate, BLS suppresses the cell. Suppressed cells display “N/A” or a special character in the flat file. Researchers working with OEWS data must handle suppression carefully, because it is not random: suppressed cells are disproportionately small areas, rare occupations, or highly concentrated industries, all of which may be the precise observations of interest for a given analysis.
OEWS Data Structure and Flat File Fields
BLS distributes OEWS data as flat files — one row per occupation-area-industry combination — rather than a relational database. Understanding the field layout is essential for working with the data programmatically.
The area_type field identifies the geographic level: 1 for national, 2 for state, 3 for MSA, 4 for metropolitan division, 5 for nonmetropolitan area, and 6 for balance-of-state areas. The naics field holds a two-digit NAICS industry code or “000000” for cross-industry totals. Thenaics_title field provides a human-readable label. The own_codefield distinguishes ownership: 1 for all ownership combined, 2 for private industry, 3 for state and local government.
The occupation fields are occ_code (six-digit SOC code),occ_title (occupational title), and o_group which distinguishes whether the row represents a major group, minor group, broad occupation, or detailed occupation. Filtering to o_group = 'detailed' gives the most granular occupation-level estimates; filtering to o_group = 'major'gives the 23-group summary.
The emp field reports employment in actual workers (not thousands). The emp_prse field gives the percent relative standard error of the employment estimate — a measure of sampling uncertainty. Employment estimates with PRSE above 50 percent should be treated with caution. The wage fields split into hourly and annual variants: h_mean and a_mean for mean wages; h_median and a_median for median wages;h_pct10, h_pct25, h_pct75,h_pct90 for hourly percentiles and their annual equivalents prefixed with a_. The mean_prse field gives the percent relative standard error of the mean wage estimate.
Three special symbols appear throughout the data. An asterisk (*) in a wage field means the wage is above the top-coded threshold — $100.00 per hour or $208,000 per year — and is not published to protect confidentiality and because survey precision is lower at extreme values. A hash symbol (#) means the employment estimate is not displayed, typically due to confidentiality. A double asterisk (**) means the estimate is not available. Any data pipeline working with OEWS must handle these non-numeric values before attempting arithmetic operations.
Accessing OEWS: Bulk Downloads and the BLS API
The primary access path for OEWS is bulk file download. BLS publishes annual zip archives at bls.gov/oes/tables.htm for every release going back to 1999. Each annual zip contains separate flat files for national data (all industries combined and by industry sector), state-level data, MSA-level data, and nonmetropolitan area data. The national all-industry file is typically the starting point for cross-occupation analysis because it gives the cleanest comparison baseline. State and MSA files are structured identically and can be concatenated for multi-area analysis.
BLS also provides an Excel version of the national data (oes_research_estimates_allxls.xlsx) for users who prefer a spreadsheet interface. Individual state files follow the naming convention oesi_<state-fips>_M<year>_dl.xlsx. A browser-based query tool at bls.gov/oes/oes_emp.htm allows point-and-click access for users who need a small number of specific estimates rather than the full dataset.
The BLS public API (version 2) at api.bls.gov/publicAPI/v2/timeseries/datacan in principle serve OEWS data. Each OEWS time series has a structured ID of the form OEUM000000000000, where components encode the area code, industry code, occupation code, and data type (employment, mean wage, or a specific percentile). In practice, constructing valid series IDs requires careful cross-referencing of BLS area and occupation code tables, and the API rate limits and payload restrictions make it less practical than bulk download for any analysis covering more than a few occupations or areas. The API remains useful for monitoring a small set of specific estimates over time — say, the mean wage of software developers in three specific MSAs across annual releases — without downloading the full dataset each year.
API access requires a free registration key from BLS. Unregistered requests are limited to 25 series per query and 500 queries per day; registered API key holders receive 50 series per query and 500 queries per day with higher daily limits. Responses are JSON, with each series returning its full historical time series for the requested date range.
Industry-Occupation Wage Matrix
One of the less frequently discussed features of OEWS is its industry-specific wage data. For each detailed occupation in the national file, BLS publishes wage estimates separately for each of the major NAICS industry sectors where the occupation appears in sufficient numbers. This creates an industry-occupation matrix that reveals wage premiums associated with specific industries for the same underlying occupation.
Software developers (15-1252) serve as a useful illustration. The mean annual wage for software developers in Finance and Insurance typically runs $15,000–$25,000 above the national cross-industry mean, reflecting both the density of high-frequency trading and fintech applications and the profit margins of financial firms. Software developers in Manufacturing typically earn somewhat less than the national mean, reflecting both the types of software work — often embedded systems and process control — and the lower wage norms of manufacturing relative to tech-sector firms.
Registered nurses show a different pattern. Nurses in hospitals command the highest wages within the profession, followed by nurses in outpatient care centers, then physician offices. This hierarchy reflects the acuity of care required in hospital settings, the prevalence of shift differentials and overtime in inpatient environments, and the collective bargaining coverage differences between hospital and ambulatory settings.
Accountants and auditors (13-2011) show perhaps the most pronounced industry wage gap. Those employed in Securities, Commodity Contracts, and Other Financial Investment activities earn substantially above those in Manufacturing or Educational Services. The gap reflects both the complexity of financial reporting in investment-intensive industries and the rent-sharing that occurs when high-revenue firms share profits with professional staff.
Lawyers (23-1011) who work for legal services firms — the classic law firm partnership structure — earn above the national mean for the occupation, but lawyers who work in-house at large corporations or in government earn substantially less. OEWS captures this split clearly because industry classification is based on the employer establishment rather than the nature of the legal work performed.
Employment Projections Linkage
OEWS employment estimates serve as the baseline for the BLS Employment Projections (EP) program, which publishes 10-year occupational outlooks on a two-year cycle. The National Employment Matrix (NEM) links OEWS employment counts to projected 2032 employment levels, growth rates, and job openings by occupation.
The 2022–2032 projections round, published in 2023, showed a labor market undergoing significant structural change. Wind turbine service technicians led all occupations in percent growth at approximately 60 percent over the decade. Nurse practitioners showed roughly 46 percent growth, reflecting the continued expansion of primary care delivered by advanced practice nurses in response to physician shortages. Data scientists grew approximately 35 percent. Information security analysts grew approximately 32 percent. These are growth rates applied to the OEWS baseline employment count to derive absolute job numbers.
On the declining side, travel agents faced a projected decline of roughly 26 percent as online booking platforms continue to replace agent-mediated transactions. Word processors and typists faced a 15 percent decline. Postal service workers faced a 16 percent decline driven by continued erosion of first-class mail volume. Printing machine operators faced roughly 20 percent decline.
In absolute job numbers, the projections highlighted the scale of care economy demand. Home health and personal care aides were projected to add approximately 924,000 jobs — the single largest absolute increase of any occupation — driven by the aging of the baby boom cohort. Fast food and counter workers were projected to add approximately 795,000 jobs. Software developers and software quality assurance analysts, taken together, were projected to add approximately 370,000 jobs.
The NEM also distinguishes between job openings generated by growth — net new positions created by expanding demand — and openings generated by replacement needs — positions vacated by workers retiring, changing occupations, or leaving the labor force. For most occupations, replacement needs generate more annual openings than growth. For home health aides, growth dominates. For electricians, replacement needs dominate. This distinction matters for workforce planning: an occupation with flat growth but high replacement needs still generates substantial hiring activity.
Data Reliability and Key Limitations
The PRSE fields in OEWS data are there for a reason. Wage estimates for occupations with small employment in a given area can carry PRSE values above 50 percent, meaning the confidence interval around the estimate is wide. Cross-area comparisons of rare occupations in small MSAs should be treated skeptically unless PRSE is low. BLS recommends flagging and disclosing PRSE values above 50 percent in any published analysis.
Several structural limitations of OEWS bear on what the data can and cannot support. Wages reported are base wages for wage-and-salary workers. They exclude tips, bonuses, overtime pay, commissions, and benefits. For occupations where non-wage compensation is a large share of total pay — sales occupations with commission structures, service occupations with heavy tipping, financial occupations with large bonus pools — OEWS underestimates total compensation substantially. For comparisons that need to capture total compensation costs to employers, the BLS National Compensation Survey (NCS) and Employment Cost Index (ECI) provide benefits data that OEWS does not.
The self-employed are entirely excluded. For occupations with high self-employment rates — physicians in private practice, lawyers running their own firms, construction contractors, photographers, consultants — OEWS represents only the employed segment of the occupation. Independent contractors classified as self-employed are similarly excluded, which creates growing undercoverage in occupations where gig and platform work has expanded.
Industry classification in OEWS is based on the establishment where a worker is employed, following NAICS. A software developer working at a hospital's IT department is classified in Health Care, not Information Technology. This limits direct comparisons of OEWS occupation data across industries to situations where the worker's occupation genuinely varies across the industries in question rather than where the worker's occupation is incidental to the employer's primary activity.
Dual jobholders are counted once, in their primary job as reported to the survey. Workers who hold two part-time positions in different occupations appear only in the primary one. For occupations with high dual-employment rates — teachers who tutor privately, musicians who also teach lessons — this undercounts the full scope of occupational participation.
Python Example: Wage Comparison Analysis
The following script downloads the OEWS national cross-industry zip file from BLS, loads the flat file into a pandas DataFrame, and runs three analyses: mean annual wage by major occupation group, the top 20 Computer and Mathematical detailed occupations by mean wage, and wage percentile spreads (90th minus 10th) for healthcare versus technology occupations. The BLS zip structure changes slightly between annual releases; the script includes logic to identify the correct file within the zip.
import requests
import zipfile
import io
import pandas as pd
# ---------------------------------------------------------------
# BLS OEWS wage analysis — national cross-industry flat file
# Source: https://www.bls.gov/oes/tables.htm
# ---------------------------------------------------------------
OES_URL = (
"https://www.bls.gov/oes/special.requests/"
"oes_research_2023_sec_55-56.xlsx"
)
# Use the all-sector national file for a broader look
ALL_NATIONAL_URL = (
"https://www.bls.gov/oes/special.requests/"
"oesm23nat.zip"
)
print("Downloading OEWS national data zip...")
resp = requests.get(ALL_NATIONAL_URL, timeout=120)
resp.raise_for_status()
with zipfile.ZipFile(io.BytesIO(resp.content)) as zf:
# The national cross-industry file is the all-sectors flat file
target = [n for n in zf.namelist() if "nat_4digit" in n.lower() or "national_M2023" in n]
if not target:
# Fall back to listing available files
print("Files in zip:", zf.namelist())
raise SystemExit("Could not locate national flat file — check BLS zip structure.")
fname = target[0]
print(f"Reading: {fname}")
with zf.open(fname) as f:
df = pd.read_excel(f, dtype=str)
# Normalize column names to lowercase
df.columns = [c.strip().lower() for c in df.columns]
# Convert numeric columns
numeric_cols = [
"emp", "h_mean", "a_mean", "h_median", "a_median",
"h_pct10", "h_pct25", "h_pct75", "h_pct90",
"a_pct10", "a_pct25", "a_pct75", "a_pct90",
]
for col in numeric_cols:
if col in df.columns:
df[col] = pd.to_numeric(df[col], errors="coerce")
print(f"\nTotal rows loaded: {len(df):,}")
print(f"Columns: {list(df.columns)}")
# ---------------------------------------------------------------
# 1. Mean annual wage by major occupation group
# ---------------------------------------------------------------
major = df[df["o_group"] == "major"].copy()
major_wages = (
major[["occ_title", "emp", "a_mean"]]
.dropna(subset=["a_mean"])
.sort_values("a_mean", ascending=False)
)
print("\n=== Mean Annual Wage by Major Occupation Group ===")
print(f"{'Occupation Group':<45} {'Mean Annual Wage':>18} {'Employment':>14}")
print("-" * 80)
for _, row in major_wages.iterrows():
emp_str = f"{int(row['emp']):,}" if pd.notna(row["emp"]) else "N/A"
print(f"{row['occ_title']:<45} ${row['a_mean']:>17,.0f} {emp_str:>14}")
# ---------------------------------------------------------------
# 2. Top-20 Computer and Mathematical detailed occupations
# ---------------------------------------------------------------
comp_math = df[
(df["o_group"] == "detailed") &
(df["occ_code"].str.startswith("15-"))
].copy()
top_comp = (
comp_math[["occ_code", "occ_title", "emp", "a_mean", "a_median"]]
.dropna(subset=["a_mean"])
.sort_values("a_mean", ascending=False)
.head(20)
)
print("\n=== Top 20: Computer & Mathematical Occupations by Mean Annual Wage ===")
print(f"{'SOC Code':<12} {'Occupation':<45} {'Mean Wage':>12} {'Median':>12} {'Emp':>10}")
print("-" * 95)
for _, row in top_comp.iterrows():
emp_str = f"{int(row['emp']):,}" if pd.notna(row["emp"]) else "N/A"
med_str = f"${row['a_median']:,.0f}" if pd.notna(row["a_median"]) else "N/A"
print(
f"{row['occ_code']:<12} {row['occ_title']:<45} "
f"${row['a_mean']:>11,.0f} {med_str:>12} {emp_str:>10}"
)
# ---------------------------------------------------------------
# 3. Wage percentile spread: healthcare vs. technology
# ---------------------------------------------------------------
def wage_spread(df_in, soc_prefix, group_label):
subset = df_in[
(df_in["o_group"] == "detailed") &
(df_in["occ_code"].str.startswith(soc_prefix))
].dropna(subset=["a_pct10", "a_pct90"])
if subset.empty:
print(f" No data for {group_label}")
return
subset = subset.copy()
subset["spread"] = subset["a_pct90"] - subset["a_pct10"]
top_spread = subset.nlargest(5, "spread")[
["occ_title", "a_pct10", "a_pct90", "spread"]
]
print(f"\n--- {group_label}: Widest Wage Spreads (90th minus 10th pct) ---")
print(f"{'Occupation':<45} {'P10':>12} {'P90':>12} {'Spread':>12}")
print("-" * 85)
for _, row in top_spread.iterrows():
print(
f"{row['occ_title']:<45} "
f"${row['a_pct10']:>11,.0f} "
f"${row['a_pct90']:>11,.0f} "
f"${row['spread']:>11,.0f}"
)
print("\n=== Wage Percentile Spread by Sector ===")
wage_spread(df, "29-", "Healthcare Practitioners (29-xxxx)")
wage_spread(df, "15-", "Computer and Mathematical (15-xxxx)")
The PRSE columns should be inspected for any estimates that feed into downstream analysis. Rows where the wage fields contain BLS special characters (“*”, “#”, “**”) will coerce to NaN under pd.to_numeric(..., errors='coerce'), which is the correct behavior for analysis — but analysts should count and log how many rows were lost to suppression in each occupation or area subset, since suppression is not random.
For geographic wage comparison, the same flat-file approach applies to the state-level and MSA-level zip archives. Concatenating national, state, and MSA files — after tagging each row with its area_type — creates a single DataFrame suitable for multilevel geographic analysis. Memory usage for the full combined dataset across all areas and occupations typically runs 2–4 GB, making chunked loading or DuckDB-backed queries preferable to naive pandas concatenation for large-scale studies.
The FRA Railroad Accident database is another federal dataset in this series — covering incident reports, casualty counts, and track condition data across the US rail network: FRA Railroad Accidents: The Federal Database Behind US Rail Safety Statistics.
For real wage analysis, OEWS wage data is typically deflated using BLS CPI series — both are core BLS data products and are designed to be used together: BLS CPI: The Federal Inflation Database Used to Deflate US Economic Indicators.