Technical writing

HMDA Mortgage Lending Data: The Federal Database Behind 15 Million Annual Mortgage Applications

January 17, 2027· 22 min read· AI Analytics

HMDAMortgagesFair LendingHousing FinanceFederal Data

The Home Mortgage Disclosure Act requires every US mortgage lender to report every loan application — applicant race, income, property location, loan amount, interest rate, action taken, and denial reason — creating the most comprehensive public dataset on mortgage lending disparities and fair lending compliance.

What HMDA is

The Home Mortgage Disclosure Act was enacted in 1975 and is codified at 12 U.S.C. §§ 2801–2810. Its original purpose was to give community groups, regulators, and researchers the data needed to identify whether financial institutions were serving the credit needs of the communities in which they operated — or were systematically avoiding minority and low-income neighborhoods in a practice Congress recognized as redlining.

Rulemaking authority was held by the Federal Reserve Board from 1975 until the Dodd-Frank Wall Street Reform and Consumer Protection Act of 2010 transferred it to the newly created Consumer Financial Protection Bureau. The CFPB's implementing regulation is at 12 CFR Part 1003. The FFIEC — the Federal Financial Institutions Examination Council, whose member agencies include the CFPB, OCC, FDIC, Federal Reserve, and NCUA — administers the annual data collection and publishes the Loan Application Register (LAR) data at ffiec.cfpb.gov.

The coverage thresholds as of the current rule: depository institutions (banks, savings associations, credit unions) with total assets exceeding $54 million that originated at least one home purchase loan in the preceding calendar year; non-depository institutions (independent mortgage companies) that originated at least 25 closed-end or 500 open-end covered loans in each of the two preceding calendar years. Approximately 6,700 financial institutions file HMDA data annually, reporting roughly 15 million loan application records per year. The LAR is filed by March 1 for the prior calendar year and made public by the FFIEC shortly thereafter.

The statute's three stated purposes are: to enable the public and regulators to detect discriminatory lending patterns including redlining; to assist government entities in directing home improvement and public investment funds to areas of unmet credit need; and to identify whether financial institutions are serving the housing needs of their local communities under the Community Reinvestment Act framework. HMDA itself does not prohibit any lending practice — it is a disclosure statute. But disclosure is the foundation on which fair lending enforcement is built.

What HMDA covers

A covered loan is any closed-end mortgage loan or open-end line of credit secured by a dwelling and used for a home purchase, home improvement, refinance, or home equity line of credit purpose. The regulation defines four application types:

Application type	Description
Home purchase	Loan to finance the purchase of a dwelling; the most analytically significant category for fair lending analysis
Refinance — rate/term	Replaces an existing mortgage without extracting equity; borrower obtains a lower rate, shorter term, or both
Refinance — cash-out	Replaces an existing mortgage and extracts equity; loan amount exceeds the original balance plus closing costs
Home improvement	Loan to improve a dwelling; may be secured or unsecured
HELOC	Home equity line of credit; revolving credit secured by the borrower's equity; reported separately as open-end coverage under the post-2018 expanded rule

Four loan types are tracked, corresponding to the insuring or guaranteeing agency: conventional (no federal insurance), FHA (Federal Housing Administration), VA (Department of Veterans Affairs), and RHS/FSA (Rural Housing Service / Farm Service Agency, i.e. USDA rural loans). The loan type distribution is analytically significant because FHA and VA loans carry different risk profiles, down payment requirements, and fee structures — and because racial disparities in FHA vs. conventional loan mix, controlling for creditworthiness, are a marker of reverse redlining (steering minority borrowers into higher-cost loan products).

Property types covered include single-family (1-4 unit), multifamily (5+ units), manufactured homes (both titled as personal property and affixed as real property), and site-built. Each action taken on an application is reported using a standardized code: originated (1), approved but not accepted by the borrower (2), denied (3), withdrawn by the applicant (4), file closed for incompleteness (5), purchased loan (6 — secondary market acquisitions), preapproval request denied (7), or preapproval request approved but not accepted (8).

The geographic unit of analysis is the census tract — an 11-digit FIPS code combining the 2-digit state FIPS, 3-digit county FIPS, and 6-digit tract number. Census tracts average about 4,000 residents and are the standard unit for redlining analysis because they are stable enough to carry historical demographic data from the decennial census and the American Community Survey while being granular enough to capture neighborhood-level lending patterns.

Key data fields: the post-2018 expanded schema

The CFPB's 2015 HMDA rule, which took effect for data collected beginning January 1, 2018, substantially expanded the LAR data fields. Pre-2018 HMDA data contains demographic and geographic fields but lacks the underwriting and pricing variables needed for statistical disparate impact analysis. Post-2018 data is far richer. Key fields:

Field	Notes
applicant_race_1 – applicant_race_5	Up to five race codes per applicant, allowing multi-race reporting. Race codes: 1=American Indian/Alaska Native, 2=Asian, 3=Black or African American, 4=Native Hawaiian/Pacific Islander, 5=White, 6=information not provided, 7=not applicable. Co-applicant races in `co_applicant_race_1` through `co_applicant_race_5`.
applicant_ethnicity_1	Hispanic or Latino origin reported separately from race per OMB standards. Code 1=Hispanic/Latino, 2=Not Hispanic/Latino. Ethnicity and race are independent fields; an applicant can be reported as White and Hispanic simultaneously.
applicant_sex	1=Male, 2=Female, 3=Information not provided, 4=Not applicable, 6=Applicant selected both male and female.
applicant_age	Age bracket: 25, 25-34, 35-44, 45-54, 55-64, 65-74, 75. Used in age discrimination analysis under the ECOA.
income	Gross annual income in $1,000s, as reported by the applicant. 999 = not applicable. Used to segment denial rate analysis by income band.
loan_amount	Loan amount in dollars (post-2018 full dollar amount; pre-2018 data in $1,000s). For applications, the requested amount; for originations, the funded amount.
interest_rate	Note rate at closing, as a percentage. Post-2018 only. NA for adjustable-rate mortgages where the initial rate is not fixed for the full term, or where the lender is exempt from reporting.
loan_to_value_ratio	Combined LTV at origination, as a percentage. Post-2018 only. Key underwriting variable for controlling risk in disparity analysis.
debt_to_income_ratio	Ratio of total monthly debt payments to gross monthly income, expressed as a percentage or a range bucket (e.g., “20%-<30%”). Post-2018. Used extensively in regression-based fair lending analysis as a creditworthiness control.
denial_reason_1 – denial_reason_4	Up to four denial reasons. Codes: 1=Debt-to-income ratio, 2=Employment history, 3=Credit history, 4=Collateral, 5=Insufficient cash (down payment), 6=Unverifiable information, 7=Credit application incomplete, 8=Mortgage insurance denied, 9=Other, 10=Not applicable. Reported by the lender at its discretion.
census_tract	11-digit FIPS census tract code for the property address. The primary geographic unit for redlining analysis. Can be joined to ACS demographic data by tract to characterize neighborhood racial composition.
origination_charges	Total lender origination charges in dollars. Post-2018. Used with `lender_credits` and `discount_points` to compute net lender fee income and analyze pricing disparities by race.
automated_underwriting_system	AUS used: 1=Desktop Underwriter (Fannie Mae DU), 2=Loan Prospector/Loan Product Advisor (Freddie Mac LP), 3=Technology Open to Approved Lenders (TOTAL, FHA), 4=Guaranteed Underwriting System (USDA GUS), 5=Other, 6=Not applicable, 1111=Exempt. Post-2018.
manufactured_home_secured_property_type	1=Manufactured home and land, 2=Manufactured home and not land, 3=Not applicable. Distinguishes chattel loans (personal property title, higher rates, fewer consumer protections) from real property manufactured home loans.

Redlining and fair lending enforcement

Modern redlining analysis using HMDA data traces directly to the 1930s practice it is named for. The Home Owners' Loan Corporation, a New Deal agency that refinanced millions of defaulted mortgages, created neighborhood security maps for cities across the United States. Neighborhoods were graded A through D — with D-graded areas outlined in red on the maps. The criteria for a red (“hazardous”) designation included the racial composition of the neighborhood: Black neighborhoods were nearly universally graded D regardless of physical condition or income level. Private lenders and the FHA used these maps to decline mortgage applications in redlined areas, effectively preventing Black families from building home equity and intergenerational wealth in the decades when suburban homeownership was the primary mechanism of middle-class wealth formation.

Modern redlining — the version that DOJ and CFPB enforcement actions address — is defined not by explicit maps but by application and origination patterns. A lender engages in modern redlining when it generates substantially fewer applications and originations in majority-minority census tracts than peer lenders operating in the same metropolitan statistical area, after controlling for geographic market coverage, branch locations, and other operational factors. HMDA data is the evidentiary foundation for every modern redlining investigation because it provides application-level data by census tract for every covered lender in the market.

Major redlining settlements based on HMDA analysis include: Trustmark National Bank ($5 million, 2021, DOJ/CFPB, Memphis MSA), where the bank had virtually no applications in majority-Black census tracts despite those tracts being within its assessment area; Lakeland Bank ($13 million, 2022, DOJ, Newark MSA), with similarly suppressed application rates in majority-Black and Hispanic tracts; and City National Bank ($31 million, 2023, DOJ, Los Angeles MSA) — the largest redlining settlement in DOJ history — where the bank avoided communities of color across the country's second-largest mortgage market for years. In each case, the government's analysis compared the defendant lender's HMDA application-rate-per-minority-tract to the average of peer lenders operating in the same geography during the same period.

Beyond geographic redlining, HMDA data supports applicant-level disparate treatment analysis. Nationally, Black applicants are denied conventional home purchase loans at approximately 2.5 times the rate of white applicants when the comparison is made across all income levels and loan amounts. After controlling for loan-to-value ratio, debt-to-income ratio, and income — the underwriting variables now available in post-2018 HMDA — the disparity narrows but does not disappear. The residual unexplained disparity, which cannot be attributed to the observable underwriting variables in the public data, is the analytical focus of CFPB and DOJ fair lending examinations. Those examinations add credit score data (not public in HMDA) and loan file reviews to determine whether the residual disparity reflects differential treatment.

Data access

The primary public access point for HMDA data is the FFIEC's HMDA Platform at ffiec.cfpb.gov. The platform provides three access modes: an interactive Data Browser for exploring data by institution, geography, and year; a bulk CSV download interface that returns the full LAR for a selected year and state; and a REST API for programmatic access.

The bulk download API endpoint pattern is:

https://ffiec.cfpb.gov/api/public/lar/2023/?states=CA

No API key is required. The response is a CSV file containing all HMDA records for the specified year and state. State-level files range from a few hundred megabytes (small states) to several gigabytes (California, Texas, Florida) when uncompressed. The FFIEC also supports filtering byactions_taken to pull only specific action codes — for example:

https://ffiec.cfpb.gov/api/public/lar/2023/?states=CA&actions_taken=3

returns only denied applications in California for 2023, a much smaller download useful for denial rate analysis. The Data Browser interface atffiec.cfpb.gov/data-browser/data/2023 allows slicing by institution LEI, state, county, or census tract and downloading filtered CSVs without writing any code.

A legacy API at api.consumerfinance.gov/data/hmda provided earlier access to pre-2018 HMDA data via the CFPB's Socrata platform, but the authoritative source for post-2018 data is the FFIEC platform. Key CSV fields in the annual LAR export include: activity_year,lei, state_code, county_code,census_tract, loan_type, loan_purpose,action_taken, loan_amount, applicant_race_1,applicant_ethnicity_1, applicant_sex, income,interest_rate, loan_to_value_ratio,debt_to_income_ratio, and denial_reason_1through denial_reason_4.

Python: analyzing 2023 HMDA data for a state

The script below downloads the 2023 HMDA LAR for a configurable state (defaulting to Texas) from the FFIEC bulk API and produces five analytical outputs: denial rates by applicant race for conventional home purchase loans; median loan amount by income decile and race for originated loans; the 20 census tracts with the highest denial rates (among tracts with at least 20 applications); the top 10 lenders by origination volume identified by LEI; and the FHA vs. conventional loan mix by applicant race. The script uses only the Python standard library — no third-party packages are required. The download step for a large state may take several minutes depending on connection speed.

import json
import csv
import time
import urllib.request
import urllib.parse
import io
from collections import defaultdict

# ---------------------------------------------------------------------------
# HMDA 2023 State Analysis via FFIEC Data Browser API
#
# Bulk LAR CSV endpoint (no API key required):
#   https://ffiec.cfpb.gov/api/public/lar/2023/?states=TX
#
# Returns a CSV download of all HMDA records for the requested state/year.
# Key fields used below:
#   action_taken      - 1=originated, 2=approved not accepted, 3=denied,
#                       4=withdrawn, 5=incomplete, 6=purchased, 7=preapproval denied,
#                       8=preapproval approved not accepted
#   loan_type         - 1=conventional, 2=FHA, 3=VA, 4=RHS/FSA
#   loan_purpose      - 1=home purchase, 2=home improvement, 31=refi rate-term,
#                       32=refi cash-out, 4=other, 5=not applicable
#   applicant_race_1  - 1=American Indian/Alaska Native, 2=Asian, 3=Black/AfAm,
#                       4=Native Hawaiian/Pacific Islander, 5=White, 6=info not provided,
#                       7=not applicable
#   applicant_ethnicity_1 - 1=Hispanic/Latino, 2=not Hispanic/Latino, 3=info not provided,
#                           4=not applicable
#   income            - gross annual income in $1,000s (999=not applicable)
#   loan_amount       - in dollars
#   interest_rate     - note rate as of closing (post-2018 field)
#   debt_to_income_ratio - DTI or range (post-2018 field)
#   census_tract      - 11-digit FIPS census tract code
#   lei               - Legal Entity Identifier of the filing institution
# ---------------------------------------------------------------------------

STATE = "TX"   # change to any two-letter state abbreviation
YEAR  = "2023"

LAR_URL = f"https://ffiec.cfpb.gov/api/public/lar/{YEAR}/?states={STATE}"

print(f"Downloading HMDA {YEAR} LAR for {STATE} (may take several minutes) ...")
with urllib.request.urlopen(LAR_URL, timeout=300) as resp:
    raw_bytes = resp.read()

# The API returns UTF-8 CSV
lines = raw_bytes.decode("utf-8")
reader = csv.DictReader(io.StringIO(lines))
records = list(reader)
print(f"Loaded {len(records):,} HMDA records for {STATE} ({YEAR})\n")


# ---------------------------------------------------------------------------
# Helper: safe integer coercion (blanks and exempt codes map to None)
# ---------------------------------------------------------------------------
EXEMPT = {"Exempt", "NA", "", "1111", "7777", "8888", "9999"}

def safe_int(val: str) -> int | None:
    if val in EXEMPT:
        return None
    try:
        return int(float(val))
    except (ValueError, TypeError):
        return None

def safe_float(val: str) -> float | None:
    if val in EXEMPT:
        return None
    try:
        return float(val)
    except (ValueError, TypeError):
        return None


# ---------------------------------------------------------------------------
# Step 1: Denial rates by applicant race -- conventional home purchase loans
# Subset: loan_type=1, loan_purpose=1, action_taken in {1, 3}
# Race codes: 2=Asian, 3=Black, 5=White; ethnicity 1=Hispanic
# ---------------------------------------------------------------------------
RACE_LABELS = {
    "2": "Asian",
    "3": "Black / African American",
    "5": "White",
}

race_apps:    dict[str, int] = defaultdict(int)  # total applications
race_denials: dict[str, int] = defaultdict(int)  # denials

hisp_apps    = 0
hisp_denials = 0

for r in records:
    if r.get("loan_type") != "1":
        continue
    if r.get("loan_purpose") != "1":
        continue
    action = r.get("action_taken", "")
    if action not in ("1", "3"):
        continue

    race  = r.get("applicant_race_1", "")
    eth   = r.get("applicant_ethnicity_1", "")

    # Hispanic (any race reported as Hispanic counts here)
    if eth == "1":
        hisp_apps += 1
        if action == "3":
            hisp_denials += 1
        continue

    if race in RACE_LABELS:
        race_apps[race] += 1
        if action == "3":
            race_denials[race] += 1

print("=== Denial Rates by Race: Conventional Home Purchase Loans ===")
print(f"State: {STATE}  |  Year: {YEAR}")
print(f"{'Race':<32} {'Applications':>14} {'Denials':>9} {'Denial Rate':>12}")
print("-" * 71)

ordered = [("2", "Asian"), ("3", "Black / African American"), ("5", "White")]
for code, label in ordered:
    apps = race_apps[code]
    dens = race_denials[code]
    rate = (dens / apps * 100) if apps else 0.0
    print(f"{label:<32} {apps:>14,} {dens:>9,} {rate:>11.1f}%")

# Hispanic row
hisp_rate = (hisp_denials / hisp_apps * 100) if hisp_apps else 0.0
print(f"{'Hispanic / Latino':<32} {hisp_apps:>14,} {hisp_denials:>9,} {hisp_rate:>11.1f}%")
print()


# ---------------------------------------------------------------------------
# Step 2: Median loan amount by income decile and race (originated loans only)
# ---------------------------------------------------------------------------
# Collect (income, loan_amount, race_label) tuples for originated loans
originated: list[tuple[float, float, str]] = []

for r in records:
    if r.get("action_taken") != "1":
        continue
    inc  = safe_float(r.get("income", ""))
    amt  = safe_float(r.get("loan_amount", ""))
    race = r.get("applicant_race_1", "")
    if inc is None or amt is None or inc <= 0:
        continue
    label = RACE_LABELS.get(race, "Other")
    originated.append((inc, amt, label))

# Build income decile breakpoints
all_incomes = sorted(t[0] for t in originated)
n = len(all_incomes)
breakpoints = [all_incomes[int(n * d / 10)] for d in range(1, 10)]

def income_decile(inc: float) -> int:
    for d, bp in enumerate(breakpoints, 1):
        if inc <= bp:
            return d
    return 10

# Accumulate loan amounts per (decile, race)
bucket: dict[tuple[int, str], list[float]] = defaultdict(list)
for inc, amt, race in originated:
    bucket[(income_decile(inc), race)].append(amt)

def median(vals: list[float]) -> float:
    if not vals:
        return 0.0
    s = sorted(vals)
    mid = len(s) // 2
    return (s[mid] + s[mid - 1]) / 2 if len(s) % 2 == 0 else s[mid]

races_display = ["Asian", "Black / African American", "White", "Other"]

print("=== Median Loan Amount by Income Decile and Race (Originated Loans) ===")
header = f"{'Decile':<8}" + "".join(f"{r[:6]:>20}" for r in races_display)
print(header)
print("-" * (8 + 20 * len(races_display)))

for d in range(1, 11):
    row = f"{d:<8}"
    for race in races_display:
        vals = bucket[(d, race)]
        row += f"{'${:,.0f}'.format(median(vals)):>20}" if vals else f"{'N/A':>20}"
    print(row)
print()


# ---------------------------------------------------------------------------
# Step 3: Census tracts with highest denial rates (min 20 applications)
# ---------------------------------------------------------------------------
tract_apps:    dict[str, int] = defaultdict(int)
tract_denials: dict[str, int] = defaultdict(int)

for r in records:
    action = r.get("action_taken", "")
    if action not in ("1", "3"):
        continue
    ct = r.get("census_tract", "").strip()
    if not ct or ct in EXEMPT:
        continue
    tract_apps[ct] += 1
    if action == "3":
        tract_denials[ct] += 1

tract_rates = [
    (ct, tract_denials[ct] / tract_apps[ct] * 100, tract_apps[ct])
    for ct in tract_apps
    if tract_apps[ct] >= 20
]
tract_rates.sort(key=lambda x: x[1], reverse=True)

print("=== Top 20 Census Tracts by Denial Rate (min 20 applications) ===")
print(f"{'Census Tract':<16} {'Applications':>14} {'Denials':>9} {'Denial Rate':>12}")
print("-" * 55)
for ct, rate, apps in tract_rates[:20]:
    dens = tract_denials[ct]
    print(f"{ct:<16} {apps:>14,} {dens:>9,} {rate:>11.1f}%")
print()


# ---------------------------------------------------------------------------
# Step 4: Top 10 lenders by origination volume
# ---------------------------------------------------------------------------
lei_volume: dict[str, int] = defaultdict(int)

for r in records:
    if r.get("action_taken") != "1":
        continue
    lei = r.get("lei", "").strip()
    if lei:
        lei_volume[lei] += 1

top10_lenders = sorted(lei_volume.items(), key=lambda x: x[1], reverse=True)[:10]

print("=== Top 10 Lenders by 2023 Origination Volume ===")
print(f"{'Rank':<5} {'LEI':<24} {'Originations':>14}")
print("-" * 45)
for rank, (lei, vol) in enumerate(top10_lenders, 1):
    print(f"{rank:<5} {lei:<24} {vol:>14,}")
print()


# ---------------------------------------------------------------------------
# Step 5: FHA vs conventional loan mix by applicant race (originated loans)
# ---------------------------------------------------------------------------
LOAN_TYPE_LABELS = {"1": "Conventional", "2": "FHA", "3": "VA", "4": "RHS/FSA"}

race_loan_mix: dict[str, dict[str, int]] = defaultdict(lambda: defaultdict(int))

for r in records:
    if r.get("action_taken") != "1":
        continue
    race  = r.get("applicant_race_1", "")
    eth   = r.get("applicant_ethnicity_1", "")
    lt    = r.get("loan_type", "")
    label = "Hispanic / Latino" if eth == "1" else RACE_LABELS.get(race, "Other")
    lt_label = LOAN_TYPE_LABELS.get(lt, "Other")
    race_loan_mix[label][lt_label] += 1

types_display = ["Conventional", "FHA", "VA", "RHS/FSA"]
races_mix = ["Asian", "Black / African American", "White", "Hispanic / Latino", "Other"]

print("=== FHA vs Conventional Loan Mix by Race (Originated Loans) ===")
header2 = f"{'Race':<28}" + "".join(f"{t:>14}" for t in types_display) + f"{'Total':>10}"
print(header2)
print("-" * (28 + 14 * len(types_display) + 10))

for race in races_mix:
    mix   = race_loan_mix[race]
    total = sum(mix.values()) or 1
    row   = f"{race:<28}"
    for t in types_display:
        cnt = mix.get(t, 0)
        row += f"{cnt / total * 100:>13.1f}%"
    row += f"{total:>10,}"
    print(row)

Several practical notes for working with HMDA data at this scale. Theincome field uses the sentinel value 999 for “not applicable” (business entities, purchased loans); filtering these out before income-based analysis avoids distorting decile breakpoints. The race and ethnicity fields are independent — Hispanic/Latino ethnicity must be extracted from applicant_ethnicity_1before race-based segmentation, or Hispanic applicants who also report a race code will be double-counted in a naive race-only grouping. The lei field (Legal Entity Identifier) identifies lenders but requires a crosswalk to institution name; the FFIEC publishes an LEI-to-institution-name mapping file atffiec.cfpb.gov/npw/FinancialInstitution/SearchForm.

Census tract-level denial rate analysis is the core of redlining investigation, but it requires a denominator correction before drawing conclusions. A tract may have a high denial rate because it has a high density of marginal applications — borrowers with low credit scores, high DTIs, or low income seeking purchase loans in high-cost areas. Redlining analysis compares a subject lender's tract-level application rates to peer lenders in the same MSA: if peers generate 40 applications per majority-minority tract while the subject generates 4, the disparity is in application generation (marketing suppression or steering away from certain communities) rather than in approval rates. Both patterns are legally significant, but they require different analytical approaches.

Limitations and analytical caveats

HMDA data is the richest public source on mortgage lending, but several limitations bound what conclusions it supports on its own.

Credit score is not in the public data.The CFPB evaluated including credit score in the 2015 expanded rule and declined, citing privacy concerns and the difficulty of standardizing across different credit score models (FICO, VantageScore, lender-proprietary models). Credit score is the most powerful single predictor of mortgage approval and pricing, so public HMDA data cannot fully control for creditworthiness in disparity analysis. Examiners supplement HMDA with loan file-level data including credit scores during supervisory examinations.

Race and ethnicity are self-reported.For applications taken in person, the lender is required to visually observe and record race and ethnicity if the applicant does not provide it. For applications taken by mail or internet (an increasing proportion of the total), race and ethnicity are recorded as “information not provided” when the applicant declines to answer. The fraction of records with missing race data varies substantially across lender types and application channels, and this missingness is not random — online applications from minority borrowers are more likely to have missing race data, which can bias aggregate denial rate statistics toward zero disparity.

The debt-to-income and LTV fields have substantial exempt and missing values. A subset of institutions below the partial exemption threshold (generally those with fewer than 500 closed-end originations in each of the two preceding years) are exempt from reporting the expanded fields added in 2018, including interest rate, LTV, DTI, and pricing data. Approximately 20 to 25 percent of HMDA records carry exempt codes for these fields. Analyses that require LTV or DTI controls must either exclude exempt-institution records or impute values — each approach introduces analytical tradeoffs.

Purchased loans inflate origination counts.Action code 6 (purchased loan) represents secondary market acquisitions — loans originated by another institution and subsequently purchased. Banks that are active in the secondary market will report large numbers of purchased loans, which can distort comparisons of origination volume if purchased loans are not filtered out before computing lender-level market share.