Technical writing
SBA 7(a) and 504 Loan Data: The Federal Small Business Lending Database Behind $40 Billion in Annual Guarantees
The Small Business Administration does not lend money. It guarantees loans made by private banks, credit unions, and specialized lenders—absorbing most of the default risk so that lenders will extend credit to businesses that would otherwise fall short of conventional underwriting thresholds. Two programs account for nearly all of this activity: the 7(a) loan program, which approved roughly $30–40 billion in guarantees annually in the years leading up to 2026, and the 504 program, which provides long-term fixed-asset financing at $8–10 billion per year. The SBA publishes loan-level data for both programs going back to FY2010, and in some cases to 1991. That dataset—hundreds of thousands of rows covering borrower identity, lender, loan amount, guarantee size, NAICS code, and ultimate disposition—is one of the more underused windows into small business credit in the United States.
The 7(a) program: structure and mechanics
The 7(a) program is the SBA's flagship lending vehicle. Any for-profit business that meets SBA size standards and cannot obtain financing on reasonable terms elsewhere is potentially eligible. Size standards are defined by NAICS code: manufacturing firms must have fewer than 500 employees; revenue ceilings for service businesses range from $7.5 million to $38.5 million depending on the specific industry classification. The thresholds are higher than most people expect—a dental practice grossing $12 million per year qualifies as a small business under SBA definitions.
Eligible uses are broad. Working capital, equipment purchases, leasehold improvements, commercial real estate acquisition and renovation, business acquisitions, and refinancing of existing debt all qualify. Loan terms extend to 25 years for real estate and 10 years for working capital and equipment. Interest rates are variable: prime rate plus a spread of 1.5 to 2.75 percentage points for loans above $50,000, with the spread narrowing for larger loans.
The guarantee mechanics are the program's defining feature. On loans of $150,000 or less, the SBA guarantees 85% of the principal. On loans above $150,000, the guarantee drops to 75%, up to a maximum guarantee of $3.75 million (on a $5 million loan). If a borrower defaults, the lender files a claim with the SBA, which pays out the guaranteed portion. The SBA then pursues collection from the borrower directly. The lender retains the unguaranteed slice and absorbs loss on that portion—which is why lenders still underwrite rather than simply originating any loan that qualifies.
Several sub-programs operate within the 7(a) umbrella. SBA Express allows lenders to use their own underwriting procedures for loans up to $500,000, with a reduced 50% guarantee in exchange for a 36-hour SBA review turnaround rather than the standard multi-week timeline. CAPLines addresses revolving credit needs—lines of credit for seasonal businesses, contract-based firms, and builders. Export Express and Export Working Capital extend the program to companies with international sales.
The 504 program: fixed assets and the three-party structure
The 504 program finances fixed assets: commercial real estate, heavy equipment, and major facility renovations. It is not available for working capital or inventory, which distinguishes it sharply from 7(a). The financing structure involves three parties rather than two.
The private bank provides a first mortgage covering 50% of the total project cost. A Certified Development Company—a nonprofit organization licensed and regulated by the SBA—provides a second mortgage covering 40% of project cost, funded by selling a debenture to investors with an SBA guarantee behind it. The borrower contributes the remaining 10% as equity, though the equity requirement rises to 15% for startups and for “special purpose” buildings (hotels, gas stations, car washes, facilities not easily repurposed for another use).
The maximum CDC debenture is $5.5 million for most projects and rises to $5.5 million again for projects meeting SBA “public policy goals”—energy efficiency, renewable energy, or operations in areas with high unemployment. Because the CDC portion carries an SBA guarantee and is funded through the bond market at long-term fixed rates, borrowers typically access lower interest rates through 504 than through a conventional commercial real estate loan.
There are approximately 210 active CDCs operating regionally across the country. Each CDC has a defined territory, though the SBA has allowed geographic expansion in some cases. Unlike 7(a) lenders, CDCs are mission-driven nonprofits; their underwriting culture skews toward economic development objectives rather than pure risk-adjusted return.
What the public loan-level data contains
The SBA publishes annual CSV files for both programs at sba.gov/about-sba/sba-performance/open-government/digital-sba/open-data. The same data is accessible via the Socrata API at data.sba.gov, and historical bulk data going back to 1991 is available via FTP. Coverage for the loan-level disclosure begins reliably at FY2010; pre-2010 records exist but with more gaps in demographic flags.
The 7(a) files are organized by fiscal year (October through September). Each row is a single approved loan. Key fields:
| Field | Notes |
|---|---|
LoanNumber | Primary key. Unique per loan across all fiscal years. |
ApprovalDate | SBA approval date. Determines which fiscal year cohort the loan belongs to. |
GrossApproval | Total loan amount approved, in dollars. |
SBAGuaranteedApproval | Dollar amount of the SBA guarantee on this loan. |
TermInMonths | Loan maturity in months. Up to 300 (25 years) for real estate. |
BusinessName | Legal name of borrowing business as submitted to lender. |
BorrState | Two-letter state abbreviation. Enables geographic aggregation. |
NAICSCode | Six-digit NAICS industry code. Primary basis for sector analysis. |
LenderName, LenderID | Originating lender. Enables lender-level performance analysis. |
LoanStatus | P I F (paid in full), CHGOFF (charged off), CANCLD, EXEMPT, DISBURSED CURRENT. |
SBA_Guaranteed_Portion_Charged_Off | Dollar amount the SBA paid out on a charged-off guarantee. Zero for non-defaulted loans. |
GrossChargeOffAmount | Total principal lost at charge-off, including both guaranteed and unguaranteed portions. |
BusinessType | Corporation, LLC, Partnership, Sole Proprietorship, etc. |
BusinessAge | Age category of business at origination (e.g., “Existing > 2 years”, “New Business”). |
RuralUrbanIndicator | R (rural) or U (urban). Rural loans receive the full 85% guarantee on loans under $150K. |
LowDoc | Flag for simplified documentation program (legacy designation). |
MIS_Flag | Minority-owned business flag. Populated inconsistently in older records. |
WomenOwned, VeteranStatus | Self-reported ownership flags. SBA collects but does not verify against authoritative registries. |
FranchiseCode | SBA franchise registry code, when applicable. Enables franchise-level portfolio analysis. |
InitialInterestRate | Interest rate at origination. Reveals spread above prime by lender and borrower type. |
The 504 files follow a similar structure but substitute CDC-specific fields: the lender is identified as the CDC rather than the bank, and the loan amount reflects the CDC debenture portion (40% of project cost) rather than the full project financing.
Lender concentration
The 7(a) program is heavily concentrated in a small number of lenders. Live Oak Bank, headquartered in Wilmington, North Carolina, has ranked as the top SBA 7(a) lender by loan count for multiple consecutive years. Live Oak's model is sector-specialized: it maintains dedicated underwriting teams for veterinary practices, dental practices, funeral homes, agribusiness operations, self-storage facilities, and similar niche markets. The bank built its SBA practice by becoming the dominant lender in segments where it possesses genuine collateral expertise.
Wells Fargo, JPMorgan Chase, and Huntington National Bank appear among the top lenders by dollar volume. Large banks tend to originate larger average loan sizes; Live Oak's top position by count reflects a higher volume of smaller loans in its specialty sectors. Newtek Business Services and Byline Bank are additional examples of lenders that built SBA volume as a core business line rather than a peripheral offering.
The Community Advantage program—now rebranded as Community Advantage SBIC—expands 7(a) access to underserved markets through Community Development Financial Institutions and nonprofit lenders. Community Advantage participants typically originate smaller loans in rural areas, low-to-moderate income communities, and sectors underserved by conventional SBA lenders. Their charge-off rates tend to be higher than those of mainstream 7(a) lenders, which is an expected consequence of serving borrowers with thinner collateral and credit histories.
The 2014 SBA Inspector General report on high-risk lenders used publicly available charge-off data to identify lenders whose default rates substantially exceeded program averages. The OIG found that a small number of lenders accounted for a disproportionate share of SBA guarantee payments, and recommended enhanced oversight for lenders above specific charge-off rate thresholds. That analysis was performed entirely on data that remains publicly available today.
Industry concentration and geographic distribution
NAICS code analysis of 7(a) data reveals consistent patterns across years. Food service and restaurants (NAICS 722) dominate by loan count—restaurants are capital-hungry, have limited conventional collateral (kitchen equipment depreciates quickly and is illiquid), and turn over frequently, creating a recurring pipeline of acquisition and startup financing. Healthcare practices—dental (NAICS 621210), veterinary (NAICS 541940), optometry (NAICS 621320)—appear with high approval rates and relatively low default rates. These practices have predictable cash flows tied to insurance reimbursement, which makes them reliable SBA credits.
Construction and real estate development appear prominently by dollar volume, reflecting the 25-year maturity available for real estate loans and the larger average loan sizes in those sectors. Manufacturing loans are less common by count but typically large, often involving equipment financing at the upper range of the $5 million limit.
Geographic analysis shows that SBA loan penetration correlates closely with SBA district office footprint. States with active district offices and established lender networks—California, Texas, Florida, New York, Illinois—generate the highest absolute loan volumes. On a per-capita small business basis, some rural states with strong agricultural lending cultures show high penetration relative to their business population. The rural/urban flag in the data allows direct comparison: rural loans skew smaller in absolute dollar amount but carry the full 85% guarantee, making them more attractive to lenders than their size alone would suggest.
Charged-off loans and default analysis
The LoanStatus field distinguishes loans that have been charged off—where the SBA paid out the guarantee—from those paid in full, cancelled, or still active. The SBA_Guaranteed_Portion_Charged_Off field gives the exact dollar amount the SBA disbursed on a defaulted guarantee, which is distinct from the borrower's total loss.
Historically, 7(a) cumulative default rates by count have run approximately 10–15% across the program's lifetime, with significant variation by vintage year, lender, and industry. Default rates measure loan count; dollar-weighted default rates tend to be lower because larger loans—which go to more established businesses—default at lower rates than smaller loans.
Cohort analysis by approval year reveals vintage performance: loans approved in years preceding an economic contraction (2006–2007, for example) show elevated charge-off rates that materialize three to five years after origination as collateral values fall and businesses that were marginally viable in good times fail in downturns. The data allows reconstruction of these vintage curves for every cohort back to FY2010 without any additional data sources beyond the SBA files themselves.
The pandemic-era lending environment produced an unusual pattern. The COVID-19 crisis triggered separate SBA direct-lending programs—the Paycheck Protection Program and the Economic Injury Disaster Loan program—rather than a surge in 7(a) volume. The 7(a) program actually contracted in FY2020 as lenders tightened conventional underwriting. Post-2020 7(a) originations show modestly elevated charge-off rates, consistent with loans originated to businesses that were still recovering from pandemic disruption when they took on new debt. These patterns are visible in the vintage charge-off curves for FY2021 and FY2022 cohorts.
One methodological note: charge-offs in the public data reflect the date the SBA paid the lender, not the date the borrower first missed payments. The lag between default and SBA payout is typically six to eighteen months as lenders exhaust collection procedures before filing the guarantee claim. This means charge-offs for a given origination year cohort continue to appear in the data for several years after origination.
The SBIC program
The Small Business Investment Company program operates separately from 7(a) and 504 but deserves mention as the third leg of SBA financing. SBICs are private investment funds—venture capital and private equity funds, in practice—that the SBA licenses to raise additional capital in the form of low-cost SBA debentures. A licensed SBIC might raise $100 million in private limited partner capital and then leverage it with $200 million in SBA-guaranteed debentures, deploying $300 million into equity or debt investments in small businesses. The leverage is the program's value: SBIC managers access government-backed funding at below-market rates to enhance returns.
Historically, the SBIC program financed some of the most consequential technology companies in American history during their early stages. Intel and Apple both received SBIC investment in the 1970s and early 1980s when they were still considered small businesses under program definitions. The program has since shifted toward later-stage private equity, and combined SBIC investment runs approximately $5–6 billion per year. SBIC data is published annually by the SBA but with less loan-level granularity than 7(a) and 504—aggregate statistics by fund rather than individual portfolio company investments.
Equity, access, and demographic data
The SBA collects self-reported demographic flags on every 7(a) and 504 loan: women-owned, minority-owned (disaggregated by race/ethnicity in some versions of the data), veteran status, and rural location. Academic research using this data has consistently found that minority-owned and women-owned businesses receive smaller average loan amounts and pay higher average interest rates than otherwise similar non-minority, male-owned businesses—even within the SBA program, which is explicitly designed to expand access to credit.
Some of this disparity reflects loan size differences driven by industry concentration: businesses owned by members of underrepresented groups are more heavily concentrated in lower-capitalization industries like food service and personal care, which carry lower average loan amounts and higher historic default rates. Controlling for industry and loan size reduces but does not eliminate the rate differentials visible in the data.
The SBA's 8(a) Business Development Program is separate from the lending programs: it channels federal contracting set-asides to socially and economically disadvantaged small businesses rather than providing guaranteed loans. Participants in the 8(a) program are tracked in SAM.gov. The intersection of 8(a) program participation with 7(a) loan history—joinable on business name and state—provides a view of how federal small business support programs interact in practice for individual firms.
Accessing the data
The primary access point is sba.gov/about-sba/sba-performance/open-government/digital-sba/open-data, which provides CSV downloads organized by program (7(a) or 504) and fiscal year. The Socrata API at data.sba.gov exposes the same data with SQL-style filtering via the $where query parameter, making it practical to pull subsets by state, lender, NAICS code, or loan status without downloading full annual files.
For historical research, FTP bulk download is available for records going back to 1991, though pre-2010 files have inconsistent field coverage and missing demographic flags. The most analytically complete period is FY2010–present.
Key Socrata fields to filter on: loan_status (for active versus charged-off subsets), naicscode (for sector analysis), borr_state (geographic filtering), initial_interest_rate (rate analysis), gross_approval (loan size thresholds), and sba_guaranteed_portion_charged_off (default dollar analysis). The Socrata dataset IDs change when the SBA republishes data; check data.sba.gov for the current identifiers before scripting a download.
Python analysis: sector default rates and lender charge-off rates
The following script downloads 7(a) loan data for five fiscal years via the Socrata API, filters to active and charged-off loans, computes sector-level default metrics grouped by two-digit NAICS code, and separately analyzes the ten largest lenders by approved volume.
import requests
import pandas as pd
import io
# ------------------------------------------------------------------
# Step 1: Download 7(a) loan CSV files for the most recent 5 fiscal years
# SBA open data: https://data.sba.gov/dataset/7-a-504-foia
# Files are named by fiscal year, e.g. foia-7afy2024.csv
# ------------------------------------------------------------------
FISCAL_YEARS = [2020, 2021, 2022, 2023, 2024]
BASE_URL = "https://data.sba.gov/dataset/7-a-504-foia/resource/"
# Resource IDs change annually; download from known stable URL pattern
# In practice, pull the Socrata dataset ID from the data catalog
SOCRATA_DOMAIN = "data.sba.gov"
DATASET_7A = "3jjh-ghku" # 7(a) FOIA data - check current ID on data.sba.gov
def download_7a_socrata(fiscal_year: int) -> pd.DataFrame:
"""Download 7(a) loans for a single fiscal year via Socrata SQL endpoint."""
url = f"https://{SOCRATA_DOMAIN}/resource/{DATASET_7A}.csv"
params = {
"$where": f"approval_fiscal_year={fiscal_year}",
"$limit": 500000,
"$offset": 0,
}
frames = []
while True:
r = requests.get(url, params=params, timeout=60)
r.raise_for_status()
df = pd.read_csv(io.StringIO(r.text), dtype={
"naicscode": str,
"borr_state": str,
"lender_id": str,
"loan_status": str,
}, low_memory=False)
if df.empty:
break
frames.append(df)
params["$offset"] += len(df)
if len(df) < 500000:
break
return pd.concat(frames, ignore_index=True) if frames else pd.DataFrame()
dfs = []
for fy in FISCAL_YEARS:
print(f"Fetching FY{fy}...")
df = download_7a_socrata(fy)
df["fiscal_year"] = fy
dfs.append(df)
loans = pd.concat(dfs, ignore_index=True)
print(f"Total rows: {len(loans)}")
# ------------------------------------------------------------------
# Step 2: Filter to active and charged-off loans only
# LoanStatus values: 'P I F' (Paid In Full), 'CHGOFF', 'CANCLD', 'EXEMPT', etc.
# ------------------------------------------------------------------
status_keep = {"CHGOFF", "EXEMPT", "DISBURSED CURRENT", "0"} # active + charged-off
# More reliably: keep everything that is NOT paid in full or cancelled
loans = loans[~loans["loan_status"].str.upper().str.contains("PIF|PAID|CANCLD|CANCEL", na=False)]
loans["gross_approval"] = pd.to_numeric(loans["gross_approval"], errors="coerce").fillna(0)
loans["charged_off"] = pd.to_numeric(loans["sba_guaranteed_portion_charged_off"], errors="coerce").fillna(0)
loans["is_chargeoff"] = loans["loan_status"].str.upper().str.contains("CHGOFF|CHARGED", na=False)
# ------------------------------------------------------------------
# Step 3: 2-digit NAICS sector analysis
# Compute total approved, total charged-off, default rate, median loan size
# ------------------------------------------------------------------
loans["naics2"] = loans["naicscode"].str[:2].str.strip()
NAICS2_LABELS = {
"11": "Agriculture",
"21": "Mining/Oil & Gas",
"22": "Utilities",
"23": "Construction",
"31": "Manufacturing",
"32": "Manufacturing",
"33": "Manufacturing",
"42": "Wholesale Trade",
"44": "Retail Trade",
"45": "Retail Trade",
"48": "Transportation",
"49": "Warehousing",
"51": "Information",
"52": "Finance & Insurance",
"53": "Real Estate",
"54": "Professional Services",
"55": "Management",
"56": "Admin & Support",
"61": "Education",
"62": "Health Care",
"71": "Arts & Entertainment",
"72": "Food Service",
"81": "Other Services",
"92": "Public Admin",
}
loans["sector"] = loans["naics2"].map(NAICS2_LABELS).fillna("Unknown")
sector_stats = (
loans.groupby("sector")
.agg(
total_loans=("gross_approval", "count"),
total_approved_m=("gross_approval", lambda x: x.sum() / 1e6),
total_chargeoff_m=("charged_off", lambda x: x.sum() / 1e6),
median_loan=("gross_approval", "median"),
chargeoff_count=("is_chargeoff", "sum"),
)
.reset_index()
)
sector_stats["default_rate_pct"] = (
sector_stats["chargeoff_count"] / sector_stats["total_loans"] * 100
).round(2)
sector_stats["total_approved_m"] = sector_stats["total_approved_m"].round(1)
sector_stats["total_chargeoff_m"] = sector_stats["total_chargeoff_m"].round(1)
sector_stats["median_loan"] = sector_stats["median_loan"].round(0).astype(int)
print("\n=== By 2-digit NAICS sector: ranked by total dollar volume ===")
by_volume = sector_stats.sort_values("total_approved_m", ascending=False)
print(by_volume[["sector", "total_loans", "total_approved_m", "chargeoff_count",
"total_chargeoff_m", "default_rate_pct", "median_loan"]].to_string(index=False))
print("\n=== By 2-digit NAICS sector: ranked by default rate ===")
by_default = sector_stats.sort_values("default_rate_pct", ascending=False)
print(by_default[["sector", "total_loans", "default_rate_pct", "total_chargeoff_m"]].to_string(index=False))
# ------------------------------------------------------------------
# Step 4: Top-10 lenders by volume; compute their charge-off rates
# ------------------------------------------------------------------
top_lenders = (
loans.groupby("lender_name")["gross_approval"]
.sum()
.nlargest(10)
.reset_index()
.rename(columns={"gross_approval": "total_approved"})
)
lender_co = loans.groupby("lender_name").agg(
loan_count=("gross_approval", "count"),
total_approved=("gross_approval", "sum"),
total_chargeoff=("charged_off", "sum"),
chargeoff_count=("is_chargeoff", "sum"),
).reset_index()
lender_co["chargeoff_rate_pct"] = (lender_co["chargeoff_count"] / lender_co["loan_count"] * 100).round(2)
lender_co["total_approved_m"] = (lender_co["total_approved"] / 1e6).round(1)
lender_co["total_chargeoff_m"] = (lender_co["total_chargeoff"] / 1e6).round(1)
top10_lenders = lender_co[lender_co["lender_name"].isin(top_lenders["lender_name"])].copy()
top10_lenders = top10_lenders.sort_values("total_approved_m", ascending=False)
print("\n=== Top 10 lenders by volume: loan counts, approval totals, charge-off rates ===")
print(top10_lenders[["lender_name", "loan_count", "total_approved_m",
"chargeoff_count", "total_chargeoff_m", "chargeoff_rate_pct"]].to_string(index=False))Running this against five years of 7(a) data typically returns 200,000–350,000 rows. The sector analysis reveals that food service and personal services carry default rates two to three times those of healthcare and professional services, reflecting both the higher failure rates in those industries and the thinner collateral available at origination. Lender charge-off rate variation is substantial—a difference of 10 or more percentage points between the highest- and lowest-performing top-10 lenders by charge-off rate is typical, reflecting genuine differences in underwriting culture and sector concentration rather than random variation.
One practical note: the LoanStatus field is not consistently formatted across fiscal years and files. Values like P I F, PIF, Paid in Full, and PAID IN FULL all appear for the same status depending on when the file was exported. Normalizing to uppercase and using substring matching rather than exact equality is necessary for reliable filtering across the full multi-year dataset.
Related writing: USASpending Federal Contracts: Tracing $700 Billion in Annual Government Procurement—the federal contracting counterpart to SBA lending; many SBA borrowers also pursue federal contracts through set-aside programs, and both datasets are joinable on business name and SAM.gov registration.
Related writing: FDIC Call Report Data: The Quarterly Financial Filing Behind Every US Bank's Balance Sheet—SBA lenders are FDIC-regulated banks; Call Report Schedule RC-C disaggregates their loan portfolios by purpose and includes the SBA-guaranteed loan book, allowing cross-reference between SBA program data and bank-level financial reporting.
Related writing: Census County Business Patterns: Annual Establishment Counts, Employment, and Payroll for Every US County—County Business Patterns provides the denominator for SBA penetration analysis: total establishment counts and payroll by NAICS and county against which 7(a) loan volumes can be benchmarked.