The Public Company Accounting Oversight Board publishes inspection reports, enforcement orders, and a registration database covering every accounting firm that signs an audit opinion on a US-listed public company. Those reports document, in precise regulatory language, when an auditor signed off on financial statements without the evidence to support that opinion. Understanding what the database contains, how the inspection process works, and what two decades of deficiency data reveal about audit quality is essential for securities analysts, audit committee members, and anyone who studies the institutional infrastructure of public capital markets.
Origins: Sarbanes-Oxley and the failure of self-regulation
Before 2002, public accounting was largely self-regulated. The American Institute of Certified Public Accountants (AICPA) operated a peer-review program in which accounting firms reviewed one another's audit practices. The structural conflict was obvious: firms that competed for audit clients were also responsible for assessing each other's quality. Congress concluded that system was inadequate after the Enron, WorldCom, and Global Crossing accounting scandals exposed billions of dollars in fabricated earnings and concealed liabilities that had been certified as fairly presented by their auditors.
The collapse of Arthur Andersen — the fifth-largest accounting firm in the world, indicted for obstruction of justice in 2002 for shredding Enron audit documents and dissolved shortly after — destroyed a major audit supplier to the Fortune 500 and made the fragility of audit market concentration impossible to ignore. Congress passed the Sarbanes-Oxley Act in July 2002 with near-unanimous votes in both chambers. Section 101 of Sarbanes-Oxley created the Public Company Accounting Oversight Board as an independent nonprofit corporation overseen by the Securities and Exchange Commission, with statutory authority to register, inspect, investigate, and discipline accounting firms that audit SEC- registered public companies.
The PCAOB is structured as a nonprofit corporation rather than a federal agency, but it exercises quasi-governmental regulatory authority. Its five Board members — the Chair plus four additional members, no more than two of whom may be certified public accountants — are appointed by the SEC after consultation with the Treasury Secretary and the Federal Reserve Chair. Its annual budget of approximately $325 million is funded not by congressional appropriations but by “accounting support fees” levied on SEC-registered issuers and broker-dealers, allocated roughly in proportion to their market capitalization. The funding structure insulates PCAOB from annual appropriations cycles while maintaining SEC oversight of its budget.
Firm registration: the universe of PCAOB-registered auditors
Any accounting firm that prepares, issues, or participates in an audit report for a company required to file with the SEC under the Exchange Act must register with PCAOB. The registration obligation is broadly construed: a foreign firm that audits a foreign private issuer whose securities trade on a US exchange must register. A firm that performs a component of a group audit — auditing a subsidiary or segment that is material to the consolidated financial statements — must register even if it does not sign the principal auditor's report. As of 2024, approximately 1,700 firms are registered globally, operating across more than 90 countries.
The registered firm universe divides into several tiers with meaningfully different market positions. The Big Four — Deloitte & Touche, Ernst & Young (EY), KPMG, and PricewaterhouseCoopers (PwC) — collectively audit virtually every S&P 500 company and nearly all companies with market capitalizations above $1 billion. They operate as global networks of legally separate national partnerships, each of which registers with PCAOB independently: KPMG LLP (the US firm) and KPMG Huazhen LLP (the Chinese firm) are distinct registered entities. The second tier includes Grant Thornton, BDO USA, RSM US, Crowe LLP, and a handful of other firms that audit mid-market public companies and some smaller accelerated filers. Below that are hundreds of smaller regional and local firms, most of which audit fewer than ten public company clients each.
Registration requires annual filings reporting the number of issuer audit clients, the names of those clients, fees charged, and quality control system descriptions. Annual registration fees scale with firm size: the Big Four pay several million dollars per year; the smallest firms pay a minimum fee. Failure to register or maintain registration bars a firm from the SEC audit market — any company whose auditor loses PCAOB registration must find a new auditor immediately or risk delisting. This creates powerful compliance incentives even for firms that view PCAOB inspection oversight as burdensome.
The inspection program: methodology and scope
PCAOB conducts annual inspections of firms that audit more than 100 SEC-registered issuers per year. This threshold covers the Big Four, all second-tier firms, and several dozen other firms. Firms auditing 100 or fewer issuers are inspected triennially — once every three years — which means the PCAOB inspection cycle for small firms runs approximately five years behind the current state of practice when inspection preparation and report publication lag are counted. Congress has periodically debated lowering the triennial threshold, but the inspection staff resources required to shift all ~1,600 triennial firms to annual inspection would require a substantial budget increase.
The inspection process is not a compliance audit of the firm's own policies. PCAOB inspectors select specific audit engagements — actual client audits performed by the firm — and examine the workpapers, staff communications, review notes, management representation letters, and third-party confirmations that the auditor assembled before signing the opinion. Engagement selection is a combination of risk-based targeting (industries with higher inherent risk, complex accounting estimates, recent restatements, firms with prior deficiency findings) and random sampling. The adversarial nature of the selection is intentional: PCAOB inspectors are not trying to assess average audit quality, they are trying to find the audits most likely to have problems. The inspected sample is therefore not a representative sample of all audits, which matters for interpreting deficiency rates.
PCAOB inspectors have broad statutory authority. Under Sarbanes-Oxley Section 104, PCAOB can compel production of workpapers, require testimony from firm personnel, and refer potential violations to the SEC or the Department of Justice. Firms cannot refuse inspection — refusal would constitute grounds for revocation of registration. In practice, the major firms cooperate fully with inspections, though the scope of that cooperation has been a significant source of geopolitical tension in the case of Chinese audit firms.
Inspection reports: Part I deficiencies and Part II quality control
Each inspection report has two principal parts. Part I contains deficiency findings related to the specific audit engagements inspected. Part II contains quality control criticisms — observations about the firm's overall system of quality control that are not engagement-specific. The publication timing differs deliberately: Part I is published immediately when the report is issued; Part II is withheld for 12 months to give the firm an opportunity to remediate the identified quality control weaknesses. If the firm demonstrates adequate remediation within that period, Part II remains nonpublic permanently. If it does not, Part II is released. For the Big Four, Part II findings are rarely released because they invest heavily in quality control system remediation. For smaller firms, Part II releases are more common.
Within Part I, PCAOB distinguishes between two subcategories of findings that the structured data format labels Part I.A and Part I.B:
Part I.A: insufficient audit evidence
A Part I.A deficiency is a finding that the auditor issued an opinion without obtaining sufficient appropriate audit evidence to support the conclusion in that opinion. This is the most severe finding PCAOB makes short of an enforcement action. The auditor signed a report — an unqualified opinion stating that the financial statements present fairly in all material respects the financial position of the company — without being able to demonstrate to PCAOB inspectors that it had done the work necessary to reach that conclusion. Part I.A findings do not necessarily mean the financial statements are wrong; they mean the auditor cannot show it knew the statements were right. The distinction is critical: PCAOB is not a financial statement reviewer. It is an audit process reviewer. A Part I.A deficiency is about the sufficiency of audit procedures, not about the accuracy of the financial statements.
Part I.B: standards noncompliance
A Part I.B deficiency is a finding of noncompliance with PCAOB auditing standards or SEC independence rules during the conduct of the audit, where the noncompliance does not rise to the level of insufficient evidence. Documentation failures, inadequate supervision, failure to obtain required client representations, and engagement partner review deficiencies typically generate Part I.B rather than Part I.A findings. The boundary between I.A and I.B is not always clear, and PCAOB's characterization of a finding has itself been a subject of comment in enforcement proceedings.
Two decades of deficiency rate data: Big Four trends 2004–2024
PCAOB has published inspection reports continuously since its first inspection cycle in 2003. The aggregate deficiency data spanning 20 years provides a unique longitudinal record of audit quality at the firms that serve the public capital markets. Several durable patterns emerge from that record.
The Big Four Part I.A deficiency rates were relatively low in the early years of PCAOB inspection — roughly 10–15% of inspected engagements in the 2004–2007 period — before rising significantly. The post-financial- crisis period brought increased scrutiny of bank fair value estimates, loan loss reserves, and goodwill from pre-crisis acquisitions, and deficiency rates at firms with large financial institution practices rose accordingly. A second deterioration occurred in the early 2020s: Big Four aggregate Part I.A rates reached 31% to 44% across the four firms in 2022, which PCAOB Chair Erica Williams described publicly as “unacceptable” and a primary driver of new enforcement initiatives. The 2023 inspection cycle showed improvement to a 26% aggregate Big Four rate, though rates remained elevated relative to historical norms.
The inter-firm variation within the Big Four is large and persistent. Deloitte has consistently posted the lowest Part I.A rates among the four firms across multiple inspection cycles; KPMG has been persistently highest, a pattern that has persisted through different firm leadership and different inspection staff. The persistence of this ranking suggests genuine differences in quality control infrastructure, partner accountability systems, and firm culture rather than statistical noise. EY and PwC occupy the middle positions, with EY typically slightly higher than PwC.
Non-Big Four deficiency rates are systematically higher than Big Four rates, though the comparison requires care because PCAOB's inspection of smaller firms is less frequent and the inspected sample is more adversarially selected relative to each firm's audit portfolio. Triennial firms often show Part I deficiency rates of 50% or higher in the inspected sample, which for small firms may involve only three to ten reviewed engagements in a single inspection cycle. The statistical reliability of any individual small-firm deficiency rate is therefore low; the aggregate trend across the small-firm population is more meaningful.
Most common deficiency areas
PCAOB inspection reports identify the specific audit area in which a deficiency was found, using language that allows categorization even without a standardized taxonomy. Across the public inspection report corpus from 2017 to 2024, several audit areas appear most frequently in Part I.A deficiency findings:
Revenue recognition is consistently the most common deficiency area following the adoption of ASC 606 (FASB's revenue from contracts with customers standard). ASC 606 introduced a principles-based five-step framework that requires significant management judgment about performance obligation identification, variable consideration estimates, and contract modifications. PCAOB inspectors have found that auditors frequently fail to adequately test management's application of the step-by-step model, particularly for contracts with multiple deliverables, contingent consideration, or significant financing components.
Internal control over financial reporting (ICFR) audit failures under AS 2201 — the PCAOB standard implementing Sarbanes-Oxley Section 404(b) — are the second most common deficiency area. The integrated audit requirement means that the external auditor must test and opine on management's assessment of internal controls, not merely test controls as part of financial statement audit procedures. PCAOB inspectors have found systematic failures in the identification of significant accounts and relevant assertions, inadequate walkthrough procedures, insufficient testing of IT general controls, and overreliance on management's testing work without independent verification.
Goodwill and asset impairment testing generates a recurring category of Part I.A findings because it requires auditors to evaluate complex discounted cash flow models with highly sensitive assumptions: terminal growth rates, discount rates, and projected cash flows that can vary dramatically with minor changes in assumptions. PCAOB inspectors focus on whether the auditor independently assessed the reasonableness of management's DCF model inputs rather than simply recalculating management's arithmetic. The difference between recalculation and independent assessment is the core of most goodwill impairment deficiency findings.
The current expected credit loss (CECL) standard (ASC 326), adopted by large calendar-year-end financial institutions beginning in 2020, created a new category of deficiency findings. CECL requires banks and credit card issuers to estimate lifetime expected credit losses using forward-looking econometric models. Auditing a statistical model — understanding the modeling assumptions, testing the data inputs, and evaluating whether the model output is reasonable for the current economic environment — requires skills that differ from traditional audit procedures. PCAOB has found deficiencies in auditors' assessments of model conceptual soundness, data completeness, and qualitative factor overlays.
Critical Audit Matters: expanded disclosure since 2019
PCAOB Auditing Standard 3101, effective for large accelerated filers beginning with fiscal years ending on or after June 30, 2019, requires auditors to communicate Critical Audit Matters (CAMs) in the auditor's report. A CAM is a matter that: (1) was communicated or required to be communicated to the audit committee, (2) relates to accounts or disclosures that are material to the financial statements, and (3) involved especially challenging, subjective, or complex auditor judgment. The auditor must describe in the report the nature of the CAM, how it was addressed during the audit, and which financial statement accounts or disclosures relate to it.
The CAM requirement represents the most significant expansion of auditor reporting in the United States since the 1970s. Before AS 3101, the auditor's report was a standardized boilerplate document of approximately 300 words containing no entity-specific information. CAM disclosures now routinely run to multiple pages and identify the specific accounting estimates, assumptions, and controls that the audit team found most challenging. Revenue recognition, goodwill impairment, fair value measurement, and income tax accounting dominate the CAM landscape across the S&P 500, consistent with the audit areas that generate the most PCAOB deficiency findings.
Academic research has examined whether CAM disclosures contain information useful to investors and whether auditors strategically shade CAM language to avoid disclosing concerns that might affect client relationships. The evidence is mixed: CAM language is often similar across auditors and industries in ways that suggest boilerplate risk, but studies find that CAMs identifying unusual or entity-specific matters are associated with audit effort indicators and subsequent financial reporting outcomes.
The HFCAA standoff and Chinese audit firms
From PCAOB's inception through 2022, China represented the single largest gap in its global inspection authority. Chinese securities regulators maintained that audit workpapers prepared in China for Chinese companies audited by PCAOB- registered firms contained state secrets under Chinese law and therefore could not be provided to foreign regulatory bodies including PCAOB. The Big Four Chinese affiliates — KPMG Huazhen, PricewaterhouseCoopers Zhong Tian, Deloitte Touche Tohmatsu CPA, and Ernst & Young Hua Ming — were registered with PCAOB but effectively uninspectable: they could register, pay fees, and submit annual reports, but PCAOB inspectors could not examine their audit workpapers.
The standoff created a regulatory asymmetry that became acute as Chinese companies listed on US exchanges through reverse mergers and SPAC combinations in the 2010s. A Chinese company listed on the NYSE or NASDAQ was subject to SEC reporting requirements but audited by a Chinese firm that PCAOB could not inspect. A wave of accounting fraud at small Chinese issuers — Longtop Financial Technologies, Rino International, SinoCoking Coal and Coke — exposed the absence of functional audit oversight and triggered billions of dollars in investor losses.
Congress responded with the Holding Foreign Companies Accountable Act (HFCAA), signed into law in December 2020. HFCAA requires that if PCAOB has been unable to inspect or investigate a foreign company's audit firm for three consecutive years due to a foreign government prohibition, the SEC must prohibit trading in that company's securities on US exchanges. The three-year clock was later shortened to two years by the Accelerating Holding Foreign Companies Accountable Act (2022). The threat was credible and specific: PCAOB identified 273 Chinese issuers whose auditors were on its “HFCAA Determination” list as of 2022, representing companies with combined market capitalizations of hundreds of billions of dollars.
The threat produced a negotiated resolution. In August 2022, PCAOB, the SEC, and China's China Securities Regulatory Commission (CSRC) signed a Statement of Protocol under which China agreed to allow PCAOB inspectors access to audit workpapers in China. PCAOB conducted its first-ever on-site inspections of KPMG Huazhen and PricewaterhouseCoopers Zhong Tian in October and November 2022. In December 2022, PCAOB's Board voted that it had secured “complete access” for the 2022 inspection, removing all 273 companies from the HFCAA determination list. PCAOB published inspection reports for KPMG Huazhen and PricewaterhouseCoopers Zhong Tian in 2023 containing Part I deficiency findings, confirming that the firms were not performing at the standard PCAOB expected. Inspections continued in 2023 and 2024; PCAOB has indicated that ongoing supervision remains a priority and that any restriction of access would reinstate the delisting process.
Enforcement authority and sanctions
PCAOB Section 105 grants enforcement authority to investigate potential violations of Sarbanes-Oxley, PCAOB rules, and SEC rules as they apply to audit practice. Enforcement proceedings can be initiated based on inspection findings, complaints, or referrals from the SEC. The statutory sanction schedule runs from censure through monetary penalties, temporary practice restrictions, and permanent revocation of firm registration or individual bar from association with any registered firm.
Maximum monetary penalties are $15 million per firm per proceeding and $750,000 per associated person (the individual auditor). In practice, the largest fines have been associated with enforcement actions that involve egregious violations rather than inspection deficiencies. The most notable PCAOB enforcement action involved KPMG: in 2019, KPMG paid $50 million in combined SEC and PCAOB penalties for a scheme in which former KPMG partners who had moved to PCAOB provided current KPMG partners with confidential information about which KPMG audit engagements PCAOB planned to inspect, allowing KPMG to prepare those engagements before inspection. The conduct — which PCAOB Chair William Duhnke described as “a serious breach of the trust that undergirds the inspection process” — resulted in criminal charges against individuals and the largest PCAOB fine to that point.
Other significant enforcement actions include a $8 million PCAOB/SEC settlement with Deloitte Brazil for issuing unqualified audit opinions on financial statements while lacking independence (partners held financial interests in audit clients in violation of SEC independence rules). Numerous smaller firm actions have resulted in practice bars — individual auditors barred from association with any PCAOB-registered firm — for failing to perform required audit procedures, fabricating workpapers, or signing opinions on audits they did not perform. All disciplinary orders are published in full at pcaobus.org/enforcement.
PCAOB auditing standards
PCAOB promulgates its own auditing standards, separate from the AICPA's Generally Accepted Auditing Standards (GAAS) that apply to private company audits. The PCAOB standards are codified in the AS 1001–AS 4105 numbering series and apply to all audits of SEC-registered companies performed by PCAOB- registered firms. When PCAOB was created in 2003, it initially adopted the AICPA's existing standards on an interim basis; it has progressively replaced those interim standards with its own substantively different standards.
The most consequential standard is AS 2201, which implements the integrated audit requirement of SOX Section 404(b). Section 404(b) requires the external auditor of an “accelerated filer” — a company with public float exceeding $75 million — to audit and report on management's assessment of internal control over financial reporting. AS 2201 specifies a risk-based top-down approach to ICFR testing that begins with entity-level controls and works down to the significant accounts and relevant assertions that management's assessment addresses. The standard's requirement for a “walkthrough” of each major class of transaction — a procedure in which the auditor traces a transaction from initiation through recording and reporting, including testing the relevant controls at each step — is one of the most resource-intensive components of a public company audit.
Other significant PCAOB standards include AS 2110 (identifying and assessing risks of material misstatement), which requires a structured risk assessment process as the foundation of audit planning; AS 2605 (consideration of the internal audit function), which governs when and how external auditors can rely on work performed by the client's internal audit department; and AS 3101 (the auditor's report on an audit of financial statements when the auditor expresses an unqualified opinion), which contains the CAM requirements. AS 6101 (letters for underwriters) applies in securities registration contexts and governs the comfort letters accounting firms provide to underwriters in IPO and secondary offering transactions.
PCAOB standards are increasingly relevant to international convergence debates. The International Auditing and Assurance Standards Board (IAASB) develops the International Standards on Auditing (ISAs) that govern private company audits in most countries outside the United States and Canada. ISAs and PCAOB standards overlap substantially in concept but differ in specifics. IAASB has adopted its own key audit matters (KAM) standard (ISA 701) that parallels the CAM requirement. PCAOB and IAASB coordinate on standard-setting but maintain separate standard-setting processes, and PCAOB's independence from IAASB has been a deliberate policy position reflecting the SEC's concern about ceding US audit oversight to an international body with different governance.
Accessing the public data
PCAOB publishes three categories of structured data at pcaobus.org:
The firm registration database at pcaobus.org/Registration/Firms lists all registered firms with their registration status (Active, Inactive, Revoked), country of domicile, the number of annual issuer audit clients, and a firm- level profile page with registration history. The database is searchable and filterable by country and status. An underlying JSON API (not officially documented but publicly accessible) supports programmatic retrieval of the full registered firm list with pagination.
Inspection reports are published at pcaobus.org/Inspections as full-text HTML and PDF documents. PCAOB also publishes structured CSV, XML, and JSON data files containing aggregate deficiency counts at the firm-year level (the Firm Data and Audits Selected for Review file), deficiency records at the finding level for Part I.A findings, and Part I.B findings with the specific PCAOB standard or SEC rule cited. These structured files are available at pcaobus.org/oversight/inspections/firm-inspection-reports and constitute the most analytically useful format for quantitative research. PCAOB issues an annual report with industry-wide statistics summarizing inspection findings across all inspected firms for the most recent cycle.
Enforcement actions are published at pcaobus.org/Enforcement as individual disciplinary orders in PDF format. PCAOB's public enforcement database is not structured as a downloadable dataset; extraction for systematic analysis requires parsing individual PDF orders. The SEC's EDGAR system reflects PCAOB enforcement actions that are coordinated with the SEC: joint orders appear in the SEC enforcement releases database and reference PCAOB proceeding numbers.
Academic and policy significance
PCAOB inspection findings are the most widely used proxy for audit quality in empirical accounting research. Prior to PCAOB, audit quality research relied on indirect measures: restatements, accruals abnormalities, going-concern opinion issuance rates, and analyst earnings surprise dispersion. PCAOB inspection data provides a direct regulatory assessment of whether auditors performed their work in accordance with professional standards, enabling research that directly tests the relationship between audit process quality and financial reporting outcomes.
DeFond and Lennox (2011) used early PCAOB inspection findings to show that inspected firms with Part II quality control findings subsequently lost audit clients at higher rates, consistent with the inspection revealing audit quality information not observable from outside the audit firm. Francis and Yu (2009) found that Big Four office size was associated with audit quality, using PCAOB inspection data to validate that large-office auditors were less likely to have deficiency findings. Subsequent research has examined whether PCAOB inspections themselves improve audit quality through a deterrence mechanism (firms improve to avoid deficiency findings) or whether inspection is a lagging indicator of quality problems that would eventually manifest as restatements or enforcement actions in the absence of inspection.
The Big Four vs. non-Big Four audit quality premium has been extensively studied using PCAOB data. The premium is real on average — Big Four firms have lower deficiency rates in the inspected sample than non-Big Four firms — but the within-Big-Four variation documented above suggests the premium is not uniform. The persistent KPMG-to-Deloitte differential in Part I.A rates is large enough that a sophisticated audit committee selecting between Big Four firms should treat firm-specific inspection history as a material input to the selection decision, alongside partner-level quality indicators and audit fee considerations.
Emerging policy debates center on two frontiers. First, ESG and sustainability assurance: the SEC's climate disclosure rules (adopted in March 2024, subsequently subject to legal challenge) would require large accelerated filers to have their Scope 1 and Scope 2 greenhouse gas emissions data assured by an independent body. PCAOB has signaled interest in establishing standards for sustainability assurance by registered public accounting firms, which would bring a new category of attestation within its inspection scope. Second, the audit of digital assets: PCAOB issued a staff paper in 2023 on the unique auditing challenges presented by cryptocurrency holdings, decentralized protocol revenues, and smart contract-based financial instruments, where traditional audit procedures for confirmations, completeness, and valuation do not map cleanly onto the underlying technology. Both frontiers will generate new PCAOB standard-setting activity and, eventually, new deficiency categories in inspection reports.
Python example: scraping the inspection report index
The following script demonstrates how to programmatically access the PCAOB public data: enumerating the registered firm list, fetching the inspection report index for each year, parsing Part I deficiency rates from individual report pages, computing Big Four vs. non-Big Four deficiency rate trends from 2004 to 2024, and identifying which audit areas generate the most deficiency findings based on public report language. PCAOB does not provide an official API, so the approach uses the undocumented JSON endpoint for firm registration data and HTML scraping with BeautifulSoup for inspection report pages. A polite crawl rate of 0.5–0.75 seconds between requests is appropriate; aggressive crawling risks IP-level blocking.
import re
import csv
import time
import requests
import statistics
from io import StringIO
from collections import defaultdict
from bs4 import BeautifulSoup
# ---------------------------------------------------------------------------
# PCAOB Inspection Report Analysis
# Scrapes the PCAOB public inspection report index and parses Part I deficiency
# rates for Big Four vs. non-Big Four firms across inspection years 2004-2024.
# Identifies which audit areas (internal controls, revenue recognition,
# impairment testing) generate the most deficiencies based on public summaries.
# No API key required. PCAOB does not provide an official API.
# ---------------------------------------------------------------------------
BASE = "https://pcaobus.org"
HEADERS = {
"User-Agent": "academic-research-bot/1.0 (contact: research@example.com)",
"Accept": "text/html,application/xhtml+xml,application/json",
}
BIG_FOUR = {
"Deloitte & Touche LLP",
"Ernst & Young LLP",
"KPMG LLP",
"PricewaterhouseCoopers LLP",
}
# Audit areas tracked in PCAOB inspection report summaries.
# These strings are matched against plain-text descriptions in public reports.
AUDIT_AREA_PATTERNS = {
"Revenue recognition": r"revenue recogni",
"Internal controls": r"internal control",
"Goodwill/impairment": r"goodwill|impairment",
"Credit loss (CECL)": r"credit loss|cecl|allowance",
"Lease accounting": r"lease",
"Going concern": r"going concern",
"Fair value measurement": r"fair value",
"Business combinations": r"business combination|acquisition accounting",
"Income taxes": r"income tax|deferred tax",
}
# ── 1. Fetch registered firm list ────────────────────────────────────────────
def fetch_registered_firms(status: str = "Active") -> list[dict]:
"""
Pull PCAOB registered firm list from the public registration API.
Returns a list of dicts with firm_id, firm_name, country, status,
issuer_clients, and whether the firm is in the Big Four.
"""
url = BASE + "/api/FirmSearch/GetFirms"
params = {
"pageNumber": 1,
"pageSize": 500,
"sortBy": "FirmName",
"sortOrder": "asc",
"country": "",
"status": status,
}
all_firms: list[dict] = []
while True:
resp = requests.get(url, params=params, headers=HEADERS, timeout=30)
resp.raise_for_status()
data = resp.json()
firms = data.get("firms", [])
if not firms:
break
for f in firms:
all_firms.append({
"firm_id": f.get("firmId"),
"firm_name": f.get("firmName", "").strip(),
"country": f.get("country", ""),
"status": f.get("registrationStatus", ""),
"issuer_clients": f.get("numberOfIssuers", 0),
"is_big_four": any(b in f.get("firmName", "") for b in BIG_FOUR),
})
params["pageNumber"] += 1
time.sleep(0.5)
if len(firms) < params["pageSize"]:
break
return all_firms
# ── 2. Fetch inspection report index ─────────────────────────────────────────
def fetch_inspection_index(year: int) -> list[dict]:
"""
Retrieve the list of published inspection reports for a given year
from the PCAOB inspections index page.
Returns a list of dicts with firm_name, report_url, year, and inspection_type.
"""
url = BASE + f"/oversight/inspections/firm-inspection-reports?year={year}"
resp = requests.get(url, headers=HEADERS, timeout=30)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, "html.parser")
reports: list[dict] = []
# PCAOB inspection report links are anchored with firm name and report PDF/HTML
for link in soup.find_all("a", href=True):
href = link["href"]
if "/Inspections/Reports/" in href or "/inspection-reports/" in href.lower():
firm_name = link.get_text(strip=True)
reports.append({
"firm_name": firm_name,
"report_url": BASE + href if href.startswith("/") else href,
"year": year,
"inspection_type": "Annual" if year >= 2004 else "Triennial",
})
time.sleep(0.75)
return reports
# ── 3. Parse deficiency rate from a single inspection report ─────────────────
def parse_deficiency_rate(report_url: str) -> dict:
"""
Fetch an individual PCAOB inspection report page and extract:
- Number of audits selected for review
- Number of audits with Part I (or Part I.A) deficiencies
- Computed deficiency rate
- Audit areas mentioned in deficiency descriptions
Returns a dict with keys: selected, deficient, rate, audit_areas.
Falls back to None values if parsing fails.
"""
try:
resp = requests.get(report_url, headers=HEADERS, timeout=45)
resp.raise_for_status()
text = resp.text.lower()
# Pattern 1: modern structured reports (2017+)
# "X of the Y engagements ... deficiency"
m = re.search(
r"(d+)s+ofs+(?:thes+)?(d+)s+(?:audits+)?engagements?s+"
r"(?:revieweds+)?(?:inspecteds+)?(?:hads+)?(?:includeds+)?deficien",
text,
)
if m:
deficient = int(m.group(1))
selected = int(m.group(2))
else:
# Pattern 2: pre-2017 narrative style
# "inspectors reviewed X audits" + "Y audits in which..."
sel_m = re.search(r"revieweds+(d+)s+(?:audits+)?engagements?", text)
def_m = re.search(
r"(d+)s+(?:ofs+(?:thes+)?d+s+)?(?:audits+)?engagements?.*?deficien",
text,
)
selected = int(sel_m.group(1)) if sel_m else None
deficient = int(def_m.group(1)) if def_m else None
rate = (deficient / selected) if (selected and deficient is not None) else None
# Identify audit areas referenced in deficiency narrative
found_areas: list[str] = []
for area, pattern in AUDIT_AREA_PATTERNS.items():
if re.search(pattern, text, re.IGNORECASE):
found_areas.append(area)
return {
"selected": selected,
"deficient": deficient,
"rate": rate,
"audit_areas": found_areas,
}
except Exception:
return {"selected": None, "deficient": None, "rate": None, "audit_areas": []}
finally:
time.sleep(0.5)
# ── 4. Build multi-year deficiency rate time series ──────────────────────────
def build_deficiency_trend(
start_year: int = 2004,
end_year: int = 2024,
big_four_only: bool = False,
) -> dict[str, dict[int, float | None]]:
"""
For each year in [start_year, end_year], fetch the inspection index,
parse deficiency rates for each firm, and return a nested dict:
{ firm_name: { year: deficiency_rate } }
If big_four_only=True, restrict to the Big Four.
"""
trend: dict[str, dict[int, float | None]] = defaultdict(dict)
for year in range(start_year, end_year + 1):
print(f" Fetching inspection index for {year}...")
reports = fetch_inspection_index(year)
for report in reports:
if big_four_only and not any(b in report["firm_name"] for b in BIG_FOUR):
continue
result = parse_deficiency_rate(report["report_url"])
trend[report["firm_name"]][year] = result["rate"]
if result["rate"] is not None:
print(
f" {report['firm_name'][:40]:<40} {year} "
f"{result['deficient']}/{result['selected']} "
f"= {result['rate']:.1%}"
)
return dict(trend)
# ── 5. Audit area deficiency frequency analysis ───────────────────────────────
def audit_area_frequency(
trend_data: dict[str, dict[int, float | None]],
reports_cache: list[dict],
) -> dict[str, int]:
"""
Given a list of parsed report dicts (each with 'audit_areas' key),
count how frequently each audit area appears across all deficiency
findings in the inspection reports.
Returns a dict of { audit_area: mention_count } sorted descending.
"""
area_counts: dict[str, int] = defaultdict(int)
for report in reports_cache:
for area in report.get("audit_areas", []):
area_counts[area] += 1
return dict(sorted(area_counts.items(), key=lambda x: x[1], reverse=True))
# ── 6. Summary statistics ─────────────────────────────────────────────────────
def summarize_trend(
trend: dict[str, dict[int, float | None]],
) -> None:
"""
Print a year-by-year summary table comparing Big Four vs. non-Big Four
average Part I deficiency rates, and a firm-level view for the Big Four.
"""
all_years = sorted(
{yr for firm_data in trend.values() for yr in firm_data}
)
print()
print("=== Big Four Part I Deficiency Rate Trend ===")
print(f"{'Year':<6}", end="")
for b in sorted(BIG_FOUR):
short = b.split("&")[0].split("water")[0].strip()[:10]
print(f" {short:<12}", end="")
print(f" {'BF Avg':>8} {'Non-BF Avg':>10}")
print("-" * 80)
for yr in all_years:
bf_rates: list[float] = []
non_bf_rates: list[float] = []
row = f"{yr:<6}"
for b in sorted(BIG_FOUR):
matched = [r for fn, r in trend.items() if b in fn]
rate = matched[0].get(yr) if matched else None
if rate is not None:
bf_rates.append(rate)
row += f" {rate:>11.1%}"
else:
row += f" {'N/A':>11}"
for fn, firm_data in trend.items():
if not any(b in fn for b in BIG_FOUR):
r = firm_data.get(yr)
if r is not None:
non_bf_rates.append(r)
bf_avg = statistics.mean(bf_rates) if bf_rates else float("nan")
non_bf_avg = statistics.mean(non_bf_rates) if non_bf_rates else float("nan")
row += f" {bf_avg:>8.1%} {non_bf_avg:>10.1%}"
print(row)
print()
print("=== Firms with Highest Average Deficiency Rate (all years) ===")
avg_by_firm = []
for fn, yr_data in trend.items():
rates = [r for r in yr_data.values() if r is not None]
if rates:
avg_by_firm.append((fn, statistics.mean(rates), len(rates)))
avg_by_firm.sort(key=lambda x: x[1], reverse=True)
print(f" {'Firm':<45} {'Avg Rate':>10} {'Years':>6}")
print(" " + "-" * 64)
for fn, avg, n in avg_by_firm[:15]:
marker = " [Big 4]" if any(b in fn for b in BIG_FOUR) else ""
print(f" {fn[:45]:<45} {avg:>10.1%} {n:>6}{marker}")
# ── Main ──────────────────────────────────────────────────────────────────────
if __name__ == "__main__":
# Step 1: enumerate registered firms
print("Fetching PCAOB registered firm list...")
firms = fetch_registered_firms()
print(f" Total active registered firms: {len(firms):,}")
big_four_firms = [f for f in firms if f["is_big_four"]]
print(f" Big Four firms matched: {len(big_four_firms)}")
international = [f for f in firms if f['country'] not in ('', 'United States')]
print(f" International (non-US) firms: {len(international):,}")
print()
# Step 2: build 20-year deficiency rate trend for Big Four
# (Remove big_four_only=True to include all ~1,700 registered firms;
# full run takes ~45 minutes at polite crawl rate)
print("Building Big Four deficiency rate trend 2004-2024...")
trend = build_deficiency_trend(
start_year=2004,
end_year=2024,
big_four_only=True,
)
# Step 3: print summary
summarize_trend(trend)
# Step 4: export to CSV for further analysis
output_rows = []
for firm_name, yr_data in trend.items():
for year, rate in yr_data.items():
output_rows.append({
"firm": firm_name,
"year": year,
"deficiency_rate": f"{rate:.4f}" if rate is not None else "",
"is_big_four": "1" if any(b in firm_name for b in BIG_FOUR) else "0",
})
fieldnames = ["firm", "year", "deficiency_rate", "is_big_four"]
buf = StringIO()
writer = csv.DictWriter(buf, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(output_rows)
with open("pcaob_deficiency_trend.csv", "w", newline="") as fh:
fh.write(buf.getvalue())
print()
print(f"Exported {len(output_rows)} rows to pcaob_deficiency_trend.csv")
The script produces a 20-year CSV covering all Big Four Part I deficiency rates, which can be joined against EDGAR financial statement data, restatement databases, or audit committee disclosure proxies for empirical research. Extending the scope to non-Big Four firms by removing the big_four_onlyfilter adds approximately 200–400 firms per inspection cycle; at a polite crawl rate the full run takes several hours. The audit area frequency analysis from the AUDIT_AREA_PATTERNS dictionary provides a text-mining approximation of the deficiency topic distribution; for more granular analysis, the PCAOB-published structured deficiency data files with coded area identifiers provide higher-quality categorization than regex matching on report narrative.
For SEC financial oversight of public company disclosure — how Form D private placement filings and the EDGAR database document the $2–$3 trillion annual exempt offering market that operates outside the continuous disclosure requirements that PCAOB-audited public companies face, see SEC Form D: The Private Placement Database Behind $2 Trillion in Annual Exempt Offerings →
For a parallel federal oversight database in banking — the FDIC institution profiles, charter classifications, and supervisory rating data that serve the same role for depository institutions that PCAOB inspection reports serve for public company auditors, see FDIC Institution Database: The Federal Profile of Every FDIC-Insured Bank and Thrift →