Technical writing
PCAOB: The Federal Audit Watchdog Created After Enron and the KPMG Inspection-Data Scandal
Every auditor that signs off on a US public company's financial statements is registered with a federal oversight body that publishes inspection findings for every firm it examines. That body—the Public Company Accounting Oversight Board—maintains one of the most granular public datasets on professional conduct in any regulated industry. In the best year for the Big Four, roughly one in five audits reviewed was flagged with a deficiency. In worse years the rate exceeded one in three.
Origins: Enron, Arthur Andersen, and Sarbanes-Oxley
Before 2002, public accounting was self-regulated. The American Institute of Certified Public Accountants set auditing standards, and peer-review programs run by the profession itself were supposed to catch quality failures. The collapse of Enron in late 2001—and the simultaneous implosion of its auditor, Arthur Andersen—demonstrated that self-regulation had catastrophic failure modes.
Arthur Andersen earned roughly $25 million annually in audit fees from Enron and another $27 million in consulting fees from the same client in the year before Enron's bankruptcy. That dual relationship—auditor and paid consultant to the same company—created conflicts that Andersen's own quality-control procedures failed to constrain. When federal investigators sought Andersen's Enron work papers, employees shredded documents. The firm was convicted of obstruction of justice in 2002 (a conviction later unanimously reversed by the Supreme Court in 2005, after Andersen had already ceased operations). WorldCom collapsed later in 2002, revealing an $11 billion accounting fraud that had also passed through an auditor's review without a qualified opinion.
Congress passed the Sarbanes-Oxley Act in July 2002. Section 101 created the PCAOB as an independent, nonprofit corporation overseen by the Securities and Exchange Commission. The SEC appoints its five-member board, approves its budget, and can review or modify its rules. The PCAOB is not a government agency in the constitutional sense—it is a private entity—but the Supreme Court confirmed in Free Enterprise Fund v. PCAOB (2010) that it operates under sufficient SEC oversight to be constitutionally valid. SOX also banned audit firms from providing certain non-audit services—bookkeeping, financial information system design, appraisal, and others—to companies whose audits they conduct.
Four core functions
The PCAOB exercises authority through four distinct programs, each of which generates public data.
Registration. Any public accounting firm that audits the financial statements of an SEC registrant or plays a substantial role in such an audit must register with the PCAOB. As of 2025, more than 10,000 firms are registered, of which roughly 1,800 are headquartered in more than 50 countries outside the United States. Registered-firm status is a prerequisite for signing an audit opinion on a public company's financials. The registration database—searchable at pcaobus.org—includes the firm name, country, registration date, and current status (active, withdrawn, or revoked). It is updated in near-real time when firms register, withdraw, or have their registration revoked through enforcement.
Standard-setting. SOX authorized the PCAOB to set auditing standards for public company audits, replacing the Generally Accepted Auditing Standards that the AICPA had previously controlled. PCAOB standards are numbered (AS 1101 through AS 6101 in the current codification) and cover everything from audit risk assessment and internal control evaluation to audit documentation and engagement quality review. A critical 2019 change was AS 3101, which requires auditors to identify and describe Critical Audit Matters—issues that were most difficult, subjective, or complex to audit—directly in the public audit report. CAMs are now a primary source of structured qualitative disclosure in the 10-K ecosystem.
Inspection. The PCAOB's inspection program is the mechanism that generates its most-used public data. Firms that audit more than 100 SEC-registered issuers per year are inspected annually; all other registered firms face triennial inspections. The inspectors select a sample of audit engagements, review the work papers, and evaluate whether the auditor gathered sufficient appropriate evidence before issuing its opinion. Findings are classified into Part I.A (deficiencies in auditing—meaning the opinion was not supported by adequate evidence) and Part I.B (less severe quality-related observations). Part I of each inspection report is public immediately. Part II, which covers deficiencies in a firm's quality control system, is initially withheld to allow remediation; if the PCAOB determines within 12 months that the firm has not adequately remediated, Part II is made public.
Enforcement. When inspections or other sources reveal violations of PCAOB rules, auditing standards, or the securities laws, the PCAOB can bring disciplinary proceedings. Sanctions include monetary penalties, suspensions from auditing public companies, permanent bars from the profession, and revocation of firm registration. Enforcement orders are published on pcaobus.org and searchable by respondent name, sanction type, and date.
Big Four inspection deficiency rates in practice
Deloitte, PricewaterhouseCoopers, Ernst & Young, and KPMG are each inspected annually. The PCAOB samples between 50 and 60 or more audit engagements from each firm per inspection cycle. The percentage of those sampled engagements that have at least one Part I.A deficiency—the deficiency rate—is the headline metric researchers use to compare audit quality across firms and over time.
Deficiency rates for the Big Four have ranged from the low teens to more than 40 percent in individual inspection years. A deficiency does not mean the audited financial statements are wrong; it means the auditor failed to obtain sufficient evidence to support the opinion it issued. The distinction matters: the PCAOB is evaluating the process, not retroactively opining on the accuracy of the client's numbers. But process failures at high rates suggest that audit opinions are being issued on less evidence than standards require—and that material misstatements could be present without detection.
Common deficiency categories across Big Four reports include insufficient testing of revenue recognition (especially complex variable consideration and multi-element arrangements), inadequate evaluation of management estimates (particularly goodwill impairment, credit loss reserves, and fair value measurements), and failure to sufficiently test the existence and valuation of inventory and receivables.
The distinction between “significant deficiency” and “material weakness” in internal control over financial reporting is a separate concept that appears in management's and the auditor's own SOX Section 404 assessment, not in PCAOB inspection terminology. PCAOB inspection reports use their own classification (Part I.A, Part I.B) rather than the SOX 404 vocabulary, though the underlying audit procedures being evaluated overlap substantially.
The KPMG inspection-data theft scandal
The most consequential enforcement action in PCAOB history was not against an audit client—it was against KPMG itself, for using stolen PCAOB data to game its own inspections.
Beginning in 2015, a network of current and former PCAOB employees and KPMG partners shared confidential PCAOB inspection lists with KPMG leadership. The lists identified which audit engagements the PCAOB planned to inspect before inspectors arrived at the firm. KPMG partners used that advance warning to revisit the flagged files, remediate documentation deficiencies, and prepare audit teams for the specific questions inspectors were likely to raise—in effect, staging audits to pass inspections rather than conducting audits sufficiently in the first instance.
The Department of Justice unsealed criminal charges in January 2018. Multiple KPMG partners and former PCAOB employees were convicted of or pleaded guilty to charges including wire fraud and conspiracy. KPMG itself reached a deferred prosecution agreement in 2019 and paid $50 million in penalties—at the time the largest sanction in PCAOB history. The scandal prompted the PCAOB to overhaul its internal data security practices and accelerated discussion of how its inspection selection methodology could be randomized further to remove the informational advantage that advance knowledge of a list would provide.
The HFCAA crisis and Chinese auditor access
Chinese state secrecy law created a structural collision with PCAOB jurisdiction that took more than a decade to partially resolve.
More than 200 Chinese companies were listed on US exchanges—Alibaba, NIO, JD.com, Pinduoduo, and many smaller firms—audited by Chinese affiliates of the Big Four or by Chinese domestic firms. The PCAOB attempted for years to inspect the work papers underlying those audits and was consistently refused. Chinese regulators took the position that audit work papers relating to Chinese companies could contain state secrets and could not be transferred to a foreign oversight body.
The Holding Foreign Companies Accountable Act, enacted in December 2020, gave the standoff legislative teeth. HFCAA directed the SEC to identify companies whose auditors had not been subject to PCAOB inspection for three consecutive years and to prohibit trading in their securities on US exchanges if that condition was not remedied. The SEC began identifying non-inspection companies in 2022. For Chinese-listed companies, the three-year clock would have expired in 2023, threatening mass delistings affecting hundreds of billions of dollars in market capitalization.
In August 2022, the PCAOB reached a Statement of Protocol with the China Securities Regulatory Commission and the Ministry of Finance establishing a framework for PCAOB inspectors to travel to Hong Kong and review audit work papers of Chinese firms serving US-listed issuers. In December 2022, the PCAOB announced that it had completed inspections—for the first time ever—of KPMG Huazhen LLP and PricewaterhouseCoopers Zhong Tian LLP, the Chinese member firms of two Big Four networks. The PCAOB determined that it had been able to conduct complete inspections satisfying HFCAA requirements, averting the 2023 delistings. The 2022 inspection reports documented significant deficiencies, but the access question was resolved enough to remove the delisting threat for that cycle. Whether access will remain sufficient in future years is an ongoing policy issue.
Public data products
The PCAOB publishes four categories of data at pcaobus.org. None are available via a bulk API; all require either manual download or scraping.
Inspection reports are published as PDFs indexed by firm name. Each report has a Part I section that identifies the number of engagements reviewed and the number with at least one Part I.A or Part I.B finding. For large firms, Part I runs to dozens of pages describing each deficiency area in narrative form without identifying the specific audit client. Researchers must parse the PDFs or the landing-page HTML to extract deficiency counts. Part II is published separately when remediation fails, and contains the PCAOB's criticisms of firm-wide quality control systems.
Enforcement actions are published as order documents with a searchable index. Each order identifies the respondent (firm name or individual CPA), the PCAOB rule or auditing standard violated, and the sanction imposed. The Deloitte Brazil penalty ($8 million, 2016) arose from PCAOB inspection findings that Deloitte's Brazilian member firm had issued opinions on audits of US-listed Brazilian companies without adequate evidence. Individual CPAs have been barred permanently from auditing public companies for failures ranging from fabricated work papers to failures to disclose impairment of independence.
The registration database lists all registered firms with their country, status, and registration date. It is the canonical source for determining whether a given firm is currently authorized to sign opinions on public company financials. Researchers use it to cross-reference the auditor field in SEC filings—particularly the Form 8-K Item 4.01 disclosure that public companies must file within four business days of changing auditors—with current PCAOB registration status.
Staff guidance and standards are policy documents rather than data, but they determine what the inspection data means. The PCAOB's annual inspection briefs and staff guidance on specific topics explain what deficiency categories mean in practice and how inspection methodology has evolved. Reading the current year's inspection brief alongside the inspection reports is necessary context for interpreting deficiency rate changes from year to year.
Critical Audit Matters and audit report transparency
AS 3101, effective for large accelerated filers beginning with fiscal years ending on or after June 30, 2019, requires auditors to identify Critical Audit Matters in the audit report and explain why each matter was considered critical and how it was addressed. CAMs are not deficiency findings—they are disclosures of where audit judgment was most exercised.
Common CAMs in practice include goodwill impairment testing (for companies with significant acquisition history), revenue recognition under complex arrangements, income tax valuation allowances, and litigation contingencies. The structured text of CAM disclosures, available in 10-K filings on EDGAR, has become a useful NLP training source for models that need to identify company-specific audit risk areas. Because CAMs must be written in a consistent format that describes the risk, the auditing procedures applied, and key observations, they are among the more structured long-form disclosures in the 10-K corpus.
How researchers use PCAOB data
Academic accounting research has used PCAOB inspection deficiency rates as an auditor quality proxy since the first inspection reports were published in 2004. The primary research design is to merge the PCAOB inspection data—firm name, inspection year, deficiency rate—with the auditor field in Compustat or Audit Analytics, which links each public company's fiscal year to its auditor. This produces a firm-year panel where each observation has a measure of auditor quality. Studies have examined whether companies audited by firms with higher deficiency rates have lower earnings quality, higher restatement probability, and larger absolute discretionary accruals.
Practitioners use the same linkage differently. A due diligence analyst reviewing a company's audit history might check whether the auditor had elevated deficiency rates in the years those audits were conducted. Proxy advisory firms have cited PCAOB inspection findings when evaluating audit committee effectiveness. The SEC's own enforcement staff has used PCAOB inspection findings as a predicate for opening investigations into whether specific audit deficiencies corresponded to material misstatements in the underlying financial statements.
The HFCAA inspection data from 2022 onward adds a China-specific research question: do the deficiency rates for Chinese audit firms, now that inspection is possible, differ systematically from those of domestic US firms? Early data suggested they did—significantly—raising questions about whether the financial statements of US-listed Chinese companies had been adequately assured in prior years when inspection was unavailable.
Enforcement actions link naturally to DOJ False Claims Act settlements when audit failures involve government contractors. If a company overstated revenue on government contracts and its auditor failed to catch the fraud, both the company (under the False Claims Act) and the auditor (through PCAOB enforcement) may face separate proceedings. Connecting the two datasets requires entity resolution on firm names, since the PCAOB uses full legal firm names that may differ from the abbreviated names that appear in DOJ press releases.
Python: scraping five years of Big Four inspection deficiencies
The following script fetches the PCAOB registered-firm list, identifies inspection reports issued in the past five years for named large firms, and extracts Part I.A and Part I.B deficiency counts from each report landing page. The PCAOB does not expose a structured API; the script parses the HTML tables on pcaobus.org. Adapt the BIG_FIRM_NAMES set to include or exclude firms as needed, or remove the filter entirely to process all registered firms.
import requests
from bs4 import BeautifulSoup
import re
import csv
import time
from datetime import datetime, timedelta
BASE = "https://pcaobus.org"
INSPECTION_INDEX = BASE + "/Registration/Firms"
HEADERS = {"User-Agent": "research-bot/1.0 (academic; contact: research@example.com)"}
# ── 1. Fetch the list of registered firms ─────────────────────────────────────
def fetch_registered_firms() -> list[dict]:
"""
Pull the PCAOB registered-firm list from the public registration database.
Returns a list of dicts with keys: firm_name, country, registration_status, firm_id.
"""
url = BASE + "/api/FirmSearch/GetFirms"
params = {
"pageNumber": 1,
"pageSize": 500,
"sortBy": "FirmName",
"sortOrder": "asc",
"country": "",
"status": "Active",
}
all_firms = []
while True:
resp = requests.get(url, params=params, headers=HEADERS, timeout=30)
resp.raise_for_status()
data = resp.json()
firms = data.get("firms", [])
if not firms:
break
for f in firms:
all_firms.append({
"firm_id": f.get("firmId"),
"firm_name": f.get("firmName"),
"country": f.get("countryName"),
"status": f.get("registrationStatus"),
})
total = data.get("totalCount", 0)
if len(all_firms) >= total:
break
params["pageNumber"] += 1
time.sleep(0.5)
return all_firms
# ── 2. Fetch inspection report index for a firm ───────────────────────────────
DEFICIENCY_RE = re.compile(
r"(\d+)\s+(?:of\s+)?(\d+)\s+engagement[s]?\s+reviewed",
re.IGNORECASE,
)
def fetch_firm_inspection_reports(firm_id: int) -> list[dict]:
"""
Returns inspection report metadata for a single firm: report date,
Part I deficiency counts parsed from the HTML summary table.
"""
url = BASE + "/Inspections/Reports/Details/" + str(firm_id)
resp = requests.get(url, headers=HEADERS, timeout=30)
if resp.status_code == 404:
return []
resp.raise_for_status()
soup = BeautifulSoup(resp.text, "html.parser")
reports = []
for row in soup.select("table.inspection-table tbody tr"):
cells = row.find_all("td")
if len(cells) < 3:
continue
report_date = cells[0].get_text(strip=True)
report_link = cells[1].find("a")
report_url = BASE + report_link["href"] if report_link else ""
period_text = cells[2].get_text(strip=True)
reports.append({
"firm_id": firm_id,
"report_date": report_date,
"report_url": report_url,
"period": period_text,
})
return reports
# ── 3. Parse Part I deficiency counts from a single PDF report page ────────────
def parse_part_i_deficiencies(report_url: str) -> dict:
"""
Fetch an inspection report landing page, extract the Part I.A and Part I.B
deficiency totals from the summary paragraph. Returns a dict with:
engagements_reviewed, part_ia_deficient, part_ib_deficient, deficiency_rate.
"""
if not report_url:
return {}
try:
resp = requests.get(report_url, headers=HEADERS, timeout=30)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, "html.parser")
body = soup.get_text(" ", strip=True)
# Pattern: "X of Y engagements reviewed had at least one Part I.A finding"
ia_match = re.search(
r"(\d+)\s+of\s+(\d+)\s+engagements?\s+reviewed.*?Part\s+I\.A",
body, re.IGNORECASE | re.DOTALL
)
ib_match = re.search(
r"(\d+)\s+engagements?.*?Part\s+I\.B",
body, re.IGNORECASE | re.DOTALL
)
deficient_ia = int(ia_match.group(1)) if ia_match else None
reviewed = int(ia_match.group(2)) if ia_match else None
deficient_ib = int(ib_match.group(1)) if ib_match else None
rate = round(deficient_ia / reviewed, 4) if (deficient_ia is not None and reviewed) else None
return {
"engagements_reviewed": reviewed,
"part_ia_deficient": deficient_ia,
"part_ib_deficient": deficient_ib,
"deficiency_rate": rate,
}
except Exception as exc:
return {"_error": str(exc)}
finally:
time.sleep(0.75) # be polite to PCAOB servers
# ── 4. Build a 5-year CSV of inspection deficiency rates ──────────────────────
BIG_FIRM_NAMES = {
"Deloitte & Touche LLP",
"PricewaterhouseCoopers LLP",
"Ernst & Young LLP",
"KPMG LLP",
"Grant Thornton LLP",
"BDO USA, LLP",
}
CUTOFF = datetime.now() - timedelta(days=5 * 365)
def build_deficiency_dataset(out_path: str) -> None:
firms = fetch_registered_firms()
rows = []
for firm in firms:
if firm["firm_name"] not in BIG_FIRM_NAMES:
continue # limit to named firms for this example; remove to run all
firm_reports = fetch_firm_inspection_reports(firm["firm_id"])
for rpt in firm_reports:
# parse report date, skip if older than 5 years
try:
rpt_dt = datetime.strptime(rpt["report_date"], "%B %d, %Y")
except ValueError:
continue
if rpt_dt < CUTOFF:
continue
deficiency_data = parse_part_i_deficiencies(rpt["report_url"])
row = {**firm, **rpt, **deficiency_data}
rows.append(row)
fieldnames = [
"firm_id", "firm_name", "country", "status",
"report_date", "period", "report_url",
"engagements_reviewed", "part_ia_deficient", "part_ib_deficient",
"deficiency_rate",
]
with open(out_path, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames, extrasaction="ignore")
writer.writeheader()
writer.writerows(rows)
print("Wrote " + str(len(rows)) + " inspection report rows to " + out_path)
if __name__ == "__main__":
build_deficiency_dataset("pcaob_inspection_deficiencies_5yr.csv")
The script's parse_part_i_deficiencies function uses regular expressions against the landing-page prose because the PCAOB has not published deficiency counts in a machine-readable structured format. The regex patterns will need adjustment if the PCAOB changes its report wording. A more robust production approach would download the full PDF and use a PDF parsing library to locate the summary table in Part I, which has been structurally consistent across inspection years.
Access and limitations
All PCAOB data is available at pcaobus.org without authentication. There is no bulk download or API. Inspection reports are PDFs; enforcement orders are PDFs with HTML landing pages; the registration database has a web search interface. Rate limiting is not formally documented, but the PCAOB has historically been accessible to well-behaved scrapers that include polite delays and a descriptive User-Agent header.
The key limitation of the inspection data is that deficiency rates reflect a sample of each firm's audit engagements, not all of them. The PCAOB selects engagements using a risk-based methodology that intentionally overweights complex, higher-risk audits. Published deficiency rates are not representative of the firm's overall audit quality in a statistical sense— they are the rate for the riskiest slice that inspectors chose to examine. Year-over-year comparisons for a given firm are informative, but comparisons across firms require care because the selection methodology may target different risk profiles at different firms.
Related writing
SEC Enforcement Actions: The Public Record of Every Securities Law Violation — the SEC's Administrative Proceedings and Litigation Releases channels, including actions against auditors referred from PCAOB enforcement findings.
SEC Form 8-K: Material Events and the Real-Time Disclosure Feed — Item 4.01 auditor-change disclosures are filed on Form 8-K within four days; cross-referencing them with PCAOB registration status reveals whether companies switched to or from firms with elevated deficiency histories.
DOJ False Claims Act Settlements: The Federal Fraud Recovery Database — when audit failures enable fraud against the government, False Claims Act settlements and PCAOB enforcement actions can target different actors in the same underlying scheme.