โ† Back to all posts
Research Notes

Why the 10%/20% TDR Thresholds Mislead, and How to Set Your Own

โ€ข12 min read

๐Ÿ“Š Guide Summary

You've seen the rule of thumb: <10% Technical Debt Ratio (TDR = remediation cost รท development cost) is manageable, >20% signals trouble. But peer-reviewed research explicitly states there are no empirically supported universal thresholds by organization type.

This guide explains why generic thresholds mislead, what research actually shows about startups vs enterprises, and how to calibrate defensible targets for your context.

Research finding (strong evidence)
No universal TDR thresholds exist that are validated for startups vs enterprises [3, 4]
Research finding (moderate evidence)
Technical debt (TD) correlates with size, speed, and complexity more than company stage alone [8]
Research finding (moderate evidence)
Startups tolerate higher TD (lacking explicit baselines); enterprises define TD as divergence from quality models [1]

Key insight: TDR is valuable for internal benchmarking and trend analysis, not for applying universal cut-offs. Organizations must calibrate their own thresholds based on business context.

If you've searched for technical debt ratio guidance, you've encountered the conventional wisdom: keep TDR below 10% and you're healthy, let it climb above 20% and you're in trouble.

These thresholds appear in blog posts, conference talks, and vendor marketing materials. They feel authoritative. They give you a number to target.

There's just one problem: they're not validated by research.

Quick takeaway: The 10%/20% thresholds are practitioner heuristics, not empirically validated standards. Large benchmarking studies explicitly avoid publishing fixed thresholds. This guide shows you what research actually says about TDR by organization type, and how to set defensible targets for your context.

The uncomfortable truth about TDR thresholds

Let's start with what the research actually says. Curtis et al. (2012) analyzed 745 applications across 160 companies. This is one of the largest technical debt benchmarking studies ever conducted [3].

Did they publish differentiated thresholds by company type? No. The study explicitly avoided fixed thresholds because the data didn't support them.

Li et al. (2015) conducted a systematic mapping study of technical debt research across hundreds of papers [4]. Their finding: no universal thresholds exist that are validated across organization types.

Why this matters

  • Applying the wrong threshold wastes resources. A startup chasing 10% TDR may over-invest in quality when speed-to-market matters more.
  • Generic thresholds create false security. An enterprise at 15% TDR may feel safe while critical systems accumulate debt in high-risk areas.
  • Context determines what's acceptable. A 25% TDR in a pre-PMF startup is different from 25% in a healthcare SaaS with compliance requirements.

The 10%/20% numbers likely emerged from practitioner experience and tool vendor guidance. They've become conventional wisdom through repetition, not validation. That doesn't mean they're useless, but they're heuristics, not standards.

What research actually shows about startups vs enterprises

While universal thresholds don't exist, research does reveal systematic differences in how organizations at different stages relate to technical debt.

Startups

Besker et al. (2018), Klotins et al. (2017) [1, 2]

  • Intentionally accumulate TD as investment under high uncertainty. Speed to validate assumptions matters more than clean code.
  • Lack explicit architecture/quality thresholds that would bound TD. Without a baseline, there's nothing to diverge from.
  • TD peaks during growth/scale-up phase, not early stage. The transition from MVP to growth creates the most debt.
  • Testing debt is typically most severe. Klotins found median severity of 3/4 on Likert scale for testing-related debt.

Mature organizations

Besker et al. (2018), Yli-Huumo et al. (2016) [1, 9]

  • Define TD as divergence from explicit quality models. They have architectural baselines to measure against.
  • Have baselines to measure against. Startups usually don't. This fundamentally changes what TDR means.
  • Senior practitioners more likely to manage TD systematically. Organizational maturity correlates with debt management maturity.
  • Higher TD directly impacts financial performance. Banker et al. found 10% TD increase correlates with ~16% ROA reduction [5].

The key insight: startups and enterprises aren't just on different points of the same scale. They have fundamentally different relationships with technical debt. Startups often lack the baselines that make TDR meaningful. Enterprises have the baselines but face different trade-offs.

Stage-based heuristics (use with caution)

Given the research, here are patterns observed across qualitative studies. These are not validated thresholds. Treat them as starting points for your own calibration.

StagePractitioner heuristicResearch context
Early startup (pre-PMF)Higher tolerance (no fixed %)No explicit thresholds [2]. Speed-to-validate dominates. Focus on learning, not optimization.
Growth/scale-upTD peaks here. Time to establish baselines.Architecture TD peaks during scaling [2]. This is when you need to start defining quality models.
EnterpriseLower tolerance. Explicit quality models.Financial impact is measurable [5]. Governance and systematic management matter.

Critical caveats

  • These are NOT research-validated thresholds. They're patterns observed in qualitative studies. Use them as discussion starters, not rules.
  • TD correlates more with size/speed/complexity than company stage alone [8]. A small team at a large company may behave more like a startup.
  • Architecture decisions matter. Lenarduzzi et al. found monolith โ†’ microservices migration causes temporary TD spike, then better control [6].
  • Team context varies. A greenfield team within an enterprise may need startup-like tolerance.

How to calibrate your own thresholds

If universal thresholds don't work, how do you set defensible targets? Research suggests focusing on factors that actually correlate with TD outcomes.

Factors that matter more than stage labels

  • Project size and complexity. Jesus & Melo (2017) found these correlate with TD more than project age [8].
  • Development speed and task complexity. Rushing and complex tasks generate more debt [8].
  • Team size and number of collaborators. More people touching code increases coordination debt.
  • Architecture decisions. Microservices compartmentalize TD better than monoliths [6].
  • Explicit quality baselines. Mature orgs have them. Without a baseline, TDR measures divergence from... nothing.

A process for setting internal targets

1

Establish an explicit baseline

Define the "ideal state" you're measuring against. This might be architectural guidelines, coding standards, or quality criteria.

Without a baseline, TDR is meaningless. You're measuring divergence from an undefined target.

2

Measure current TDR consistently

Use a consistent methodology: static analysis tools, code quality platforms, or manual estimation. The method matters less than consistency.

Track trends. Is TDR increasing or decreasing quarter over quarter? Trend direction matters more than absolute numbers.

3

Define what "remediation cost" includes

TDR = Remediation Cost / Development Cost. What counts as remediation for your codebase?

Include code issues, architecture refactoring, infrastructure updates, test coverage gaps, and documentation debt. Be explicit.

4

Set threshold based on risk tolerance

Consider: How much velocity loss can we tolerate? What's the cost of a major incident? What are our competitive pressures?

Set your threshold based on your context, not industry averages that may not apply.

5

Review quarterly and document rationale

TD evolves non-linearly with releases and refactoring. What was acceptable last quarter may not be acceptable now.

Document the rationale in an ADR so future teams understand why you chose this threshold.

Checklist for setting internal TDR targets

Use this checklist when establishing or reviewing your TDR targets:

TDR target calibration checklist

  • โ˜Establish an explicit architectural/quality baseline (the "ideal state" to measure against)
  • โ˜Measure current TDR using consistent methodology (static analysis, code quality tools, or manual)
  • โ˜Define what "remediation cost" includes for your codebase
  • โ˜Set threshold based on your risk tolerance, not industry averages
  • โ˜Review quarterly. TD evolves non-linearly with releases and refactoring.
  • โ˜Document rationale in an ADR so future teams understand the choice
  • โ˜Consider contextual factors: size, complexity, architecture, team maturity
  • โ˜Track trend direction (improving/worsening) alongside absolute TDR

What to do instead of chasing generic thresholds

If the 10%/20% thresholds aren't reliable, what should you focus on?

1. Track trend direction, not absolute numbers

Is your TDR increasing or decreasing? That matters more than whether you're at 12% or 18%. A team at 25% TDR with a declining trend is healthier than a team at 15% with a rising trend.

2. Focus on high-impact areas

Not all debt is equal. Debt in high-churn code (frequently modified files) has more impact than debt in stable, rarely-touched modules. Use code churn analysis to prioritize.

3. Connect debt to business outcomes

Banker et al. showed 10% TD increase correlates with ~16% ROA reduction in enterprise systems [5]. Find your own correlations. Track velocity, incident rate, and customer-impacting bugs alongside TDR.

4. Invest in prevention

Digkas et al. (2020) found that "clean as you code" policies can reduce TD density over time [7]. Focus on preventing new debt in every commit, not just cleaning up old messes.

Connect debt measurement to capacity planning

TDR becomes actionable when connected to capacity allocation. Once you understand your debt levels and trends, the next question is: how much capacity should we allocate to prevention and paydown?

A common practitioner heuristic is reserving 20โ€“30% of sprint capacity for maintenance and quality work. Teams with high TDR or rising trends may need more. The key is making this allocation explicit and protected.

Turn this maintenance cost into capacity

Map the dollars you just calculated into real planning slots. Build a shared capacity model, compare scenarios, and decide what debt to attack without guessing.

Build your capacity model

Conclusion

The 10%/20% TDR thresholds are convenient, but they're not validated. Research consistently shows that context matters more than generic numbers.

Key takeaways

  • Universal thresholds don't exist. Large benchmarking studies explicitly avoid publishing fixed thresholds. The 10%/20% numbers are practitioner heuristics.
  • Startups and enterprises have different relationships with debt. Startups lack baselines to measure against. Enterprises can quantify financial impact.
  • Context determines what's acceptable. Size, complexity, architecture, and team maturity matter more than company stage labels.
  • TDR is valuable for internal benchmarking. Track trends over time. Compare across your own projects. Don't compare to generic industry numbers.
  • Calibrate your own thresholds. Define your baseline, measure consistently, set targets based on your risk tolerance, and review quarterly.
The goal isn't to hit a magic number. It's to understand your debt, make informed trade-offs, and prevent debt from compounding faster than you can pay it down.

FAQ: TDR thresholds and benchmarks

Where do the 10%/20% TDR thresholds come from?
The 10%/20% thresholds are practitioner heuristics that emerged from industry experience, not peer-reviewed research. Large benchmarking studies (like Curtis et al.'s analysis of 745 applications) explicitly avoid publishing fixed thresholds. The numbers have become conventional wisdom through repetition, but no empirical study validates them as universal standards.
Should startups ignore TDR entirely?
No. TDR is valuable for internal benchmarking and trend analysis. The problem is applying universal thresholds without context. Startups should track TDR to understand their debt trajectory, but shouldn't panic at high numbers if they're intentionally moving fast to validate product-market fit. Establish baselines once you have an explicit architecture to measure against.
What TDR is acceptable for an enterprise?
Research shows higher TD directly impacts enterprise financial performance (10% TD increase correlates with ~16% ROA reduction) [5]. But "acceptable" depends on your risk tolerance, competitive context, and strategic priorities. Instead of adopting generic thresholds, define your own based on business impact analysis and track trends over time.
Does migrating to microservices reduce technical debt?
Research by Lenarduzzi et al. found that migrating from monolith to microservices causes a temporary TD spike, then provides better control [6]. Microservices compartmentalize debt, making it easier to isolate and address. But the migration itself creates debt that must be planned for.
How often should we review our TDR threshold?
Review quarterly at minimum. TD evolves non-linearly with releases and refactoring. As your organization matures, your tolerance for debt should decrease and your ability to prevent it should increase. Document threshold rationale in an Architecture Decision Record (ADR) so future teams understand the reasoning.
What metrics matter more than TDR?
TDR trend direction (improving or worsening) matters more than absolute numbers. Debt density in high-churn areas matters more than overall average. Business impact (velocity loss, incident rate, customer complaints) matters more than tool-reported remediation time. Focus on actionable metrics that drive decisions.

Sources and further reading

About the author

ScopeCone Author

Product & Engineering Leadership

An engineering leader with a background in software development and product collaboration. Writing anonymously to share practical lessons from years of building and shipping with multi-team product organizations.