Every significant codebase has a section you're afraid to touch. It works. Mostly. You're not sure why. The original author is gone. The tests are sparse. The business logic is embedded in 600-line functions. You need to change it.
This is the discipline of legacy system modernization — doing it without the 3am incident that proves why everyone was afraid.
Summary
The Strangler Fig pattern is the most reliable approach to legacy system modernization: build new behavior beside the old, redirect traffic incrementally, and retire the old code when the new is proven. Never do a big-bang rewrite. Use feature flags for cutover. Maintain dual-running capability during transition. Verify behavior equivalence before cutover, not after.
Why Rewrites Fail
Big-bang rewrites fail for predictable reasons:
- Unknown unknowns — The legacy code has implicit behavior you don't realize until production breaks
- Moving target — Business logic changes during the rewrite; now you're maintaining two codebases
- Integration complexity — Everything that calls the old system needs to switch simultaneously
- No rollback path — When the new system has issues, you have no safe state to return to
The Strangler Fig Pattern
The Strangler Fig tree grows around its host, eventually replacing it. Apply the same to software:
Phase 1: Build new behavior alongside old
Old System [running]
New Module [dormant] ──→ same interface
Phase 2: Route fraction of traffic to new
Old System [70% traffic]
New Module [30% traffic] ← feature flag
Phase 3: Validate equivalence
Old System [reads only, comparison mode]
New Module [writes + reads] ← primary
Phase 4: Retire old
New Module [100%]
Old System [deleted]
Implementation: The Proxy Layer
Start by introducing a routing layer that can switch between old and new implementations:
from django.conf import settings
class PaymentService:
"""
Routing proxy for payment processing.
Allows incremental cutover from legacy to new implementation.
"""
def __init__(self):
self._legacy = LegacyPaymentService()
self._new = NewPaymentService()
def process_payment(self, payment_data: dict) -> PaymentResult:
# Feature flag controls which path is used
use_new = self._should_use_new(payment_data)
if use_new:
try:
result = self._new.process_payment(payment_data)
# In shadow mode, also run old to compare
if settings.PAYMENT_SHADOW_MODE:
self._run_shadow_comparison(payment_data, result)
return result
except Exception as e:
# Fallback to legacy on error (during early cutover)
logger.error(f"New payment service error: {e}")
return self._legacy.process_payment(payment_data)
return self._legacy.process_payment(payment_data)
def _should_use_new(self, payment_data: dict) -> bool:
"""
Determine which implementation to use.
Can be percentage-based, user-based, or flag-based.
"""
rollout_percentage = settings.NEW_PAYMENT_ROLLOUT_PERCENT
user_hash = hash(payment_data.get("user_id", "")) % 100
return user_hash < rollout_percentage
Shadow Mode: The Safety Net
Before routing real traffic to the new system, run it in shadow mode — both systems process the request, but only the old system's response is returned. Compare outputs:
def _run_shadow_comparison(
self,
payment_data: dict,
new_result: PaymentResult,
) -> None:
"""
Run old system in parallel and compare results.
Log discrepancies but don't affect the user response.
"""
try:
old_result = self._legacy.process_payment(payment_data)
if not self._results_equivalent(old_result, new_result):
logger.warning(
"Shadow mode discrepancy detected",
extra={
"payment_data": sanitize(payment_data),
"old_result": old_result,
"new_result": new_result,
"diff": compute_diff(old_result, new_result),
}
)
except Exception as e:
logger.error(f"Shadow comparison failed: {e}")
# Never let shadow mode failures affect the primary path
Shadow mode finds the behavioral differences before they become production incidents.
Characterization Tests: Document Before Refactoring
Before touching the legacy code, write tests that document its current behavior — even the weird parts:
class TestLegacyInvoiceCalculation:
"""
Characterization tests for legacy invoice calculation.
These tests document what the system currently does,
not necessarily what it should do.
They must pass before and after refactoring.
"""
def test_calculates_tax_on_subtotal_before_discount(self):
# Legacy system calculates tax before applying discount.
# This is arguably wrong, but it's what the business expects.
invoice = legacy_calculate_invoice(
line_items=[LineItem(amount=1000)],
discount_percent=10,
tax_rate=18,
)
# Tax is on 1000 (before discount), not 900 (after discount)
assert invoice.tax_amount == 180 # 18% of 1000
assert invoice.total == 1080 - 100 # tax + subtotal - discount
def test_rounds_down_on_fractional_cents(self):
# Legacy behavior: always floor, not round
invoice = legacy_calculate_invoice(
line_items=[LineItem(amount=333)],
discount_percent=0,
tax_rate=18,
)
assert invoice.tax_amount == 59 # floor(333 * 0.18 = 59.94)
These tests will fail if your new implementation silently changes behavior. That's the point.
The Cutover Checklist
Before moving traffic from old to new:
□ Shadow mode ran for 72 hours with < 0.1% discrepancy rate
□ New system has been running in production (even 1% traffic) for 7 days
□ Performance metrics match or exceed old system (P50, P95, P99)
□ Error rate is lower than old system
□ All characterization tests pass against new implementation
□ Rollback plan documented and tested (not just written)
□ Feature flag set to enable instant rollback without deploy
□ On-call engineer available for 4 hours post-cutover
□ Monitoring dashboards updated to watch new system
Failure: The "Good Enough" Shadow Test
The most common mistake: running shadow mode for 2 hours, seeing "looks fine", and cutting over. Shadow mode works best for edge cases — which appear rarely. Run it for a full business cycle: at least one weekly billing run, one month-end reconciliation, one high-traffic event.
Failure: Incomplete Interface Coverage
The proxy layer only catches calls that go through it. Legacy code often has direct database queries, background jobs, and admin scripts that bypass the service layer entirely. Map these before starting:
# Find all direct database queries to legacy tables
# grep -r "LegacyPayment.objects" --include="*.py" .
# Check scheduled tasks, management commands, admin views
When to Accept Technical Debt
Not everything needs to be modernized. The calculus:
| Modernize | Leave Alone |
|---|---|
| Blocking new features | Working, stable, rarely changed |
| Performance bottleneck | No active development |
| Security risk | Isolated from new development |
| Impossible to test | Well-understood with good tests |
| Team can't reason about it | Team understands it |
Legacy code that works, doesn't block anything, and won't be touched again is not a problem. It's historical infrastructure. The energy of modernization is finite — spend it where it matters.
Key Takeaways
- Strangler Fig over big-bang rewrite, always — the rewrite that's too big to fail often does
- Characterization tests document what the system does, not what it should do — run them before and after to catch silent behavioral changes
- Shadow mode finds behavioral differences before they become production incidents — run it for a full business cycle
- Never cut over without a tested rollback path — a feature flag that instantly reverts to old behavior is the minimum
- Not all legacy code needs modernizing — only that which blocks new development, creates risk, or can't be reasoned about