Rebuilding Legacy Systems Safely

Every significant codebase has a section you're afraid to touch. It works. Mostly. You're not sure why. The original author is gone. The tests are sparse. The business logic is embedded in 600-line functions. You need to change it.

This is the discipline of legacy system modernization — doing it without the 3am incident that proves why everyone was afraid.

Summary

The Strangler Fig pattern is the most reliable approach to legacy system modernization: build new behavior beside the old, redirect traffic incrementally, and retire the old code when the new is proven. Never do a big-bang rewrite. Use feature flags for cutover. Maintain dual-running capability during transition. Verify behavior equivalence before cutover, not after.

Why Rewrites Fail

Big-bang rewrites fail for predictable reasons:

Unknown unknowns — The legacy code has implicit behavior you don't realize until production breaks
Moving target — Business logic changes during the rewrite; now you're maintaining two codebases
Integration complexity — Everything that calls the old system needs to switch simultaneously
No rollback path — When the new system has issues, you have no safe state to return to

The Strangler Fig Pattern

The Strangler Fig tree grows around its host, eventually replacing it. Apply the same to software:

Phase 1: Build new behavior alongside old
    Old System [running]
    New Module [dormant] ──→ same interface

Phase 2: Route fraction of traffic to new
    Old System [70% traffic]
    New Module [30% traffic] ← feature flag

Phase 3: Validate equivalence
    Old System [reads only, comparison mode]
    New Module [writes + reads] ← primary

Phase 4: Retire old
    New Module [100%]
    Old System [deleted]

Implementation: The Proxy Layer

Start by introducing a routing layer that can switch between old and new implementations:

from django.conf import settings

class PaymentService:
    """
    Routing proxy for payment processing.
    Allows incremental cutover from legacy to new implementation.
    """
    def __init__(self):
        self._legacy = LegacyPaymentService()
        self._new = NewPaymentService()

    def process_payment(self, payment_data: dict) -> PaymentResult:
        # Feature flag controls which path is used
        use_new = self._should_use_new(payment_data)

        if use_new:
            try:
                result = self._new.process_payment(payment_data)
                # In shadow mode, also run old to compare
                if settings.PAYMENT_SHADOW_MODE:
                    self._run_shadow_comparison(payment_data, result)
                return result
            except Exception as e:
                # Fallback to legacy on error (during early cutover)
                logger.error(f"New payment service error: {e}")
                return self._legacy.process_payment(payment_data)
        
        return self._legacy.process_payment(payment_data)

    def _should_use_new(self, payment_data: dict) -> bool:
        """
        Determine which implementation to use.
        Can be percentage-based, user-based, or flag-based.
        """
        rollout_percentage = settings.NEW_PAYMENT_ROLLOUT_PERCENT
        user_hash = hash(payment_data.get("user_id", "")) % 100
        return user_hash < rollout_percentage

Shadow Mode: The Safety Net

Before routing real traffic to the new system, run it in shadow mode — both systems process the request, but only the old system's response is returned. Compare outputs:

def _run_shadow_comparison(
    self,
    payment_data: dict,
    new_result: PaymentResult,
) -> None:
    """
    Run old system in parallel and compare results.
    Log discrepancies but don't affect the user response.
    """
    try:
        old_result = self._legacy.process_payment(payment_data)
        
        if not self._results_equivalent(old_result, new_result):
            logger.warning(
                "Shadow mode discrepancy detected",
                extra={
                    "payment_data": sanitize(payment_data),
                    "old_result": old_result,
                    "new_result": new_result,
                    "diff": compute_diff(old_result, new_result),
                }
            )
    except Exception as e:
        logger.error(f"Shadow comparison failed: {e}")
        # Never let shadow mode failures affect the primary path

Shadow mode finds the behavioral differences before they become production incidents.

Characterization Tests: Document Before Refactoring

Before touching the legacy code, write tests that document its current behavior — even the weird parts:

class TestLegacyInvoiceCalculation:
    """
    Characterization tests for legacy invoice calculation.
    These tests document what the system currently does,
    not necessarily what it should do.
    They must pass before and after refactoring.
    """

    def test_calculates_tax_on_subtotal_before_discount(self):
        # Legacy system calculates tax before applying discount.
        # This is arguably wrong, but it's what the business expects.
        invoice = legacy_calculate_invoice(
            line_items=[LineItem(amount=1000)],
            discount_percent=10,
            tax_rate=18,
        )
        # Tax is on 1000 (before discount), not 900 (after discount)
        assert invoice.tax_amount == 180  # 18% of 1000
        assert invoice.total == 1080 - 100  # tax + subtotal - discount

    def test_rounds_down_on_fractional_cents(self):
        # Legacy behavior: always floor, not round
        invoice = legacy_calculate_invoice(
            line_items=[LineItem(amount=333)],
            discount_percent=0,
            tax_rate=18,
        )
        assert invoice.tax_amount == 59  # floor(333 * 0.18 = 59.94)

These tests will fail if your new implementation silently changes behavior. That's the point.

The Cutover Checklist

Before moving traffic from old to new:

□ Shadow mode ran for 72 hours with < 0.1% discrepancy rate
□ New system has been running in production (even 1% traffic) for 7 days
□ Performance metrics match or exceed old system (P50, P95, P99)
□ Error rate is lower than old system
□ All characterization tests pass against new implementation
□ Rollback plan documented and tested (not just written)
□ Feature flag set to enable instant rollback without deploy
□ On-call engineer available for 4 hours post-cutover
□ Monitoring dashboards updated to watch new system

Failure: The "Good Enough" Shadow Test

The most common mistake: running shadow mode for 2 hours, seeing "looks fine", and cutting over. Shadow mode works best for edge cases — which appear rarely. Run it for a full business cycle: at least one weekly billing run, one month-end reconciliation, one high-traffic event.

Failure: Incomplete Interface Coverage

The proxy layer only catches calls that go through it. Legacy code often has direct database queries, background jobs, and admin scripts that bypass the service layer entirely. Map these before starting:

# Find all direct database queries to legacy tables
# grep -r "LegacyPayment.objects" --include="*.py" .
# Check scheduled tasks, management commands, admin views

When to Accept Technical Debt

Not everything needs to be modernized. The calculus:

Modernize	Leave Alone
Blocking new features	Working, stable, rarely changed
Performance bottleneck	No active development
Security risk	Isolated from new development
Impossible to test	Well-understood with good tests
Team can't reason about it	Team understands it

Legacy code that works, doesn't block anything, and won't be touched again is not a problem. It's historical infrastructure. The energy of modernization is finite — spend it where it matters.

Key Takeaways

Strangler Fig over big-bang rewrite, always — the rewrite that's too big to fail often does
Characterization tests document what the system does, not what it should do — run them before and after to catch silent behavioral changes
Shadow mode finds behavioral differences before they become production incidents — run it for a full business cycle
Never cut over without a tested rollback path — a feature flag that instantly reverts to old behavior is the minimum
Not all legacy code needs modernizing — only that which blocks new development, creates risk, or can't be reasoned about