Overview
A production financial infrastructure layer built for a marketplace platform handling real money movement. This covers the full lifecycle: ledger accounting, invoice generation, payment capture, refund processing, and audit reporting.
Financial systems are unforgiving. A bug that corrupts a user's account balance or creates a broken audit trail isn't a P2 ticket — it's a business-critical incident that can have regulatory implications.
Problem
The platform had grown organically, with payment logic scattered across multiple Django views, signals, and manual SQL updates. There was no centralized ledger — account balances were computed by summing transactions on-the-fly, with no immutable record of state changes. Refunds were handled by ad-hoc reversals that sometimes left inconsistent state.
The business needed:
- A complete audit trail of every financial event
- Accurate real-time balance computation without full table scans
- Reliable refund processing with multi-step approval workflows
- Monthly reconciliation reports for external auditors
Constraints
- Correctness over speed: Incorrect financial data has legal and regulatory consequences
- Immutability: Once recorded, a ledger entry must never be modified — only offset
- Double-entry: Every debit must have a corresponding credit
- Idempotency: Payment webhooks can be delivered multiple times; processing must be idempotent
- Isolation: Financial transactions must use database-level serializable isolation
Architecture
Double-Entry Ledger Model
The core insight: don't store balances — store entries. A balance is always computed from the ledger, never stored independently.
Account Table
│ - ID, account_type, owner, currency, created_at
│
▼
LedgerEntry Table (append-only)
│ - ID, account_id, debit_amount, credit_amount
│ - transaction_id (FK → Transaction)
│ - created_at, description, metadata (JSONB)
│
▼
Transaction Table
│ - ID, status, initiated_at, settled_at
│ - idempotency_key (UNIQUE — prevents double processing)
│ - entries: [LedgerEntry, LedgerEntry, ...]
│
▼
Balance View (materialized, refreshed on write)
SELECT account_id,
SUM(credit_amount) - SUM(debit_amount) as balance
FROM ledger_entries
GROUP BY account_id;
Transaction State Machine
Every transaction moves through explicit states with logged transitions:
INITIATED → PROCESSING → SETTLED
↘ FAILED
SETTLED → REFUND_INITIATED → REFUND_PROCESSING → REFUNDED
↘ REFUND_FAILED
Technology Decisions
| Decision | Choice | Why |
|---|---|---|
| Isolation level | SERIALIZABLE | Prevents phantom reads on balance queries |
| Balance storage | Computed from entries | Immutable source of truth; no balance drift |
| Idempotency | Unique idempotency_key + SELECT FOR UPDATE | Handles duplicate webhook delivery safely |
| Refund workflow | Celery chain with approval step | Multi-party sign-off; retryable on failure |
| Audit log | Separate append-only table | Never modified; separate backup schedule |
| Reconciliation | Materialized view + scheduled Celery job | Fast reporting without full scan |
Implementation
Atomic Ledger Entry Creation
Every financial operation creates matching ledger entries in a single database transaction:
from django.db import transaction
class LedgerService:
@staticmethod
@transaction.atomic
def record_payment(
payer_account: Account,
payee_account: Account,
amount: Decimal,
idempotency_key: str,
) -> Transaction:
"""
Record a payment as a balanced double-entry ledger transaction.
Idempotent: same key produces same result if called twice.
"""
# Check for existing transaction with this key
existing = Transaction.objects.filter(
idempotency_key=idempotency_key
).first()
if existing:
return existing # Idempotent return
txn = Transaction.objects.create(
idempotency_key=idempotency_key,
status=TransactionStatus.PROCESSING,
)
# Debit payer
LedgerEntry.objects.create(
account=payer_account,
transaction=txn,
debit_amount=amount,
credit_amount=Decimal("0"),
description=f"Payment {txn.id}",
)
# Credit payee (balancing entry)
LedgerEntry.objects.create(
account=payee_account,
transaction=txn,
debit_amount=Decimal("0"),
credit_amount=amount,
description=f"Payment received {txn.id}",
)
txn.status = TransactionStatus.SETTLED
txn.settled_at = timezone.now()
txn.save()
return txn
Refund State Machine
Refunds are the hardest part of financial engineering. They require approval steps, can partially fail, and must not double-refund:
class RefundService:
@staticmethod
@transaction.atomic
def initiate_refund(
original_transaction: Transaction,
amount: Decimal,
reason: str,
requested_by: User,
) -> Refund:
"""
Initiate a refund. Creates a Refund record in PENDING state.
Amount must not exceed original transaction amount.
"""
if amount > original_transaction.settled_amount:
raise ValidationError("Refund exceeds original transaction amount")
if Refund.objects.filter(
original_transaction=original_transaction,
status__in=[RefundStatus.PENDING, RefundStatus.APPROVED, RefundStatus.PROCESSING]
).exists():
raise ValidationError("Refund already in progress for this transaction")
return Refund.objects.create(
original_transaction=original_transaction,
amount=amount,
reason=reason,
requested_by=requested_by,
status=RefundStatus.PENDING,
)
Failures & Lessons
Failure 1: Race condition on balance check
Early versions checked balance with a simple SELECT, then processed. Under concurrent load, two requests could both read the same balance and both approve a payment that should have been declined. Fixed with SELECT FOR UPDATE and database-level serializable isolation.
Failure 2: Webhook double-processing
Payment gateway delivered the same webhook twice in quick succession during a network retry. Without idempotency keys, this created duplicate transactions. Fixed by adding a idempotency_key unique constraint that rejects duplicate processing.
Failure 3: Refund without matching ledger entries Initial refund implementation updated transaction status without creating corresponding ledger entries. The balance computed correctly at first, but audit reconciliation failed. Every financial state change must create ledger entries — no exceptions.
Failure 4: Lack of explicit state machine Transaction statuses were modified directly without going through a state machine. This allowed invalid transitions (e.g., FAILED → SETTLED). Adding an explicit state machine with enforced valid transitions caught and prevented several data integrity issues.
Metrics
| Metric | Value |
|---|---|
| Ledger accuracy | 100% (auditor verified) |
| Reconciliation time | ~5 minutes (materialized view) |
| Refund SLA compliance | 99.7% within 24h |
| Idempotency collision rate | 0.03% (successfully deduplicated) |
| Race condition incidents | 0 (post-fix) |
Future Improvements
- Event sourcing — Move to full event sourcing; current approach is close but not pure; enables richer audit queries
- Multi-currency — Current system is single-currency; adding FX handling requires a rates service and additional ledger entry complexity
- Real-time reconciliation — Current reconciliation is scheduled; move to continuous reconciliation with alerting on discrepancy
- Distributed transactions — If system splits into microservices, saga pattern needed to replace atomic DB transactions
Key Takeaways
- Never store computed balances separately from ledger entries — they will drift
- Double-entry bookkeeping is not overengineering for financial systems — it's the minimum viable correctness guarantee
- Idempotency keys on every payment operation are non-negotiable
- Database-level SERIALIZABLE isolation is required; application-level locking is insufficient under concurrent load