In this article, I am honored to have my ex-coworker & brilliant career payments-engineer, Shardendu Gautam (currently an early engineer at Forage, previously AtoB, Brex, Uber Money) to be my co-author.
I have seen a lot of companies not understanding the nuances of Payments while building their payments platform. That causes long term ramifications on your money movement functions & massive reconciliation issues. Shardendu did a Payments Engineering 101 session with the engineering team at my last startup & everyone there found it so valuable that we thought it was worth sharing with a wider audience.
So let’s talk about nuances of payments engineering & some best practices.
What is payments engineering?
It's an engineering architecture for payments platform that facilitates easy and accurate money movement functions. FinTech is such a regulations & protocol heavy domain that payments platform have to be architected with domain awareness so it’s built for long term scalability & regulatory/compliance framework in mind. As I have discussed in “The match should always add up”, it’s critical for payment products to have higher scrutiny and standards over checks and balances.
Tl;Dr
Do not overwrite values or states in database; log every event (request - response calls)
Add a state-machine on top of this database to access the latest state. Use this latest state for Data Engineering and Data Science purposes (reporting, accounting, finance recon etc.)
Build double entry ledger (reconciliation daily EOD)
Always store money in integer format (in cents), not float ($ in decimal)
Idempotency - consistent response is critical for any money movement functions
Two must haves for audited financials are: Immutable and double entry ledger
It’s important for a Payments PM to map out all the events triggering a payment capture() call . For e.g. in this sample Shopify merchant, it’s the checkout page submit button click that will call payment capture1 event. For your business map out all events & functions that call capture()
There might be more than just the checkout page (where you take credit /debit card information to charge as seen in this sample Shopify merchant). Having a sound state machine level understanding of your system flow shall certainly make your job to find all the places in your codebase evoking payment capture call easier.
Once that has been established, look for second order effects of accepting & dispersing money - like pricing module, promo/loyalty, affiliates, (partial) refunds, price adjustments, wallets, dispensing money etc.
Understanding your codebase & state machine of how your system works is the key for you to architect a good payments engineering platform.
When you are displaying pricing on the product SKU or in the cart, you want to have a consistent approach & not have to rebuild the logic. I have seen codes that have logic baked into different modules, so any change has to be made to all of these modules or the math doesn’t add up causing inconsistent UX. This is typical with spaghetti code. It can certainly be avoided with microservices style architecture. Also, since over time spaghetti codebases needs a lot of tribal knowledge for upkeep maintaining & scaling them becomes difficult.
A charge call can to be one (micro)service in payments module which gets called by other modules & so on. Some of this might seem very intuitive & common sense but trust me, I see the opposite at so many early stage companies. As the company matures, of course these things get streamlined. But for that migration to happen someone has put a lot of thought and effort & build a case for it.
Now that we know what payments engineering is, why it's important & what not to do, let’s discuss some best practices
Data architecture
Double entry ledger & State machine:
Always log every event in an event source log. Append only to make it immutable. This protects the integrity of transactions. It’s also helpful in troubleshooting.
Optimistic lock on (tx_id, event#).
Every request and response should be logged as separate logs
All transactions (auth, capture, refund, void) live in one log table.
Maintain a state-machine on top of the event source logs to always surface the latest state of the transaction. For e.g. (partial) auth > (partial) auth release > charge > refund etc.
Important to note that, there is a difference between state machine and ledger. The ledger is usually state-less and doesn't include auths since that doesn't involve money movement until captured.
DEL is used for reconciling transactions and money in = money out so only used for actual money movements. The transaction state machine is used for keeping track of the state of a transaction so you can invoke the correct operations (eg. can't refund a txn that's only authed but not captured)
Always maintain this as a double entry ledger & recon every night with a processor (Stripe/Adyen) nightly file.
Async message from event source to create double entry ledger.
Source of truth is always product logs (inhouse) Vs external data source (like Stripe, financial invoices etc)
Data engineering & Data Science: DE / DS will use the state machine’s latest state for analysis. Product logs should only be used for troubleshooting. For e.g.:
A simplistic representation of these micrologs/product logs is illustrated in the above table. What DE/DS should store & use is the latest state i.e.
1234567890 | Capture Denied
1234567891 | Auth Denied
1234567892 | Transaction Capture Approved
Storing money & money transactions:
Stripe has set a good standard here. Always store money in cents for e.g. $5 = 500 cents
Calculate all math operations on these stored values (in cents) as integers rounded to the closest dollar.
Always convert back when surfacing it to the end user/externally or while reporting internally & vice-versa.
This is a very important point especially since I have seen several startups with spaghetti code store money in different formats: string, float, integer etc in various places. This causes inconsistencies, pricing/payment issues, customer grievances & outages.
Round up for fraction of cents calculations for consistency
Having said that, storing money is hard; There's no ‘one size fits all'. This is a good guide. Decide what suits best based on your business’ specific use case. When in doubt remember storing money in cents remains the standard practice for most.
Best practices from Uber Payments Engineering
Business logic steps at uber: Auth → ship → capture
How Uber handled split orders & split payments?
At Uber, we kept the complexity of group orders & split payments etc. at the application layer. Then, the payments backend team ‘Gulfstream’ (Uber’s fifth generation collection and disbursement payment platform) handled each & every transaction lifecycle.
For startups (being relatively small & less complex business logic), you can decide how to decouple & handle this.
Reasons for Gulfstream project (large payments migration) at Uber
Easier scaling
Idempotency
Async capabilities
Use AWS instead of internal datastores for industry guaranteed reliability.
Audited financials
If and when your company is graduating and thinking of getting its financials audited, here are some must haves to make the life of the payments team & DA/DA supporting finance in this exercise smooth & streamlined:
immutable ledger for auditors,
Having a double entry ledger system from day 1
This is a high level intro to Payments Engineering 101 & some best practices from Shardendu and myself. If you want to continue exploring this topic, below are some useful resources.
Further Reading & Research Materials
Modern Treasury:
Our ex-coworker Gergely Orosz’s Pragmatic Engineer newsletter - Designing a payments system
Capture here means payment capture call where it solidifies money movement entry from Issuing bank to Merchant bank. Best practices recommend calling auth() before capture(); call however for simplicity capture will be used in this article.
Also, networks and processors do not dictate an auth. You can always do a direct capture call .However, auth before capture is highly recommended.