Implementation Notes

You are probably familiar with the concept of webhooks, wherein one party sends an HTTP request containing all the new data to another party, ie. they push data. We do the inverse, where you request the data from us, which is to say you have to pull data.

Synchronization is a hard problem. Because our system is so radically different from most others we have compiled a bunch of notes for developers to ease implementation. This section is not required reading but it will make your life considerably easier and bulletproof your integration.

Sequential Changes

To illustrate how it works at scale, let's imagine a sequence of 10 events that goes:

1-3: First, the shop receives 3 new orders. We call them trn#1, trn#2 and trn#3.
4-5: The shop owner ships the merchandise for trn#1 and trn#3 but doesn't have stock for trn#2, so he only captures trn#1 and trn#3, leaving trn#2 alone for now.
6-7: The shop then receives a new subscriber we call sub#1 and automatically charges it, creating trn#4.
8-9: The shop owner ships the merchandise for trn#4 and captures it and the customer who made trn#2 has decided to cancel the order, so the shop owner voids it.
10: Finally, the shop receives a new order, trn#5.

You might expect that this would mean you need to synchronize 10 times to retrieve all 10 events, but that's not the case. Let's look at how this sequence of events might be synchronized visually. Note that each event is denoted only by the database identifier and not by what has actually changed.

As you can see, this example only synchronizes 3 times to retrieve all 10 events and then once more where it receives no new events. Each seq request retrieves the next part of the sequence and the new position after all the changes have been merged; that is to say, you are not guaranteed to receive the entire sequence, and in this example, you see it's been broken up into 3 pieces. To put it another way, you need to call seq in a loop until the "seq" value you receive back is the same as the one you called with.

You may have noticed that the 3rd synchronization only returned 3 changes despite advancing 4 events in the sequence. That happens because we're not actually sending deltas but rather the entire transaction or subscriber whenever a change occurs. This means that we can (and do) deduplicate the entries we send back. For that reason, you can't simply count the number of entries in the "changes" array to advance the counter; instead, you must use the "seq" value you receive from us.

Recommendations and pitfalls

If you're working with a language like PHP, you probably read the introduction and thought to yourself that you're just going to run your synchronization code whenever you receive a ping request. Before you do that, let's go over the one massive issue you're likely to smash head-first into if you do that:

Multiple synchronizations happening simultaneously.

Except for a bit of hysteresis, we send pings immediately when there is something new for you to fetch, and we do not wait for you to respond before sending another one. In practice, this means that if you get 2 transactions in quick succession, you will receive 2 pings, and the first synchronization will not have time to finish before the second ping arrives to start the second synchronization. This can be a big problem as you will be running into lots of potential TOCTOU races. Consider this timeline of events:

In this example, not only is trn#1 being processed twice simultaneously, possibly messing up its data in unpredictable ways, but because the left process was slower than usual and the right process was faster, the older "seq" value was saved over the newer one.

The first solution is simple. All you need to do is take a lock before beginning the synchronization and release it when you finish. Now, you will only ever have 1 synchronization process running at a time.

This, too, can have issues. For example: if you're using a MariaDB lock in PHP and your script crashes for whatever reason, you'll find that the lock hasn't been released. This is because PHP multiplexes multiple scripts into a single process, and it shares the MariaDB connection between them for performance reasons. Similarly, you can't just use flock() because the lock it takes belongs to the process and will stick around after a crash. But we can do better.

The synchronous way

As was pointed out in the introduction, you fundamentally only ever need to run one single process that keeps calling seq in an infinite loop. This is a perfectly viable solution, but the problem then becomes determining when to wait, and for how long. A simple version of this might look something like this:

localSeq := 0
loop
    newSeq := localSeq
    while newSeq ≤ localSeq do
        newSeq := WaitForPing()
    end while
    data := JSONDecode(GET "https://api.scanpay.dk/v2/shopid:seq:" ‖ localSeq)
    for each change in data.changes do
        Update(change)
    end for
    localSeq := data.seq
end loop

However there's a race condition here with ping here. Suppose a new transaction is created after the request to seq has returned but before WaitForPing has been called. Now, the code is stuck waiting for a ping that has already been sent, and while it will eventually come unstuck when the next ping arrives, we can do better.

All that needs to change here is the WaitForPing loop. Suppose we replace it with a new function called WaitForSeq, and instead of simply waiting for the next ping to arrive, it looks at the last ping received and returns immediately if its "seq" value is greater than the "seq" value from the last successful call to seq. Using common thread primitives, it might look like this:

global LastSeqFromPing := 0

function WaitForSeq(localSeq)
    Lock()
    while LastSeqFromPing ≤ localSeq do
        ConditionWait()
    end while
    Unlock()
end function

function UpdatePing(ping)
    Lock()
    if LastSeqFromPing < ping.seq then
        LastSeqFromPing := ping.seq
        ConditionSignal()
    end if
    Unlock()
end function

This is the most efficient way to synchronize.

Atomic updates

So far, we've only discussed the need to update orders in your system with data from ours but not actually how to do it. The fastest way to do it is with atomic (transactional) database updates.

Let's imagine a simple scenario where you only want to send a confirmation email whenever you receive payment for a new order. It sounds simple, but there are a multitude of pitfalls to be aware of. Suppose you're looking at a transaction received via seq. The way to find the matching order in your database is with the "orderid" field. You might have noticed that there is also a field called "id". This is the immutable ID of the transaction or subscriber in our system, and it, alongside the "rev" field, which denotes the revision, or version, of the entry, will allow you to build a general and very efficient database transaction. In this simple case, it might look something like this:

trnid, email := MariaDB(
    BEGIN;
    SELECT trnid,email FROM orders
        WHERE trnid='trn.id' OR orderid='trn.orderid' FOR UPDATE;
    UPDATE orders SET trnid='trn.id',rev='trn.rev',…
        WHERE (trnid='trn.id' OR orderid='trn.orderid') AND rev<'trn.rev';
    COMMIT;
)
if trnid=0 then
    SendMail(email, "Order confirmation …")
end if

Here, we use a trnid field in the database both as an index for fast lookups when updating an order and to determine whether we've already matched this order with a transaction ID, using the knowledge that IDs are always non-zero. We also use a rev field to avoid writing to the database if we've already updated the data to this or a newer revision.

But wait because we can go even further than that. If you have access to atomic updates you can do fully asynchronous synchronization. A simple version of that might look like:

localSeq := MariaDB(SELECT seq FROM globals)
data := JSONDecode(GET "https://api.scanpay.dk/v2/shopid:seq:" ‖ localSeq)
for each change in data.changes do
    Update(change)
end for
MariaDB(UPDATE globals SET seq='data.seq' WHERE seq<'data.seq')

In this case, you would run this code whenever you receive a ping. We have come full circle and are now running multiple synchronization processes simultaneously. We can do this because we eliminated the data races using atomicity.

This last approach has the lowest latency, but the trade-off is potentially making many unnecessary seq requests.

Frequently Questioned Answers

What do I do if synchronization fails?

Simply don't save the new "seq" value you received and allow the synchronization to run again. No data has been lost, and synchronization can proceed from the same point next time. A nice side effect of this is that you can always synchronize the entire sequence from the beginning or any arbitrary point thereafter should you need to for any reason, such as when restoring from a backup.

How do I know what has changed in any given transaction or subscriber?

If you need that information, you must keep track of it yourself. Referring to the example figure, let's assume this synchronization occurred after all 10 events had already happened. You might expect that on the …:seq:0 request, you will get a trn#3 that looks like (a), and then on the …:seq:3 request, you'll get a trn#3 that looks like (b); however you will, in fact, get (b) both times.

{
    "type": "transaction",
    "id": 3,
    …
    "acts": []
}

{
    "type": "transaction",
    "id": 3,
    …
    "acts": [
        {
            "act": "capture",
            …
        }
    ]
}

(a) Without capture action

(b) With capture action

So why is this? It happens because we don't track what has happened to any given transaction or subscriber, only that something has happened and in what order it happened relative to other transactions and subscribers. If you look at the example figure, I'll ask you again to note that the only things recorded, represented by the square boxes, are the IDs of the transactions and subscribers that have changed. In practice, the ping arrives so quickly that you're likely to only ever see a single event in the "changes" array when calling seq, but that's not a property you may depend on. This is especially important to get right as it will likely only ever be relevant under heavy load, such as on Black Friday.

Are changes ordered chonologically?

Yes and no. Because changes are whole objects and not deltas and we deduplicate them before sending, perfect chronological ordering is lost; however we do provide strong guarantees about ordering.

Internally the sequence database is a digraph sorted in dependency order. It provides 2 strong guarantees:

The first instance of DB#N will always appear later than the first instance of DB#N-1.
The first instance of a charge on sub#N@M will always appear later than the first instance matching sub#N@≥M.

This, of course, only applies to the sequence as a whole, not to every subset of it.

Why would you foist this upon us?

Despite the obvious complexity on display here, do not mistake it for being the result of choosing a pull method. Synchronization is an unfortunately difficult problem regardless of methodology. The webhook (push) method suffers the exact same issues laid out here but also much, much worse ones, such as:

Any implementation of strong ordering guarantees would incur a potentially enormous latency penalty.
The pushed data by necessity becomes use-it-or-lose-it, because the ability to resend all data would, in effect, be a DoS attack button.
If the specified endpoint changes, then we would potentially be sending sensitive data to an unknown third party.
If there is a network or server outage, then we would have to resend the data. The question then becomes: how many times do we resend it, and when? This can (and frequently does) result in permanent data loss.
If a system isn't responding correctly to these requests, then we will effectively be conducting a DoS attack, potentially on both our customers and ourselves, by constantly resending.

The latter two issues can (and have, many times) manifested in the need to shut down these callback systems due to extreme server loads. This, again, results in permanent data loss. We consider this unacceptable.