The Hermit Shrimp

2020.12.16

Building a Better API

It's more than just shoving out data.

Everyone has them. You know who. That vendor that replies to your support tickets on four week intervals, treats you like bacterium, and generally seems to despise you as a customer even though you are giving them money for their service. You're probably already gritting your teeth just thinking about them.

Then your managers come to you and ask you to "integrate" with these vendors to get some data. The managers that have neither care nor need to know how atrocious this vendor is nor any idea what a "data" or "api" even is. They'll come in and say, "Alright, this vendor says they have an API for their customer database we have with them. They can't make the reports we need so you'll just need to figure it out." At this point they'll walk out of the room, feeling proud of their accomplishment of using acronyms and go back to pretending that replying to an email is a difficult and technical job.

Now you're sitting here with a one page printout of a marketing sheet that the vendor provided for their API with no reference to any documentation. You'll head back to your desk and go to the vendor's "next-gen" site fill with fancy buzzwords like "HTML 4.0" and "Java" that was probably built when Netscape was still relevant to search for the mythical documentation for an API. After much searching and dead-ends, you eventually find a PDF on Google that someone uploaded several years ago on their personal blog after they had battled with the vendor's first four levels of support technicians to provide.

You open up this closely guarded book of secrets expecting some spark of brilliance or order, but alas, you are immediately kicked in the shins by one of the foulest creatures you have ever laid eyes upon. You sigh then immediately scroll down the the customer endpoints because that's what your boss needs to show the executives that "number go up", so that they can all nod and give themselves raises.

As a side note, let's say this vendor is holding a database of about five million customers.

You finally scroll through the endlessly detailed endpoints for things that doubtfully anyone would ever use and you lay eyes upon the customer endpoint.

/customers/

parameters:

  • deceased
    • true/false
{
    "customers": [
        1,
        2,
        3,
        ...
    ]
}

You dry-heave in your mouth knowing the evil that has been given to you. A thoughtless design by a vendor attempting the absolute bare minimum. An interaction we all know to well. No pagination, no details, no filters of value, not even an ounce of operational value. A raw skeleton of apathy towards the consumer. You query it out of desperation that possibly the documentation is out of date. Your Postman spins and spins and spins. You spend a bit working on some tickets sent by an aggravated user that is mad because they pressed "delete" and the thing was deleted. Eventually you come back to this task.

"Status: 200 OK Time: 21 minutes, 32 seconds Size: 3.13 MB"

"Alright this is doable" you try to reassure yourself. You then look for the individual customer endpoint.

/customers/{id}

{
    "id": 1,
    "links": {
        "demographics": "/customers/1/demographics",
        "billing_profile": "/customers/1/billing_profile",
        "orders": "/customers/1/orders"
    }
}

Here is where you know everything is going to go sideways. One "pull" of the customer database would be a minimum of 15,000,001 queries. Admittedly these endpoints would be able to crawl in a blistering 2 seconds each. But that's okay, we live in a modern era with fancy technology such as multi-threading don't we? But this is where the pain sets in. API request limits.

"Maxiumum of 1 request per 5 seconds."

The math immediately hits you like a brick. That's a maximum of 17,280 calls in a day. We're talking about two and a half years to pull all the customers data. About this time your boss comes by and asks, "What do you think? We'll have it by next week, riiigghhhtt? I already went ahead and told my boss that this will be an easy integration." You try to explain basic mathematical calculations to him, forgetting that numbers above one hundred are big and scary. Your manager mentally clocks out, smacks the back of your chair, chuckles, makes a bad joke about college football, then heads out on his four hour "working" lunch.

You know that there is no hope in contacting the vendor as they still haven't replied to the last six tickets and twelve phone calls that you have entered in the past three months for system breaking issues. Your next hope is to contact the report writers and ask what data they actually need. Maybe we can shave down some API calls? Maybe we don't need to know that they have a Visa versus a Mastercard set up?

"Yes, we need absolutely all the data that you can get."

You're remindeed that you're little more than a magical puppy that can puke out anything that anyone could ever wish for as long as you receive enough kicks.

But now the most painful dawning realization hits you. The one that hurts. Most of you have probably already seen it. Once you sync the entire customer database, you'll have to start all over again. You're unable to get a list of customers that have changed since the last sync. So everytime you do a sync, you have to do a FULL sync. No partials. You can already hear the voice of the vendor right now:

"Doing a date modified filter would be incredibly difficult under our database achitecture. I'm not even sure how high the bill would be for such an enhancement." - Every Vendor Ever

Now you're peeved. You're going to make this vendor pay for their sins. You create a second user account with a second API key. You run both at the same time. No conflict. Thank God that they're just as terrible at programming everything else as they are their APIs. Here's the shining star in your dark night. You find the API endpoint for creating users, wire it up and spit out 10000 new users as fast as you can. At this point you have sheer horsepower on your side. Time to put those enterprise self-hosted virtual servers to work like never before. With sheer determination and a prayer that the vendor doesn't block this workaround, you're able to pull 172,800,000 calls a day.

You kick back call yourself a magician and pretend that anyone in your office knows your first name while your multi-threading monster is causing havoc in some network engineer's office.

But I digress, the problem identified here is something that really should have never existed in the first place. Just a few simple ounces of effort could have made this entire process a far more efficient expense of time for everyone involved. The biggest question to always ask yourself when writing an API is always, "what are people going to do with this?"

Another fun one to ask is, "Will this make someone want to strangle me in a dark alley?"

With many vendors it often seems that this task is delegated to the greenest college graduate they can find, who's only familiarity with an API is a definition on Wikipedia.

For example, The original customer pull could be greatly improved with only a handful of incredibly simple changes. Just adding a date modified filter and pagination would dramatically increase the productivity of this endpoint. And don't you dare set the max page size to 100 for an endpoint that hits millions. You know who you are. Then the customer by id endpoint could be simplified by moving the most common data from the subqueries up to the parent such as the name, date of birth, and maybe the most recent order. Things that generally everyone would be looking for.

And these are just changes that can be done from a very structured point of view, but we can get some much more creative and flexible than this. Imagine this endpoint:

/customers/{id}

parameters

  • demographics: true/false
  • billing_profile: true/false
  • recent_orders: true/false
    • limit_recent_orders: 10
{
    "id": 1,
    "demographics": {
        "first_name": "",
        "last_name": ""
    },
    "billing_profile": {
        "type": ""
    },
    "recent_orders": [1,2,3,4,5]
}

This is an incredibly flexible API that can work well with an entire spectrum of use cases and needs. And odds are, from a infrastructure point of view, this will almost always place less stress on both systems involved rather than running four separate queries. Don't need the billing_profile section? Set it to false. Need only the most recent order? Awesome. limit_recent_orders=1.

Being an API developer, doesn't have to be an abstract witchcraft of smoke and mirrors, sometimes it's just as simple as asking yourself, "would I want to query this?" If the answer is no, then you should probably take a good look at what you have and either make a better version or if you were able to ask this before the API hits production, you can go ahead and prevent the pain now.