When you add another API into your software, its impact is felt long after that initial integration. Developers also need to maintain the connections, which includes identifying issues as they arise. Too often, the problems don’t surface until a user reports a bug. For most companies, that’s way too late, but building tools to expose the problems is only an option for the most show stopping features.
In this discussion, we’ll look at the user experience of API errors—and just how common they are. And we’ll see what’s needed to reduce these API headaches.
Read more about this topic here.
This Hangout features:
- Adam DuVander - Principle at EveryDeveloper.com
- Co-founders of Hoss.com, Matt Hawkins, Cameron Cooper, and Trung Vu
Hoss Hangouts #1 - Transcript
Matt Hawkins: Thanks, guys, for joining us today. I'm Matt Hawkins, CEO and co-founder of Hoss. I'm here with my two co-founders, Cameron and Trung, and we're excited to be joined by Adam DuVander for our discussion today. Adam is a developer and marketer who has worked with API companies like Zapier and SendGrid, and held writing and editorial positions at Programmable Web and Wired. He has a ton of experience with APIs. Would it be fair to call you Mr. API, Adam?
Adam DuVander: It wouldn’t be the first time, sure. It’s good to be here.
Matt: Anything about your background that I missed?
Adam: No, just working with dev tool companies at Every Developer now to help them better communicate all the topics around APIs.
Matt: So let’s dive in. At Hoss, we've talked a lot about how the average team uses over 18 APIs to power their applications.
Adam, I'd love if you can walk us through an example that you've shared previously about the complex web of APIs that is required to build even a very simple bike sharing app.
Adam: Yeah. And, you know, even I hear that number, that 18 APIs number, and that just seems like it can't be, that has to be too big. But then you think through some of the different things. The bike sharing example, I mean, you have geolocation stuff happening there. You have to be able to communicate with the end user, probably in a mobile app. So I mean, you might get to 18 APIs just with the cloud infrastructure to connect all that. Look at your AWS dashboard and you see all the different tools that you need to use, but even around the mapping, you have the actual map that you're displaying, but there also has to be some way to be able to keep track of your fleet of bikes. And likely you're going to be talking to that through an API and then knowing where each one is - and does that look like a latitude and longitude point? And does that compute in your brain? Probably not. So you need some tool to be able to say, you know, here's the address or the intersection where that is.
It then goes into, like, how do you communicate with your customers? Do you send them SMSs and emails? Probably both. So you're going to need tools for that. I mean, you mentioned my background at SendGrid that, you know, that was one of the things that we often said. It's basically like every application needs email. It's got to be in there somewhere. And if someone says no, you say, okay, what happens if someone loses their password?
Also if you're running a bike share app, you likely are charging people money to rent those bikes. So you have to at least have a Stripe to be able to call into. And maybe you are taking multiple forms of payment because someone says, “Oh, I don't want to give you my credit card.I want to be able to use Apple pay.”
I mean, I wasn't counting there, but we might've gotten to 18.
Matt: So I mean, as this complexity builds, what are some of those things that can go wrong with this bike sharing app and who does it affect?
Adam: Yeah. So, each of those different services that I mentioned there could have issues. Even when you're using, I mean, we mentioned some, some big names there that you probably can rely on, like Google for the maps, or Stripe, right, but even those have downtime. Shh, don't don't let Google know that we noticed, but it has down time. It's degraded performance.
And you asked where it shows up. I mean, it eventually shows up, unless you've somehow coded around it completely, in the user experience. And in my experience, a lot of times, you don't even know. That's the problem. So many of these apps are relying on so many APIs, and the people who built them don't actually know if they had a problem.
Matt: Yeah, we hear that all the time from our customers, too. Prior to becoming one of our customers we ask them how they know when they’re having issues with these integrations and the number one answer is that their customers tell them. And they know that's not an acceptable answer, but that's the state of things today because of the complexity. It's because of those 18 APIs, and oftentimes many more than that. It's tough to manage.
So, taking a step back for a second - I’ll jump over to you, Trung - what are some common API issues, from your experience? Are there problems that are a lot more common than others that you generally see as a developer?
Trung Vu: Yeah, for sure. So, generally, I usually categorize API issues into two buckets: issues that are caused by the API producer and issues that are caused by the API consumer. With the first bucket, the common issues in that bucket would be server downtime, network issues, bugs caused by bad code or bad data. To rub salt into the wound, the way these errors or issues manifest themselves to the consumer or to the end user, are very different and vary by provider.
So if you think that you can have one way to handle errors, one standard way, you’re in for some bad times because the moment you integrate with another API, it’s going to break in a completely different way. And you’ll have no idea until your customers tell you about it.
Then there's the second bucket, which is most often overlooked by API consumers - the issues that are caused by them. So they have bad requests, they have requests being rate limited because they went over their usage, but then there can also be network issues. So what if my request leaves my server, but never makes it into the API provider’s? You just get an error, and the API producer doesn't have any issues because they actually never received the request, but your customers see an error. How do you know about that? Most of the time, you don’t.
So yeah, I would say those are the most common issues that API consumers face.
Matt: Yeah, it's a great point about, you know, you're adding new APIs, that introduces new gaps in your visibility. Right. And, then if you don't have the foresight, that gap is going to exist until a customer runs into an issue. And you experience churn and your net promoter scores go down and word of mouth is impacted.
That's a great point. Well, continuing on problem areas, Cameron, I don't know if you have an example of an especially headache-inducing API issue that you've dealt with. I mean, you've built, by now, thousands of integrations.
Cameron Cooper: Yeah actually it's interesting that you mentioned visibility, because I feel like a lot of the biggest headaches that I’ve dealt with have been rooted in visibility. And that could be while I'm in development or that could be after I've gone to production. So for example, when I'm in development, I'm writing an integration with a new platform that maybe I haven't integrated with before, shipping that to development environments that are making API calls. Not having visibility into those API calls, and the responses that you're getting when you're developing - it can be really frustrating because you don't know, did the call work? Did you get an error back? And so then you find yourself building sort of ad hoc visibility solutions, writing debug logs, reading through those logs to see the requesting, maybe the response that you got back, and did it work. A lot of times maybe you didn't have a success, so you need to copy and paste those into an email and send those off to a developer support advocate on the other side.
So visibility has been a really big problem for me while in development mode. But it also continues when you're in a production, because when you've deployed that integration, and now it's in operations, you're a lot of times driving a user-facing feature with that integration. And by default, you get nothing in terms of reliability and visibility and whether or not that integration is continuing to work.
So, I've had integrations in the past where it stops working maybe with an intermittent failure. It works sometimes, but not all the time. That results in an error that surfaces all the way up to a user who might be using the app. And if you don't have visibility - just basic visibility - they're the ones that are telling you that something's broken. And that's been really frustrating for me because, you know, you make these integrations, you expect the API provider to be good, and that when it works in development, it will continue to work in production. That's just not the way that it works. So visibility has been kind of the biggest headache for me.
Matt: Yeah. That's interesting. And so, you know, when you add a new integration, I mean, how do you think about that ongoing maintenance and an ongoing cost? I mean, you're managing teams, you've been a CTO at a number of organizations. Like how do you think about it? And then I guess maybe if you have insight on your peers too, if you think about it differently than most.
Cameron: Yeah. I think that APIs have this really interesting hidden cost. A lot of times that comes in, you know, we kind of talked about user experience, but with engineering costs, there's a huge, hidden cost there. Because once you find out about something, whether it's from a user or you've seen an error in your logs, you start to realize that you need to build additional things, whether it's additional monitoring and reporting or additional reliability features. Things like automatic retries, or even failover from one API provider to another.
These are things that become real costs to an engineering team. They result in tickets that go on your backlog - things that you know that you need to do and eventually want to do - but you're always having to wrestle with that decision of, do we do this now? Or probably put it off for a little bit longer and always having to wonder, like, am I going to pay the price with another incident, another outage, because I decided to defer this? Because you have to decide do I want to build a new feature or do I want to build reliability. That's always a really tough decision for engineering teams to have to make.
Matt: Yeah. Absolutely. Anybody else have any thoughts on that?
Adam: So it reminded me of the time at Zapier - Zapier connects to, I think I saw now two thousands different apps. Right? And like the entire idea of the platform is to keep adding APIs.
So I think for a lot of people, Cameron, to your point, the question is, do you integrate another API? Like does the benefit outweigh that cost, that you are talking about the ongoing costs of, of being able to maintain that integration? At Zapier, we didn't have a choice whether to integrate more. It's like it's a platform and, and SaaS companies are bringing their APIs to it. So yeah, the engineering team there had to be able to tune things and had to be able to look in - and not just the engineering team, actually, - it was also support. Because you get non-technical users saying, and this can happen with a normal app, where a non-technical user emails in and says it doesn't work.
And that's not a lot for a technical person to go on. And even less, if you don't have access to any sort of logging data, any kind of information about the calls that that user has. And so at Zapier, we had all of that, every call we could know how long it took, we could look in and see what the data - with some things removed, things like API keys, you couldn't get to - but being able to have that insight was really important because Zapier is integrated with 2,000 APIs. But that’s the kind of insight that I think any developer would want to have if they could. But I’ll tell you at least on the Zapier side, having talked to the engineers who built that over multiple years, it’s a lot of work to be able to create a system that gives you that kind of insight.
Matt: Absolutely. And you touched on an interesting point there, Adam, right? At the end of the day, the end user doesn't care. If they purchased a service that, you know, in part, because it integrates with QuickBooks, if there's an issue where that sync isn't working, in their mind, they're not thinking, “Oh, it's okay, I’m going to let so-and-so off the hook, there must have been an issue with the QuickBooks API.” They’re like, “No, that was part of your core value prop to me. You’re down, you’re having an issue, I want my money back.” It’s just so core to, you know, as these organizations rely on APIs, and that’s how they sell themselves, it becomes this compounding problem.
So I'm jumping around a little bit, you know, talking about, monitoring. Zapier is tracking 2,000 APIs. We did a study at Hoss. We surveyed developers and in a bunch of different communities. And one of the insights that we learned was that half - 50 percent - of teams do not monitor the third-party APIs that they consume. And I guess my question to the group is, why not? Why don't they?
Trung: Well as a developer, I can give my personal experience. I think developers, especially those that have less experience, they assume that once they write their code, everything will work. There will be no issues with the API they’re calling. And maybe there’s a mental gap between calling something that I write directly versus something someone else owns. Thinking I don’t have the same responsibility. I just don’t have that mindset - like, oh, this is something I didn’t write, but it can impact my customers. So I have to be prepared for that mindset. That mindset takes a while for developers to obtain after many battle scars dealing with headache-inducing APIs. So yeah, that’s my experience. Anything to add, Cameron?
Cameron: Yeah, no, I think you're totally right. It starts with not having enough experience to know that this is something that will happen, something you have to deal with. But then I think on the other end of the spectrum, when you do appreciate that this is a problem and that you actually need to do something about it - this becomes a very big and large problem that you have to solve. There’s a lot of tooling and infrastructure that you have to build and maintain. This becomes tickets on your backlog, things you need to start doing and cost for infrastructure to support it. I think that’s the other reason people don’t do it is because it’s a lot of work.
Adam: It’s hard, yeah.
Cameron: If they’re going to try to dip their toe in the water and just solve one piece at a time, it just takes a long time to get there.
Adam: And there are a lot of different meanings of what monitoring could be and how deep they choose to go in that. I mean, at the very simple level, you can say, “Oh, I call these 18 APIs. Let me ping each of these, you know, once a minute, and there we go, that's all I need to do.”
Well, you might get a response from each of those 18 pings every minute, but that doesn't mean it's the right response. So it's like, does the right data come through? Does it come fast enough? And, you know, am I able to make a follow-up call that needs to use something from the first one? And I mean, there are monitoring tools that allow you to do this, but that is a lot of effort to be able to put together these tests. And let's just say, you go to the effort to do that. So you have these 18 APIs. You have like all of the calls that you make and everything strung together. Now you change your app. You, you change the way you call one of them. You add a new API.
Do you go back and recreate this special web of monitoring calls to match the reality of your app? Like, yeah, in a perfect world, you do. But the reality of developer work is that that is going to get left behind. If it ever looks like the reality, it only looks like it for that day. And then you’ve moved onto other things.
That’s why, if you look at those monitoring tools, a lot of them are focused on the provider side. Because that’s the more likely market. It’s more likely that someone wants to say, “I am providing an API. There’s maybe an SLA I’ve provided alongside this. So I Want to know when my API goes down.” That’s great, except that they have one API and developers are out there using 18. So. It doesn't really solve the problems of the consuming developer.
Cameron: Yeah. You said something really important there, which is a lot of the errors that you encounter are very subjective to the request you made, the credential you used, the network that you were on. Those monitoring tools can’t necessarily solve for that, but they can tell a provider if they’re up or down, because those are less context-sensitive.
Matt: Absolutely. So, the final question goes to you, Adam, because you’re the guest of honor. You’ve said that companies need to do more than just monitor their APIs, that they need, I think how you put it was, “extreme awareness” of their network. So what does extreme awareness look like to you?
Adam: Yeah, I think it goes back to the topic we were just discussing, which is to know all the aspects of the requests you’re making. And to be able to have that - I would say that the system Zapier has is “extreme awareness” - and for them, it’s out of necessity that they have that and to be able to look and see what is the status code of this particular error that came back for this particular user.
I don’t think it has to be like the security guard in the movie with all the screens, like there doesn’t need to be a developer who’s like, “Oh, User 12345 has an error, and I’m going to go investigate that error now.” Right? I think that level of awareness is beyond extreme. But to be able to see that in aggregate, and to be able to say that not just one particular user, but we’re seeing something at a level that we don’t expect to be able to see - that kind of awareness I think is not typical and expected now, but I think we’re moving into a time when that is going to be expected. And developers need to be able to have access to something that gives them that awareness. And when User 12345 emails in and says it doesn’t work, then they can go in and investigate that one thing. But it’s not like the alarms have to blare for one particular user. The alarms blare when you see things that are consistently happening, because really the issue is more likely to be with one API at a time than one user at a time.
Matt: Absolutely. That's a systemic problem. That's an issue that has to be addressed. That's affecting a group of customers, not just one. Yep, absolutely. Excellent point.
Well, Adam, thanks for joining us. Teams that will be watching this might want to learn more about content and especially, you know, talking to developer audiences. How can people find you?
Adam: Yeah, EveryDeveloper.com is the best place to look.
Matt: Fantastic. Love the URL. Well, Adam, thanks again, Trung, Cameron, this has been great. You can learn more about Hoss at Hoss.com.