REST API Best Practices: HTTP Status Codes and You, Part 1 - Introduction
How many HTTP Status Codes can you name? We all know a few of the major ones to be sure: 200 - OK, 301 - Moved, 404 - Not Found, 500 - Unavailable... Would you believe me if I told you there were more than 60? You don't have to, check it out for yourself: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
Often when building REST APIs, I have found people defaulting to some of the better known HTTP Status codes - to a fault. Sometimes this is because they are simply updating existing services where the misuse is already coded to duplicate the functionality as part of new REST APIs initiatives, and sometimes it is because they are using codes that work even though there are more specific and better status codes of which they are unaware. An example may be using a 403 Forbidden in place of a 429 for rate limit exceptions. Some services have major aberrations, such as using HTTP Status 200 (Success) to relay everything, even failure messages. While REST APIs may be functional under these misuse conditions, a core concept of REST APIs are that they gravitate to orchestration and automation use cases, and by using HTTP Status Codes improperly, we are doing a disservice to consumers by limiting the logic they can execute around them.
The proper use of HTTP Status Codes in REST API building is very powerful, as they supply logic to the consumers/developers about what is happening and why, so they can automate operations or provide programmatic orchestration. This helps not only the consumer, but can also have valuable effects on your service as well. One mindset I have come across worth addressing is that that status responses should be as generic as possible to prevent interpretation from providing insight for attacks. I think it is worth addressing this concept early on. The power and functionality provided by proper use of HTTP Status Codes should be discarded to support security through obscurity. Not only is this not best practice for security, as we should not be relying on generalizing errors to protect our services, but it also limits the SLA/metric based insights on service consumption and performance for analyzing API/app/user behavior to identify threats, which actually does us a disservice from the security perspective.
To illustrate these points a bit, let's look for example at an issue I see all to often, returning a 404 Not Found as a generic response when an application is down or non-responsive, perhaps for maintenance, instead of a 503 Service Unavailable. If a client receives a 404 back, their understanding is that the resource is no longer present or they are calling an inappropriate location to access it. This tells the client that there is an issue on its end that needs to fix. A developer may then spend cycles reviewing the error, attempting to call higher level APIs to discover where the resource may be, opening support tickets, etc. Perhaps a developer is aware that in an outage a 404 will be returned when in fact the resource in no longer available, so they simply have their client retry multiple calls to attempt to get a response and add unnecessary load to the system. Let's now look at this from the perspective of returning an appropriate 503 back to the client. It relays the insight that the service is down, and the issue is not on the clients end. The developer will not waste as many cycles investigating the issue, the client might have logic added to wait a while before attempting a new call, fail over to another endpoint to continue operations, or simply return this information to allow the insight that the client is working as expected but there is an outage on the resource side that needs addressed. For your services, your support team spends less cycles looking into why the developer cannot access the resource, you have less resource calls while the service is down which might mean less queued requests or skewed metrics, and the 503 response can be monitored or reported which may lead to a more rapid resolution of unexpected outage issues. This simple use case of the different between using a 404 and 503 HTTP Status is just one of many ways you can better architect your REST APIs to allow for best use.
In the following weeks, I hope to publish some insights on the proper usage of HTTP Status Codes when building REST APIs. To start, let's explore the five major categories of HTTP Status Codes as they pertain to Monopoly.
1xx: Flip a property card. (Information)
2xx: Passed GO, Collect $200! (Success)
3xx: Doubles... reroll, or Advance to X. (Redirection)
4xx: Go directly to jail, do not pass GO, do not collect $200! (Client Error)
5xx: Bank error, collect $200! (Server Error)
First up, we will explore the HTTP Status Code 2XX series. Success - Continue forth, pass GO! These are the ideal outcomes from clients calling our services. Learn more in the upcoming post.
Often when building REST APIs, I have found people defaulting to some of the better known HTTP Status codes - to a fault. Sometimes this is because they are simply updating existing services where the misuse is already coded to duplicate the functionality as part of new REST APIs initiatives, and sometimes it is because they are using codes that work even though there are more specific and better status codes of which they are unaware. An example may be using a 403 Forbidden in place of a 429 for rate limit exceptions. Some services have major aberrations, such as using HTTP Status 200 (Success) to relay everything, even failure messages. While REST APIs may be functional under these misuse conditions, a core concept of REST APIs are that they gravitate to orchestration and automation use cases, and by using HTTP Status Codes improperly, we are doing a disservice to consumers by limiting the logic they can execute around them.
The proper use of HTTP Status Codes in REST API building is very powerful, as they supply logic to the consumers/developers about what is happening and why, so they can automate operations or provide programmatic orchestration. This helps not only the consumer, but can also have valuable effects on your service as well. One mindset I have come across worth addressing is that that status responses should be as generic as possible to prevent interpretation from providing insight for attacks. I think it is worth addressing this concept early on. The power and functionality provided by proper use of HTTP Status Codes should be discarded to support security through obscurity. Not only is this not best practice for security, as we should not be relying on generalizing errors to protect our services, but it also limits the SLA/metric based insights on service consumption and performance for analyzing API/app/user behavior to identify threats, which actually does us a disservice from the security perspective.
To illustrate these points a bit, let's look for example at an issue I see all to often, returning a 404 Not Found as a generic response when an application is down or non-responsive, perhaps for maintenance, instead of a 503 Service Unavailable. If a client receives a 404 back, their understanding is that the resource is no longer present or they are calling an inappropriate location to access it. This tells the client that there is an issue on its end that needs to fix. A developer may then spend cycles reviewing the error, attempting to call higher level APIs to discover where the resource may be, opening support tickets, etc. Perhaps a developer is aware that in an outage a 404 will be returned when in fact the resource in no longer available, so they simply have their client retry multiple calls to attempt to get a response and add unnecessary load to the system. Let's now look at this from the perspective of returning an appropriate 503 back to the client. It relays the insight that the service is down, and the issue is not on the clients end. The developer will not waste as many cycles investigating the issue, the client might have logic added to wait a while before attempting a new call, fail over to another endpoint to continue operations, or simply return this information to allow the insight that the client is working as expected but there is an outage on the resource side that needs addressed. For your services, your support team spends less cycles looking into why the developer cannot access the resource, you have less resource calls while the service is down which might mean less queued requests or skewed metrics, and the 503 response can be monitored or reported which may lead to a more rapid resolution of unexpected outage issues. This simple use case of the different between using a 404 and 503 HTTP Status is just one of many ways you can better architect your REST APIs to allow for best use.
In the following weeks, I hope to publish some insights on the proper usage of HTTP Status Codes when building REST APIs. To start, let's explore the five major categories of HTTP Status Codes as they pertain to Monopoly.
1xx: Flip a property card. (Information)
2xx: Passed GO, Collect $200! (Success)
3xx: Doubles... reroll, or Advance to X. (Redirection)
4xx: Go directly to jail, do not pass GO, do not collect $200! (Client Error)
5xx: Bank error, collect $200! (Server Error)
First up, we will explore the HTTP Status Code 2XX series. Success - Continue forth, pass GO! These are the ideal outcomes from clients calling our services. Learn more in the upcoming post.
Deep dive HTTP Status Codes in the 2xx range here: https://danielwille.blogspot.com/2018/04/rest-api-best-practices-http-status_9.html
ReplyDelete