Are you ready to learn how to become a superhero in the world of GraphQL? By giving clients superpowers, GraphQL opens up a world of possibilities, but as Uncle Ben famously said, “with great power comes great responsibility.” In the next few sections, we’ll dive into some strategies that will help you secure your GQL server from any nefarious queries lurking in the shadows. It’s time to don your cape and protect your GraphQL endpoint like the true hero you are! 🦸♀️🦸♂️
What is GraphQL?
GraphQL is a query language for APIs (although often confused with being a database technology) that was developed by Facebook in 2012 and was open-sourced three years later. It is a powerful alternative to REST APIs and enables more efficient and flexible data fetching.
In essence, GraphQL is designed to allow clients to specify exactly what data they need and receive only that data in response. This is one of the key advantages of GraphQL which in contrast to traditional REST APIs, where clients often receive more data than they need (Overfetching) and have to make multiple requests to get all the necessary information (Underfetching).
Suppose we have a car cyclopedia platform with the following REST API endpoints:
GET /cars GET /cars/{id} GET /cars/{id}/parts GET /manufacturers/{id} GET /manufacturers/{id}/cars
To retrieve a car and its parts, we need to make multiple requests to different endpoints. For example, to retrieve the car with ID#123 and its parts, we would need to make the following requests:
GET /cars/123 GET /cars/123/parts
With GraphQL, clients can specify the shape of the data they need using a hierarchical structure, which is called a query. Here’s an example of a GraphQL query to retrieve the car with ID 123 and its parts, along with the manufacturers who made the car:
query { cars (id: "123") { id model price manufacturers { id name } parts { id model manufacturers { id name } } } }
This query specifies the exact data we want to retrieve and its structure. The response will only include the requested data, which can improve performance and reduce network usage. Additionally, we can retrieve data from multiple resources in a single request, which can simplify our code and reduce the number of API calls we need to make.
Seems Great! But why do we need extra security?
Already you may have got the idea that GraphQL gives enormous power to clients (It’s Over 9000!!!). Because clients can create extremely complicated queries, our servers need to be ready to handle them. Besides, these requests may be malicious ones coming from bad clients, or they could just be very big ones coming from good clients that could potentially take our GraphQL server down. So, it is our responsibility to impose some strategies to avoid these risks.
Timeout
Most of the time we just use the simplest one which is a timeout and call it a day. But for GraphQL, setting the optimal timeout is very tough as a legitimate complex query may take more than the time allowed, which may result in strange behaviors. Also, damage can already be done even before the timeout halts the execution. So, what we can do is to use this as a final protection and apply some advanced strategies beforehand.
Maximum Query Depth
GraphQL provides a strongly typed schema that allows for validation and documentation of the API. As we covered earlier, clients using GraphQL may craft any complex query they want. Since GraphQL schemas are often cyclic graphs, this means a client could craft a query like this one:
query IhaveComeToBargain { # depth: 0 bargainer (id: "strange") { # depth: 1 dormammu { # depth: 2 bargainer { # depth: 3 dormammu { # depth: 4 bargainer { # depth: 5 dormammu { # depth: 6 bargainer { # depth: 7 # You cannot do this forever . . . # Actually I (client) can!!! } } } } } } } }
In the above example, the query has a depth of possibly more than 7. If we know our schema well, we can have an idea of how deep a legit query can go and set a Maximum Query Depth threshold for the client. The GraphQL server then can reject or accept a request based on its depth by analyzing the query document’s abstract syntax tree (AST).
Query Complexity
Sometimes, depth alone cannot force all abusive queries. For example, a query requesting an enormous number of nodes on the root will be very expensive but unlikely to be blocked by a query depth analyzer. Some fields in our schema could be more complex to compute than others. To calculate how complex the query is let us give each field a complexity of 1. Then take a look at this example:
query { car (id: "123") { # complexity: 1 manufacturer { # complexity: 1 name # complexity: 1 } } }
But what if we wanted more cars produced by the manufacturer of car#123? So, we should consider arguments also when calculating complexity.
query { car (id: "123") { # complexity: 1 manufacturer { # complexity: 1 name # complexity: 1 cars (first: 100) { # complexity: 100 model # complexity: 1 } } } }
Throttling
The above solutions are great when we want to stop clients who are making large queries. But what if the client is trying to abuse our server with a lot of medium sized queries?
In REST APIs, a simple throttle is enough to stop clients from requesting resources too often. In GraphQL just throttling on the number of requests won’t help us. Also, the server has no idea about the number of requests that are acceptable, because that is actually defined by clients.
Throttling Based on Server Time
We can estimate how expensive a query is based on the server time it needs to execute. Then we can deduce maximum server time a client can use over a certain time delta. After we can decide how much time should be added to a client over time, we can implement the leaky bucket throttling algorithm.
Suppose we give our maximum server time (Bucket Size) allowed to 2000ms; clients get 200ms of server time per second (Leak rate). If a single query takes 500ms to complete, the client can call this query 2000ms/500ms = 4 times. After that, any more execution would be blocked until more server time is added. After 3 seconds, bucket size will fill up to 600ms and the query can be executed one more time.
Throttling Based on Query Complexity
Throttling based on time is good because clients can now call complex queries less often as it takes more time to compute and call simple queries more as they take less time to finish. But the problem is we need to inform clients of these constraints so that they can configure themselves accordingly. But server time is not easy to explain to clients as the client doesn’t know how much time a query can take without trying them first, requiring trial and error. Also, the server might have other side effects running which could slow the query execution time.
The GitHub public API uses some interesting strategies to throttle their clients. They impose 2 limits (Node limit and Rate limit) to protect their APIs. In Node limit, Node calculations are done exactly the way we saw earlier in the Query Complexity section. In the Rate limit they calculate based on the number of requests a query has to fulfill. But we know a single GraphQL query makes a single request to the server. How does that even work!?🤔 Using their strategy in our cars example will make things a bit easier to understand.
query { manufacturer (first: 50) { # complexity: 50 name # complexity: 1 cars (first: 100) { # complexity: 100 model # complexity: 1 parts (last: 20) { # complexity: 20 model # complexity: 1 manufacturer (first: 10) { # complexity: 10 model # complexity: 1 } } } } }
Here we need to add up the number of requests needed to fulfill each unique connection in the call. We are actually calling it a request each time we want to connect to a unique list to get its items. Each new connection can also have its own point value. This is because you can have multiple systems(database, legacy or third—party systems) behind a single, coherent GraphQL API. So, each connection can have a different load on the server. But for simplicity, we are giving fixed 1 point to each connection here.
- First, we are getting the first 50 manufacturers from the list. Although the complexity is 50, the API has to connect to the manufacturers list once. So, total requests = 1.
- Then, we are getting 100 cars made by the manufacturers which we got from each of the 50 manufacturers. So, total requests = 50.
- Then, we are returning 20 parts from each of the probable 100 cars x 50 manufacturers = 5000 cars. So, total requests = 5000
- If we dive more, we are getting 10 manufacturers from each of the potential 5000 cars x 20 parts = 100,000 parts. So, total requests = 100,000
- Total = 100,000 + 5000 + 50 + 1 = 105051
As the number can get even bigger in the real world, we can normalize the number by dividing it by 100 to get the final aggregate cost. In our example, we get a final score of around 1051. Then we can set the optimized bucket size and leak rate accordingly and express them to the clients.
That’s it!
We have discussed some of the approaches to secure our GraphQL server against unexpected queries. You can also use data loaders and implement caching to minimize the number of requests to backing data sources even more. But we need to keep in mind that none of the approaches can solve world hunger alone. So, we should know their limits and decide accordingly.