https://dev.to/brentmitchell/after-5-years-im-out-of-the-serverless-compute-cult-3f6d
I have been using serverless computing and storage for nearly five years and I'm finally tired of it. I do feel like it has become a cult. In a cult, brainwashing is done so gradually, people have no idea it is going on. I feel like this has happened across the board with so many developers; many don’t even realize they are clouded. In my case, I took the serverless marketing and hype hook, line, and sinker for the first half of my serverless journey. After working with several companies small and large, I have been continually disappointed as our projects grew. The fact is, serverless technology is amazingly simple to start, but becomes a bear as projects and teams accelerate. A serverless project typically includes a fully serverless stack which can include (using a non-exhaustive list of AWS services):
- API Gateway
- Cognito
- Lambda
- DynamoDB
- DAX
- SQS/SNS/EventBridge
Combining all of these into a serverless project become a huge nightmare for the following reasons.
Testing
All these solutions are proprietary to AWS. Sure, a lambda function is a pretty simple idea; it is simply a function that executes your code. The other services listed above have almost no other easy and testable solutions when integrated together. Serverless Application Model and Localstack have done some amazing work attempting to emulate these services. However, they usually only only cover basic use cases and an engineer ends up spending a chunk of time trying to mock or figure out a way to get their function to test locally. Or, they simply forget it and deploy it. Also, since these functions typically depend on other developer's functions or API Gateway, there tends to be 10 different ways to authorize their function. For example, someone might have an unauthorized API, one may use AWS credentials, another might use Cognito, and yet another uses an API key. All of these factors lead to an engineer having little to no confidence in their ability to test anything locally.
Account Chaos
Since engineers typically don't have a high confidence in their code locally they depend on testing their functions by deploying. This means possibly breaking their own code. As you can imagine, this breaks everyone else deploying and testing any code which relies on the now broken function. While there are a few solutions to this scenario, all are usually quite complex (i.e. using an AWS account per developer) and still cannot be tested locally with much confidence. Chaos engineering has a time and a place. This is not it.
Security
With all the possible permutations of deployments and account structures, security becomes a big problem. Good IAM practices are hard. Many engineers simply put a dynamodb:*
for all resources in the account for a lambda function. (BTW this is not good). It becomes hard to manage all of these because developers can usually quite easily deploy and manage their own IAM roles and policies. And since it is hard to test locally, trying to fix serverless IAM issues requires deploying to AWS and testing (or breaking) in the environment.
Bad (Cult-like) Practices
No Fundamental Enforcement
Without help from frameworks, DRY (Don't Repeat Yourself), KISS (Keep It Simple Stupid) and other essential programming paradigms are simply ignored. In a perfect world, a team would reject PR's that do not abide by these basic principles. However, with the huge push for the cloud over the past several years, many junior developers have had the freedom to do what they want in the serverless space because of its ease of use; resulting in developers enmasse adopting something that doesn’t increase the health of the developer ecosystem as a whole. AWS gives you a knife by providing such an easy way to deploy code on the internet. Please don't hurt yourself with it.
Copy and Paste Culture
Most teams end up copying code to the new microservices and proliferating it across many services. I have seen teams with hundreds and even thousands of functions with nearly every function being different. This culture has gotten out of hand and now teams are stuck with these functions. Another symptom of this is not taking the time to provide a proper DNS.
DNS Migration Failures
Developers take the generic API Gateway generated DNS name (abcd1234.amazonaws.com) and litter their code with it. There will come a time when the teams want to put a real DNS in front of it and now you're faced with locating the 200 different spots it was used to change it. And, it's not as easy as a Find/Replace. Searching like this can become a problem when you have a mix of hard-coded strings, parameterized/concatenated strings, and environment variables everywhere that DNS name lies. Oh and telemetry? Yeah that's nowhere to be found.
Microservice Hell
This isn't a post about microservices. However, as teams and developers can decide and add whatever they want into their YAML files for deployment, you end up with hundreds of dependent services and hundreds of repositories. Many have different approaches and/or have different CI/CD workflows. Also, I've found that repository structures begin widely diverging. Any perceived cost savings has now been moved to managing all of these deployments and repositories. Here are a few examples of how developers choose to break up their serverless functions by Git repositories:
- Use a monolith for all their API's.
- Separate each API Gateway or queue processors
- Separate "domain" (i.e.
/customers
or /invoices
) - Separate by endpoint (I have seen developers break out a repository for
POST:/customers
while maintaining a separate one for GET:/customers/:id
and so on…).
Many times, developers switch between different styles and structures daily. This becomes a nightmare not only for day-to-day development, but also for any developer getting to a quick-understanding of how the code deploys and what dependencies it has or impacts.
API Responses
The serverless cult has been active long enough now that many newer engineers entering the field don't seem to even know about the basics of HTTP responses. Now there are many veteran developers lacking this knowledge. While this is not strictly a serverless problem, I never have experienced this much abuse outside of serverless. I've seen endpoints returning 200, 400, 500 like normal. Yet another set of endpoints return all 2xx responses, with a payload like:
{
"status": "ERROR",
"reason": "malformed input"
}
Then, another set of endpoints implement inconsistent response patterns dependent on some permutation of query parameters. For example:
Query 1:
/customers?firstName=John
[{
"accountId": "1234",
"firstName": "John",
"lastName": "Newman"
}]
Query 2:
/customers?lastName=Newman
{
"accountId": "1234",
"firstName": "John",
"lastName": "Newman"
}
Inventing New Problems
As mentioned previously, initially deploying these types of services are easy. The reality is there are new problems with these kind of serverless structures that don't typically occur in server-backed services:
- Cold starts - many engineers don't care too much about this. But they suddenly start caring when Function A calls Function B which calls Function C and so on. Without some voodoo warm-up scripting solution, paying for provisioned concurrency, or ignoring it, you may be out of luck.
- In the past five years, prior to our work on our flooring installation marketplace, the teams I have been a part of have always chased the latest features because we had been doing workarounds like FIFO queues, state machines, provisioned concurrency, etc. As teams chase the latest features released by AWS (or your cloud provider of choice), things then become even harder to test and maintain since SAM or Localstack don't match these features for some time.
- Some awful custom eventing solution because… serverless. Engineers think simply putting an API Gateway in front of EventBridge will solve all their eventing problems. What about retries? What about duplicate events? What about replaying events? Schema enforcement? Where does the data land? How do I get the data? These are all questions that have to be answered or documented in a custom fashion. Ok, EventBridge supports a few of these things in some form but it does leave engineers chasing the latest features, waiting for these to become available. However, outside of the serverless cult, these issues can be solved with Kafka, NATS, or other technologies. Use the right tool.
- When it’s not okay to talk about the advantages and disadvantages of serverless with other engineers without fear of reprisal, it might be a cult. Many of these engineers say Lambda is the only way to deploy anymore. There isn't much thought to offline solutions when things need to be run onsite or separated from the cloud. For some companies this can be a fine approach. However, many medium to large organizations have (potentially) offline computing needs outside of the cloud. Lambda cannot provide a sensitive, remote pressure device real-time updates in the event of an internet outage in the middle of Canada during winter.
So, how do I get out of the cult?
In this article, I didn’t plan to address the many options you have to extricate yourself from the grips of mindless serverless abuse. If you're interested, please leave a comment and I will write a follow-up on the different solutions and alternatives to serverless I’ve found as well as some tips to incrementally shift back to a normal life. What I did want to do here was to express the pains I have experienced with serverless technologies over the past couple years now that I have helped architect more traditional VM and container-based tech stacks. I felt compelled to ensure individuals, teams, and organizations know what serverless can really mean for long-term sustainability in an environment.
What does serverless do well?
Deployment and scaling. That's really it for most organizations. For a lot of these organizations, it's hard to find the time, people, and money to figure out how to automatically provision new VM's, get access to a K8S cluster, etc. My challenge to you is to first fix your deployment and scaling problems internally before thinking about serverless compute.
Conclusion
Serverless is one of the hottest new cloud trends. However, I have found it leads to more harm than good in the long run. While I understand some of the problems listed above are not unique to serverless, they are much more prolific; leading engineers to spend most of their time with YAML configuration or troubleshooting function execution rather than crafting business logic. What I find odd is the lack of complaints from the community. If I’m alone in my assessment, I’d love to hear from you in the comments below. I’ve spent a significant amount of time over the last few years working to undo my own serverless mistakes as well as those made by other developers. Maybe I’m the one who has been brainwashed? Time will tell.