Microservices Done All Wrong! Google Proposes New Method, Costs Reduced by 9x!

The year of microservices' "retrograde."

For a long time, microservices have been considered the de facto standard for cloud-native service application architecture, regardless of whether it's a large or small company. However, in 2024, not only has DHH of 37signals decided to move off the cloud and abandon microservices, but even cloud giants like Amazon and Google are leading the charge to revolutionize microservices.

Google Can't Sit Still: We Did Microservices All Wrong!

"When writing distributed applications, the conventional wisdom is to split applications into independent services that can be launched independently. The intention behind this approach is good, but microservice-based architectures like this often backfire, bringing challenges that negate the benefits the architecture tries to achieve."

In June this year, a group of Google employees (led by Google software engineer Michael Whittaker) published a paper titled "Towards Modern Development of Cloud Applications," which began by criticizing the current microservice architecture.

The article argues that, architecturally, microservices themselves have a problem; they are a structure without boundaries: "Fundamentally, this is because microservices conflate logical boundaries (how code is written) with physical boundaries (how code is deployed)."

Therefore, Google's engineers proposed an approach that can be called "Microservices 2.0." It involves building applications as logical monoliths but entrusting them to an automated runtime, which can decide where to run workloads based on what the application needs and what is available.

Based on the newly proposed structure, they were able to reduce system latency by 15 times and costs by 9 times.

"Starting with organized, modular code, we can treat the deployment architecture as an implementation detail," Google developer advocate Kelsey Hightower stated in October regarding the next steps for this work.

These Google developers found the drawbacks of splitting applications into independently deployable services too obvious and provided three highly innovative principles:

Encourage developers to write monolithic applications divided into logical components.
Postpone the challenges of physical distribution and execution of modular monoliths to runtime.
Deploy applications atomically.

These three guiding principles bring many benefits and will open doors for future development innovations.

Amazon Prime Video Team: Abandoning Microservices for Monolith

Coincidentally, also in June, a case study released by Amazon's streaming platform Prime Video seemed to change the tide: "We abandoned serverless, microservice architecture and replaced it with a monolithic architecture, which saved customers 90% in operating costs and simplified system complexity."

A "counter-attack" by monolithic applications against microservices, proposed by an Amazon team no less, quickly ignited the tech community. What went wrong?

The Prime Video team needed a tool to monitor video stream quality issues. Due to the massive number of videos, the tool needed strong scalability.

Initially, this work was done by a set of distributed components orchestrated by AWS Step Functions (a serverless orchestration service, AWS Lambda serverless service), which could quickly build a decent monitoring system. But who would have thought that the scaling issue of Step Functions would become the biggest stumbling block?

Specifically, first, for every second of video stream, many concurrent AWS Step Functions were needed, quickly reaching account limits. Second, AWS Step Functions charges users based on state transitions, making it too expensive to use.

Under duress, Prime Video began to consider a monolithic solution to reduce costs and increase application scalability. After repeated trials, the team finally decided to rebuild Prime Video's entire infrastructure.

Amazon concluded in a blog post: "Microservices and serverless components are tools that can work at scale, but whether to use them holistically must be on a case-by-case basis... Migrating services to a monolith reduced our infrastructure costs by over 90% and improved our scalability."

This indicates that, at least in the video monitoring domain, a monolithic architecture produced higher performance and better cost efficiency than microservice and serverless-dominated approaches.

DHH (David Heinemeier Hansson), founder of Ruby on Rails, who consistently advocates moving off the cloud and opposing microservices, pointedly remarked: Even Amazon itself thinks microservices or serverless are "nonsense."

Not Just Google and Amazon Abandoning Microservices

In recent years, countless small and medium-sized teams have chosen to abandon microservices after weighing the pros and cons.

Uber is one such example. Previously, Uber built microservices to complete very small requirements or functions, even having many microservices built and maintained by a single person. The existence of these microservices brought more challenges to Uber, such as monitoring, testing, continuous integration/continuous delivery (CI/CD), and Service Level Agreements (SLA).

After falling into the "trap" of microservices, the Uber team planned new services more thoughtfully: no longer just completing one thing, but serving a business function, maintained by 5-10 engineers. They also drew a painful lesson: choose the right solution at the right time to build products.

Managed by Q, an office management software company, had its application deployed as a Django monolith on ECS. To catch up with modern development practices, they switched to a microservices architecture. But they quickly found that each new service added infrastructure, and developing a feature across multiple services required more work.

As a result, two years after switching to microservices, they began merging them. Some microservices were merged back into the monolith, while others were combined into larger services. They also learned from experience: microservices should not be taken for granted as the correct choice.

They initially wanted to use microservices as a silver bullet, but the engineering overhead was too great, leading to diminishing returns. The biggest problem for the companies mentioned above was implementing dozens of microservices in an environment with only 20 engineers, which felt like using a sledgehammer to crack a nut.

The False Prosperity of Microservices: From Monolith to "Distributed Monolith"

As more and more cases of "escaping microservices" occur, people are re-examining, and even criticizing, the "microservices" concept first proposed in 2005.

For example, the Google engineers mentioned at the beginning of the article listed the shortcomings of the current microservice approach in their paper, including:

Performance: Serializing and sending data over the network to remote services can harm performance and even lead to bottlenecks if the application becomes complex enough.
Understanding and Tracing: It is notoriously difficult to trace bugs in distributed systems, given the many interactions between microservices.
Management Issues: Different parts of an application can be updated on their own schedule, which is considered an advantage. But this leads to developers having to manage a large number of binaries, each with its own release schedule. Good luck running end-to-end tests with locally running services.
Fragile APIs: A key aspect of microservice interoperability is that once a microservice is established, its API cannot change without breaking any other microservices that depend on it. Therefore, APIs can only be extended with more APIs, leading to bloat.

This seems to align with the concept of "over-engineering" mentioned earlier.

In fact, when some teams split a centralized monolithic application into microservices, they often don't start by establishing a domain model. Instead, they just split a package from the original monolithic application into multiple "microservice" packages according to business functions. However, the code within these "microservices" is highly coupled, and logical boundaries are unclear, essentially remaining a monolithic architecture pattern. Thus, they only achieved "surface prosperity" without achieving the desired results.

As Sam Newman mentioned in his book "Building Microservices," architecture needs to meet certain prerequisites, otherwise it might be over-designed.

Google Proposes a New Kind of Microservice

There's a view in the industry that still supports microservice architecture: microservices require a matching scale. "If you know you'll eventually do this at a certain scale, you might build it differently at the start. But the question is, do you know how to do it? Do you know at what scale you'll operate it?"

In many applications, especially internal ones, development costs often outweigh runtime costs.

Google's paper precisely addresses this problem: separating programming patterns and deployment patterns makes it easier for developers, while allowing the runtime infrastructure to "bet" on the most cost-effective way to run these applications.

As Google researchers wrote: "By delegating all execution responsibilities to the runtime, our solution is able to provide the same benefits as microservices, but with higher performance and lower cost."

A Year of Infrastructure Rethinking

This year has seen a lot of infrastructure rethinking, and microservices are not the only bubble being questioned. For example, cloud computing has also come under scrutiny.

In June, 37signals, the company that runs both Basecamp and Hey email applications, purchased a batch of Dell servers and moved off the cloud, breaking decades of tradition where everyone discarded the old and embraced the new.

David Heinemeier Hansson explained in a blog post: "This is common cloud marketing rhetoric: it will get much easier, requiring almost no one to operate." "(But the truth is) I've never seen it. Neither has 37signals, nor have people from large internet companies. The cloud has some advantages, but it typically doesn't reduce operations staff."

Of course, DHH is a race car driver and might prefer bare metal. But many supporters are willing to back this bet. Later this year, Oxide Computers launched their new systems, hoping to provide similar services for others: running cloud workloads, but more cost-effectively in their own data centers.

Furthermore, this sentiment seems stronger with impending cloud bills. In 2023, FinOps became a noticeable phenomenon as more organizations turned to companies like KubeCost to control their cloud spending. News that a DataDog customer received a $65 million cloud monitoring bill also startled countless people in the industry.

Perhaps for an organization generating billions in revenue, a $65 million observability bill might be worth it. But for architects, facing the technical debt from engineering decisions made over the past decade, perhaps it's time to make some adjustments.

Of course, microservices are no exception.

References: