27.8.23

Software Engineering at Google

I just concluded Software Engineering at Google – Lessons Learned from Programming Over Time (Titus Winters, Tom Manshreck, Hyrum Wright, 2020, 599 pages), and it was a long and surprisingly entertaining account of the main technical challenges behind the ascension of Google to being what it is today.

Just imagine a corporation growing from a few engineers in the late 1990’s to having over 30.000 engineers, maintaining and improving an impressive repository of over 2 billion lines of code!

In a sense, Google’s story is unique. The scientific background of the founders, and technical ingenuity of its incredible pool of talent, together with some quite unique business practices and value-based management (not covered in this book, but mentioned in some other books on this blog), made possible this colossus of a company that we see today.

This book is but a reflection of the qualities of the company, which is open to talk publicly about its technical innovations in detail, and even the weaknesses and failures committed along the way.

In the meanwhile, the findings and technical advances Google did in software engineering have to a great extent shaped the computing industry as a whole. And what are the main topics involved in the technical history of Google, you may ask? Well, there are plenty.

To start with, there is the simple and yet powerful distinction made in the book between programming (creating a piece of code that works here and now) and software engineering (building code that can last, and adapt, on long term; decades, in Google’s case). This distinction is spot on, because it permeates most of the main challenges that come up when scaling.

Unit testing, for example, is something Google learned and started adopting in 2005, to basically give confidence in further changes over the growing code base for the Google Web Server. Code review, and all the internal tooling created for it is another great asset in the company, shaping its culture from the beginning, and properly scaled-up over time.

Building, and the continuous integration (CI) of changes is yet another area where massive tooling were added over time, allowing for performant distributed builds of parts of their large code base.

By the way, the choice to keep the code in just one repository (Google’s famous Monorepo), and all the work around managing code dependencies is also described in great details in the book.

In addition, very interesting discussions over the evolution of CaaS (Compute as a Service) brings great food for thought, with the many trade-offs between the multiple options for “sourcing hardware”. From running code on local workstation, to managing (or not) virtual machines and containers, to serverless architecture.

Finally, the multiple effects of Hyrum’s law is another very interesting aspect of this book. Basically, the law states that “any observable state of a system may come to be relied upon”. At the scale Google operates, this plays a significant role, not only technically with the multiple challenges associated, but also business-wise, since different systems at Google, whose idiosyncrasies some clients may come to depend on, may also need to evolve over time, in order to keep up with the pace of technology. I find, therefore, this book to be a fascinating insight over a successful technological evolution for a company that is a daily part of billions of people’s lives, including mine.