I just
concluded Software Engineering at Google – Lessons Learned
from Programming Over Time
(Titus Winters, Tom Manshreck, Hyrum Wright, 2020, 599 pages), and it was a
long and surprisingly entertaining account of the main technical challenges
behind the ascension of Google to being what it is today.
Just
imagine a corporation growing from a few engineers in the late 1990’s to having
over 30.000 engineers, maintaining and improving an impressive repository of
over 2 billion lines of code!
In a sense,
Google’s story is unique. The scientific background of the founders, and
technical ingenuity of its incredible pool of talent, together with some quite
unique business practices and value-based management (not covered in this book,
but mentioned in some other books on this blog), made possible this colossus of
a company that we see today.
This book
is but a reflection of the qualities of the company, which is open to talk
publicly about its technical innovations in detail, and even the weaknesses and
failures committed along the way.
In the
meanwhile, the findings and technical advances Google did in software
engineering have to a great extent shaped the computing industry as a whole.
And what are the main topics involved in the technical history of Google, you
may ask? Well, there are plenty.
To start
with, there is the simple and yet powerful distinction made in the book between
programming (creating a piece of code that works here and now) and software
engineering (building code that can last, and adapt, on long term; decades, in
Google’s case). This distinction is spot on, because it permeates most of the
main challenges that come up when scaling.
Unit
testing, for example, is something Google learned and started adopting in 2005,
to basically give confidence in further changes over the growing code base for
the Google Web Server. Code review, and all the internal tooling created for it
is another great asset in the company, shaping its culture from the beginning,
and properly scaled-up
over time.
Building,
and the continuous integration (CI) of changes is yet another area where
massive tooling were added over time, allowing for performant distributed
builds of parts of their large code base.
By the way,
the choice to keep the code in just one repository (Google’s famous Monorepo),
and all the work around managing code dependencies is also described in great
details in the book.
In addition,
very interesting discussions over the evolution of CaaS (Compute as a Service)
brings great food for thought, with the many trade-offs between the multiple
options for “sourcing hardware”. From running code on local workstation, to
managing (or not) virtual machines and containers, to serverless architecture.
Finally, the
multiple effects of Hyrum’s law is another very interesting aspect of this
book. Basically, the law states that “any observable state of a system may come
to be relied upon”. At the scale Google operates, this plays a significant
role, not only technically with the multiple challenges associated, but also
business-wise, since different systems at Google, whose idiosyncrasies some
clients may come to depend on, may also need to evolve over time, in order to
keep up with the pace of technology. I find, therefore, this book to be a
fascinating insight over a successful technological evolution for a company
that is a daily part of billions of people’s lives, including mine.