Multivariate Data Analysis

I just finished reading the book Multivariate Data Analysis (Kim H. Esbensen & Brad Swarbrick, Frank Westad, 6th edition, 2018, 462 pages), and it was a very interesting reading journey! I have personally being working with MVA in the last 3+ years, on the software side of it, and I have been picking up most of the theory that the book discusses in great detail. But, it is a specially good exercise to study these things in a book, instead of reading bits and pieces in a multitude of article and reference materials.

Actually, being an engineer myself, I have spent the last 15 years working with a wide range of scientific and technical problems, first as a Software Engineer, and then as a Product Manager and moving closer and closer to final client’s management, solution architecting and onboarding. But my sources of information and knowledge, on a daily basis, are typically material found or the internet or produced internally, in articles and other technical documentation – some of which I also write myself. And, my spare time reading goes typically to business-related and other topics of personal interest. So, getting back to reading a technical book was very exciting to me this time, after several years.

And the book starts out with a review over statistics, which also suits me personally very well! Statistics is a topic I have studied many times along my learning career. It fascinates me, but I feel that I always learn a bit more, every time I get back to it – probably because I never worked directly with statistics’ reasoning on my daily tasks in my career so far, or at least not that often. So, needless to say that it was great for me to go through this once more this time, especially with the added element of multivariate statistics.

As the book actually starts by saying, most of what happens around us is multivariate, given the complex of real life systems, instead of being univariate. Therefore the journey in this book is a super interesting one, from the foundations of multivariate statistics, to Theory of Sampling, to data preprocessing technics, to Principal Component Analysis (PCA) and other multivariate methods, to multivariate Calibration (meaning putting MVA models to work on real data collected for a process or product), to Design of Experiments and the thinking around creating a design space that is valid for a specific purpose, to advanced MVA methods and techniques.

To conclude, the last chapter presents a delightful overview over PAT (Process Analytical Technology) and QbD (Quality by Design) initiatives, with their history over the last recent decades, and different aspects of how to get it implemented in practice. So, in short, it was great experience reading and studying this book, and I definitely recommend it to anyone interested in process and how Statistical Process Control can be taken into production, using modern multivariate data modelling and calibration.