Due to the complexity and the massive scale of modern warehouse scale computers (WSCs), it is challenging to quantify the performance impact of individual microarchitectural properties and the potential optimization benefits in the production environment. As a result of these challenges, there is currently a lack of understanding of the microarchitecture-workload interaction, leaving potentially significant performance on the table. This paper argues for a two-phase performance analysis methodology for optimizing WSCs that combines both an in-production investigation and an experimental load-testing study. To demonstrate the effectiveness of this two-phase approach, and to illustrate the challenges, methodologies and opportunities in optimizing modern WSCs, this paper investigates the impact of non-uniform memory access (NUMA) for several Google’s key web-service workloads in large-scale production WSCs. Leveraging a newly-designed metric and continuous large-scale profiling in live datacenters, our production analysis demonstrates that NUMA has a significant impact (10-20%) on two important web-services: Gmail backend and web-search frontend. Our carefully designed load-test further reveals surprising tradeoffs between optimizing for NUMA performance and reducing cache contention.

By Lingjia Tang, Jason Mars, Xiao Zhang, Robert Hagmann, Robert Hundt and Eric Tune. In Proceedings of the 2015 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), 2013.