Runtime Performance Management in the Data Center
Modern data-centers must support sprawling, interconnected software systems subject to constant revision. The complexity of software interactions and the steady stream of new updates make the exact runtime conditions of software difficult to predict and as consequence, difficult to optimize. Cloud computing exacerbates the complexity of this problem by expanding the diversity of codes potentially involved in these interactions. In light of this growing software complexity, maximizing system performance requires runtime mechanisms to continuously measure workload behavior and adapt to changing workload conditions. In this tutorial, technical leaders from Alibaba, Oracle, and Twitter, discuss how they combine hardware and runtime systems such as Java and PHP to meet performance needs today, while highlighting where today’s systems fall short. In addition, technical leaders from leading processor vendors such as Intel and ARM will review both current and emerging silicon capabilities to support the growing needs of runtime systems.
Saturday, June 2nd, 2018Hotel InterContinental Los Angeles Downtown
|1:30 - 1:33||Introduction||Chris Wilkerson, Kingsum Chow, Karl Taht||Alibaba|
|1:33 - 1:55||Platform Auto Configuration and Tuning||Karl Taht||University of Utah|
|1:55 - 2:20||Accelerating server-side Java* Performance||Sandhya Viswanathan||Intel|
|2:20 - 2:50||Runtime Resource Monitoring and Management for Performant and Efficient Arm Systems||Andrea Pellegrini||ARM Architecture|
|2:50 - 3:30||Extreme Scaling with Alibaba JDK||Sanhong Li||Alibaba|
|3:30 - 3:55||Break / Discussion||--||--|
|3:55 - 4:20||Twitter's Quest for a Wholly Graal Runtime||Chris Thalinger|
|4:20 - 4:45||AutoConf: Understanding and Automatically Adjusting Performance-related Software Configurations||Hank Hoffman||University of Chicago|
|4:45- 5:00||Wrap-up, Drawing Prizes|
Karl Taht, University of Utah
1:33 - 1:55
While specialized tasks ranging from graphics, to networking and graph processing, and machine learning has motivated the mainstream use of hardware accelerators and FPGAs, general purpose units will always be required to run new, unoptimized software. Platform Auto Configuration and Tuning (PACT) is a feedback mechanism which dynamically tunes hardware features in real-time. We focus not only the algorithmic design for optimizations, but also investigate the challenges of implementation on a real system.
Karl Taht is a 3rd year PhD student at the University of Utah. As part of the Utah Architecture Lab, Karl's expertise is primarily memory subsystem optimizations motivated by machine learning algorithms.
1:55 - 2:20
Sandhya Viswanathan, Intel
In this session we will talk about architectural advancements in Intel® Xeon® Scalable processor Intel® Xeon® platforms that benefit Java* applications, describe recent software optimizations in the Java* ecosystem spanning support for larger vectors with enhanced vectorization, optimized math libraries, cryptography and compression acceleration, compact strings, new APIs with associated optimized implementation and many more features that help Big Data, Cloud, Micro services, HPC and FSI applications. We will also discuss new exploratory projects like Java Vector API that we are participating in the community towards higher performant scalable server side Java*.
Sandhya Viswanathan is a Senior Staff Software Engineer at Intel with 25 years of industry experience in compilers and software development tools. She joined Intel in 2008 and has been working on Java platform optimizations for Intel server, micro-server and IOT platform. She is the technical lead for Java OpenJDK Engineering team focusing on Java JIT compiler, runtime and GC optimizations.
Andrea Pellegrini, ARM Architecture
2:20 - 2:45
We will present Arm’s work on enabling performance monitoring and resource control/allocation for high performance, power efficient Arm datacenters. This will cover unique challenges due to the need to support a wide range of Arm systems and uarchitectures, and focus on HW dynamic tuning to achieve both high performance and best-in-class energy efficiency. Additionally, we will discuss how we can deploy performance features that expose microarchitecture details to application developers without opening security vulnerabilities. Finally, we will explore tradeoffs and solutions for maintaining high system throughput and avoid noisy neighbors without compromising customer resource demands and/or fairness.
Andrea is a Principal Engineer at Arm, where he leads the Infrastructure Line of Business performance team. He received a PhD from the University of Michigan, Ann Arbor in 2013. Andrea has worked on hardware security, performance modelling, virtualization architecture, and he is currently leading the effort to analyze and optimize performance for enterprise–class Arm systems.
Sanhong Li, Alibaba
3:30 - 3:55
On Nov 11, 2017, Alibaba smashed its own online transaction record once again. The peak throughput of 325,000 transactions per second is 85% more than that in 2016. Most of these business critical transactions are handled by hundreds of thousands of Java applications, written in more than a billion lines of code. Alibaba JDK (AJDK) is the engine that runs these eCommerce applications in extreme scaling. We customized OpenJDK since 2011 to run our Java applications on more than 100,000 servers. In this talk, we will discuss how we tailor the OpenJDK for our needs and describe how characterization of our Java workloads would guide us to do these improvements. While your workloads are different, the thought process we went through could be useful for you.
Sanhong Li is a JVM lead at Alibaba. He has been working on Java since 2004, where he began at Intel Asia-Pacific R&D Lab implementing JSR135. He joined IBM in 2008 to improve runtime security on OSGi platform. He progressed to working on the development of IBM's Java Virtual Machine in 2010, where he led a project to develop multi-tenancy technology for the JVM. In 2014, he joined Alibaba to lead the development of Alibaba JDK, a customized OpenJDK version. Sanhong Li has presented at local and international conferences such as JVM language summit, JavaOne and QCon. He co-leads Shanghai Java User Group and co-chairs APMCon. He has authored over 10 technical papers and a number of technical patents.
Chris Thalinger, Twitter
3:55 - 4:20
Twitter is a massively distributed system with thousands of machines running thousands of JVMs. In any similar big system a small change in performance and CPU utilization is multiplied thousandfold and results in big savings. Electricity costs, cooling costs, and possibly reduction of server farm size. One way to improve Java performance and reduce CPU utilization is to simply generate better machine code. Simply is obviously not trivial but doable. Twitter is going down that road and experimenting with Graal to generate better code and reduce cost.
Chris Thalinger is a software engineer working on Java Virtual Machines for over 13 years. His main expertise is in compiler technology with Just-In-Time compilation in particular. Initially being involved with the CACAO and GNU Classpath projects, the focus shifted to OpenJDK as soon as Sun made the JDK open-source. Ever since Chris has worked on the HotSpot JVM at Sun, Oracle and now at Twitter.
Hank Hoffman, University of Chicago
4:20 - 4:45
Modern software systems are often equipped with hundreds to thousands of configuration options, many of which greatly affect performance. Unfortunately, properly setting these configurations is challenging for developers due to the complex and dynamic nature of system workload and environment. In this paper, we first conduct an empirical study to understand performance-related configurations and the challenges of setting them in the real-world. Guided by our study, we design a systematic and general control-theoretic framework, AutoConf, to automatically set and dynamically adjust performance-related configurations to meet required operating constraints while optimizing other performance metrics. Evaluation shows that AutoConf is effective in solving real-world configuration problems, often providing better performance than even the best static configuration developers can choose under existing configuration systems.
Henry (Hank) Hoffmann has been an Assistant Professor in the Department of Computer Science at the University of Chicago since January 2013 where he leads the Self-aware computing group (or SEEC project) and conducts research on adaptive techniques for power, energy, accuracy, and performance management in computing systems. He received the DOE Early Career Award in 2015. He has spent the last 17 years working on multicore architectures and system software in both academia and industry. He completed a PhD in Electrical Engineering and Computer Science at MIT where his research on self-aware computing was named one of the ten "World Changing Ideas" by Scientific American in December 2011. He received his SM degree in Electrical Engineering and Computer Science from MIT in 2003. As a Masters student he worked on MIT's Raw processor, one of the first multicores. Along with other members of the Raw team, he spent several years at Tilera Corporation, a startup which commercialized the Raw architecture and created one of the first manycores. His implementation of the BDTI Communications Benchmark (OFDM) on Tilera's 64-core TILE64 processor still has the highest certified performance of any programmable processor. In 1999, he received his BS in Mathematical Sciences with highest honors and highest distinction from UNC Chapel Hill.