In this report we examine the relevance of the benchmark study of the Splash-2 suite to building shared memory parallel machines. A benchmark suite or its study could have several different claims to its importance. For example, the application benchmarks in the NAS suite represents a sample of the workload of the machines at NASA Ames Research Center. The NAS application benchmarks are indicative of the types of actual data movement and computation required in state-of-the-art fluid dynamics application codes. It therefore serves the purpose of benchmarking a new parallel machine based on the needs of the application programmers mainly working on aerospace problems. The purpose of the Splash-2 application study is not to propose a set of benchmarks that could be used as a litmus test for evaluating new machines. Instead, the Splash-2 study claims to quantitatively characterize the programs that belong to the suite in terms of certain machine-independent fundamental properties, which are key in providing insight into the architectural requirements of the programs. The study also claims to propose a methodology for studying application codes with the sole purpose of designing new machines. In this report, we examine whether the Splash-2 study is successful in its purported claims.
Our basic observations include that the Splash-2 study is successful in its working set analysis and does a very thorough analysis of the cache requirements for the Splash programs. However, it has some basic flaws in its analysis of load balance and communication properties of the programs. More specifically, it has erred in at least the following ways. Certain metrics used in the study such as the number of bytes communicated per instruction does not characterize the communication requirements of the programs. In certain ways, it is even misleading since two different programs with very different behavior/performance will share the same data point. At other places, the study mixes a number of independent parameters, which prevents us from determining the actual reason for some observed program behavior. As a result, the study falls short of providing a fail-proof methodology of examining parallel applications.
In sections 2 and 3, we discuss some of the shortcomings of the study. In section 4, we propose an alternate model/methodology for modeling the intrinsic properties of parallel programs.