The speedup curves for some of the applications deviate quite a bit from the linear speedup case. The reason given for such a behavior is that the data sets used for the simulation runs were small, and therefore the startup costs for these applications occupy a substantial portion of the running time. Also, the Splash-2 researchers claim that with bigger data sets the performance would scale better. An experiment that would support such a claim and also make the methodological process sound would be to separate the time spent in the sequential and parallel sections of the program.
The available concurrency could deviate from the ideal PROCS fold
parallelism either because of sequential work replicated across all
processors, time spent in locks and mutually exclusive sections of the
code, or due to lack of sufficient parallelism throughout the
computation. By separating out the time spent in sequential section
while presenting speedup characteristics of the application, we have a
greater understanding of the parallel program. In particular, a
program that has (0.8*PROCS) tasks in the parallel section could be
differentiated from a program that spends 20% of the time in
sequential code fragments with PROCS fold parallelism in the
parallel phase.