Load balance is a machine specific characteristic, and it is not necessarily a fundamental property of a program. In particular, it does not make sense to ignore the remote memory access costs while examining load balance.
Let us say that processor 1's instruction stream has an instruction mix of 50 million arithmetic operations and 50M remote memory references. Also, let processor 2's instruction stream have an instruction mix of 90M arithmetic operations and 10M remote memory accesses. The program would appear to be load balance with the single cycle memory access assumption. However, in any realistic parallel machine, there will be a considerable load imbalance.
In fact, the single cycle memory access assumption could result in other false characterizations of the program. The Splash-2 suite has an application called Water, which simulates the flow of water molecules in a stream. This application could be implemented in two different ways. We could assign a region of the stream to each processor and have the processor move water molecules from one processor to another when they move from one space to another. Another way of implementing the application would be not to move the molecules, but execute remote memory references to access the particles. With the single cycle memory assumption, both the implementations would have the same performance, whereas in reality one of the versions could be rather inefficient.
The bottom line is that load-balance cannot be expressed as a simple numerical value for a parallel program without referring to any of the hardware properties. A more meaningful approach is to provide a function (or formula) into which a remote access cost can be plugged into to provide the load balance of a program running on a particular machine.