[Previous] [Up] [Next]
Go backward to Support for Efficient Message Passing
Go up to Top
Go forward to VLSI Effort

Interconnect Simulation Effort

 

In the simulation studies reported earlier, we were able to model latency through the interconnect at a very fine granularity using a flit-by-flit model or use a much faster but less accurate estimate based on the mean and variance of latency as measured using the accurate model. Unfortunately, the accurate network model slowed down our simulations by an order of magnitude due to its very fine grained nature, which meant that we could not use it except to calculate the appropriate mean and variance to use in the simpler network model. We found this to be unsatisfying, because the amount of contention and thus the communication latency in a real system is very dependent on the communication patterns and dynamic load on the interconnect, which makes any random distribution prone to large errors. Therefore, we have reimplemented our network simulation model to work on a coarser chunk-by-chunk level of granularity, which essentially aggregates flits (bytes) into chunks whenever possible. Doing so introduces some inaccuracies into the simulation because it is only able to approximate Myrinet's flow control mechanism. However, successive refinement of our chunk-level interconnect model has resulted in an implementation that is much faster than the original flit-by-flit level simulation with little loss of accuracy. As illustrated the relative accuracy and performance of the flit-by-flit and chunk-by-chunk simulation models. In this analysis, we simulated an 8x8 Myrinet mesh connecting 32 processors. We modeled a propagation delay and fall-through latency of 1 cycle and 38 cycles (1 cycle = 10 ns), respectively. We simulated the two models using a synthetic message generator with uniform randomly distributed traffic and arrival times assuming two message sizes, 32 and 256 bytes, the former representing control messages and the latter data messages. We compared the mean and standard deviation of network latency and time spent blocked in the network due to congestion. The measured difference in latency between the two models is less 5%. In return for this slight inaccuracy, the chunk-level simulation is up to 30 times faster than the flit-level simulation. We are continuing to work on refining the network model to improve its accuracy, but the current model is fast enough to replace the simple mean/variance model.

Figure 2 : Comparison of accuracy and efficiency between the flit-level and chunk-level models
 


This work was sponsored by the Space and Naval Warfare Systems Command (SPAWAR) and Advanced Research Projects Agency (ARPA), Communication and Memory Architectures for Scalable Parallel Computing, ARPA order #B990 under SPAWAR contract #N00039-95-C-0018
Back to the Avalanche Project Home Page, or Computer Science Department Home Page.
Feedback to <avalanche@jensen.cs.utah.edu>.

[Previous] [Up] [Next]