N. A. Nordbotten and T. Skeie (2007)
A Routing Methodology for Dynamic Fault Tolerance in Meshes and Tori
In: International Conference on High Performance Computing (HiPC), ed. by Srinivas Aluru, Manish Parashar, Ramamurthy Badrinath, Viktor K. Prasanna, pp. 514--527, Springer-Verlag. LNCS 4873
This paper proposes a fully distributed fault-tolerant routing methodology for tori and meshes. A dynamic fault-model is supported, enabling the network to remain fully operational at all times. Contrary to most previous proposals that support a dynamic fault-model, the methodology is able to tolerate concave fault regions, thereby avoiding disabling healthy nodes in most practical scenarios. The methodology provides high network performance through the use of adaptive routing and provides graceful performance degradation in the presence of faults.
