Background Radiation and Sun's E-Cache Crisis of 1999
Abstract:
As the density of circuits increases, features get smaller; as frequencies increase, voltages get lower. These trends combine to reduce the amount of charge used to represent a bit, increasing the sensitivity of memory to background radiation. For example, the original UltraSPARC-I processor ran at 143MHz and had a 256KB e-cache (external cache). The cache design used simple byte parity to protect the data, which was sufficient as the amount of charge used to hold a bit was large enough that an ionizing particle would drain off only a small amount, not enough to flip a bit.
When this design was scaled up in the UltraSPARC-II processor to run at 400MHz with an 8MB e-cache, however, the amount of charge used to hold a bit was so small that background radiation would easily flip bits, producing on average one flipped bit per processor per year. While that might not seem like a high rate, a customer with 12 systems of 32 processors each would on average experience one failure a day. This is what led to Sun's infamous e-cache parity crisis of 1999...
For the story and Steve Chessin's contribution to the solution, see the ACM [HTML|PDF] publication on this. His Sun/Oracle blog retaining this article from 2010 August.
No comments:
Post a Comment