Thursday, April 11, 2013

Solaris: Massive Internet Scalability


[SPARC processor, courtesy Oracle SPARC T5/M5 Kick-Off]
 Solaris: Massive Internet Scalability
Abstract:
Computing systems started with single processors. As computer requirements increased, multiple processors were lashed together, using technology called SMP (Symmetric Multi-Processing) to add more computing power into a single system, breaking up tasks into processes and threads, but the transition to multi-threaded computing was a long process. The lack of scalability for some problems produced MPP (Massively Parallel Processing) platforms, lashing systems together using special software to load-balance jobs to be processed. MPP platforms were very difficult to program general purpose applications, so massively Multi-Core and Multi-Threaded processors started to appear. Oracle recently released the SPARC T5 processor and systems - producing an SMP platform scalable with massive sockets, cores, and threads into a single chassis - leveraging existing multi-threaded computing software, reducing the need for MPP in real-world applications, while placing tremendous pressure upon the Operating System layer.

[SPARC logo, courtesy SPARC.org]
SPARC Growth Rate:
The SPARC processors started a growth rate, with a movement to massively threaded software.
SPARCCoresGHzThreadsSocketsTotal-CoresTotal-Threads
T181.4321832
T281.6641864
T2+81.664432256
T3161.6128464512
T48364432256
T5163.612881281024
M563.648321921536

The movement to massively threaded processors meant that applications needed to be re-written to take advantage of the new higher throughput. Certain applications were already well suited for this workload (i.e. web servers) - but many were not.

[DTrace infrastructure and providers]
Application Challenges:
The movement to massively threaded software, to take advantage of the higher overall throughput offered by the new processor technology, was difficult for application programmers. Technologies such as DTrace were added to advanced operating systems such as Solaris to assist developers and systems administrators in pin-pointing their code hot-spots for later re-write.

When the SPARC T4 was released, there was a feature called "Critical Thread API" in the S3 core, to assist application programmers who could not resolve some single thread bottlenecks. The S3 core could automatically switch into a single-threaded mode (with the sacrifice of throughput) to address hot-spots. The T4 (and T5) faster S3 core was also clocked at a higher rate, providing an overall boost to single threaded workflows over previous processors - even at the same number of cores and threads. The ability to perform out-of-order instruction handling in the S3 also increased speed in the execution of single-threaded applications.

The SPARC T4 and T5 processors finally offered application developers a no-compromise processor. For heavy single-threaded workloads, the SPARC M5 processor was released from Oracle, driving inreasing scales of higher single-threaded workloads, without having to rely upon systems produced by long-time SPARC partner & competitor - Fujitsu.


[Solaris logo, courtesy Sun Microsystems]
Operating System Challenges:

A single system scaling to 192 cores and 1536 threads offers incredible challenges to Operating System designers. Steve Sistare from Oracle discusses some of these challenges in a Part 1 article and solutions in a  Part 2 article. Some of the challenges overcome by Solaris included:
CPU scaling issues include: •increased lock contention at higher thread counts
•O(NCPU) and worse algorithms
Memory scaling issues include:
•working sets that exceed VA translation caches
•unmapping translations in all CPUs that access a memory page
•O(memory) algorithms
•memory hotspots

Device scaling issues include:
•O(Ndevice) and worse algorithms
•system bandwidth limitations
•lock contention in interrupt threads and service threads
Clearly, the engineering team at Oracle were up for the tasks created for them by the Oracle SPARC engineering team. Innovation from Sun Microsystems continues under Oracle. It will take years for other Operating System vendors to "catch up".
Network Management Applications:

In the realm of Network Management, many polling applications used threads to scale, where network communication to edge devices was latency bottlenecked - making the SPARC "T" processors an excellent choice in the carrier based environment.
The data returned by the massively mult-threaded pollers needed to be placed in a database, in a consistent fashion. This offered a problem during the device "discovery" process. This is normally a single-threaded process, which experienced massive slow-downs under the "T" processors - until the T4 was released. With processors like the SPARC T4 and SPARC T5 - Network Management applications gain the proverbial "best of both worlds" with massive hardware thread scalability for pollers and excellent single-threaded throughput during discovery bottlenecks with the "Critical Thread API."

The latest SPARC platforms are optimal platforms for massive Network Management applications. There is no other platform on the planet which compares to SPARC for managing "The Internet".

Monday, April 1, 2013

SunFire 280R: 3737 Days of Uptime

[SunFire 280R, courtesy codigounix.blogspot.com]

SunFire 280R: 3737 Days of Uptime
For anyone who cared & fed systems - 10 Years of Uptime is phenominal.



Background:
This platform was located in Hungary. I say was, since it was relocated.  This video was taken during the last hours of it's relocation, and thus ending 10 years of uptime. This platform was involved in processing outbound internet facing traffic. The last of it's production facing traffic load was removed a number of months earlier.  Mid-way through the video, a short interlude was shown with a Solaris 11 platform and a ZFS kernel dump - note, this was not the tribute platform. in question since ZFS was not around back when this 280R system was first powered up - this is a Solaris 9 platform. The music used in this tribute video was performed by Lana Del Rey - Born to Die.
A short post and discussion on slash-dot surrounding the shutdown and relocation of this system has been noted.