Wednesday, April 25, 2018

State of The Art - SPARC M8 & Solaris

[SPARC M8 Socket, Courtesy Oracle Datasheet]

State of The Art - SPARC M8 & Solaris

Abstract:

The SPARC processor was developed by Sun Microsystems and had existed since the exit of major systems manufacturers from the Motorola 68K environment. Multiple manufacturers had always existed in this environment, to provide to consumers multiple supply chains in this commodity hardware market. The migration from 32 bit to 64 bit computing in SPARC occurred decades ago, as current computing systems still wrestle with the complexities. The next increment of hardware from Oracle was released September 17, 2017 - once again maintaining the fastest CPU sockets and SMP systems in the world. It is past 7 months since the release, it is long overdue for an article.

[SPARC M8 Block diagram, courtesy Oracle T8/M8 Architecture White Paper]

Block Diagram Layout

A logical block diagram can help the reader gain a better understanding of the heart of the system. That author of the Next Platform publication speculates, how four S7 processors may have been merged onto the same die or silicon, using a switched interconnect encapsulated in silicon.

Overall, it does appear to look like a fully switched quad-socket 32 core 256 thread SMP system, burned onto a single piece of silicon. This is a pretty astounding effort, any way one chooses to examine the diagram!

[SPARC M8 Processor with Oracle & Sun Microsystems SPARC family, Courtesy Oracle]

Architecture Changes

In general, processors have undergone many changes, mostly regarding consolidation of functions. As time progressed, the pattern of consolidating resulted in lower part counts, higher performance, higher reliability. Components were designed, sometimes combined, sometimes re-separated in various architectures:
  • Integer Processing Units
  • Floating Point Processing Units
  • Memory Management Units
  • Processor Socket Cross-Bar Switch
  • 1st level cache
  • 2nd level cache
  • 3rd level cache
  • Network Interfaces
  • Video Processing
  • Encryption Units
  • De-encryption Units
  • Decompression Units
  • [DAX] Database Analytics Acceleration Units
The M8 is no exception... being a server & cloud oriented, video has been de-emphasized for some time & no longer developed actively by Oracle. Previous processors had integrated 10 Gig Ethernet (i.e. Sun Microsystems SPARC T2+) or attempted to integrate Infiniband (i.e. Oracle SPARC "Sonoma" S7) - but Oracle had decided to de-emphasize these commodity features. Oracle, being primarily a software company, concentrated on the most "bang for the buck" with faster cores, encryption, decryption, decompression, and database acceleration units.


[SPARC Solaris Layered Availability Infrastructure, courtesy Oracle SPARC T8/M8 RAS datasheet]

Socket Availability Architecture

The SPARC M8 Processor continues to play a vital role in the Availability Architecture of Solaris systems. With the innermost 4 layers of the onion, features like: parity, error correction, reissue on error in hardware, internal power management, voltage scaling, frequency scaling. Decades of design continue to exhibit robust features unavailable in commodity CPU architectures.


[SPARC Solaris DIMM Sparing Infrastructure, courtesy Oracle SPARC T8/M8 RAS datasheet]

Memory Availability Architecture

Normally, error correction exists on DIMM's chosen for the SPARC Servers. If one of the DIMM's repeatedly experiences errors on a SPARC M8, Solaris will initiate the process of retiring the faulty DIMM. The hardware can interleave memory across all 16 DIMM's, for maximum throughput, and can also facilitate interleaving memory requests across 15 DIMM's, in order to facilitate a DIMM retirement. During retirement, there is a slight decrease in the amount of available memory.


Virtualization Architecture

The SPARC M8 processor works in combination with OpenFirmware in order to implement the lowest levels of virtualization.

Physical Domains, which allows for electrically disconnected domains, is facilitated by the LOM (Lights Out Management) card, allowing for an OpenBoot instance to operate on each physical domain (note: this feature is only available on higher end systems.) There are no particular features within Physical Domains, which specifically benefit from the M8 processor.

Within the Physical Domain, a Hypervisor program is run in the OpenFirmware, allowing for virtual instances of OpenBoot to run for different groups of devices (i.e. console, memory, pci-cards slots, network cards, disk devces.) Different OS's may run inside of these logically separated OpenBoot instances. A Logical Domain can be live-migrated to a different Physical domain or even a different SPARC chassis, taking advantage of M8 enbedded engines at wire-speed such as Encryption, Decryption, and Decompression.

Zone virtualization is also available under Solaris, regardless of underlying architecture. The SPARC M8 brings hardware accelerated encryption, decryption, and uncompression for activities like live migration, similar to the benefits to Logical Domains.



Recent Processor Comparisons

The SPARC M8 processor has certainly advanced in performance over the past few generations of silicon, both from Oracle and Fujitsu. The Register was kind enough to put together a simple chart, illustrating the differences between each of the recent processors, each processor being the fastest socket of it's kind during it's release.
 
[Multivendor SPARC Comparison Chart, courtesy The Register]

Performance Comparisons

As The Register had pointed out, the new core's "wider instruction pipeline... coupled with the increase in clock frequency and the larger L1 code cache" sees to wring out the most recent performance gains from this silicon. These features help to define the 5th generation S5 core, located in the former [1/4 sized] S7 and now M8 processors. Boosting the clock rate nearly 25% from slightly over 4GHz M7 to an industry "unheard of" 5 GHz clearly is beneficial.

[SPARC M8 Security Features, courtesy Oracle Datasheet]

Security Comparisons

The Register also reported important enhancements in security, "it can perform hardware-accelerated cryptographic functions – such as AES, RSA and SHA-512 – at twice the speed of the M7." In addition, the SPARC M8 is the only processor in the industry which supports SHA-3 in silicon. With functions like Random Number Generators and [En|De]Crypto operations in hardware, there are no excuses for lack of security in a data center since there is virtually no cost or processing overhead. Crypto functions operate fast enough  to occur at wire-speed for network with virtually unnoticable CPU overhead, reserving the rest of the system resources for processing business/military data.


[SPARC M8 RDBMS Features, courtesy Oracle Datasheet]

Database & Analytics Comparisons

The massive thread count with 32x cores, 8x threads per core, 32x floating point units are not unusual for the SPARC M8, although these are already unusually large in comparison to other slower commodity & proprietary processors on the market. These significantly boost application & database performance to non-competitive proprietary Intel and other processors.

The SPARC M8 advances on the former "State of the Art" SPARC M7 with larger caches & faster processing for all workloads, including Database & Analytics.

The real secret of this processor lies in the Next-Generation Data Analytics Accelerator (DAX) engines. In-line decompression enables 200%+ increase in data held in memory for processing without performance impact at 120+ GB/second. Performance improvements of up to 700% for in-memory query acceleration is demonstrated! Oracle number processing is now managed through accelerators, short-cutting software routines to directly pass processing of numbers larger than 16 bytes to the hardware for all database workloads, resulting in 1000% increases in performance! Java 8 Streams API can be used to work with data-sets directly via the DAX, eliminating iteration operations [since they are implied] providing 2000% increase in performance on some operation types!

Conclusion

There are no competing processors in the same Integer or Floating Point Processing Class as the Oracle SPARC M8 processor. Full security, with nearly any cryptography library of choice, is no longer a burden with the SPARC M8... there are no security compromises with this processor, outrunning any workload on any similarly configured socket-to-socket configuration, even with full encryption enabled under SPARC. Once one tries to compare database processing or java application  processing, it is quickly observed that many industry players may be a decade or more away in performance of similar workloads. All of this, with the ability to virtualize, with no virtualization performance or monetary tax, makes this an exceptional core to any system in any data center. It will be difficult for any hardware vendor to build any kind of superior next generation processor in the first quarter of the 21st century... the SPARC M8 is the apex of modern day computing.

Tuesday, April 24, 2018

State of The Art - SPARC S7 & Solaris

[SPARC S7 Processor, Courtesy Oracle Data Sheet]

State of The Art - SPARC S7 & Solaris

Abstract:

The SPARC processor was developed by Sun Microsystems and had existed since the exit of major systems manufacturers from the Motorola 68K environment. Multiple manufacturers had always existed in this environment, to provide to consumers multiple supply chains in this commodity hardware market. The migration from 32 bit to 64 bit computing in SPARC occurred decades ago, as current computing systems still wrestle with the complexities. Oracle ceased producing horizontal scaling CPU sockets once purchasing Sun Microsystems. In June 2016, Oracle decided to re-enter the commodity market with the SPARC S7. NetMgt had published an article, but it was lost, and it was decided it was time to re-publish it again.

[Labeled Die Photo, courtesy Oracle Hot Chips 27 Presentation]

S7 "Sonoma" Floor Plan:

It can be clearly seen that the new die photo shows the use of 2x 4 Core Clusters. The cores are nearly the same 4th generation S4 cores, bundled in it's 32 core sister M7 processor. Glueless Coherency links were bundled, to scale an S7 system from 8 to 16 cores, with little external circuitry. With DDR4 memory interfaces on-chip, latency is cut down by eliminating external chips. Database Analytics Accelerators have been included, although not as many per-core as with the larger M7.

Close to 20% of the space used by an Infiniband network interface. The last time integrating network on-board silicon occurred was with the UltraSPARC T2+ (which integrated 10 Gig Ethernet) - but it was not released to customer facing production system.This was a significant disappointment to NetMgt, since an new Blade system with an Infiniband backplane allowing scalability to thousands of sockets would have been a welcome addition for cloud computing.


[Courtesy, Oracle's SPARC S7 Servers Technical Overview]

Architectural Changes:

The S7 is a smaller processor, returns to 8 cores on a die, similar to the T4 from back in 2011, but even outperforms processors from 2013 with higher cache, clock speed, and 4th generation core. It was designed to be a competitive socket (in price/performance) to commodity proprietary CPU's (instead of being the fastest performing socket in the market.)


[Dual-Die Photo, courtesy Oracle Hot Chips 27 Presentation]

Dual Socket Configuration:

The glueless dual-socket configuration allows for outstanding communication speeds between sockets. It is apparent that more bandwidth may be available over PCIe links than over the un-exposed Infiniband. Without the Infiniband exposed, the S7 looks more like an UltraSPARC IIIi.


[Infiniband Performance, courtesy Oracle Hot Chips 27 Presentation]

Unexposed Infiniband Performance:

The unexposed infiniband offered the possibility of significant packet performance improvement, over a PCIe card, even under high load. Note, the red Sonoma IB line mostly maintaining between 20-60 Millions of Packets per second.

A blade chassis connecting all minimal Sonoma "S7" blades (holding memory & 2 sockets) connecting back to infiniband storage in an MPP environment scaling across thousands of nodes could have been an amazing entry into Cloud Computing or Super Computing environments.

Conclusions:

While  NetMgt welcomes the new low-cost "Sonoma" S7 options, we see the form-factor produced as a "game changer" unrealized, by not placing the chip in a socket, exposing Infiniband, and into a chassis form factor which could most leverage it's strength. Such a socket could have easily replaced & out-performed the Intel based Storage subsystem Oracle's Infiniband native Engineered Systems. A such a socket in a blade chassis would have filled-out a gaping hole in Oracle's systems portfolio. Furthermore, the cost of memory is so high in the chassis that the cost difference between SPARC "Sonoma" S7 and SPARC M7 are marginal if selecting a chassis to perform LDom virtualization. It is a beautiful chip, for what it is, but a great opportunity lost.

Thursday, April 19, 2018

HBA Firmware Update Required


HBA Firmware Update Required

Abstract:

When an OS update delivers firmware down to HBA cards, it is occasionally required to perform a manual link reset to enable the firmware This process describes it.

Symptoms:

When  rebooting Solaris 11, one may see the following WARNING messages to the console or /var/adm/messages log
Boot device: disk  File and args:
SunOS Release 5.11 Version 11.3 64-bit
Copyright (c) 1983, 2018, Oracle and/or its affiliates. All rights reserved.
WARNING: emlxs0: Firmware update required.
        (To trigger an update, a manual HBA or link reset using fcadm or emlxadm is required.)
WARNING: emlxs1: Firmware update required.
        (To trigger an update, a manual HBA or link reset using fcadm or emlxadm is required.)
WARNING: emlxs2: Firmware update required.
        (To trigger an update, a manual HBA or link reset using fcadm or emlxadm is required.)
WARNING: emlxs3: Firmware update required.
        (To trigger an update, a manual HBA or link reset using fcadm or emlxadm is required.)

Resolution:

Review the available devices to be reset
sun1801/root# fcinfo hba-port | grep "Device Name"
        OS Device Name: /dev/cfg/c5
        OS Device Name: /dev/cfg/c6
        OS Device Name: /dev/cfg/c10
        OS Device Name: /dev/cfg/c11


and perform the reset
sun1801/root# fcinfo hba-port | nawk '
 /Device Name/ { print "luxadm -e forcelip",$NF }' | ksh -x
+ luxadm -e forcelip /dev/cfg/c5
+ luxadm -e forcelip /dev/cfg/c6
+ luxadm -e forcelip /dev/cfg/c10
+ luxadm -e forcelip /dev/cfg/c11