Tuesday, December 21, 2010

Sun Founders Panel 2006

Sun Founders Panel 2006

This video from the Computer History Museum contains an intriguing panel with Sun founders and pioneers Andy Bechtolsheim, Bill Joy, Vinod Khosla, Scott McNealy, and John Gage.

A memorable quote from the session, "Get them on tape before they die." Some of the details surrounding this session are located here. This video is a "must watch" for anyone involved in the technology business.

Technologist David Halko states: predicting the future today requires understanding the past.

Sunday, December 5, 2010

CoolThreads UltraSPARC and SPARC Processors

[UltraSPARC T3 Micrograph]

CoolThreads UltraSPARC and SPARC Processors


Processor development takes an immense quantity of time, to architect a high-performance solution, and an uncanny vision of the future, to project market demand and acceptance. In 2005, Sun embarked on a bold path moving toward many cores and many threads per core. Since the purchase of Sun by Oracle, the internal SPARC road map from Sun had clarified.

[UltraSPARC T1 Micrograph]
Generation 1: UltraSPARC T1
A new family of SPARC processors was announced by Sun on 2005 November 14.
  • Single die
  • Single socket
  • 64 bits
  • 4, 6, 8 integer cores
  • 4, 6, 8 crypto cores
  • 4 threads/core
  • 1 shared floating point core
  • 1.0 GHz - 1.4 GHz clock speed
  • 279 million transisters
  • 378 mm2
  • 90 nm CMOS (TI)
  • 1 JBUS port
  • 3 Megabyte Level 2 Cache
  • 1 Integer ALU per Core
  • ??? Memory Controllers
  • 6 Stage Integer Pipeline per Core
  • No embedded Ethernet into CPU
  • Crypto Algorithms: ???
Platform designed as a front-end server for web server applications. With a massive number of cores, it was designed to provide web-tier performance similar to existing quad-socket systems leveraging a single socket.

To understand the ground-breaking advancement in this technology, most processors were single core, with an occasional dual core processor (with cores glued together through a more expensive process referred to as a multi-chip module, driving higher software licensing costs for those platforms.)

Generation 2: UltraSPARC T2
The next generation of the CoolThreads processor was announced by Sun on 2007 August.
  • Single die
  • Single Socket
  • 64 bits
  • 4, 6, 8 integer cores
  • 4, 6, 8 crypto cores
  • 4, 6, 8 floating point units
  • 8 threads/core
  • 1.2 GHz - 1.6 GHz clock speed
  • 503 million transisters
  • 342 mm2
  • 65 nm CMOS (TI)
  • 1 PCI Express port (1.0 x8)
  • 4 Mageabyte Level 2 Cache
  • 2 Integer ALU per Core
  • 4x Dual Channel FBDIMM DDR2 Controllers
  • 8 Stage Integer Pipeline per Core
  • 2x 10 GigabitEthernet on-CPU ports
  • Crypto Algorithms: DES, Triple DES, AES, RC4, SHA1, SHA256, MD5, RSA-2048, ECC, CRC32
This processor was designed for higher compute intensive requirements and incredibly efficient network capacity. Platform made an excellent front-end server for applications as well as Middleware, with the ability to do 10 Gigabit wire-speed encryption with virtually no CPU overhead.

Competitors started to build Single-Die dual-core CPU's with Quad-Core processors by gluing dual-core processors into a Multi-Chip Module.

[UltraSPARC T2 Micrograph]
Generation 3: UltraSPARC T2+
Sun quickly released the first CoolThreads SMP capable UltraSPARC T2+ in 2008 April.
  • Single die
  • 1-4 Sockets
  • 64 bits
  • 4, 6, 8 integer cores
  • 4, 6, 8 crypto cores
  • 4, 6, 8 floating point units
  • 8 threads/core
  • 1.2 GHz - 1.6 GHz clock speed
  • 503 million transisters
  • 342 mm2
  • 65 nm CMOS (TI)
  • 1 PCI Express port (1.0 x8)
  • 4 Megabyte Level 2 Cache
  • 2 Integer ALU per Core
  • 2x? Dual Channel FBDIMM DDR2 Controllers
  • 8? Stage Integer Pipeline per Core
  • No embedded Ethernet into CPU
  • Crypto Algorithms: DES, Triple DES, AES, RC4, SHA1, SHA256, MD5, RSA-2048, ECC, CRC32
This processor allowed the T processor series to move from the Tier 0 web engines and Middleware to Application tier. Architects started to understand the benefits of this platform entering the Database tier. This was the first Coolthreads processor to scale past 1 and up to 4 sockets.

By this time, competition really started to understand that Sun had properly predicted the future of computing. The drive toward single-die Quad-Core chips have started with Hex-Core Multi-Chip Modules being predicted.

Generation 4: SPARC T3
The market became nervous with Oracle purchasing Sun. The first Oracle branded CoolThreads SMP capable UltraSPARC T3 was launched in in 2010 September.
  • Single die
  • 1-4 Sockets
  • 64 bits
  • 16 integer cores
  • 16 crypto cores
  • 16 floating point units
  • 8 threads/core
  • 1.67 GHz clock speed
  • ??? million transisters
  • 377 mm2
  • 40 nm
  • 2x PCI Express port (2.0 x8)
  • 6 Megabyte Level 2 Cache
  • 2 Integer ALU per Core
  • 4x DDR3 SDRAM Controllers
  • 8? Stage Integer Pipeline per Core
  • 2x 10 GigabitEthernet on-CPU ports
  • Crypto Algorithms: DES, 3DES, AES, RC4, SHA1, SHA256/384/512, Kasumi, Galois Field, MD5, RSA to 2048 key, ECC, CRC32
This processor was more than what the market was anticipating from Oracle. This processor took all the features of the T2 and T2+ combined them into the new T3 with an increase in overall features. No longer did the market need to choose between multiple sockets or embedded 10 GigE interfaces - this chip has it all plus double the cores.

The market, immediately before this release, the competition was releasing single die hex-core and octal-core CPU's using multi-chip modules, by gluing them together. The T3 was a substantial upgrade over the competition by offering double the cores on a single die.

Generation 5: SPARC T4
Oracle indicated in December 2010 that they had thousands of these processors in the lab and predicted this processor will be released end of 2011.

After the announcement, a separate press release indicated processors will have a renovated core, for higher single threaded performance, but the socket will offer half the cores.

Most vendors are projected to have 8 core processors available (through Multi-Chip modules) by the time the T3 is released, but only the T4 should be on a single piece of silicon during this period.

[2010-12 SPARC Solaris Roadmap]
Generation 6: SPARC T5

Some details on the T5 were announced with the T4. Processors will use the renovated T4 core, with a 28nm process. This will return to 16 cores per socket again. This processor may be the first Coolthreads T processor able to scale from 1-8 processors. It is projected to appear in early 2013.

Some vendors are projecting to have 12 core processors on the market using Multi-Chip Module technology, but when the T5 is released, this should still be the market leader in 16 cores per socket.

Network Management Connection

Consolidating most network management stations in a globalized environment works very well with the Coolthreads T-Series processors. Consolidating multiple slower SPARC platforms onto single and double socket T series have worked well over the past half decade.

While most network management polling engines will scale linearly with these highly-threaded processors, there are some operations which are bound to single threads. These type of processes include event correlation, startup time, and syncronization after a discovery in a large managed topology.

The market will welcome the enhanced T4 processor core and the T5 processor, when it is released.

Friday, December 3, 2010

Scalable Highest Performing Clusters at Value Pricing

Scalable Highest Performing Clusters at Value Pricing

Oracle presented another milestone achievement in their 5 year SPARC/Solaris road map with Fujitsu. John Fowler stated: "Hardware without Software is a Door-Stop, Solaris is the gateway."

The following is a listing of my notes from the two sessions. The notes have been combined, with Larry Ellison outlining the high-level and John Fowler presenting the lower-level details. SPARC T3 making world-record benchmarks. New T3 based integrated products. Oracle's Sun/Fujitsu M-Series gets a speed bump. SPARC T4 is on the way.

Presentation Notes:

New TpmC Database OLTP Performance
  • SPARC Top cluster performance
  • SPARC Top cluster price-performance
  • (turtle)
    HP Superdome Itanium 4 Million Transactions/Minute
  • (stallion)
    IBM POWER7 Power 780 10 Million Transactions/Minute
    (DB2 clustered through custom applications)
  • Uncomfortable 4 month for Oracle, when IBM broke the Oracle record
  • (cheetah)
    Sun SPARC 30 Million Transactions/Minute
    (standard off-the-shelf Oracle running RAC)
  • Oracle/Sun performance benchmark => ( IBM + HP ) x 2 !
  • Sun to IBM Comparison:
    3x OLTP Throughput, 27% better Price/Performance, 3.2x faster response time
  • Sun to HP Comparison:
    7.4x OLTP Throughput 66 Better Price/Performance, 24x compute density
  • Sun Supercluster:
    108 sockets, 13.5 TB Memory, Infiniband 40 Gigabit link, 246 Terabytes Flash, 1.7 Petabytes Storage, 1 Quadrillion rows, 43 Trillion transactions per day, 0.5 sec avg response

New Gold Release
  • Gold Standard Configurations are kept in the lab
  • What the customer has, the support organization will have assembled in the lab
  • Oracle, Sun, Cisco, IBM will all keep their releases and bug fixes in sync with releases

SPARC Exalogic Elastic Cloud
  • Designed to run Middleware
  • New T3 processor based
  • 100% Oracle Middleware is Pure Java
  • Tuned for Java and Oracle Fusion Middleware
  • Load-balances with elasticity
  • Ships Q1 2011
  • T3-1B SPARC Compute Blades based
    30 Compute Servers, 16 cores/server, 3.8 TB RAM, 960 GB mirrored flash disks, 40 TB SAS Storage, 4 TB Read Cache, 72 GB Write Cache, 40 Gg/sec Infiniband, 10 GigE to Datacenter

SPARC Supercluster
  • New T3 processor based and M processor based
  • T3-2 = 2 nodes, 4 CPU's, 64 cores/512 threads, 0.5 TB RAM, 96 TB HDD ZFS, 1.7TB Write Flash, 4TB Read Flash, 40 Gbit Infiniband
  • T3-4 = 3 nodes, 12 CPU's, 192 cores/1536 threads, 1.5 TB RAM, 144 TB HDD ZFS, 1.7TB Write Flash, 4TB Read Flash, 40 Gbit Infiniband
  • M5000 = 2 nodes, 16 CPU's, 64 core/128 threads, 1 TB RAM, 144 TB HDD ZFS, 1.7TB Write Flash, 4TB Read Flash, 40 Gbit Infiniband

T3 Processor in production
  • Releases already, performing in these platforms
  • 1-4 processors in a platform
  • 16 cores/socket, 8 threads/core
  • 16 crypto-engines/socket
  • More cores, threads, 10 GigE on-chip, more crypto engines

T4 Processor in the lab!
  • Thousands under test in the lab, today
  • To be released next year
  • 1-4 processors
  • 8 cores/socket, 8 threads/core
  • faster per-thread execution

M3 Processor from Fujitsu
  • 1-64 SPARC64 VII+ Processors
  • 4 cores, 2 threads/core
  • Increased CPU frequency
  • Double cache memory
  • 2.4x performance of original SPARC64 VI processor
  • VII+ boards will slot into the VI and VII board chassis
Flash Optimization
- Memory hierarchy with software awareness

- Appropriate for High Performance Computing
- Dramatically better performance than Ethernet for linking servers to servers & storage

New Solaris 11 Release

  • Next Generation Networking
    re-engineered network stack
    low latency high bandwidth protocols
  • Cores and Threads Scale
    Adaptive Thread and Memory Placement
    10,000's of core & threads
    thread observability with DTrace

  • Memory Scale
    Dynamic optimization for large memory configs
    Advanced memory placement
    VM systems for 1000's TB memory configs

  • I/O Performance
    Enhanced NUMA I/O framework
    Auto-Discovery of NUMA architecture
    I/O resources co-located with CPU for scale/performance

  • Data Scale
    ZFS Massive storage for massive datasets

  • Availability
    Boot times in seconds
    Minimized OS Install
    Risk-Free Updates with lightweight boot and robust package dependency
    Extensive Fault Management with Offline failing components
    Application Service Managemment with Restart failed applications and associated services quickly

  • Security
    Secure by default
    Secure boot validated with onboard Trusted Platform Module
    Role Based Root Access
    Encrypted ZFS datasets
    Accelerated Encryption with hardware encryption support

  • Trusted Solaris Extensions
    Dataset labels for explicit access rules
    IP labels for secure communication

  • Virtualization
    Network Virtualization to add to Server and Storage Virtualization
    Network Virtualization includes Virtual NIC's and Virtual Switches
SPARC Supercluster Architecture
  • Infiniband is 5x-8x faster than most common Enterprise interconnects
    Infiniband has been leveraged with storage and clustering in software
  • Flash is faster than Rotating Media
    Integrated into the Memory AND Storage Hierarchy

SPARC 5 Year Roadmap
  • SPARC T3 delvered in 2010
  • SPARC VII+ delivered in 2010
  • Solaris 11 and SPARC T4 to be delivered in 2011
Next generation of mission critical enterprise computing
  • Engineer software with hardware products
  • Deliver clusters for general purpose computing
  • Enormous levels of scale
  • Built in virtualization
  • Built in Security
  • Built in management tools
  • Very Very high availability
  • Tested with Oracle software
  • Supported with Gold Level standard
  • Customers spend less time integrating and start delivering services on systems engineered with highest performance components