Showing posts with label UltraSPARC. Show all posts
Showing posts with label UltraSPARC. Show all posts

Wednesday, August 29, 2012

ZFS: A Multi-Year Case Study in Moving From Desktop Mirroring (Part 4)

Abstract:
ZFS was created by Sun Microsystems to innovate the storage subsystem of computing systems by simultaneously expanding capacity & security exponentially while collapsing the formerly striated layers of storage (i.e. volume managers, file systems, RAID, etc.) into a single layer in order to deliver capabilities that would normally be very complex to achieve. One such innovation introduced in ZFS was the ability to dynamically add additional disks to an existing filesystem pool, remove the old disks, and dynamically expand the pool for filesystem usage. This paper discusses the upgrade of high capacity yet low cost mirrored external media under ZFS.

Case Study:
A particular Media Design House had formerly used multiple external mirrored storage on desktops as well as racks of archived optical media in order to meet their storage requirements. A pair of (formerly high-end) 400 Gigabyte Firewire drives lost a drive. An additional pair of (formerly high-end) 500 Gigabyte Firewire drives experienced a drive loss within one month later. A media wall of CD's and DVD's was getting cumbersome to retain.

First Upgrade - Migration to Solaris:
A newer version of Solaris 10 was released, which included more recent features. The Media House was pleased to accept Update 8, with the possibility of supporting Level 2 ARC for increased read performance and Intent Logging for increase write performance. A 64 bit PCI card supporting gigabit ethernet was used on the desktop SPARC platform, serving mirrored 1.5 Terabyte "green" disks over "green" gigabit ethernet switches. The Media House determined this configuration performed adequately.


ZIL Performance Testing:
Testing was performed to determine what the benefit was to leveraging a new feature in ZFS called the ZFS Intent Log or ZIL. Testing was done across consumer grade USB SSD's in different configurations. It was determined that any flash could be utilized in the ZIL to gain a performance increase, but an enterprise grade SSD provided the best performance increase, of about 20% with commonly used throughput loads of large file writes going to the mirror. It was determined at that point to hold off on the use of the SSD's, since the performance was adequate enough.

Second Upgrade - Drives Replaced:
One of the USB drives experienced some odd behavior from the time it was purchased, but it was decided the drives behaved well enough under ZFS mirroring. Eventually, the drive started to perform poorly and were logging occasional errors. When the drives were nearly out of capacity, they were upgraded from 1.5 TB mirror to a 2 TB mirror.

Third Upgrade - SPARC Upgraded:
The Ultra60 desktop was being moved to a new location in the media house, a PM (preventative maintenance) was conducted (to remove dust), but the Ultra 60 did not boot in the new location. It was time to move the storage to a newer server.

The old Ultra60 was a nice unit, with 2 Gig of RAM and a dual 450MHz UltraSPARC II CPU's, but did not offer some of the features that modern servers offered. An updated V240 platform was chosen: Dual 1.5GHz UltraSPARC IIIi, 4 Gig of RAM, redundant power supplies, and an upgraded UPS.

Listing the Drives:

After booting the new system, attaching the USB drives, a general "disks" command was run, to force a discovery of the drives. Whether this is needed or not, is not necessarily important, but it is a step seasoned system administrators do.

The listing of the drives is simple to do through
V240/root$ ls -la /dev/rdsk/c*0
lrwxrwxrwx 1 root root 46 Jan  2  2010 /dev/rdsk/c0t0d0s0 -> ../../devices/pci@1e,600000/ide@d/sd@0,0:a,raw
lrwxrwxrwx 1 root root 47 Jan  2  2010 /dev/rdsk/c1t0d0s0 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:a,raw
lrwxrwxrwx 1 root root 47 Jan  2  2010 /dev/rdsk/c1t1d0s0 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:a,raw
lrwxrwxrwx 1 root root 47 Mar 25  2010 /dev/rdsk/c1t2d0s0 -> ../../devices/pci@1c,600000/scsi@2/sd@2,0:a,raw
lrwxrwxrwx 1 root root 47 Sep  4  2010 /dev/rdsk/c1t3d0s0 -> ../../devices/pci@1c,600000/scsi@2/sd@3,0:a,raw
lrwxrwxrwx 1 root root 59 Aug 14 21:20 /dev/rdsk/c3t0d0 -> ../../devices/pci@1e,600000/usb@a/storage@2/disk@0,0:wd,raw
lrwxrwxrwx 1 root root 58 Aug 14 21:20 /dev/rdsk/c3t0d0s0 -> ../../devices/pci@1e,600000/usb@a/storage@2/disk@0,0:a,raw
lrwxrwxrwx 1 root root 59 Aug 14 21:20 /dev/rdsk/c4t0d0 -> ../../devices/pci@1e,600000/usb@a/storage@1/disk@0,0:wd,raw
lrwxrwxrwx 1 root root 58 Aug 14 21:20 /dev/rdsk/c4t0d0s0 -> ../../devices/pci@1e,600000/usb@a/storage@1/disk@0,0:a,raw

The USB storage was recognized. ZFS may not recognize the drives, when plugged into different USB ports on the new machine. ZFS will see the drives through the "zpool import" command.
V240/root$ zpool status
no pools available
V240/root$ zpool list
no pools available
V240/root$ zpool import
  pool: zpool2
    id: 10599167846544478303
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

        zpool2      ONLINE
          mirror    ONLINE
            c3t0d0  ONLINE
            c4t0d0  ONLINE

Importing Drives on New Platform:
Since the drives were taken from another platform, ZFS tried to warn the administrator, but the admin is all to well aware that the old Ultra60 is dysfunctional and the importing the drive mirror is exactly what is desired to be done.
V240/root$ time zpool import zpool2
cannot import 'zpool2': pool may be in use from other system, it was last accessed by Ultra60 (hostid: 0x80c6e89a) on Mon Aug 13 20:10:14 2012
use '-f' to import anyway

real    0m6.48s
user    0m0.01s
sys     0m0.05s

The drives are ready for import, use the force flag, and the storage is available.
V240/root$ time zpool import -f zpool2

real    0m23.64s
user    0m0.02s
sys     0m0.08s

The pool was imported quickly.
240/root$ zpool status
  pool: zpool2
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zpool2      ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c3t0d0  ONLINE       0     0     0
            c4t0d0  ONLINE       0     0     0

errors: No known data errors
V240/root$ zpool list
NAME     SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
zpool2  1.81T  1.34T   480G    74%  ONLINE  -
The storage movement went very well to the existing SPARC server.

Conclusions:
ZFS for this ongoing engagement has proved very reliable. The ability to reduce rebuild time from days to seconds, upgrade underlying OS releases, retain compatibility with older file system releases, increase write throughput by adding consumer or commercial grade flash storage, recover from drive failures, and recover from chassis failure demonstrates the robustness of ZFS as the basis for a storage system.

Friday, July 20, 2012

Midrange CPU Board Basics (Part 1)

Abstract:

Every so often, when working on a Sun server, it is helpful to know the positioning and speed of the CPU boards, to plan for better upgrades. This article takes a few common machines and provides some basic, simple to read instructions, for determining CPU capabilities.



Sun V490 (SUNW,Sun-Fire-V490)

Introduction

The Sun Microsystems server Sun Fire V490 was a machine on the high end of the workgroup servers. This server has a 2 CPU board capacity, where each CPU board holds 2 sockets, where each socket typically holds 2 cores.

These are not Intel cores, but each core addition to the socket increase performance close to linearly (instead of by 50% in Intel or AMD sockets of this age.)

Determining Class

The "uname" provides for an easy way to know the class of machine.
sun1316$ uname -a
SunOS sun1316 5.9 Generic_122300-57 sun4u sparc SUNW,Sun-Fire-V490
The name of the platform is a "Sun-Fire-V490", hinting this chassis is capable of 4 sockets. The "90" indicates it was an UltraSPARC IV based machine, which was capable of dual-cores. (The chassis is also compatible with older single board UltraSPARC III processors.)

Determining Boards

The psrinfo command is available in the /usr/sbin directory.
sun1316$ /usr/sbin/psrinfo
0       on-line   since 07/15/2012 02:31:01
1       on-line   since 07/15/2012 02:31:01
2       on-line   since 07/15/2012 02:31:01
3       on-line   since 07/15/2012 02:31:00
16      on-line   since 07/15/2012 02:31:01
17      on-line   since 07/15/2012 02:31:01
18      on-line   since 07/15/2012 02:31:01
19      on-line   since 07/15/2012 02:31:01
Processors 16 and over are an indication that this socket is a dual-core socket. On this chassis, sockets 0-1 are located on board 1, while sockets 2-3 are located on board 1. Both boards are populated.

Determining Performance

The psrinfo with a "-v" option will provide additional information, such as the speed of the individual cores.
sun1316$ psrinfo -v
Status of virtual processor 0 as of: 07/19/2012 22:43:35
  on-line since 07/15/2012 02:31:01.
  The sparcv9 processor operates at 1350 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 1 as of: 07/19/2012 22:43:35
  on-line since 07/15/2012 02:31:01.
  The sparcv9 processor operates at 1350 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 2 as of: 07/19/2012 22:43:35
  on-line since 07/15/2012 02:31:01.
  The sparcv9 processor operates at 1350 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 3 as of: 07/19/2012 22:43:35
  on-line since 07/15/2012 02:31:00.
  The sparcv9 processor operates at 1350 MHz,
        and has a sparcv9 floating point processor.
...
In the above example, I trimmed the output of the second cores on the 4 sockets, since the information is identical to the first 4 cores on each socket.



Sun V890 (Sun-Fire-V890)

Introduction

The Sun V890 was a machine square in the high end of the workgroup servers. The server has a 4 CPU board capacity, where each CPU board holds 2 sockets, where each socket typically holds 2 cores.

These are not Intel cores, but each core addition to the socket increase performance close to linearly (instead of by 50% in Intel or AMD sockets of this age.)

Determining Class

sun1375$ uname -a
SunOS sun1376 5.10 Generic_144488-11 sun4u sparc SUNW,Sun-Fire-V890

The name of the platform is "V890" hinting at 8 socket capability. The "90" hints it was capable of using dual-core UltraSPARC processors. The chassis was capable of using UltraSPARC III boards.

Determining Boards

The psrinfo command is available in the /usr/sbin directory.
sun1376$ psrinfo 
0       on-line   since 05/16/2011 14:26:00
1       on-line   since 05/16/2011 14:26:00
2       on-line   since 05/16/2011 14:26:00
3       on-line   since 05/16/2011 14:26:00
4       on-line   since 05/16/2011 14:26:00
5       on-line   since 05/16/2011 14:26:00
6       on-line   since 05/16/2011 14:26:00
7       on-line   since 05/16/2011 14:25:55
16      on-line   since 05/16/2011 14:26:00
17      on-line   since 05/16/2011 14:26:00
18      on-line   since 05/16/2011 14:26:00
19      on-line   since 05/16/2011 14:26:00
20      on-line   since 05/16/2011 14:26:00
21      on-line   since 05/16/2011 14:26:00
22      on-line   since 05/16/2011 14:26:00
23      on-line   since 05/16/2011 14:26:00

Processors 16 and over are an indication that this socket is a dual-core socket. On this chassis, sockets 0-1 are located on board 1, sockets 2-3 are located on board 2, sockets 4-5 on board 3, sockets 6-7 on board 4. All boards are populated.

Determining Performance

The psrinfo with a "-v" option will provide additional information, such as the speed of the individual cores.
sun1376$ psrinfo -v
Status of virtual processor 0 as of: 07/19/2012 22:55:41
  on-line since 05/16/2011 14:26:00.
  The sparcv9 processor operates at 1500 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 1 as of: 07/19/2012 22:55:41
  on-line since 05/16/2011 14:26:00.
  The sparcv9 processor operates at 1500 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 2 as of: 07/19/2012 22:55:41
  on-line since 05/16/2011 14:26:00.
  The sparcv9 processor operates at 1500 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 3 as of: 07/19/2012 22:55:41
  on-line since 05/16/2011 14:26:00.
  The sparcv9 processor operates at 1500 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 4 as of: 07/19/2012 22:55:41
  on-line since 05/16/2011 14:26:00.
  The sparcv9 processor operates at 1500 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 5 as of: 07/19/2012 22:55:41
  on-line since 05/16/2011 14:26:00.
  The sparcv9 processor operates at 1500 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 6 as of: 07/19/2012 22:55:41
  on-line since 05/16/2011 14:26:00.
  The sparcv9 processor operates at 1500 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 7 as of: 07/19/2012 22:55:41
  on-line since 05/16/2011 14:25:55.
  The sparcv9 processor operates at 1500 MHz,
        and has a sparcv9 floating point processor.

...
In the above example, I trimmed the output of the second cores on the 4 sockets, since the information is identical to the first core on each socket. Note, the cores ran at 1.5GHz.



Sun E2900 (SUNW,Netra-T12)

Introduction

The Sun E2900 was a machine on the cusp of between a workgroup and a midrange server. This server has a 3 CPU boards capacity, where each CPU board holds 4 sockets, where each socket typically holds 2 cores.

These are not Intel cores, but each core addition to the socket increase performance close to linearly (instead of by 50% in Intel or AMD sockets of this age.)

Determining Class

The "uname" provides for an easy way to know the class of machine.
sun1142$ uname -a
SunOS sun1142 5.10 Generic_138888-07 sun4u sparc SUNW,Netra-T12
The name of the platform is a "SUNW-Netra-T12", hinting this chassis is capable of 12 sockets. The "900" in the 2900 model is a hint indicating this chassis is capable of using UltraSPARC IV dual-core CPU's. (The chassis is also compatible with older single board UltraSPARC III processors.)

Determining Boards

The psrinfo command is available in the /usr/sbin directory.
sun1142$ /usr/sbin/psrinfo
0       on-line   since 02/12/2012 02:27:26
1       on-line   since 02/12/2012 02:27:42
2       on-line   since 02/12/2012 02:27:42
3       on-line   since 02/12/2012 02:27:42
8       on-line   since 02/12/2012 02:27:42
9       on-line   since 02/12/2012 02:27:42
10      on-line   since 02/12/2012 02:27:42
11      on-line   since 02/12/2012 02:27:42
512     on-line   since 02/12/2012 02:27:42
513     on-line   since 02/12/2012 02:27:42
514     on-line   since 02/12/2012 02:27:42
515     on-line   since 02/12/2012 02:27:42
520     on-line   since 02/12/2012 02:27:42
521     on-line   since 02/12/2012 02:27:42
522     on-line   since 02/12/2012 02:27:42
523     on-line   since 02/12/2012 02:27:42
Processor 512 and over are an indication that this socket is a dual-core socket. On this chassis, sockets 0-4 are located on board 1, while sockets 8-11 are located on board 3. Board 2 is missing.

Determining Performance

The psrinfo with a "-v" option will provide additional information, such as the speed of the individual cores.
sun1142$ /usr/sbin/psrinfo -v
Status of virtual processor 0 as of: 07/19/2012 22:20:39
  on-line since 02/12/2012 02:27:26.
  The sparcv9 processor operates at 1950 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 1 as of: 07/19/2012 22:20:39
  on-line since 02/12/2012 02:27:42.
  The sparcv9 processor operates at 1950 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 2 as of: 07/19/2012 22:20:39
  on-line since 02/12/2012 02:27:42.
  The sparcv9 processor operates at 1950 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 3 as of: 07/19/2012 22:20:39
  on-line since 02/12/2012 02:27:42.
  The sparcv9 processor operates at 1950 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 8 as of: 07/19/2012 22:20:39
  on-line since 02/12/2012 02:27:42.
  The sparcv9 processor operates at 1950 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 9 as of: 07/19/2012 22:20:39
  on-line since 02/12/2012 02:27:42.
  The sparcv9 processor operates at 1950 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 10 as of: 07/19/2012 22:20:39
  on-line since 02/12/2012 02:27:42.
  The sparcv9 processor operates at 1950 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 11 as of: 07/19/2012 22:20:39
  on-line since 02/12/2012 02:27:42.
  The sparcv9 processor operates at 1950 MHz,
        and has a sparcv9 floating point processor.
...

Note, on this chassis, a board runs at a uniform clock rate across all sockets and cores, so only one core per board is needed, but I ignored the second cores in each socket to shorten the above example. Board 1 and Board 3 both use 1.95GHz clock rate. 2.1 GHz is the fastest board which can be purchased for this chassis.

An example of a completely filled chassis with differing speed CPU boards is as follows:
sun1143$ /usr/sbin/psrinfo -v
Status of virtual processor 0 as of: 07/19/2012 22:32:18
  on-line since 07/17/2012 11:27:08.
  The sparcv9 processor operates at 1200 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 1 as of: 07/19/2012 22:32:18
  on-line since 07/17/2012 11:27:26.
  The sparcv9 processor operates at 1200 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 2 as of: 07/19/2012 22:32:18
  on-line since 07/17/2012 11:27:26.
  The sparcv9 processor operates at 1200 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 3 as of: 07/19/2012 22:32:18
  on-line since 07/17/2012 11:27:26.
  The sparcv9 processor operates at 1200 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 8 as of: 07/19/2012 22:32:18
  on-line since 07/17/2012 11:27:26.
  The sparcv9 processor operates at 1350 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 9 as of: 07/19/2012 22:32:18
  on-line since 07/17/2012 11:27:26.
  The sparcv9 processor operates at 1350 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 10 as of: 07/19/2012 22:32:18
  on-line since 07/17/2012 11:27:26.
  The sparcv9 processor operates at 1350 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 11 as of: 07/19/2012 22:32:18
  on-line since 07/17/2012 11:27:26.
  The sparcv9 processor operates at 1350 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 16 as of: 07/19/2012 22:32:18
  on-line since 07/17/2012 11:27:26.
  The sparcv9 processor operates at 1500 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 17 as of: 07/19/2012 22:32:18
  on-line since 07/17/2012 11:27:26.
  The sparcv9 processor operates at 1500 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 18 as of: 07/19/2012 22:32:18
  on-line since 07/17/2012 11:27:26.
  The sparcv9 processor operates at 1500 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 19 as of: 07/19/2012 22:32:18
  on-line since 07/17/2012 11:27:26.
  The sparcv9 processor operates at 1500 MHz,
        and has a sparcv9 floating point processor.
...
Note, in the above example, I cut out the second cores, to simplify the output. It can be seen that the 3 CPU boards are running at 1.5GHz, 1.35GHz, and and 1.2GHz

Network Management
Most network management platforms require excessive uptime and outstanding expandability. One of the reasons for choosing platforms such as these would be to have robust platforms which will survive for the duration of a contract with an end customer. This may be 3-5 years.

Many of these mid-range platforms are coming towards end of managed services contract, and there is a lot of horse power left in them, to provide services for the next 3-5 year range. Adding a new CPU and Memory card can extend the capital investment of the asset for years to come, if you know which platforms can be extended, and know which platforms to retire.

Often, older high-end platforms are rotated to become development platforms, while older low-end platforms are retired. One of the difficulties experienced by network management providers center around the ISV's, who have had their legs chopped out from under them, by Oracle not releasing Solaris 11 under these UltraSPARC units. Some ISV's have just chosen to stop developing for SPARC Solaris because the barrier to entry is now too high (must buy new development and new production SPARC hardware... not to mention the lack of inexpensive Solaris SPARC desktops.)

Why bother developing network management applications for Solaris SPARC? The next generation of SPARC processors are terrific: crypto engines, 128 slow threads, 64 fast threads, binary compatibility for nearly forever, future 128 fast threads, future 8 socket platforms, built in gigabit ethernet in the CPU socket, etc.

It is good to know that the SPARC also has a second supplier, which bailed out Sun Microsystems a number of years ago, when they were not doing so well with their advanced processor lines. It is comforting to know that viruses and worms seldom target SPARC. It is also good to know that a platform which is booted yearly provides availability unlike most others - and Network Management is all about Availability.

Sunday, December 5, 2010

CoolThreads UltraSPARC and SPARC Processors


[UltraSPARC T3 Micrograph]

CoolThreads UltraSPARC and SPARC Processors

Abstract:

Processor development takes an immense quantity of time, to architect a high-performance solution, and an uncanny vision of the future, to project market demand and acceptance. In 2005, Sun embarked on a bold path moving toward many cores and many threads per core. Since the purchase of Sun by Oracle, the internal SPARC road map from Sun had clarified.


[UltraSPARC T1 Micrograph]
Generation 1: UltraSPARC T1
A new family of SPARC processors was announced by Sun on 2005 November 14.
  • Single die
  • Single socket
  • 64 bits
  • 4, 6, 8 integer cores
  • 4, 6, 8 crypto cores
  • 4 threads/core
  • 1 shared floating point core
  • 1.0 GHz - 1.4 GHz clock speed
  • 279 million transisters
  • 378 mm2
  • 90 nm CMOS (TI)
  • 1 JBUS port
  • 3 Megabyte Level 2 Cache
  • 1 Integer ALU per Core
  • ??? Memory Controllers
  • 6 Stage Integer Pipeline per Core
  • No embedded Ethernet into CPU
  • Crypto Algorithms: ???
Platform designed as a front-end server for web server applications. With a massive number of cores, it was designed to provide web-tier performance similar to existing quad-socket systems leveraging a single socket.

To understand the ground-breaking advancement in this technology, most processors were single core, with an occasional dual core processor (with cores glued together through a more expensive process referred to as a multi-chip module, driving higher software licensing costs for those platforms.)


Generation 2: UltraSPARC T2
The next generation of the CoolThreads processor was announced by Sun on 2007 August.
  • Single die
  • Single Socket
  • 64 bits
  • 4, 6, 8 integer cores
  • 4, 6, 8 crypto cores
  • 4, 6, 8 floating point units
  • 8 threads/core
  • 1.2 GHz - 1.6 GHz clock speed
  • 503 million transisters
  • 342 mm2
  • 65 nm CMOS (TI)
  • 1 PCI Express port (1.0 x8)
  • 4 Mageabyte Level 2 Cache
  • 2 Integer ALU per Core
  • 4x Dual Channel FBDIMM DDR2 Controllers
  • 8 Stage Integer Pipeline per Core
  • 2x 10 GigabitEthernet on-CPU ports
  • Crypto Algorithms: DES, Triple DES, AES, RC4, SHA1, SHA256, MD5, RSA-2048, ECC, CRC32
This processor was designed for higher compute intensive requirements and incredibly efficient network capacity. Platform made an excellent front-end server for applications as well as Middleware, with the ability to do 10 Gigabit wire-speed encryption with virtually no CPU overhead.

Competitors started to build Single-Die dual-core CPU's with Quad-Core processors by gluing dual-core processors into a Multi-Chip Module.


[UltraSPARC T2 Micrograph]
Generation 3: UltraSPARC T2+
Sun quickly released the first CoolThreads SMP capable UltraSPARC T2+ in 2008 April.
  • Single die
  • 1-4 Sockets
  • 64 bits
  • 4, 6, 8 integer cores
  • 4, 6, 8 crypto cores
  • 4, 6, 8 floating point units
  • 8 threads/core
  • 1.2 GHz - 1.6 GHz clock speed
  • 503 million transisters
  • 342 mm2
  • 65 nm CMOS (TI)
  • 1 PCI Express port (1.0 x8)
  • 4 Megabyte Level 2 Cache
  • 2 Integer ALU per Core
  • 2x? Dual Channel FBDIMM DDR2 Controllers
  • 8? Stage Integer Pipeline per Core
  • No embedded Ethernet into CPU
  • Crypto Algorithms: DES, Triple DES, AES, RC4, SHA1, SHA256, MD5, RSA-2048, ECC, CRC32
This processor allowed the T processor series to move from the Tier 0 web engines and Middleware to Application tier. Architects started to understand the benefits of this platform entering the Database tier. This was the first Coolthreads processor to scale past 1 and up to 4 sockets.

By this time, competition really started to understand that Sun had properly predicted the future of computing. The drive toward single-die Quad-Core chips have started with Hex-Core Multi-Chip Modules being predicted.


Generation 4: SPARC T3
The market became nervous with Oracle purchasing Sun. The first Oracle branded CoolThreads SMP capable UltraSPARC T3 was launched in in 2010 September.
  • Single die
  • 1-4 Sockets
  • 64 bits
  • 16 integer cores
  • 16 crypto cores
  • 16 floating point units
  • 8 threads/core
  • 1.67 GHz clock speed
  • ??? million transisters
  • 377 mm2
  • 40 nm
  • 2x PCI Express port (2.0 x8)
  • 6 Megabyte Level 2 Cache
  • 2 Integer ALU per Core
  • 4x DDR3 SDRAM Controllers
  • 8? Stage Integer Pipeline per Core
  • 2x 10 GigabitEthernet on-CPU ports
  • Crypto Algorithms: DES, 3DES, AES, RC4, SHA1, SHA256/384/512, Kasumi, Galois Field, MD5, RSA to 2048 key, ECC, CRC32
This processor was more than what the market was anticipating from Oracle. This processor took all the features of the T2 and T2+ combined them into the new T3 with an increase in overall features. No longer did the market need to choose between multiple sockets or embedded 10 GigE interfaces - this chip has it all plus double the cores.

The market, immediately before this release, the competition was releasing single die hex-core and octal-core CPU's using multi-chip modules, by gluing them together. The T3 was a substantial upgrade over the competition by offering double the cores on a single die.


Generation 5: SPARC T4
Oracle indicated in December 2010 that they had thousands of these processors in the lab and predicted this processor will be released end of 2011.

After the announcement, a separate press release indicated processors will have a renovated core, for higher single threaded performance, but the socket will offer half the cores.

Most vendors are projected to have 8 core processors available (through Multi-Chip modules) by the time the T3 is released, but only the T4 should be on a single piece of silicon during this period.


[2010-12 SPARC Solaris Roadmap]
Generation 6: SPARC T5

Some details on the T5 were announced with the T4. Processors will use the renovated T4 core, with a 28nm process. This will return to 16 cores per socket again. This processor may be the first Coolthreads T processor able to scale from 1-8 processors. It is projected to appear in early 2013.

Some vendors are projecting to have 12 core processors on the market using Multi-Chip Module technology, but when the T5 is released, this should still be the market leader in 16 cores per socket.

Network Management Connection

Consolidating most network management stations in a globalized environment works very well with the Coolthreads T-Series processors. Consolidating multiple slower SPARC platforms onto single and double socket T series have worked well over the past half decade.

While most network management polling engines will scale linearly with these highly-threaded processors, there are some operations which are bound to single threads. These type of processes include event correlation, startup time, and syncronization after a discovery in a large managed topology.

The market will welcome the enhanced T4 processor core and the T5 processor, when it is released.

Tuesday, October 5, 2010

US Department of Energy: No POWER Upgrade From IBM


US Department of Energy: No POWER Upgrade From IBM

Abstract:

Some ay no one was ever fired for buying IBM, but no government or business ever got trashed for buying SPARC. The United States Department of Energy bought an IBM POWER system with no upgrade path and no long term spare parts.


[IBM Proprietary POWER Multi-Chip Module]

Background:

The U.S. Depertmant of Energy purchased a petaflops-class hybrid blade supercomputer called the IBM "Roadrunner" that performed into the multi-petaflop range for nuclear simulations at the Los Alamos National Laboratory. It was based upon the IBM Blade platform. Blades were based upon an AMD Opteron and hybrid IBM POWER / IBM Cell architecture. A short article was published in October 2009 in The Register.

Today's IBM:

A month later, the super computer was not mentioned at the SC09 Supercomputin Trade Show at Oregon, because IBM killed it. Apparently, it was killed off 18 months earlier - what a waste of American tax payer funding!

Tomorrow's IBM:

In March 2010, it was published that IBM gave it's customers (i.e. the U.S. Government) three months to buy spares, because future hybrid IBM POWER / Cell products were killed. Just a few months ago, IBM demonstrated their trustworthlessness with their existing Thin Client customers and partners by abandoning their thin client partnership and using the existing partner to help fund IBM's movement to a different future thin client partner!



Obama Dollars:

It looks like some remaining Democratic President Obama stimulus dollars will be used to buy a new super computer from Cray and cluster from SGI. The mistake of buying IBM was so huge that it took a massive spending effort from the Federal Government to recover from losing money on proprietary POWER.

[Fujitsu SPARC64 VII Processor]

[Oracle SPARC T3 Processor]
Lessons Learned:
If only the U.S. Government did not invest in IBM proprietary POWER, but had chosen an open CPU architecture like SPARC, which offers two hardware vendors: Oracle/Sun and Fujitsu.

[SUN UltraSPARC T2; Used in Themis Blade for IBM Blade Chassis]

Long Term Investment:

IBM POWER is not an open processor advocated by other systems vendor. Motorola abandoned the systems market for POWER from a processor production standpoint. Even Apple abandoned POWER on the desktop & server arena. One might suppose that when IBM kills a depended upon product, that one could always buy video game consoles and place them in you lights-out data center, but that is not what the Department of Energy opted for.

Oracle/Sun has a reputation of providing support for systems a decade old, and if necessary, Open SPARC systems and even blades for other chassis can be (and are) built by other vendors(i.e. Themis built an Open SPARC blade for an IBM Blade chassis.) SPARC processors have been designed & produced by different processor and system vendors for over a decade and a half. SPARC is a well proven long term investment in the market.

Network Management Connection:

If you need to build a Network Operation Center, build it upon the infrastructure the global telecommunications providers had trusted for over a decade: SPARC & Solaris. One will not find serious network management applications on IBM POWER, so don't bother wasting time looking. There are reasons for it.

Saturday, May 8, 2010

Oracle's Intentions on Sun Hardware Portfolio


The Latest on Oracle’s Intentions for the Sun Hardware Portfolio

For the past two months Oracle has been conducting a series of Sun Welcome Events around the world, kicking off first in the US at the beginning of March. Last week was Sydney Australia’s turn and IDEAS analysts attended the event to get an update of the latest news.

Although the format had similar content to previous events...

(see rest of article from Ideas International)
Thanks Amarand Agasi from Flickr for the photo!

Friday, February 12, 2010

Two Billion Transistors: Niagra T3



Two Billion Transistors: Niagra T3

Abstract:

Sun Microsystems has been developing octal core processors for almost a half decade. During the past few years, a new central processor unit called "Rainbow Falls" or "UltraSPARC KT" has been in development. With the release of the Power7, IBM's first octal core CPU, there has been a renewal of interest in the OpenSPARC processor line, in particular the T3.

Background:

OpenSPARC was an Open Source project started with an initial contributor of Sun Microsystems. It was based upon the open SPARC architecture, which had many companies and manufacturers contributing and leveraging the open specification over the years. Afara Websystems was one of those SPARC vendors who started the intellectual thought on combining many SPARC cores onto a single piece of silicon. They were later purchased by Sun Microsystems, who had the deep pockets to invest the engineering required to bring it to fruition (as the OpenSPARC or UltraSPARC T1) and advance it (with the design of the T2, T2+, and now the T3.) Sun was later purchased by Oracle, who had some deeper pockets.

Features:

As is typical with the highly integrated OpenSPARC processors, PCIe are included on-chip, providing very fast access to I/O subsystems.

The T3 looks more like an a combined T2 and T2+ with enhancecments. The T2 had embedded 10Gig Ethernet, while the T2+ had 4 chip cache coherency glue. Well, the T3 has it all, in conjunction with an uplifted DDR3 DRAM interface with 4 memory channels, enhanced crypto co-processors, a doubling of cores!

The benefits to Network Management:

Small and immature Network Management products are usually thread-bound, but those days of poorly programmed systems are long gone (except in the Microsoft Windows world.)

Network management workloads are typically highly threaded and UNIX based. Platforms like the OpenSPARC have played to meet these workloads from their very early design days in the early 2000's, with other CPU vendors anxiously trying to catch up in the late 2000's.

When thousands of devices need to have information polled from numerous subsystems on various minute intervals, latency on the receiving of the information adds a level of complexity to the polling software, and highly threaded CPU's with a well written OS reward the programmer for their work.

It was not that long ago when Solaris was updated to manage processes in the millions, when those processes could have dozens, hundreds, or thousands of threads apiece.

In the Network Management arena, we welcome these high-throughput workhorses!

Wednesday, September 30, 2009

Sun / Oracle License Change - T2+ Discount!

Sun / Oracle License Change - T2+ Discount!

Abstract

Oracle licenses it's database by several factors, typically the Standard License (by socket) and an Enterprise License (by core scaling factor.) Occasionally, Oracle will change the core scaling factor, resulting in discounting or liability for the consumer.

The Platform

The OpenSPARC platform is an open sourced SPARC implementation where the specification is also open. There have been several series of chips based upon this implementation: T1, T2, and T2+. The T1 & T2 are both single socket implementations, while the T2+ is a multi-socket implementation.

The Discount

While reviewing the Oracle licensing PDF, the following information has come to light concerning the OpenSPARC processor line, in particular the Sun UltraSPARC T2+ processor.

Factor Vendor/Processor
0.25 SUN T1 1.0GHz and 1.2GHz (T1000, T2000)
0.50 SUN T1 1.4GHz (T2000)
0.50 Intel Xeon 74xx or 54xx multi-core series (or earlier); Intel Laptop
0.50 SUN UltraSPARC T2+ Multicore
0.75 SUN UltraSPARC T2 Multicore

0.75 SUN UltraSPARC IV, IV+, or earlier
0.75 SUN SPARC64 VI, VII
0.75 SUN UltraSPARC T2, T2+ Multicore
0.75 IBM POWER5
1.00 IBM POWER6, SystemZ
1.00 All Single Core Chips


Note, Red is old, Green is new. Oracle has broken out the T2+ processor to a core factor of 0.50 instead of 0.75.

To see a copy of some of the old license factors, please refer to my old blog on the Oracle IBM license change entry.

Impacts to Network Management infrastructure

To calculate your discount, see the table below. It is basically 33% for the Enterprise version of Oracle under the T2+ processor.

Chips Cores Old New
01 08 06 04
02 16 12 08
03 24 18 12
04 32 24 16


If you have been waiting for a good platform to move your polling intensive workloads to, this may be the right time, since the T2+ has had it's licensing liability reduced.

Thursday, September 10, 2009

What's Better: USB or SCSI?

What's Better: USB or SCSI?

Abstract
Data usage and archiving is just exploding everywhere. The bus options for adding data increase often, with new bus protocols being added regularly. With systems so prevalent throughout businesses and homes, when should one choose a different bus protocol for accessing the data? This set of tests will be done with some older mid-range internal SCSI drives against a brand new massive external USB drive.

Test: Baseline
The Ultra60 test system is an SUN UltraSPARC II server, running dual 450MHz CPU's and 2 Gigabytes of RAM. Internally, there are 280 pin 180Gigabyte SCSI drives. Externally, there is one external 1.5 Terabyte Seagate Extreme drive. A straight "dd" will be done, from a 36Gig root slice, to the internal drive, and external disk.


Test #1a: Write Internal SCSI with UFS
The first copy was to an internal disk running UFS file system. The system hovered around 60% idle time with about 35% CPU time pegged in the SYS category, the entire time of the copy.

Ultra60-root$ time dd if=/dev/dsk/c0t0d0s0 of=/u001/root_slice_0
75504936+0 records in
75504936+0 records out

real 1h14m6.95s
user 12m46.79s
sys 58m54.07s


Test #1b: Read Internal SCSI with UFS
The read back of this file was used to create a baseline for other comparisons. The system hovered around 50% idle time with about 34% CPU time pegged in the SYS category, the entire time of the copy. About 34 minutes was the span of the read.

Ultra60-root$ time dd if=/u001/root_slice_0 of=/dev/null
75504936+0 records in
75504936+0 records out

real 34m13.91s
user 10m37.39s
sys 21m54.72s


Test #2a: Write Internal SCSI with ZFS
The internal disk was tested again using the ZFS file system, instead of UFS file system. The system hovered around 50% idle with about 45% being pegged in the sys category. The write time lengthened about 50%, using ZFS.

Ultra60-root$ time dd if=/dev/dsk/c0t0d0s0 of=/u002/root_slice_0
75504936+0 records in
75504936+0 records out

real 1h49m32.79s
user 12m10.12s
sys 1h34m12.79s


Test #2b: Read Internal SCSI with ZFS
The 36 Gigabyte read took ZFS took about 50% longer than UFS. The CPU capacity was not strained much more, however.

Ultra60-root$ time dd if=/u001/root_slice_0 of=/dev/null
75504936+0 records in
75504936+0 records out

real 51m15.39s
user 10m49.16s
sys 36m46.53s


Test #3a: Write External USB with ZFS
The third copy was to an external disk running ZFS file system. The system hovered around 0% idle time with about 95% CPU time pegged in the SYS category, the entire time of the copy. The copy consumed about the same amount of time as the ZFS copy to the internal disk.

Ultra60-root$ time dd if=/dev/dsk/c0t0d0s0 of=/u003/root_slice_0
75504936+0 records in
75504936+0 records out

real 1h52m13.72s
user 12m49.68s
sys 1h36m13.82s


Test #3b: Read External USB with ZFS
Read performance is slower over USB than it is over SCSI with ZFS. The time is 82% slower than the UFS SCSI read and 21% slower than the ZFS SCSI read. CPU utilization seems to be slightly more with USB (a factor of 10% less idle time with USB over SCSI.)

Ultra60-root$ time dd if=/u003/root_slice_0 of=/dev/null
75504936+0 records in
75504936+0 records out

real 1h2m50.76s
user 12m6.22s
sys 42m34.05s


Untested Conditions

Attempted was Firewire and eSATA, but these bus protocols would not reliably work on the Seagate Extreme 1.5TB drive, under any platform tested (several Macintoshes and SUN Workstations.) If you are interested in a real interface besides USB, this external drive is not the one you should be investigating - it is a serious mistake to purchase.

Conclusion

The benefits of ZFS does not come with a cost in time. Reads and writes are about 50% slower, but the cost may be worth it for the benefits of: unlimited snapshots, unlimited file system expansion, error correction, compression, 1 or 2 disk failure tolerance, future 3 disk failure tolerance, future encryption, and future clustering are features.

If you are serious about your system performance, SCSI is definitely a better choice, over USB, to provide throughput with minimum CPU utilization - regardless of file system. If you invested in CPU capacity and have CPU capacity to burn (i.e. muti-core CPU), then buying external USB storage may be reasonable, over purchasing SCSI.

Saturday, July 4, 2009

Rock Cancellation Rumor, The Bizarre, and The Benefits

Rock Cancellation Rumor, The Bizarre, and The Benefits

Rock Cancellation Rumor


There has been much speculation concerning the rumor originating from the New York Times suggesting Sun Microsystems canceled the processor UltraSPARC RK called 'Rock'.

Understanding the history behind 'Rock' provides support to just about any rumor.
  • UltraSPARC III+ was canceled by Sun
  • UltraSPARC IV+ was released late by Sun, with excellent performance
  • UltraSPARC V was canceled by Sun
  • UltraSPARC T1 was deployed on-time by Sun, with excellent performance
  • UltraSPARC T2 was deployed on-time by Sun, with excellent performance
  • UltraSPARC T2+ was deployed on-time by Sun, with excellent scalability
  • Sun Microsystems partnered with Fujitsu to release a joint server product line based around the SPARC64 processor
  • UltraSPARC RK was delayed multiple times by Sun

Rock Unintersting?

There was very good historical information in many trade journals. One of the most bizarre quote regarding Sun Microsystems processor UltraSPARC RK or 'Rock' was from Dean McCarron, principal analyst at Mercury Research. His quote was widely reported in various trade magazines, such as:
"Even if Rock had made it to market, it would have been an uninteresting processor as companies like Intel and AMD are offering high-performance chips at more reasonable prices, said Dean McCarron, principal analyst at Mercury Research."

Suggesting 'Rock' is "uninteresting" demonstrates a level of ignorance beyond comprehension.

Rock Interested Audiences

'Rock' is an interesting processor to Computer Scientists since they have worked for decades trying to optimize single threaded applications. Thread bound applications run with very painful waits when they encounter cache misses on traditional proprietary (Intel & AMD) processors. These waits are a thing of the past on Rock with technologies such as:
  • thread level parallelism
  • thread level speculation
  • transactional memory
  • out-of-order retirement
  • deferred queue
'Rock' is a very interesting processor in the academic world since companies like Intel and AMD have not recently pioneered computer science technologies in silicon to optimize single threaded applications. Implementations of theory in real silicon are very important to review academic thoughts and determine future implications. Implementations like 'Rock' are studied for decades.

'Rock' is a very interesting processor for business, military, and academia system performance of single thread bound applications - software threads on 'Rock' runs with very few of the painfully long waits on slow memory due to cache misses, commonly experienced with Intel and AMD processors. People who purchase systems expect their systems to be doing work, instead of sitting around idle.

'Rock' is a very interesting processor in the commercial world since accelerating legacy single thread-bound software allow for acceleration of existing software (which does not scale well with multiple threads) - something the major CPU developers (AMD and Intel) in rest of the market have been ignoring for a couple of years. If a single thread is the problem, newer CPU's from other vendors will not solve their performance problem, increase the thread bottlenecked performance, and increase the business profitability.

'Rock' is a very interesting processor in the investment community. Sun had pioneered the niche of multi-threaded hardware with the release of their 32 hardware thread UltraSPARC T1 processor - driving other vendors (Intel, AMD) to change their directions to start heavily threading their CPU's... discontinuing projects to speed existing single-threaded applications. The release of Rock would enable Sun to pioneer this lost niche, abandoned by the other vendors. Filling niches are very profitable to investors in those technologies.

'Rock' is very interesting to the environmentally conscience consumer. Very little work has been done recently in the market to increase the performance of single threaded software, with the exception of increasing clock rate, which drives up the costs to consumers in: hardware, cooling, and power consumption. Rock has been the exception - targeting increased single threaded performance without aggressively increasing clock rate and the negatives that go along with it.

'Rock' is a very interesting processor for enterprises struggling with consolidation efforts. Bundling 16 high-speed cores into a single chip which supports LDOM's at the firmware level and Solaris Containers at the OS Level provides a consolidation platform for legacy applications which are not highly-threaded and required high single threaded throughput.

Rock & Role in Network Management

Network Management infrastructure is greatly benefited by highly threaded underlying infrastructure. It is not unusual to tun run hundreds to thousands of polling threads on centralized network management platforms. Any highly-threaded CPU platform (Sun OpenSPARC UltraSPARC T series) help to provide hardware acceleration to the polling processes, reducing the amount of time proprietary CPU's normally spend on context switches, constantly pulling/pushing registers from/to slow memory.

While Network Management sees great performance strides with highly threaded hardware during 24x7x365 operations, not all areas are optimized. Two areas where highly threaded hardware (with slower individual thread performance) need improvement include: startup/shutdown time when the database needs to be loaded/dumped and post-discovery time when data needs to be consolidated with relationships built to all the other objects.

On some very large network topologies with extremely high (99.999) availability concerns - a slow startup time or slow post-discovery time is considered unacceptable - every minute counts in the case of software or hardware failure when protected with a High Availability kit.

Rock provides a mid-range position, between the $100K US$ super-threaded UltraSPARC T2+ SMP processors with high throughput (256 hardware threads) leveraging slower threads, and super-fast $1M US$ super-core'ed SPARC64 VII SMP (256 hardware threads) leveraging faster threads.

Rock's role in network management is clearly defined and beneficial.

Conclusion

Trade journalist writer Jon Stokes came to a diametrically opposite conclusion:
In the end, I can't say that I'm really sold on Sun's very aggressive use of speculative execution, but I will say that Rock is one of the most interesting and novel processors that I've seen in 10 years of covering this space. In its own way, it's every bit as exotic as IBM's Cell processor, but because all of that exoticism is hidden from the programmer it won't be nearly as difficult for developers to deal with.
I think this says it all.