Wednesday, June 27, 2012

The Processor Market: POWER #1 HPC

The Processor Market: POWER #1 HPC

During June 2012, some very interesting updates happened - some Open Source pieces from Sun and Oracle were combined with the POWER processors to build a new Super Computer. An odd result: IBM's POWER required a lot more sockets to outrun Fujitsu SPARC64... but did so with better power efficiency and using arch-rival Sun Microsystem's (now Oracle's) open source technology.

[wiring 123% more sockets for 55% greater performance, courtesy The Register]
IBM Denies Fujitsu's SPARC64 Year Long #1 HPC Rank!
With a long list of losses, IBM's POWER architecture finally has a win: proprietary IBM POWER architecture now has a #1 HPC Performance spot with Lustre under ZFS - denying Fujitsu their nearly 1 year long spot as the fastest computer in the world, with Fujitsu's fork of Lustre clustered filesystem!

[zfs write performance under linux with lustre, courtesy Lawrence Livermore Laboratory]
Whamcloud, Lustre, and Sequoia Supercomputer

The Lustre clustered/distributed filesystem, formerly owned by Sun Microsystems, now owned by Oracle. It has long been promised to be merged into ZFS. Whamcloud is a commercial enterprise which develops a fork of the Lustre file system. They announced the release of Chroma Enterprise, to bring enterprise management to Lustre.

Whamcloud is using a non-kernel emulated ZFS fork from OpenSolaris. The ZFS implementation still shows linear scalability (in comparison to the native Linux filesystem), as the load increases.

The Sequoia Supercomputer, run by the United States Department of Energy, has an interesting feature - the use of a merged Sun's  ZFS and Sun's Lustre filesystem. Here is a short 30 minute video talking to the PDF from the Lustre User Group (LUG) 2012.

IBM's Tortoise vs Fujitsu's Hare
IBM needed 123% more proprietary POWER CPU sockets to outrun Fujitsu's open SPARCv9 SPARC64 architecture by a mere 55%. The IBM POWER solution proved itself to be about 23% more power efficient, which is truly an achievement, considering how many more sockets were required. The tortoise POWER processor takes less energy than the hare SPARC64 processor.

Fujitsu SPARC64 Loses The Battle of the Alamo
This is somewhat a Pyrrhic victory, kind of like winning the Battle of the Alamo. Could any 1 year old platform hold it's performance position, when the new opposition has a 123% numeric advantage?

This victory was a solid win for IBM, from a supercomputer to supercomputer perspective, but there is an odd conclusion that some people may notice: each SPARC64 old socket appears to demonstrate a minimum of 123% faster than each new POWER socket.

Considering that each SPARC64 socket was an 8 core processor socket, in comparison to the 18 core POWER processor socket (of which 16 cores is usable) - each SPARC64 core is roughly 243% faster than each POWER core!

Fujitsu's SPARC64 Other Battle FrontsThe battles have been continuous since 2011:
SPARC continue to be on the map, in new locations, as well as eating IBM POWER's lunch in smaller installations - for very good reason. The new 16 core SPARC64 chips offer double the performance, in the same socket, making POWER look pale, in comparison.

Better Options for Super Computers
IBM's main processor is POWER with it's main OS being AIX. AIX is lacking a modern file system. IBM had a second operating system option, Linux, but it was lacking a modern file system. IBM briefly toyed with the idea of purchasing Sun Microsystems, before Oracle made the final purchase. AIX and Linux choices on POWER were lacking.

Why was the choice made to emulate ZFS? The licensing in Linux is so restrictive that ZFS could not be combined with the Linux kernel, so it had to be emulated in userland. Why did IBM use Lustre instead of IBM's own GPFS clustered file system? Cost may be a factor and Lustre is basically the defacto standard in High Performance Computing.

Lustre was going to be merged into ZFS by Sun Microsystems, after it's acquisition in 2007. The use of Lustre support directly from Oracle, without hardware, came to an end shortly after the purchase of Sun Microsystems by Oracle. Oracle limited the support of Lustre to Oracle hardware in 2010.

Code changes to OpenSolaris were delivered for Lustre friendliness - the movement to complete Lustre with ZFS under Illumos in kernel space could have offered better performance over user space ZFS, fewer system calls would be required at the emulation layer. Illumos could have delivered native performance on the IBM POWER Sequoia or the Fujitsu SPARC64 K Supercomputer.

Fujitsu, being the SPARC64 creator, was more than capable of delivering their drivers into the Illumos market, had Illumos been interested in SPARC. Clearly, pushing IBM to adopt forks of Oracle's Solaris ZFS and Oracle's Lustre was still pretty aggressive, perhaps pushing them all the way to adopt Illumos, a fork of Solaris, was a bridge too far (especially, after a failed Solaris acquisition.)
With some in the Illumos community seemingly less interested in POSIX subsystems, pulling out SVR4 features, disinterested in non-Intel distributions - some are asking the question the value of Illumos without the differentiators of ZFS and DTrace with an OS like Linux.

With POWER sitting as #1, SPARC64 as #2, and ARM growing with increasing market prevalence - the window for Illumos relevance may be closing if they don't start actively supporting some non-x64 architectures, as their differentiating features get ported to competing OS's.

IBM's POWER has long tried to demonstrate their superiority in per-socket or per-core performance. The POWER platform uses 18 core's per socket while Fujitsu uses 8 cores per socket - so each POWER core is vastly slower than a Fujitsu SPARC64 core.

IBM long tried to demonstrate their superiority of technologies to companies like Sun and Oracle, yet at the core of their super computer was ZFS and Lustre - in order to compete in this arena, former Sun Microsystem (now Oracle) technology was used, to scale their solution.

A non-IBM operating system, running a fork of Oracle Solaris ZFS, and running a fork of Oracle Lustre is not the way some might want to advertize an IBM POWER architecture (which normally runs IBM AIX operating system with IBM GPFS file system.)

Sunday, June 24, 2012

Network Management Basics: SNMP

Network Management Basics: SNMP

From the dawning days of The Internet, the network grew from hosts on a wire, to hosts on a wire joined by a bridge to extend electrical signals, to a logical group of hosts on wires being defined as a network and joined to other networks via routers. Throughout these periods, there was always a need for a way to manage the infrastructure, and SNMP is The Internet Standard. The SNMP Internet Standard is a critical piece of total management business requirements.

The Network:
Every device on The Internet has a physical Hardware Address, to facilitate communications on it's own wire, and a logical Internet Protocol (IP) Address, to facilitate communications to other locations, provided through Routers. Someone on that network has to provide the logical IP Addresses, this person is normally some kind of network administrator. This person has some kind of responsibility to manage the network.

[ARPANET diagram, courtesy wikipedia]

The Creation:
Networks were traditionally circuit switched, driven by a telephone company. In 1969, Steve Crocker developed a system to track agreed upon standards, called RFC's (Request for Comments), to facilitate interconnection of networks. The worlds first operational packet switching network came into existence, known as ARPANET (the Advanced Research Projects Agency Network) in 1977.

As The Internet started to grow, basic diagnosis utilities were needed. Mike Muuss created a utility called Ping in December 1983. The most important function of this tool was the use of the ICMP Echo Request  (type 8) network packet to another IP Address and the observation of the returned value.

The Manager may send an Echo Request or Ping to a remote device's logical IP Address to see if there is connectivity. If there is no connectivity, no packet is returned, or sometimes an Router in the path may return a message such as "Host Unreachable" or "TTL Exceeded" (packet time-to-live.) The manager may receive additional information such as the time it took for the packet to make the round trip.

As networks continued to get more complex, the management requirements grew. Traceroute was born, attributed to Van Jacobson in 1987. Now, the manager could send a packet to an agent and receive a path of each router which the packet would traverse, bundling in the round trip times.

The Problem:
Such tools like "ping" and "traceroute" were critical for an individual manager to understand network connectivity - but neither provided in-depth information about the target agent device. A "ping" not being returned did not necessarily mean that the agent or target device is "down". A “ping” returning does not necessarily mean that the agent did not go down a few minutes earlier. A "traceroute" response to another location does not necessarily mean there is a problem with the agent or target device. These tools did not do much to allow a manager to understand history of a device or the intermediate network devices.
In 1988, SNMP (now referred to as Version 1) was born, through a variety of published RFC's. SNMP retained many of the advantages of ICMP and Traceroute (light-weight, avoided use of heavy TCP protocol), but brought to the world:
  • programmable name for a device agent
  • programmable location field for a device agent
  • a description of the hardware and firmware on the device agent
  • last-reboot counter of the device agent
  • configuration, fault, and performance knowledge of interfaces (Interface Table)
  • other physical hardware devices connected on the network (ARP Table)
  • other neighboring logical devices connected on the network (Routing Table)
  • passwords (called community strings) for basic protection
  • framework for vendors to extend the management capabilities
This information is held in the MIB (Management Information Base) of the device - a database of information that each device holds regarding the health of the hardware, firmware, operating system, and applications.)

[MIB2 tree illustration courtesy O'Reilly Essential SNMP]

SNMPv1 was made up of RFC 1065, 1066, 1067. Updates included 1155, 1156, 1157. RFC 1213 (called MIB-1) was later updated 1156 (called MIB-2.)

In 1993, SNMP Version 2 was created through RFC's 1141-1452. Security was updated, but not widely adopted. Introduced was an efficient way to transfer information (GetBulkRequest) - which was readily adopted, to alleviate concerns of the protocol being "overly chatty".

In 1996, SNMPv2c (Community-Based Simple Network Management Protocol Version 2) was introduced in RFC 1901-1908. The most important added the capability was to encrypt the password (community string) in transit, alleviating the concerns of the protocol being "insecure".

[SNMPv3 message format, courtesy TCP/IP Guide]
In December 2002, SNMPv3 was released, comprised of RFC's 3411-3418. In 2004, the IETF (Internet Engineering Task Force) designated SNMPv3 as STD0062 or a Full Internet Standard. Practically speaking, SNMPv3 adds encryption of the payload, to completely secure the protocol.

Modern Computing:
Today, nearly every modern equipment vendor, who instruments their internet equipment for management, bundles SNMP in their standard packaging - since SNMPv3 is The Internet Standard. This means that most equipment that plugs into a network via ethernet or wireless can be managed in an "agentless" manor (i.e. without loading any special additional components.)

Most Internet Infrastructure (i.e. computers, servers, routers, switches, etc.) allow for the following basic capabilities (sometimes using an internet standard, sometimes using vendor extension):
  • Interface Configuration (administratively up, down; interface capacity) 
  • Interface Fault Status (Up, Down, Testing, Last-Change Time-stamp))
  • Interface Performance Statistics (packets, bytes, errors, etc.)
  • SNMP Agent Last-Reboot Timestamp
  • Memory and/or Buffer Usage; Buffer Allocation Errors
  • Flash and/or Disk Capacity and Usage
  • Running Processes
  • Installed Software
  • CPU Usage
  • Alert to a Manager when an Agent detects a problem
Customer Benefit:
Since SNMPv3 is The IETF Internet Standard, most equipment on a network can be reasonably managed without ever adding software to an end device. This means a service provider can provide greater insight into the health and performance of a customer estate with proper management software, especially historical trends when data is captured and stored in a database.

SNMP is only a piece of the puzzle for managing a network.
  • Business Processes
    A customer must know what business services are traversing a device to understand the impact of an outage or what business processes are at risk when assets in the estate are performing poorly.
  • Security / End-of-Life Management
    A customer must know the version of the hardware and firmware is in the estate in order to understand when a security vulnerability or end-of-life equipment may place their business at risk.
  • Logistics / Asset Management
    A customer must know what assets make up their network estate and where the assets are located in order to understand where impacts originate during faults or where security risks exist.
  • Configuration Management
    A customer must know how to update the firmware on managed devices in the estate when defects in the software may be impacting business processes or creating security risks due to vulnerabilities.
  • Performance Management
    A customer must know what "normal" operation of their estate is, collecting this data over time, in order to predict when faults will arise, so impacts to business processes are minimized.
  • Fault Management
    A customer must know when faults occurred in the past, where they occured, when they occurred, what the problem was, and what the solution was - in order to understand the business impacts and create a strategy to mitigate future similar business impacts.

SNMP is a single skill, which can be leveraged to manage any number of device vendor, types, and model numbers. Network Management requires an expertise in all of the above areas, in addition to understanding SNMP.

This open up a prime opportunity for service providers with experience to assist customers since customers may only have experience with a particular device vendor/model/type or not have experience in SNMP.

Wednesday, June 20, 2012

EMC: Building The Cloud, Kicking Cisco Out?

EMC: Building The Cloud, Kicking Cisco Out?

EMC used to be a partner in the Data Center with a close relationship with vendors such as Sun Microsystems. With the movement of Sun to create ZFS and their own storage solution, the relationship was strained, with EMC responding by suggesting the discontinuance of software development on Solaris platforms. EMC purchased VMWare and entered into a partnership with Cisco - Cisco produced the server hardware in the Data Center while EMC provided VMWare software and with EMC storage. The status-quo is poised for change, again.

[EMC World 2012 Man - courtesy: computerworld]

EMC World:
Cisco, being a first tier network provider of choice, started building their own blade platforms, entered into a relationship with EMC for their storage and OS virtualization (VMWare) technology. EMC announced just days ago during EMC World 2012 that they will start producing servers. EMC, a cloud virtualization provider, a cloud virtual switch provider, a cloud software management provider, a cloud storage provider, has now moved into the cloud server provider.

Cisco Response:
Apparently aware of the EMC development work before the announcement, Cisco released FlexPods with NetApp. The first release of FlexPods can be managed by EMC management software, because VMWare is still the hypervisor of choice. There is a move towards supporting HyperV, in a future release of FlexPods. There is also a movement towards providing complete management solution through Cisco Intelligent Automation for Cloud. Note, EMC's VMWare vCenter sits as a small brick in the solution acquired by Cisco, including NewScale and Tidal.

[Cisco-NetApp FlexPod courtesy The Register]

NetApp Position:
NetApp's Val Bercovici, CTO of Cloud, declares "the death of [EMC] VMAX." Cisco has been rumored to have been in a position to buy NetApp in 2009, 2010, but now with EMC marginalizing Cisco in 2012 - NetApp becomes more important, and NetApp's stock is dropping like a stone.
[former Sun Microsystems logo]
Cisco's Mishap:
Cisco, missing a Server Hardware, Server Hypervisor, Server Operating System, Tape Storage, Disk Storage, and management technologies, decided to enter into a partnership with EMC. Why this happened, when system administrators in data centers used to use identical console cables for Cisco and Sun equipment - this should have been their first clue.

Had Cisco been more forward-looking, they could have purchased Sun and acquired all their missing pieces: Intel, AMD, and SPARC Servers; Xen on x64 Solaris, LDom's on SPARC; Solaris Intel and SPARC; Storage Tek; ZFS Storage Appliances; Ops Center for multi-platform systems management.

Cisco now has virtually nothing but blade hardware, started acquiring management software [NewScale and Tidal]... will NetApp be next?

[illumos logo]

Recovery for Cisco:
An OpenSolaris base with hypervisor and ZFS is the core of what Cisco really needs to rise from the ashes of their missed purchase of Sun and unfortunate partnership with EMC.

From a storage perspective - ZFS is mature, providing a near superset of all features offered by competing storage subsystems (where is the embedded Lustre?) If someone could bring clustering to ZFS - there would be nothing missing - making ZFS a complete superset of everything on the market.

Xen was created around the need for OpenSolaris support, so Xen could easily be resurrected with a little investment by Cisco. Cloud provider Joyent created KVM on top of OpenSolaris and donated the work back to Illumos, so Cisco could easily fill their hypervisor need, to compate with EMC's VMWare.

[SmartOS logo from Joyent]
SGI figured out they needed a first-class storage subsystem, and placed Nexenta (based upon Illumos) in their server lineup. What Cisco really needs is a company like Joyent (based upon Illumos) - to provide storage and a KVM hypervisor. Joyent would also provide Cisco with a cloud solution - a completely intregrated stack, from the ground on up... not as valuable as Sun, but probably a close second, at this point.

Sunday, June 17, 2012

Vendors, Systems, and Processors Update

[HotChips 2012 agenda exerpt, courtesy HotChips 24]
Vendors, Systems, and Processors Update

Normally, we don't release a consolidated update on the industry more than once a month, but there has been some significant updates.

Hot Chips 24: A Symposium on High Performance Chips is right around the corner, and the agenda looks pretty exciting1

By the time HotChips 24 arrives, POWER 7+ should be about 1 year late, as Fujitsu SPARC remains #1 for over a year in the HPC charts.

Is IBM really going to talk-up POWER 7+, one year late, without releasing it? It is looking a lot like what happened to Sun Microsystems with their ROCK processor, which was killed not long after there were multiple presentations on it, around the time of Oracle acquisition of Sun Microsystems.

IBM will also talk about their zNext processor, whatever that might be. Will POWER 7+ ever see the light of day?

Oracle: The SPARC is HotAround the time the industry was expecting IBM POWER 7+, Oracle release the SPARC T4 processor.

About 1 year later, during the same time that IBM will be talking about POWER 7+, Oracle is projected to release their SPARC T5 processor. The industry is hoping that Oracle will fulfull it's projection to release SPARC T5 in 2012, about 6 months ahead of time on their roadmap.

The SPARC T5 is supposed to be a glue-less 8 socket processor, adhering to SPARC V9 open standard, certified by SPARC International. Different extensions are projected to be included, such as Oracle RDBMS number calculations in hardware and compression engines... both which will dramatically increase the performance of Oracle RDBMS's.

With the increase of Oracle RDBMS's also comes the dramatic increase in performance of software with embedded databases (which is basically everything enterprise grade.) Oracle has determined to sit on the top of the Enterprise Software performance stack and SPARC seems to be the delivery mechanism.

Why is Open Standards important in platforms? When a single vendor comes under pressure and can't deliver (i.e. IBM POWER 7+) - other vendors are free to "pick up the slack", earn a little money, and produce something of additional value for the consumer.

Fujitsu: SPARC On Top Today, Intending to Stay On Top
Fujitsu has a long history of producing SPARC CPU's, both for Sun Microsystems as well as for themselves. Fujitsu manufactured the first Sun SPARC processor, manufactured high-end systems for Sun and Oracle for the past half-decade, and has been holding the #1 performance spot on the HPC 500 list.

Fujitsu released several iterations of their own SPARC CPU for massive super-computer (SPARC64 VIII fx, SPARC64 IX fx) Linux systems, as well as processors high-end (SPARC64 V, VI, and VII) Solaris systems. During HotChips 24 - they are projected to talk about their SPARC64 X processor!

The industry is hoping for a Solaris variant, based upon OpenSolaris fork like Illumos, to unify the Fujitsu and Oracle platforms, but there are no rumblings about that.

[ARM TrustZone technology, courtesy ARS Technica]

AMD: Embedding ARM in x64?
After reading about the Dell inclusion of ARM as an enterprise blade platform, the only thing more shocking would be the inclusion of ARM in a mainstream CPU vendor. Well, that day has come: ARM is coming to AMD Opteron.

The use ARM in the AMD world seems to be targeting virtual computing. The TrustZone feature of ARM may prove interesting for booting hypervisors or providing DRM (digital rights management).

[Fujitsu PrimeHPC node, courtesy The Register]

HPC: Battle of the RISC's
Intel and AMD systems long ago took the top HPC spots. There was a general movement towards using graphics card co-processors to boost scores with specialized software. Some thought that the inclusion of ARM would help for future HPC systems, but with Fujitsu SPARC sitting on the top for a year, without any special co-processors, one may wonder whether graphics card vendors and special co-processor vendors have decided to sit out the super computer market, for awhile, since Fujitsu keeps upping the performance of their long-living SPARC open architecture.

Network Management Connection
With the rise of SPARC and ARM, one may wonder the impact for Network Management. ARM seemingly sits on most mobile devices, which all need to be managed. SPARC seemingly sits on the fastest Enterprise and HPC Systems. Network Management tool vendors will need to leverage these capabilities or at least manage them. Proprietary Intel is the volume proposition. AMD is the second-sourcing proposition for the proprietary Intel platform.

No network management vendor ignoring Intel, AMD, ARM, or SPARC are worth their weight in printed code.

Thursday, June 14, 2012

Network Management at EMC World 2012

[EMC World 2012 Man - courtesy: computerworld]

Network Management at EMC World 2012

EMC purchase network management vendor SMARTS with their InCharge suite, a number of years ago, rebranding the suite as Ionix. EMC purchased Voyence, rebranding it as NCM (Network Configuration Manager). After EMC World 2012, they completed the acquisition of Watch4Net APG (Advanced Performance Grapher.) The suite of these platforms is now being rolled into a single new brand called EMC IT Operations Intelligence. EMC World 2012 was poised to advertize the new branding in a significant way.
EMC World 2012 in Las Vegas, Nevada was unfortunately pretty uneventful for service providers. Why was it uneventful?

The labs for EMC IT Operations Intelligence did not function. There were a lot of other labs, which functioned, but not the Network Management labs. EMC World 2012 was a sure "shot-in-the-head" for demonstrating, to service providers, the benefits of running EMC Network Management tools in a VM.

After 7 days, EMC could not get their IT Operations Intelligence Network Management Suite running in a VMWare VM.

Small customers may host their network management tools in a VMWare VM. Enterprises will occasionally implement their network management systems on smaller systems, where they know they will get deterministic behavior from the underlying platform.

Service Providers traditionally run their mission critical network management systems on larger UNIX Systems, so as to provide instant scalability (swap in CPU boards) and 99.999 availability (reboot once-a-year, whether they need to or not.)

The platform of choice in the Service Provider market for scalable Network Management platforms has been SPARC Solaris, for decades... clearly, for a reason. This was demonstrated well at EMC World 2012.

The Problem:
Why not host a network management platform in a VMWare infrastructure? Besides, the fact that EMC could not make it happen, after 1 year of preparation, and 7 days of struggling... there are basic logistics.

Network Management is dependent upon ICMP and SNMP.  Both of these protocols are "connectionless protocols" - sometimes referred to as "unreliable protocols". Why would a network management platform use "unreliable protocols"?

The IETF understands that network management should always be light (each poll is a single packet, while a TCP protocol requires a 3-way handshake to start the transaction, poll the single packet, then break down with another 3-way handshake. Imagine doing this for thousands of devices every x seconds - not very light-weight, not very smart. A "connection based protocol" will also hide the nature of an unreliable underlying network, which is what a network management platform is supposed to expose - so it can be fixed.

Now stick a network management platform in a VM, where the network connection from the VM (holding an operating system, with a TCP/IP stack), going down through the hypervisor (which is another operating system, with another TCP/IP stack, which is also sharing the resources of that VM with other VM's.) If there is the slightest glitch in the VM or the hypervisor, which may cause the the packets to be queued or dropped - the actual VMWare infrastructure will signal to the Network Management Centers that there is a network problem, in their customer's network!

Clearly, someone at EMC does not understand Network Management, nor do they understand Managed Service Providers.

The Network Management Platform MUST BE ROCK SOLID, so the Network Operations Center personnel will NEVER mistake a alerts in their console from a customer's managed device as a local performance issue in their VM.

With EMC using Solaris to reach into the Telco Data Centers,  EMC later using Cisco to reach into the Telco Data Centers - EMC is done using their partners. VMWare was the platform of choice, to [not] demonstrate their Network Management tools on. Cisco was the [soon to be replaced] platform of choice, since EMC announced they will start building their own servers.

Either someone at EMC is sleeping-at-the-wheel or they need to get a spine to support their customers. Either way, this does not bode well for EMC as a provider of software solutions for service providers.

Business Requirements:
In order for a real service provider to reliably run a real network management system in a virtualized environment:
  • The virtualized platform must not insert any overhead.
  • All resources provided must be deterministic.
  • Patches are installed while the system is live.
  • Engagement of patches must be deterministic.
  • Patch engagement must be fast.
  • Rollback of patches must be deterministic.
  • Patch rollback must be fast.
  • Availability must be 99.999.  

There are many platforms which fulfill these basic business requirements, but none of them are VMWare. Ironically, only SPARC Solaris platform is currently supported by EMC for IT Operations Intelligence, EMC does not support SPARC Solaris under VMWare, and EMC chose not to demonstrate their Network Management suite under a platform which meets service provider requirements.

Today, Zones is about the only virtualized technology which offers 0%-overhead virtualizataion. (Actually, on SMP systems, virtualizing via Zones can increase application throughput, if Zones are partitioned by CPU board.) Zones, to work in this environment, seem to work best with external storage providers, like EMC.

Any platform which offers 0% virtualization penalty with ZFS support can easily meet service providers technical platform business requirements. Of these, the top 3 are probably the best supported by commercial interests
  • Oracle SPARC Solaris
  • Oracle Intel Solaris
  • Joyent SMART OS
  • OpenIndiana
  • Illumian
  • BeleniX
  • SchilliX
  • StormOS
Today's market is becoming more proprietary each passing day. The movement towards supporting applications only under proprietary solutions (such as VMWare) has demonstrated it's risk during EMC World 2012. A network management provider would not be well advised to use any network management tool which is bound to a single proprietary platform element and does not support POSIX platforms.

Monday, June 11, 2012

System Vendor - CISC, RISC, EPIC Update

System Vendor - CISC, RISC, EPIC Update

Since the decline of the Motorola 68000 CISC processor, RISC processors had been on the rise, to eventually be re-challenged by Intel with the release 80386 (and future models) with a Motorola-like flat memory model. UNIX vendors had standardized on the 68000, migrating to the RISC processors, and occasionally moving back to Intel. There has been the prediction of the decline of RISC, the loss of major processor families like ALPHA and MIPS, decline of POWER, rumor of end of EPIC processor family of Itanium by Intel, but some level of diversity surprisingly continues.

[IBM CS-9000 - courtesy Columbia EDU computing history]
IBM Update: Power 7+
In 1982, IBM released a 68000 based workstation, based upon a 32/16bit processor. There was a decision to move to x86 on PC form factor, leveraging an existing relationship between  Intel for the 8088, reducing cost by using an 16/8 bit processor, and gaining ready 8 bit part availability. This started the business PC market. IBM started to design their own RISC chip, called POWER, for their own UNIX workstations. The POWER multichip CPU modules were physically huge and very costly to manufacture - gluing together multiple chips onto a single carrier socket, limiting production quantities.

Apple-IBM-Motorola consortium started manufacturing PowerPC processors, bring POWER RISC architecture onto Apple desktops through simpler manufacturing process, but Apple discontinue it's use, not long after Apple purchased NeXT (this is the point where IBM POWER lost the desktop market.) In January 2008, IBM starting using QuickTransit, to provide x86 Linux software on their proprietary POWER processor, later ending in IBM purchasing Transitive. IBM almost purchased Sun, which would have allowed IBM to acquire SPARC, the industry volume leading commodity [non-multichip module] RISC and Solaris, the industry leading UNIX OS vendor.

[POWER5 Multi-Chip Module]

It was noted in Network Management end of August 2011 that POWER 7+ was late. March 2012, Sony appears to have abandoned IBM POWER - this is when IBM POWER lost the gaming market. April 2012, IBM POWER 7+ was a half-year late. May 2012, IBM POWER 7+ was 7 months late. June 2012 - POWER 7+ is now 8 months late. Multi-chip modules are much simpler to bring to market, over chips designed into a single piece of silicon. For IBM to be so late, something bad must have happened. This does not bode well for AIX users.

HP Update: Itanium
In 2007, HP licensed a Transitive's QuickTransit, to provide Solaris software for HP's Intel based Itanium servers. Transitive made HP a global distributor in 2008, right before IBM bought Transitive, killing HP's path to move SPARC software onto x86 Linux or Itanium HP-UX. Itanium was the first, and possibly last, nearly mainstream Explicitly Parallel Instruction Computing (or EPIC) CPU architecture.

February 2009, HP describes Project Blackbird - HP acknowledges Solaris leading UNIX in United States, Itanium is on a "death march", HP considers purchasing Sun/Solaris.  December 2009, RedHat kills Linux on Itanium. April 2010, Microsoft kills Windows on Itanium. December 2010, HP-UX was booting under Intel x86 - Project "Redwood" suggested a "last" Itanium chip in 2014, while recommending funding to move HP-UX to Intel x86.. On March 2011, Oracle stops new software development on Itanium. In November 2011, The Register described HP's Project Odyssey - building high-end Intel x86 systems, map Itanium HP-UX features to Intel x86, giving away Itanium/HP-UX software technology to Linux (not available under Itanium), and enhancing Windows with Microsoft. On May 30, 2012, HP revived an old slide dating back to June 25, 2010 from Project Kinetic, where HP-UX and other HP [OpenVMS and NonStop] operating systems will remain under Itanium, but with a twist: socket-level compatibility between Itanium and x86; a new UNIX will run under both Itanium and x86; driving mid-range features into Intel, Linux, and Windows.

The HP-UX, OpenVMS, and NonStop operating systems look dead because of their dependency on the doomed Itanium, whose architecture seems to have a trajectory to be moved to x86 while the OS's will have their features given to other operating systems. The movement to Solaris might be too late, unless HP decides to fix it's technology gap by partnering with an OpenSolaris distribution, like SGI did (see next section.) HP really needs something like Solaris Branded Zones, to encapsulate all 3 OS's.

SGI Update: OpenSolaris???
This is a most unusual update. In 1982, SGI was founded, selling UNIX IRIS Workstations using Mototola 68000 processors. Their OS eventually became AT&T System V - branded as IRIX. In 1986, the MIPS R2000 processor was released and incorporated into SGI workstations. In 1991, SGI went 64 bit with MIPS R4000 processor. SGI abandoned MIPS and moved to Intel Itanium, with their first Itanium workstation in 2001. In 2006, SGI abandoned Itanium for Intel x86, stopped developing IRIX. Rackable purchased SGI in 2009, renaming the entire company back to SGI. One version of the fall of SGI was recorded here.

Why go through all this effort, to remember Super Computer and Graphics Workstation creator SGI? It seems SGI is started to investigate UNIX again. SGI is using Nexenta for their SAN solution. Nexenta is based upon Illumos, formerly based upon OpenSolaris, which is the basis for Oracle's UNIX - Solaris 11. SGI embraces Solaris x86, for a portion of their solution, as HP considered in Project Redwood.
Dell Update: ARM???
The only thing stranger than fiction is reality. Dell would normally never appear in an article like this, but as other vendors are exiting the non-Intel x86 CISC marketplaces, Dell is about the only systems vendor who seems to be expanding out of the Intel x86 CISC market!

[Dell Quad ARM Server per Blade and Chassis]

Now, May 29, 2012 - Dell announces a RISC machine, based upon the ARM processor! Project Copper was bundled under Dell's Enterprise web site tree, which is an indication where they are interested in pushing this new product. Will Dell learn from mistakes by IBM and HP, or corrections by SGI - by bundling a Market Leading UNIX... in the form of an OracleSolaris variant based upon Illumos?

Does an enterprise or manged service grade OS exist for ARM?

In June 2009, a release of OpenSolaris for ARM hit the wild. An example of the OpenSolaris booting on ARM was blogged. October 2009 the web page was created for the release of OpenSolaris for ARM - bringing the leading UNIX to the ARM processor family. Doug Scott mentioned he was reviving a port of OpenSolaris to ARM in October 2011 for ZFS on an ARM based SheevaPlug. In October 2011, ARM announces V8 processor release, migrating ARM from 32bit to 64bit architecture - which is where the OpenSolaris variants have all moved over to. Dell has an excellent opportunity.

Apple Update:Intel and ARM
This is, perhaps, one of the most interesting computer companies in history. Starting with 8 bit 6502 processors, they move to the Motorola 68000 CISC for their high-end publishing workstation, which they called the Macintosh. After kicking out the CEO & founder, Steve Jobs, Jobs started NeXT computer, based on Motorola 68000 processors and a UNIX core.

[Apple iPhone 4s based upon ARM processor and MacOSX UNIX derivative iOS]
NeXT migrated their UNIX OS to Intel and went from being a workstation vendor to an OS vendor. Apple desperately needed a modern OS and almost went out of business. Apple purchased NeXT (getting the former CEO Steve Jobs back.) The combined company produced a UNIX based desktop with an OS called MacOS X (Macintosh Operating System 10 - based upon a NeXT Step UNIX OS core) placed on top of a PowerPC chip (designed by Apple, IBM, Motorola consortium - called AIM alliance.) Apple almost merged with Sun several times, collaborating on OpenSTEP (an open-sourced NeXT OS) during various aspects of this history. Soon, Apple created the iMac and the company started to turn around.

[Apple iPad2 based upon ARM processor and MacOSX UNIX derivative iOS]
Most recently, Apple went through another migration - moving MacOS X back to it's NeXT Intel code base. Apple started to regain profitability and then they invested in a new set of consumer products. First, was the iPod, then the iPhone, then the iPad. Many of these new devices were based upon the ARM RISC processor, based upon MacOSX, but it was branded iOS. At this point, Apple exploded, becoming the number client vendor on the market, growing to such an extent that they could buy Intel with the spare cash they had on hand. Apple did the nearly impossible: created a new RISC based UNIX ecosystem based upon nothing.

Oracle/Sun: SPARC & Solaris Update
Early on, SUN built their platforms on Motorola 68000 family, as did most workstation vendors. They experimented with x86 for a short while, discontinued them.Solaris 9 was released on Intel, where Intel based UNIX vendors like NCR started migrating to Solaris from their SVR4 platforms like MP-RAS. Solaris 10 was released only on SPARC, Solaris was open-sourced as OpenSolaris (for both Intel and SPARC), and Solaris 11 was released on Intel and SPARC after Oracle purchased Sun. Interestingly, Solaris was being ported to PowerPC for a short period of time, with designers working on a OpenSTEP interface, during a time when Apple was not doing so well. Various Solaris variants, based upon the OpenSolaris project have hit the marketplace, with more distributions being released regularly.

[SunRay 270 Ultra-Thin Client]
From Sun's early history, Sun had traditionally been a 32 bit UNIX workstation vendor, migrated to a 64 bit UNIX workstations, moved from desktop UNIX workstations to UNIX servers, created the ultra-thin SunRay client to replace UNIX desktop workstations based upon 32 bit MicroSPARC, and surprisingly migrated their SunRay platform from MicroSPARC to ARM. Various releases of OpenSolaris had briefly touched ARM, but Solaris had primarily remained focused on SPARC and Intel with the SunRay's being a firmware based system.

[SPARC T5 feature slide, courtesy Oracle on-line presentation] 

As variants of RISC and the one EPIC processor have been found to be losing mind share, there have been two major exceptions: SPARC and ARM. Oracle continues to make thin-clients based upon ARM, with no roadmap. Oracle committed to a 5 year plan on SPARC, which has been executed either on-time or early for multiple processors. The SPARC T4 brought fast single-threaded platform with octal cores in 2011. A few months away, the SPARC T5 processor will bring 16 cores (again) to the SPARC family from Oracle, with features including compression and Oracle number processing in hardware.

Fujitsu: SPARC64 Update
Fujitsu is another interesting company, in this article. They did not organically grow into the UNIX movement from Motorola 68000 processors, like most other industry players - Fujitsu co-developed with Sun into the RISC UNIX market.
[Fujitsu SPARC64 VII, used in both Fujitsu and Sun branded mainframe class systems]
SPARC was developed by Sun Microsystems in 1986. Fujitsu fabricated the SPARC 86900 developed by Sun Microsystems, the first SPARC V7 architecture. SPARC International was founded in 1989, standardizing the 32 bit SPARC V8 multi-vendor architecture, creating the first non-proprietary RISC mainstream platforms. Andrew Heller, head of the RS6000 POWER based UNIX workstation group, left IBM and founded a new company in 1990, HAL Computer Systems, to develop a SPARC processor. In 1991, Fujitsu donated significant funding for a 44% stake, in return to use SPARC chips for their own systems. In 1992, the SPARClite was produced by Fujitsu. In 1993, Fujitsu purchased the rest of HAL, making Fujitsu the sole driver behind SPARC systems. The 64 bit SPARC V9 architecture was published in 1994 and Fujitsu shipped their first system in 1995. Fujitsu actually beat Sun to market with the first 64 bit SPARC processor.

[Fujitsu SPARC64 IX fx 16 core CPU floor plan - heart of fastest super computer cluster in the world in 2011-2012]
While other CPU architectures were proprietary, with various corporations suing one another (i.e. Intel suing AMD) - SPARC brought a level of openness to the industry where vendors could cooperate (and occasionally bailed each other out, spreading the risk, while sharing the rewards from the UNIX market.) During a time when Sun's SPARC development pipeline ran dry, Fujitsu provided SPARC64 CPU's for Sun & Fujitsu high-end platforms. Sun purchased a third-party SPARC development house Afara Websystems, produced the T line of SPARC processors, and jointly sold the SPARC T line with Fujitsu. Solaris is standard on all of these platforms.

[Fujitsu SPARC64 IXfx, 16 core CPU, heart of Fujitsu's PRIMEHPC FX10 - the fastest supercomputer world-wide in 2011-2012]
Fujitsu continues to push ahead with SPARC on their own platforms, holding the fastest computer in the world for over a year. What makes this a special SPARC is that Solaris is not at it's core - rather Linux is. It seems rather amazing that Linux departed from Intel Itanium, in order to become the OS of choice for the fastest computer in the world, on a Fujitsu SPARC platform.
[UNIX - courtesy The Open Group]
In Conclusion
IBM POWER is barely breathing, with their latest road mapped CPU being so late that POWER is almost irrelevant, placing tremendous pressure on AIX. Intel Itanium vendors have been abandoning EPIC family for a half-decade with the final vendor closing it's shop. HP-UX is bound to Intel's EPIC Itanium, which is basically dead, with HP announcing development of an unknown new UNIX OS (hopefully, a Solaris fork based Illumos distribution.) Dell is releasing their first RISC platform, without an enterprise UNIX OS, hopefully they will investigate a Solaris fork Illumos distribution. SGI, who abandoned Intel's EPIC Itanum and their UNIX, is partnering with Solaris fork Illumos based distribution on Intel x86.

Oracle has been executing on SPARC, scoring highest performing industry benchmarks. Fujitsu continues to execute on SPARC, holding highest performing super-computer benchmarks. At this point, there is great opportunity for Solaris forked Illumos distribution - if they can get their act together to support SVR4 industry standards.

The UltraSPARC family of processors could be a bridge for Illumos developers to offer Fujitsu SPARC64 support on the fastest computer in the world. OpenIndiana may be closest to being able to offer such, not to mention get paid for older system support via resellers and new system support from Fujitsu (where Oracle shows little interesting in making Solaris run today.)
ARM offer great opportunities to extend Solaris family of architectures on the server, especially for Dell, who needs an enterprise OS. Of course, HP needs a new enterprise OS under the Intel platform.

If Illumos developers fail to understand how pivotal this point in time could be - this could be the end of an era and they would only have themselves to blame for their short-sightedness in not executing on the OpenSolaris source code tree during a very short time period where they can shine the brightest.