Showing posts with label Red Hat. Show all posts
Showing posts with label Red Hat. Show all posts

Monday, October 29, 2018

Oracle Linux on SPARC is dead? Oracle Linux at Risk?

Oracle Linux on SPARC is dead? Oracle Linux at Risk?

Oracle made substantial changes in their strategy last year, perhaps on a "Wim"... and now they seemed to bet wrong.

Oracle has long pinned some of it's engineered systems on a clone of Red Hat Linux. After purchasing Sun, they released storage servers based upon Solaris on Intel and left it's other engineered systems on their knock-off Linux OS. 

Oracle had to suffer through the successive Intel CPU fixes making each successive patch release slower or less secure. Now, the dominate Linux Vendor [Red Hat] that Oracle had been copying is being purchased by IBM. 

[SPARC logo, courtesy SPARC International]

SPARC Life


It appears that there is still a SPARC of life in the world's highest performing CPU architecture... and that life in Solaris. There has been a recent roadmap release [ie 2018-08] which is substantially the same as it's previous release some 5 months earlier.

[Oracle logo, courtesy Oracle Corporation]

Oracle SPARC

Oracle releases a new roadmap with an M8+ chip coming (ie 2018-03) and continues to design Oracle Solaris, for the distant future, for over a decade.

Oracle SPARC Solaris appears to be a steady ship, in turbulent seas.

[Fujitsu logo, courtest Fujitsu corporation]

Fujitsu SPARC

This seems to coincides with the Fujitsu roadmap [i.e. since last year!] It seems Fujitsu is designing the silicon for Oracle as they advance the Solaris Software layer. Fujitsu, a hardware provider supplying SPARC chips for Sun when SPARC was first created, continues to talk about new product coming [in 2018-02-03, 2018-03-15] - which is good news!

Fujitsu leaked [in 2018-07-06] is getting closer to releasing it's new Supercomputer architecture, not based upon SPARC, which probably means Fujitsu's Linux for SPARC will soon have no future.

Oracle Linux

There has been some speculation about Linux on SPARC from NetMgt. The last update of Oracle Linux on SPARC looks like Summer 2017. It appears to have stalled, possibly killed when Wim returned to the Oracle in November 2017.

Oracle's knock-off Linux is based  upon Red Hat Linux... which is now being purchased [in 2018-10-28] by arch-enemy competitor IBM... who competes in all Oracle's major spaces (i.e. Cloud, RISC servers, Intel Servers, Database, etc.)

Conclusions

NetMgt has been tracking Oracle Linux for some time, but it appears Oracle Linux on SPARC stalled last year. Oracle Linux on SPARC now appears dead on arrival. Fujitsu SPARC no longer has a need for Linux. Oracle's Linux, is now oddly in a strange risk place, under Intel whose CPU's get slower with every defect fix. Oracle SPARC Solaris continues to be the highest performing Vendor Architecture and OS combination - the "sun" continued to shine in the darkness of declining performance of competitors.

Sunday, May 25, 2014

Solaris: Loopback Optimization and TCP_FUSION

Abstract:
Since early days of computing, the most slowest interconnects have always been between platforms through input and output channels. The movement from Serial ports to higher speed communications channels such as TCP/IP became the standard mechanism for applications to not only communicate between physical systems, but also on the same system! During Solaris 10 development, a capability to increase the performance of the TCP/IP stack with application on the same server was introduced called TCP_FUSION. Some application vendors may be unaware of safeguards built into Solaris 10 to keep denial of service attacks or starvation of the applications due to the high performance of TCP writers on the loopback interface.
Functionality:
Authors Brendan Gregg and Jim Mauro describe the functionality of TCP_FUSION in their book: DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X, and FreeBSD.
Loopback TCP packets on Solaris may be processed by tcp fusion, a performance feature that bypasses the ip layer. These are packets over a fused fused connection, which will not be visible using the ip:::send and ip:::receive probes, (but they can be seen using the tcp:::send and tcp:::receive probes.) When TCP fusion is enabled (which it is by default), loopback connections become fused after a TCP handshake, and then all data packets take a shorter code path that bypasses the IP layer.
The modern application hosted under Solaris will demonstrate a significant benefit over being hosted under alternative operating systems.

Demonstrated Benefits:
TCP socket performance, under languages such as Java, may demonstrate a significant performance improvement, often shocking software developers!
While comparing java TCP socket performance between RH Linux and Solaris, one of my test is done by using a java client sending strings and reading the replies from a java echo server. I measure the time spent to send and receive the data (i.e. the loop back round trip).
The test is run 100,000 times (more occurrence are giving similar results). From my tests Solaris is 25/30% faster on average than RH Linux, on the same computer with default system and network settings, same JVM arguments (if any) etc.
The answer seems clear, TCP_FUSION is the primary reason.
In Solaris that's called "TCP Fusion" which means two local TCP endpoints will be "fused". Thus they will bypass the TCP data path entirely. 
Testing will confirm this odd performance benefit under stock Solaris under Linux.
Nice! I've used the command
echo 'do_tcp_fusion/W 0' | mdb -kw

and manage to reproduce times close to what I've experienced on RH Linux. I switched back to re-enable it using
echo 'do_tcp_fusion/W 1' | mdb -kw

Thanks both for your help.
Once people understand the benefits of TCP_FUSION, they will seldom go back.

Old Issues:
The default nature of TCP_FUSION means any application hosted under Solaris 10 or above will, by default, receive the benefit of this huge performance boost. Some early releases of Solaris 10 without patches may experience a condition where a crash can occur, because of kernel memory usage. The situation, workaround, and resolution is described:

Solaris 10 systems may panic in the tcp_fuse_rcv_drain() TCP/IP function when using TCP loopback connections, where both ends of the connection are on the same system. This may allow a local unprivileged user to cause a Denial of Service (DoS) condition on the affected host.
To work around the described issue until patches can be installed, disable TCP Fusion by adding the following line to the "/etc/system" file and rebooting the system: set ip:do_tcp_fusion = 0x0.
This issue is addressed in the following releases: SPARC Platform Solaris 10 with patch 118833-23 or later and x86 Platform Solaris 10 with patch 118855-19 or later.
Disabling TCP_FUSION feature is no longer needed for DoS protections.

Odd Application Behavior:
If an application running under Solaris does not experience a performance boost, but rather a performance degradation, it is possible your ISV is not completely understand TCP_FUSION or the symptoms of an odd code implementation. When developers expect the receiving application on a socket to respond slowly, this can result in bad behavior with TCP sockets accelerated by Solaris.

Instead of application developers optimizing the behavior of their receiving application to take advantage of 25%-30% potential performance benefit, some of those applications vendors chose to suggest disabling TCP_FUSION with their applications: Riverbed's Stingray Traffic Manager and Veritas NetBackup (4x slowdown.) Those unoptimized TCP reading applications, which perform reads 8x slower than their TCP writing application counterparts, perform extremely poorly in the TCP_FUSION environment.

Possible bad TCP_FUSION interaction?
There is a better way to debug this issue rather than shutting off the beneficial behavior. Blogger Steffen Weiberle at Oracle wrote pretty extensively on this.

First, one may want to understand if it is being used. TCP_FUSION is often used, but not always:
There are some exceptions to this, including when using IPsec, IPQoS, raw-socket, kernel SSL, non-simple TCP/IP conditions. or the two end points are on different squeues. A fused connect will revert to unfused if an IP Filter rule will drop a packet. However TCP fusion is done in the general case.
When TCP_FUSION is enabled for an application, there is a risk that the TCP data provider can provide data so fast over TCP that it can cause starvation of the receiving application! Solaris OS developers anticipated this in their acceleration design.
With TCP fusion enabled (which it is by default in Solaris 10 6/06 and later, and in OpenSolaris), when a TCP connection is created between processes on a system, the necessary things are set up to transfer data from the sender to the receiver without sending it down and back up the stack. The typical flow control of filling a send buffer (defaults to 48K or the value of tcp_xmit_hiwat, unless changed via a socket operation) still applies. With TCP Fusion on, there is a second check, which is the number of writes to the socket without a read. The reason for the counter is to allow the receiver to get CPU cycles, since the sender and receiver are on the same system and may be sharing one or more CPUs. The default value of this counter is eight (8), as determined by tcp_fusion_rcv_unread_min.
Some ISV developers may have coded their applications in such a way to anticipate that TCP is slow and coded their receiving application to be less efficient than the sending application. If the receiving application is 8x slower in servicing the reading from the TCP socket, the OS will slow down the provider. Some vendors call this a "bug" in the OS.

When doing large writes, or when the receiver is actively reading, the buffer flow control dominates. However, when doing smaller writes, it is easy for the sender to end up with a condition where the number of consecutive writes without a read is exceeded, and the writer blocks, or if using non-blocking I/O, will get an EAGAIN error.
So now, one may see the symptoms: errors with TCP applications where connections on the same system are experiencing slowdowns and may even provide EAGAIN errors.

Tuning Option: Increase Slow Reader Tolerance
If the TCP reading application is known to be 8x slower than the TCP writing application, one option is to increase the threshold that the TCP writer becomes blocked, so maybe 32x as many writes can be issued [to a single read] before the OS performs a block on the writer, from a safety perspective. Steffen Weiberle also suggested:
To test this I suggested the customer change the tcp_fusion_rcv_unread_min on their running system using mdb(1). I suggested they increase the counter by a factor of four (4), just to be safe.
# echo "tcp_fusion_rcv_unread_min/W 32" | mdb -kw
tcp_fusion_rcv_unread_min:      0x8            =       0x20

Here is how you check what the current value is.
# echo "tcp_fusion_rcv_unread_min/D" | mdb -k
tcp_fusion_rcv_unread_min:
tcp_fusion_rcv_unread_min:      32

After running several hours of tests, the EAGAIN error did not return.
Tuning Option: Removing Slow Reader Protections
If the reading application is just poorly written and will never keep up with the writing application, another option is to remove the write-to-read protection entirely. Steffen Weiberle wrote:
Since then I have suggested they set tcp_fusion_rcv_unread_min to 0, to turn the check off completely. This will allow the buffer size and total outstanding write data volume to determine whether the sender is blocked, as it is for remote connections. Since the mdb is only good until the next reboot, I suggested the customer change the setting in /etc/system.
\* Set TCP fusion to allow unlimited outstanding writes up to the TCP send buffer set by default or the application.
\* The default value is 8.
set ip:tcp_fusion_rcv_unread_min=0
There is a buffer safety tunable, where the writing application will block if the kernel buffer fills, so you will not crash Solaris if you turn this write-to-read ratio safety switch off.

Tuning Option: Disabling TCP_FUSION
This is the proverbial hammer on inserting a tack into a cork board. Steffen Weiberle wrote:
To turn TCP Fusion off all together, something I have not tested with, the variable do_tcp_fusion can be set from its default 1 to 0.
...
And I would like to note that in OpenSolaris only the do_tcp_fusion setting is available. With the delivery of CR 6826274, the consecutive write counting has been removed.
Network Management has not investigated what the changes were in the final releases of OpenSolaris or more recent  Solaris 11 releases from Oracle in regards to TCP_FUSION tuning.
Tuning Guidelines:
The assumption of Network Management is that the common systems administrator is working with well-designed applications, where the application reader is keeping up with the application writer, under Solaris 10. If there are ill-behaved applications under Solaris 10, but one is interested in maintaining the 25%-30% performance improvement, some of the earlier tuning suggestions below will provide much better help than the typical ISV suggested final step.

Check for TCP_FUSION - 0=off, 1=on (default)
SUN9999/root#   echo "do_tcp_fusion/D" | mdb -k
do_tcp_fusion:
do_tcp_fusion: 1

Check for TCP_FUSION unread to written ratio - 0=off, 8=default
SUN9999/root# echo "tcp_fusion_rcv_unread_min/D" | mdb -k
tcp_fusion_rcv_unread_min:
tcp_fusion_rcv_unread_min:      8   
Quadruple the TCP_FUSION unread to write ratio and check the results:
SUN9999/root# echo "tcp_fusion_rcv_unread_min/W 32" | mdb -kw
tcp_fusion_rcv_unread_min:      0x8            =       0x20
SUN9999/root# echo "tcp_fusion_rcv_unread_min/D" | mdb -k
tcp_fusion_rcv_unread_min:
tcp_fusion_rcv_unread_min:      32
Disable the unread to write ratio and check the results:
SUN9999/root# echo "tcp_fusion_rcv_unread_min/W 0" | mdb -kw
SUN9999/root# echo "tcp_fusion_rcv_unread_min/D" | mdb -k
tcp_fusion_rcv_unread_min:
tcp_fusion_rcv_unread_min:      0
Finally, disable TCP_FUSION to lose all performance benefits of Solaris, but keep your ISV happy.
SUN9999/root# echo "do_tcp_fusion/W 0" | mdb -kw
May this be helpful for Solaris 10 platform administrators, especially with Network Management platforms!

Thursday, December 27, 2012

Storage News: December 2012 Update


Oracle: Discusses Tape Storage...

Nanotube Non-Volitile Storage: Nantero NRAM ...

Dell Storage...

Emulex: Storage Network Vendor Buys Network Management Endace...

Red Hat Unites OS to Gluster Clustered Storage...

Advise from The Register: Cisco, Don't Buy NetApp...

Thursday, March 24, 2011

2011 March 20-36: Articles of Interest

Security, Networking, and Industry Articles of Interest


2011-03-16 - Microsoft malware removal tool takes out Public Enemy No. 4
Microsoft finally used its Malicious Software Removal Tool to remove the fourth-biggest threat in automated program's history dating back to at least 2005.


2011-03-18 - RSA breach leaks data for hacking SecurID tokens
'Extremely sophisicated' attack targets 2-factor auth


2011-03-20 - AT&T acquires T-Mobile USA from Deutsche Telekom for $39bn
There was one GSM network, to rule them all...


2011-03-23 - Mac OS X daddy quits Apple
Bertrand Serlet, Apple’s senior vice president of Mac software engineering and the man who played a lead role in the development of Mac OS X, is leaving the company.


2011-03-23 - 'Iranian' attackers forge Google's Gmail credentials
Skype, Microsoft, Yahoo, Mozilla also targeted.

Extremely sophisticated hackers, possibly from the Iranian government or another state-sponsored actor, broke into the servers of a web authentication authority and counterfeited certificates for Google mail and six other sensitive addresses, the CEO of Comodo said


2011-03-23 - Oracle announced all software development stopped on Intel's Itanium CPU.
Red Hat was the first to pull the plug on Itanium, saying back in December 2009 that its Enterprise Linux 6 operating system, which was released last summer, would not be supported on Itanium processors.

Microsoft followed suit in April 2010, saying that Windows Server 2008 R2 and SQL Server 2008 R2 would be the final releases supported on Itanium.


2011-03-24 - Apple Mac OS X: ten years old today
OS X was the product of Apple's 1996 purchase of NeXT, a move that not only saw the acquisition of a modern operating system, but also the return of its co-founder, Steve Jobs, to the company.

Monday, April 5, 2010

Itanium: The Death of Microsoft Windows Support



Itanium: The Death of Microsoft Windows Support

Announcement:


History:

See former blog entry when Red Hat Linux discontinued their Itanium support.

Network Management Implications:

None. There were no serious Network Management products using Microsoft Windows on Itanium. There are really only HP operating systems left on this CPU platform, a single isolated software vendor on a single isolated chip supplier.

Why Few Implications:

Single vendor processors (IBM POWER and Intel Itanium) are somewhat more risky, when there is a gap in the development cycle due to human error. Specialized software vendors looking for longevity often look for multiple suppliers when producing a product, to ensure that a single vendor glitch does not damage their product marketing.

In the areas of server processors, there really only seems to be two multi-vendor CPU vendors left: SPARC (Oracle/Sun and Fujitsu) and x64 (Intel and AMD.)

Who Will Be Affected?

Probably, the people who will be most affected by this move will be businesses who depended on Microsoft SQL Server on Itanium.

Had those vendors chosen another database vendor, who supports multiple architectures (i.e. Oracle RDBMS) - a migration to another Operating System (i.e. an HP Operating System) on the same hardware could have been done, to extend the life of the asset, and any desired hardware architecture could have been chosen to migrate to later (i.e. SPARC, POWER, Intel x64, AMD x64.)

Friday, December 18, 2009

Itanium: The Death of Red Hat Linux Support

Itanium: The Death of Red Hat Linux Support

Announcement

As reported on The Register, Red Hat quietly announced RHEL 5 as the "end of the line" for Intel Itanium.

The History
The processor market as basically split between two comodity CISC (Completed Instruction Set Computing) chip makers, Intel (x86) and Motorola (68K) where high-end workstation & server vendors consolidated in Motorola (68K) with PC makers leveraging Intel (x86).


Motorola indicated an end to their 68K line was coming, x86 appeared to be running out of steam. A new concept called RISC (Reduced Instruction Set Computing) was appearing on the scenes. Wholesale migration from Motorola was on, many vendors creating their own very high performance chips based upon this architecture. Various RISC chips were born, created by vendors, adopted by manufacturers, each with their own operating system based upon various open standards.
  • SUN/Fujitsu/Ross/(various others) SPARC
  • IBM POWER
  • HP PA-RISC
  • DEC Alpha
  • MIPS MIPS (adopted by SGI, Tandem, and various others)
  • Motorola 88K (adopted by Data General, Northern Telecom, and various others)
  • Motorola/IBM PowerPC (adopted by Apple, IBM, Motorola, and various others)
There was reletively small volume shipments to most vendors of full fledge processors, although the computing prices allowed for continued investment to create increasingly smaller chips to enhance performance. Many of these architectures were cooperative efforts, with cross licensing, to increase volume, and create a viable vendor base. The move to 64 occurred in most of these high-end vendors. As the costs for investment continued to rise, in order to shrink the silicon chip dies, a massive consolidation started to occur, in order to save costs and continue to be profitable.

The desktop market continued to tick away with 32 bit computing at a lower cost, with 2 primary vendors: Intel and AMD.


A massive move to consolidate 64 bit RISC processors from the minority market shareholders from their smaller shares to a common, larger, Intel based 64 bit Itanium VLIW (Very Long Intruction Word) processors. This was a very risky move, since VLIW was a new architecture, and performance was unproven. The consideration by the vendors was Intel had deep enough pockets to fund a new processor. Some of the vendors, who consolidated their architectures into Itanium included:

  • HP - PA-RISC
  • DEC, purchased by Compaq, Purchased by HP - Alpha
  • DEC, purchased by Compaq, Purchased by HP - VAX
  • Tandem, purchased by Compaq, Purchased by HP - MIPS
  • SGI -> MIPS
Many of the RISC processors did not go away, they just moved to embedded environments, where many of the more complex features of the chips could continue to be dropped, so development would be less costly.
 
[Sun Microsystems UltraSPARC 2]

[Fujitsu SPARC64 VII]
[IBM Power]
Majority RISC architecture market share holds in the desktop & server arena seemed to consolidate during the fist decade of 2000 around RISC architectures of an open consortium driven by specifications called SPARC (predominately SUN and Fujitsu) and proprietary final proprietary single vendor drive POWER (predominately IBM)
 

[AMD Athlon FX 64 Bit]
AMD later released 64 bit extensions to the aging Intel x86 instructions (which all vendors, including Intel, had basically written off as a dead-end architecture) - creating what the market referred to as "x64". Intel was later forced into releasing a similar processor, competing internally with their Itanium. Much market focus started, consolidating servers onto this proprietary x64 based systems, sapping vitality and market share from RISC and VLIW vendors.

Network Management Implications

HP really drove the market to Itanium, after acquiring many companies. There was a large number of operating systems, which needed to be supported internally, so the move to consolidate those operating systems and reduce costs became important.

HP OpenView is one of those key suites of Network Management tools, which people don't get fired for purchasing. HP made announcements of their proprietary operating system HP-UX, Microsoft proprietary Windows, and open source Linux support for Intel Itanium. HP was never able to get OpenView traction with it under Linux under Itanium or Windows under Itanium, although they were able to provide support for their own proprietary HP-UX platform, as well as Linux under x86 architecture.

With Open Source Red Hat Linux going away on Itanium. Itanium as a 64 bit architecture is clearly taking a severe downturn in the viable 3rd party architectures, and Network Management from OpenView will obviously never become a player in a market that will no longer exist.
The IBM POWER architecture, even though it is one of the last two substantial RISC vendors left, has never really been a substantial vendor in Network Managment arena, even with IBM selling Tivoli Network Management suite. Network Management will most likely never be a substantial power under POWER.

"Mom & Pop" shops run various Network Management systems under Windows, but the number of managed nodes is typically vastly inferior to the larger Enterprise and Managed Services markets. The software just does not scale as well.

Sun SPARC Solaris (with massive vertical and horizontal scalibility) and Red Hat Linux x68 (typically limited to horizontal scalibility) are really the only two substantial multi-vendor Network Management platform players for large Managed Services installations left. Red Hat abandoning HP's Itanium Linux only continues to solidify this position.