Friday, July 13, 2018

Solaris 11.4: KSplice Arriving

[Solaris Logo, Courtesy Sun Microsystems]

Solaris 11.4: KSplice Arriving!


Abstract:

The concept of patching an OS system has existed since the beginning of computers. At one time, source code needed to be compiled and a system rebooted with the new OS. Pre-compiled patches were later shipped by OS vendors. Patches were often allowed to be applied while the OS in Single User Mode. Sun had created the concept of Live Upgrade, where patching could be done to an alternate boot environment & booted at later convenient time, to reduce downtime to a simple reboot. In 2007, another feature referred to as Deferred Activation Patching allowed for the current running Sun Solaris 10 OS to be patched, while deferring some of the patches until the next boot, leveraging loopback file systems. In 2008, KSplice concept was birthed from MIT for LINUX kernels, to perform some degree of patching, while severely limiting reboots. Oracle purchased Sun and Solaris in 2009. The MIT students won a $100K award, to start up a company in 2009. KSplice was  purchased by Oracle in 2011. The promise of KSplice and rebootless patching for Solaris was understood in concept, but not executed, until recently.

[Oracle Logo, courtesy, Oracle Corporation]

The Road to Solaris:

KSplice team had talked about Solaris support, before the purchase by Oracle, but it was first being integrated into Linux Platinum Support. NetMgt discussed the benefits & differences in KSplice for Linux vs Solaris in 2012. NetMgt reported that Solaris and KSplice Engineers were meeting in 2012. Goals included:
  1. Solaris team bringing KSplice technology into OS
  2. Reboot-less small fixes via KSplice into Solaris
  3. Allow customer to keep patches "up to date" with year long uptime
  4. Leverage Synergies with existing philosophies:
    (DTrace allows data path switching without latency or interruption)
The engineering task of merging DTrace & KSplice technologies between Linux & Solaris was seemingly on. In 2015, NetMgt reported discussion of KSplice for Platinum Support Linux with DTrace became a major differentiator for Oracle, but KSplice discussion regarding Solaris was still dark... until 2018.

[Solaris 11.4 Beta image, courtesy Oracle Corporation]

Solaris 11.4 (aka Solaris 12)

Since KSplice was a significant feature, those of us in the industry expected it to arrive in Solaris 12. To casual observers, it became clear that "big bang" approach to OS's was becoming obsolete. Apple started delivering MacOSX, continually, with no major interruptions. Windows started delivering Windows 10 continually, with no major future interruptions planned. Solaris announced they would be entering a similar Continuous Delivery model in 2017. Some reporters spread #FakeNews about Solaris being dead, but those who understood industry trends were watching for features from Solaris 12 to be integrated. The first & second Open Beta release of Solaris 11.4 was done... and it was clear that Solaris 12 features were being bundled into the Solaris 11 stream, but KSplice was conspicuously missing.

KSplice Hints

We started seeing hints of KSplice in 2017. The test cases for spliceadm were now being bundled. Many thanks to Tim Foster, to providing a hint into the future!

KSplice Infrastructure

It seems the infrastructure for KSplice was bundled in Solaris 11.4, but live splices have not been made publicly available. A sample of what we can observe from the manual.

SpliceAdm

The use of the "adm" suffix in commands for administration has become quite well known in Solaris. A new such command magically appeared in the 11.4 - "spliceadm". The manual page says "Splices are fixes for specific bugs that can be applied on a live system" - KSplice arrived! The Interface Stability is declared as "Committed" - it is here to stay!

Freeze and Unfreeze

A new concept was introduced to KSplice in Solaris 10 - freezing. This seems to be a way to allow for automatic updates, limit automatic updates to a particular splice version, or even facilitate rollbacks to a lower numerical splice. This facilitates the downloading of splices to a system, yet restriction of splice activation/rollback until a time of low utilization.

[Former KSplice Team, courtesy MIT News]

The Back Story

A former engineer, Enrico, from Sun/Oracle published the back-story on KSplice for Oracle Solaris. It appears he had worked with Tim [Foster, perhaps?] on the project. He spoke of resolving scaling problems, not experienced under Linux, due to huge stack sizes supported by SPARC Solaris systems while running latency sensitive Oracle Cluster. He also spoke of KSplice & DTrace compatibility - which we discussed earlier in this article with Linux.

Special thanks for those Enrico identified: Jan Setje-Eilers, Scott Michael, Kuriakose Kuruvilla, Pete Dennis, Rod Evans, Ali Bahrami, Mark J Nelson, Xinliang Li,Raja Tummalapalli, Albert White, Adam Paul, and Gabriel Carrillo.

Conclusions:

As an Application Architect in Remote Network & Systems Management, Configuration Management, and Incident Management - I had personally developed & used techniques to deploy code live in arenas where continual uptime was required. It is good to see similar techniques reach into the OS arena.

Sun had previously set a very high bar with Solaris, offering the ability to patch a live system with complete patch application on a reboot, or applying many parts of a system (until eventual reboot.) Some of this flexibility was rolled back with Solaris 11, to see KSplice finally start arriving on the scene with Oracle Solaris 11.4.

While Oracle Solaris is not rolling splices yet, we look forward to 100% uptime with KSplice, in conjunction with using LDom's on SPARC hardware.