Multi-Node Cluster Shared Nothing Storage
Abstract
A number of months back, a new release of Sun Cluster was released, in conjunction with OpenSolaris 2009.06. This release offered a new architecture for a lower cost fail-over cluster capability using Shared-Nothing Storage. This paper discussed the benefits using a broader implementation plan to further reduce costs and increase scalability.
Shared Nothing Storage
With the advent of ZFS under Solaris and ComStar under OpenSolaris, there is a new no-cost architecture in the world of high-availability under Sun - Shared Nothing Storage.
The benefits are clear in this environment:
Dual-Node Shared Nothing Storage
Some people may not bee too impressed - there is still a node which is completely unused. This additional node may be considered a pure cost in an H-A or D-R environment. This is not necessarily true, if other strategies are taken into consideration.
For example, a dual-active node, where individual internal storage could be leveraged on dual active nodes through dual initiators, to completely leverage CPU capacity on both nodes during peak times.
The benefits are clear in this environment:
Multi-Node Shared Nothing Storage
The dual-active node share nothing architecture seems very beneficial, but what can be done in very typical three-tier environments?
Considering how simple it is to move around pools as well as zones, multi-node clustering can be done with a couple of simple scripts.
For example, a triple-active node, where individual internal storage could be leveraged on all three active nodes through triple initiators, to completely leverage CPU capacity on all nodes during peak times.
The benefits are clear in this environment:
Application in Network Management
What does this have to do with Network Management?
Very often, there are multiple platforms which are used on polling platforms, with a high-availability requirement on an embedded database. There is usually a separate cost for H-A kits for applications as well as databases.
Placing each of the tiers within a Solaris Container is the first step to business optimization, higher availability, and cost reduction.
As a reminder, Oracle RDBMS can legally be run within a CPU Capped Solaris 10 Container, in order to reduce CPU licensing costs, leaving plenty of CPU available for failing over applications from other tiers. As additional capacity is needed by the business, the additional license can be purchased and the cap extended to other cores on the existing platform.
Pushing down the H-A requirements to the OS level eliminates application & license complexities and enables drag-and-drop load balancing or disaster-recovery under Solaris 10 or OpenSolaris using Solaris Containers. Running a RDBMS within a Capped Solaris 10 Container gives the business the flexibility to buy/stage hardware without having to pay the unused cpu cycles until they are actually needed.
- - - - - - - - - - - - - - - - - - -
Update - 2009-01-07: Another blog posting about this feature:
Solaris tip of the week: iscsi failover with COMSTAR
Update - 2019-10-21: Previous "Solaris tip of the week" no longer exists, transferred post:
https://jaydanielsen.wordpress.com/2009/12/10/solaris-tip-of-the-week-iscsi-failover-with-comstar/
Abstract
A number of months back, a new release of Sun Cluster was released, in conjunction with OpenSolaris 2009.06. This release offered a new architecture for a lower cost fail-over cluster capability using Shared-Nothing Storage. This paper discussed the benefits using a broader implementation plan to further reduce costs and increase scalability.
Shared Nothing Storage
With the advent of ZFS under Solaris and ComStar under OpenSolaris, there is a new no-cost architecture in the world of high-availability under Sun - Shared Nothing Storage.
The benefits are clear in this environment:
- External Storage is not required (with it's complexity and costs)
- Additional storage area network infrastructure is not required (with it's complexity and costs)
- The OS of the active node continually keeps all the local disks in sync (with virtually no complexity)
- Complete CPU capacity is needed on both platforms for peak CPU capacity for active applications.
Dual-Node Shared Nothing Storage
Some people may not bee too impressed - there is still a node which is completely unused. This additional node may be considered a pure cost in an H-A or D-R environment. This is not necessarily true, if other strategies are taken into consideration.
For example, a dual-active node, where individual internal storage could be leveraged on dual active nodes through dual initiators, to completely leverage CPU capacity on both nodes during peak times.
The benefits are clear in this environment:
- External Storage is not required (with it's complexity and costs)
- Additional storage area network infrastructure is not required (with it's complexity and costs)
- The OS of the active node continually keeps all the local disks in sync (with virtually no complexity)
- 200% CPU capacity on two platforms can be leveraged during peak usage times
- Fail-over of a single node results in reduction to 100% of CPU capacity
Multi-Node Shared Nothing Storage
The dual-active node share nothing architecture seems very beneficial, but what can be done in very typical three-tier environments?
Considering how simple it is to move around pools as well as zones, multi-node clustering can be done with a couple of simple scripts.
For example, a triple-active node, where individual internal storage could be leveraged on all three active nodes through triple initiators, to completely leverage CPU capacity on all nodes during peak times.
The benefits are clear in this environment:
- External Storage is not required (with it's complexity and costs)
- Additional storage area network infrastructure is not required (with it's complexity and costs)
- The OS of the active node continually keeps all the local disks in sync (with virtually no complexity)
- 300% CPU capacity across all platforms can be leveraged during peak processing times
- Failover of a single node means only a decrease to 200% CPU processing capacity
Application in Network Management
What does this have to do with Network Management?
Very often, there are multiple platforms which are used on polling platforms, with a high-availability requirement on an embedded database. There is usually a separate cost for H-A kits for applications as well as databases.
Placing each of the tiers within a Solaris Container is the first step to business optimization, higher availability, and cost reduction.
As a reminder, Oracle RDBMS can legally be run within a CPU Capped Solaris 10 Container, in order to reduce CPU licensing costs, leaving plenty of CPU available for failing over applications from other tiers. As additional capacity is needed by the business, the additional license can be purchased and the cap extended to other cores on the existing platform.
Pushing down the H-A requirements to the OS level eliminates application & license complexities and enables drag-and-drop load balancing or disaster-recovery under Solaris 10 or OpenSolaris using Solaris Containers. Running a RDBMS within a Capped Solaris 10 Container gives the business the flexibility to buy/stage hardware without having to pay the unused cpu cycles until they are actually needed.
- - - - - - - - - - - - - - - - - - -
Update - 2009-01-07: Another blog posting about this feature:
Update - 2019-10-21: Previous "Solaris tip of the week" no longer exists, transferred post:
https://jaydanielsen.wordpress.com/2009/12/10/solaris-tip-of-the-week-iscsi-failover-with-comstar/
I've been researching HA iscsi configurations recently, and I'd like to capture and share what I've learned about the COMSTAR stack. I have a simple demo that you can use for your own experiments...