Monday, May 10, 2010

Oracle VM Server for SPARC (LDoms) Dynamic Resource Management



Orgad Kimchi at Sun, now Oracle, blogged on VReality an overview of Oracle VM Server for SPARC, previously called Sun Logical Domains or LDoms. In particular, he discussed Version 1.3 with Dynamic Resource Management or DRM. The allocation of CPU threads or resources according to pre-defined polices was the target.

Orgad posted a PDF which was formatted reasonably well, but the fonts made certain sections difficult to read in the PDF that he included. I copied the PDF contents into this blog, re-formated it (while trying to keep as close to the original style as possible), adjusted some typographical errors, and included it in this blog. While the blog is not the optimal format to hold this content in, I left some feedback on his original content suggesting some reformatting suggestions.

Oracle VM Server for SPARC (LDoms) Dynamic Resource Management

ABSTRACT:

In this entry, I will demonstrate how to use the new feature of Oracle VM Server for SPARC (previously called Sun Logical Domains or LDoms) version 1.3 Dynamic Resource Management (a.k.a DRM) for allocating CPUs resources based on workload and pre defined polices.

Introduction to Oracle VM Server for SPARC:

Oracle VM Server for SPARC is a virtualization and partitioning solution supported on Oracle Solaris CoolThreads technology-based servers powered by UltraSPARC T1, T2, and T2 Plus processors with Chip Multi-threading Technology (CMT).

This technology allows the creation of multiple virtual systems on a single physical system. Each virtual system is called a logical domain (LDom) and runs a unique and distinct copy of the Solaris operating system.

Introduction to Dynamic Resource Management:

With this feature, we can define policies to control an upper and lower threshold for virtual CPU utilization on an LDom. If an LDom needs more capacity and other LDoms on the same physical server have spare capacity, the system can automatically add to or remove CPUs from domains - as per the defined policies.

The main goal of dynamic resource management (DRM) is to provide the LDoms resource allocation flexibility in order to allocate resources to the LDom during peak time without human intervention.

Architecture layout :


Prerequisites:

We need to define the control domain and three logical domains. Refer to the Logical Domains 1.3 Administration Guide (http://docs.sun.com/app/docs/doc/821-0406) for a complete procedure on how to install Oracle VM Server for SPARC.

Dynamic Resource Management configuration:

We will define a total of three polices (policy1, policy2 ,policy3), one for each domain (ldg1,ldg2 ,ldg3), each policy will define under what conditions virtual CPUs can be automatically added to and removed from a logical domain.

A policy is managed by using the commands: ldm add-policy, ldm set-policy, and ldm remove-policy commands.

The following ldm add-policy command creates the policy to be used on the ldg1 logical domain.
# ldm add-policy util-lower=25 util-upper=75 vcpu-min=4 vcpu-max=8 attack=1 decay=1 priority=1 name=policy1 ldg1 
The following policy does the following:

■ Specifies that the lower and upper limits at which to perform policy analysis are 25 percent
and 75 percent by setting the util-lower and util-upper properties, respectively.

■ Specifies that the minimum and maximum number of virtual CPUs is 4 and 8 by setting
the vcpu-min and vcpu-max properties, respectively.

■ Specifies that the maximum number of virtual CPUs to be added during any one resource
control cycle is 1 by setting the attack property.

■ Specifies that the maximum number of virtual CPUs to be removed during any one resource
control cycle is 1 by setting the decay property.

■ Specifies that the priority of this policy is 1 by setting the priority property. A priority of 1
means that this policy will be enforced even if another policy can take effect.

■ Specifies that the name of the policy file is policy1 by setting the name property.

■ Uses the default values for those properties that are not specified, such as enable (off) and
sample-rate (10 sec).

This is the second policy for the second LDom (ldg2)
# ldm add-policy util-lower=25 util-upper=75 vcpu-min=8 vcpu-max=16 attack=1 decay=1 priority=2 name=policy2 ldg2
This is the third policy for the third LDom (ldg3)
# ldm add-policy util-lower=25 util-upper=75 vcpu-min=8 vcpu-max=16 attack=1 decay=1 priority=3 name=policy3 ldg3
Now we need to enable the policies:
# ldm set-policy enable=yes name=policy1 ldg1
# ldm set-policy enable=yes name=policy2 ldg2
# ldm set-policy enable=yes name=policy3 ldg3
The following example shows how the configuration looks on the control domain. You can verify
the policies have been created by using the "ldm ls -o res" subcommand.
# ldm ls -o res
NAME
primary
------------------------------------------------------------------------------
NAME
ldg1

POLICY
STATUS PRI MIN MAX LO UP BEGIN END RATE EM ATK DK NAME
on 1 4 8 25 75 00:00:00 23:59:59 10 5 1 1 policy1
WEIGHTED MEAN UTILIZATION
4.2%
------------------------------------------------------------------------------
NAME
ldg2

POLICY
STATUS PRI MIN MAX LO UP BEGIN END RATE EM ATK DK NAME
on 2 8 16 25 75 00:00:00 23:59:59 10 5 1 1 policy2
WEIGHTED MEAN UTILIZATION
0.1%
------------------------------------------------------------------------------
NAME
ldg3

POLICY
STATUS PRI MIN MAX LO UP BEGIN END RATE EM ATK DK NAME
on 3 8 16 25 75 00:00:00 23:59:59 10 5 1 1 policy3
WEIGHTED MEAN UTILIZATION
0.0%
The following example shows how a policy, called policy1, can be changed in order to add more
CPUs to a machine called ldg1
# ldm set-policy name=policy1 vcpu-max=16 ldg1
The following example shows how we can remove a policy, called policy1
# ldm remove-policy name=policy1 ldg1
Now, let's check how dynamic resource management works :
In order stress the CPU of your system, you can get the spinners loading tool from BigAdmin (see http://www.sun.com/bigadmin/software/nspin/nspin.tar.gz .)

We will monitor the system before and during the workload.

Connect to the console of the first guest domain (ldg1)
# telnet localhost 5000
Verify the number and CPUs load using the mpstat command
# mpstat

CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 4 215 7 20 0 0 0 0 11 1 0 0 99
1 0 0 3 21 6 19 0 0 0 0 9 1 0 0 99
2 0 0 3 21 6 19 0 0 0 0 11 1 0 0 99
3 0 0 3 21 6 19 0 0 0 0 9 1 0 0 99

We can see that the LDom is underutilized (idl =99) and that we have 4 CPUs (0-3)
Let's start the workload using the nspins command and monitor the effect on the system utilization and the total number of CPUs :
# nspins -n 8 &
# mpstat 10
Now give it ~40 seconds. or so to run

CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 52 201 0 2 8 0 0 0 1 100 0 0 0
1 0 0 4 20 4 12 13 0 0 0 6 100 0 0 0
2 0 0 2 31 11 23 18 0 0 0 13 100 0 0 0
3 0 0 3 21 5 11 12 0 1 0 38 100 0 0 0
4 0 0 2 16 1 6 10 0 0 0 1 100 0 0 0
5 0 0 2 23 2 13 13 0 0 0 2 100 0 0 0
6 0 0 1 17 2 8 10 0 1 0 2 100 0 0 0
7 0 0 0 12 1 4 9 0 0 0 1 100 0 0 0

We can see that all the machine's CPUs are utilized (idl=0) and the total number of CPUs are increased to 8 (0-7) In order to see the CPUs diminished effect we can stop the workload and monitor the LDom again.
# pkill nspins
# mpstat 10
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 4 215 7 20 0 0 0 0 11 1 0 0 99
1 0 0 3 21 6 19 0 0 0 0 9 1 0 0 99
2 0 0 3 21 6 19 0 0 0 0 11 1 0 0 99
3 0 0 3 21 6 19 0 0 0 0 9 1 0 0 99
4 1 0 3 21 4 12 10 0 0 0 4 91 0 0 9
5 1 0 3 15 2 7 9 0 0 0 7 91 0 0 9
6 0 0 2 15 2 7 9 0 0 0 2 91 0 0 9

CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 4 215 7 20 0 0 0 0 11 1 0 0 99
1 0 0 3 21 6 19 0 0 0 0 9 1 0 0 99
2 0 0 3 21 6 19 0 0 0 0 11 1 0 0 99
3 0 0 3 21 6 19 0 0 0 0 9 1 0 0 99
4 1 0 3 20 4 12 10 0 0 0 4 89 0 0 10
5 1 0 5 15 2 7 9 0 0 0 7 89 0 0 11

CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 4 215 7 20 0 0 0 0 11 1 0 0 99
1 0 0 3 21 6 19 0 0 0 0 9 1 0 0 99
2 0 0 3 21 6 19 0 0 0 0 11 1 0 0 99
3 0 0 3 21 6 19 0 0 0 0 9 1 0 0 99
4 1 0 3 20 4 12 10 0 0 0 5 88 0 0 12

CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 4 215 7 20 0 0 0 0 11 1 0 0 99
1 0 0 3 21 6 19 0 0 0 0 9 1 0 0 99
2 0 0 3 21 6 19 0 0 0 0 11 1 0 0 99
3 0 0 3 21 6 19 0 0 0 0 9 1 0 0 99

We see from the mpstat output that the total number of CPUs has decreased by 1 in a cycle from 8 to 4

Conclusion:

Oracle VM Server for SPARC Dynamic Resource Management provides the system administrator the flexibility to have better dynamic resource allocation based on system utilization. In this blog entry, I demonstrated how to set up Dynamic Resource Management and how to monitor this feature during CPU utilization peak time.

About the Author:

Orgad Kimchi joined Sun in September 2007. He is currently working in the Independent Software Vendors (ISV) Engineering organization helping software vendors adopt Sun technology and improve performance on Sun hardware and software. Orgad’s blog can be found at http://blogs.sun.com/vreality.

4 comments:

  1. Do you have the new location of "Spinners Loading tool" ?

    ReplyDelete
  2. I am uncertain what you are referring to.

    Are you talking about a javascript or css tool?
    http://www.greepit.com/2011/09/generate-css-spinners-and-loading-bars-for-ajax-jquery-cssload/

    ReplyDelete
  3. I am not able to find nspin tool. Please provide the link.

    ReplyDelete
  4. Since I do not work for Sun/Oracle, I do not have the link.

    I did some searches on the internet, could not find it, either.

    I would suggest you go to the author and inquire.

    ReplyDelete