Monday, February 22, 2010

Network Management is About Uptime

Network Management is About Uptime

Thanks to Rob for sending me toward xkcd for this one!

Saturday, February 13, 2010

Solaris 10: Using CGI on Apache2 for beginners

Abstract: There are many helpful books and web-pages to assist beginning CGI programmers. Unfortunately most assume that one is uploading their files to a professionally administered server. This blog is aimed at those who have full access to a stock Solaris 10/Apache2 webserver.

Step 1: Enable Apache2 according to this post.

2: Login as super-user.

3: Change the cgi-bin folder permissions.

# chmod 775 /var/

4: Save a valid cgi file to the ../cgi-bin directory. My test file is "first.cgi" which is borrowed from the excellent tutorial "CGI Programming 101" by Jaqueline Hamilton.
#!/usr/bin/perl -wT 
print "Content-type: text/html\n\n";

print "Hello, world!\n";

5: Set the file permissions.

# chmod 755 ..cgi-bin/first.cgi

6: Access your website to test the file.

You should see "Hello World" on a plain background (or whatever your test file specified should happen).


Solaris 10 comes with Apache and Apache2 installed (but inactive). Ensure that you don't confuse them (i.e. saving files in the wrong directories).

Apache2 does not require .cgi at the end of CGI files but does require the full file name when running it in a browser.

Friday, February 12, 2010

Two Billion Transistors: Niagra T3

Two Billion Transistors: Niagra T3


Sun Microsystems has been developing octal core processors for almost a half decade. During the past few years, a new central processor unit called "Rainbow Falls" or "UltraSPARC KT" has been in development. With the release of the Power7, IBM's first octal core CPU, there has been a renewal of interest in the OpenSPARC processor line, in particular the T3.


OpenSPARC was an Open Source project started with an initial contributor of Sun Microsystems. It was based upon the open SPARC architecture, which had many companies and manufacturers contributing and leveraging the open specification over the years. Afara Websystems was one of those SPARC vendors who started the intellectual thought on combining many SPARC cores onto a single piece of silicon. They were later purchased by Sun Microsystems, who had the deep pockets to invest the engineering required to bring it to fruition (as the OpenSPARC or UltraSPARC T1) and advance it (with the design of the T2, T2+, and now the T3.) Sun was later purchased by Oracle, who had some deeper pockets.


As is typical with the highly integrated OpenSPARC processors, PCIe are included on-chip, providing very fast access to I/O subsystems.

The T3 looks more like an a combined T2 and T2+ with enhancecments. The T2 had embedded 10Gig Ethernet, while the T2+ had 4 chip cache coherency glue. Well, the T3 has it all, in conjunction with an uplifted DDR3 DRAM interface with 4 memory channels, enhanced crypto co-processors, a doubling of cores!

The benefits to Network Management:

Small and immature Network Management products are usually thread-bound, but those days of poorly programmed systems are long gone (except in the Microsoft Windows world.)

Network management workloads are typically highly threaded and UNIX based. Platforms like the OpenSPARC have played to meet these workloads from their very early design days in the early 2000's, with other CPU vendors anxiously trying to catch up in the late 2000's.

When thousands of devices need to have information polled from numerous subsystems on various minute intervals, latency on the receiving of the information adds a level of complexity to the polling software, and highly threaded CPU's with a well written OS reward the programmer for their work.

It was not that long ago when Solaris was updated to manage processes in the millions, when those processes could have dozens, hundreds, or thousands of threads apiece.

In the Network Management arena, we welcome these high-throughput workhorses!

Monday, February 8, 2010

IBM Power 7 and eDRAM Cache

IBM Power 7 and eDRAM Cache

Welcome IBM to the world of 64 Bit Octal-Core Computing!

On February 8th, 2010, Timothy Prickett Morgan wrote about the IBM Power 7 chip launch in The Register, "Sparc T 64-threaded T2 and T2+... quad-core, eight-threaded Tukwilas... the Power7 chip has 32 threads"'

It is nice to see the trail which first generation OpenSPARC T1 had blazed with 32 threads is being followed by IBM Power and Intel Itanium, both applying different technology to compete with Sun's second and second and third generation 64 threaded OpenSPARC processors.

Possible Architecture Trade-offs to eDRAM in Cache

Timothy Prickett Morgan also wrote, "The effect of this eDRAM on the Power7 design, and its performance, is two-fold. First, by adding the L3 cache onto the chip..."

The use of embedded DRAM, to reduce transistors, squeeze more cores, and reduce latency was a great idea, even with the refresh logic added onto the chip!

Every benefit comes with a drawbacks. The discourse on possible trade-offs have been silent, which confuses me from the media.

The use of Static RAM has been traditionally beneficial to the chip manufacturers, since they could get fast and regular access to the memory cells, without having to wait for a slow refresh signal to propagate across the RAM. It is interesting that no one (and I mean NO ONE) is talking about the impact of performance for the CPU cores needing to wait for refresh on the eDRAM.

I wonder what the ratio of performance hit to reduction in latency was in moving to eDRAM?

Multi-Ported Static RAM allows for fast (simultaneous) access from multiple cores into cache. With multi-process heavy workloads, where data in the cache may not be simultaneously accessed from different cores or hardware strands, eDRAM may be a good fit. With software multi-threaded heavy workloads, where the data in the cache will be accessed simultaneously by multiple cores and hardware strands, eDRAM may suffer in comparison to multi-ported SDRAM due to excessive inefficient re-loads from main memory and inefficient sharing.

I wonder what the ratio of benefit to performance hit in throughput for moving to eDRAM was in comparison under various real-world workloads where multi-threaded applications need to share the instructions & data in the cache?

I wonder if the performance of eDRAM will be as linear as SDRAM, as the processors get loaded up? (This reminds me of the Intel 50MHz 80486 vs Intel 66Mhz (33MHz bus) 80486 tradeoff from years past...)

Connection to Network Management

Network Management traditionally deals with extremely highly threaded workloads. Managing tens of thousands of devices with hundreds of thousands of managed resources often requires thousands of threads in a single process with very regular (1-5 minute) polling intervals required tremendous throughput.

The use of Power 7 in these types of managed device facing highly threaded workloads is yet to be measured - it may be one of the most fabulous chips on the market, or it may be mediocre, for the network management space. Power is not a substantial player in the Network Management world, so I would not really expect engineers to tune the CPU for this type of workload.

I would expect that engineers tuned Power for the Database market. Network Management does require long term storage requirements of data, so this may be a very good back-end platform.


The move to eDRAM is very interesting by IBM, almost as interesting as OpenSPARC moving to highly threaded octal cores many years ago.

Will other vendors emulate IBM in the move to eDRAM cache, the same way IBM, Intel, and AMD are moving to 64 bit octal-core as OpenSPARC did years ago?

U P D A T E ! ! !

Another article has come out to discuss the use of eDRAM by IBM.

First in the chain is the 32KB L1 data cache, which has seen its latency cut in half, from four cycles in the POWER6 to two cycles in POWER7. Then there's the 256KB L2, the latency of which has dropped from 26 cycles in POWER6 to eight cycles in POWER7—that's quite a reduction, and will help greatly to mitigate the impact of the shared L3's increased latency.

The POWER7's L3 is its most unique feature, and, at 32MB, it's positively gigantic. IBM was able to cram such a large L3 onto the chip by making it out of embedded DRAM (eDRAM) instead of the usual SRAM. This decision cost the cache a few cycles of latency

Thursday, February 4, 2010

Learn firmware programming now!

ok There's an operating system on your SUN and/or Macintosh systems that you've probably never heard of even though it runs before the regular operating system is loaded. That mystery operating system is FORTH.

FORTH is the programming language used to write the FORTH operating system and it compiles itself. FORTH is an odd beast in the programming world. Due to its odd characteristics which include compactness and simplicity, it's great for firmware programming and embedded systems.

If you're tired of not quite understanding OPENBOOT or the firmware process, there are many great and free resources to help you (thanks to the friendly FORTH community) :

Starting FORTH
A great introduction to FORTH, stack programming concepts, and post-fix notation.

Writing FCode Programs
SUN's guide for firmware programming.

Thinking FORTH
Discusses broader topics such as the software development cycle and experienced programmers/developers' views of FORTH.


More information and resources:

Johnathan Schwartz announces resignation as CEO of SUN with a tweeted haiku!

This message was sent to the Twitter account of soon to be former CEO of SUN John Schwartz early Thursday morning:

"Today's my last day at Sun. I'll miss it. Seems only fitting to end on a #haiku. Financial crisis/Stalled too many customers/CEO no more"