Supercomputers: Who’s the boss?
IBM Research middleware for managing extreme-scale computers recognized by US Department of Energy
supercomputers being used in industry? How can we harness the tremendous power of extreme-scale computers to provide smarter solutions in areas like financial transactions, the Internet, or manufacturing? According to Eliezer Dekel, manager of distributed middleware at IBM Research – Haifa, supercomputers still need management programs, like the ones being developed as part of the award-winning Colony II project. Colony was recently awarded supercomputer time by the US Department of Energy to help meet this challenge head-on.Why aren’t more
Today, the management of supercomputers is still being done in a relatively unsophisticated manner. You can turn them off, turn them on, and load a machine. But each action must be done separately in a manual fashion or using very basic scripts.
“Our team is developing a coordinated framework that enables the management and monitoring of hundreds of thousands of machines running in parallel,” explains Dekel. “In this way, we can also deploy one program for a cluster of machines, deploy another program for another subset, and then convey the status of these machines to the user—even reporting what stage of the processes they’re running.“ In short, the new management middleware will provide high performance computers with improved scalability, optimization, fault tolerance, and support for various management policies.
Super management for supercomputers
“Because the price per processor of BlueGene is much lower than any other large computing system, it makes sense to harness that power for business purposes like financial systems, the Internet, and more,” continues Dekel.
Management technology for extreme-scale computers has the potential to introduce supercomputing capabilities to the business market, opening up new opportunities for both business environments and the computer systems.
INCITE for their potential to advance scientific discovery and awarded time at the DOE’s Leadership Computing Facilities at Argonne National Lab in Illinois and Oak Ridge National Lab in Tennessee. The 30 month project began a few months ago and just saw delivery of the first initial design. The award from the DOE enables the researchers to gain supercomputing time at the National Labs in order to carry out tests for their design.Colony II is one of 69 project selected by
Projects receiving INCITE awards utilize complex simulations to accelerate discoveries in ground-breaking technologies such as lithium air batteries and nano solar cells. The awards also include projects designed to close the nuclear fuel cycle, develop advanced propulsion systems, improve DNA sequencing and explore phenomena on the tiny scale of nanostructured superconductors.
What lies ahead
The Colony II Haifa team led by Yoav Tock, is now debating the pros and cons of having the technology they are developing remain proprietary or making it open source. They are currently using a special version of Linux developed in Hursley and are exploring whether they will be adding their new algorithms to Linux or keeping them inside the IBM middleware.
“By developing new management technologies for supercomputers, we’re looking forward to bringing the future of supercomputers a little bit closer,” noted Dekel. “As they become smarter and more self-sufficient, they also become more accessible and practical to many markets and industries.”
Where do you see the future of supercomputing?