IBM
Skip to main content
 
Search IBM Research
     Home  |  Products & services  |  Support & downloads  |  My account
 Select a country
 IBM Home
IBM Research
Hypervisor
Search Research
Feedback
IBM Research
  Research Hypervisor
hcall"> HV"> IOMMU"> LPAR"> ISA"> ]>
Research Hypervisor Principles Draft Version (0.20) IBM Research Hypervisor Group 2005 International Business Machines Corporation &revision; &date; Hypervisors allow different operating systems or different instances of a single operating system to run on the same hardware at the same time. In this sense, Hypervisors resemble virtual machine systems such as VMWare. However, recent developments in Hypervisors have shown that performance need not be compromised as it is with VM systems[XEN]. An OS, and its applications, running on top of a Hypervisor, and its applications, can run at the native speed of the machine; in this steady state, the OS need never use Hypervisor services.
Introduction The Research Hypervisor project creates a virtual machine environment that is a Logical representation of a machine that has been Partitioned from the original base machine. We call this environment the Logical Partition ((). In Xen parlance an ( is called a Domain. The ( is different from a pure virtual machine in that it uses para-virtualization techniques first described by the Denali Project[DEN]. Introduce the "types" of hypervisors somewhere and don't forget the ref. Para-virtualization requires the software that runs in the ( to be aware that it is not running on bare metal. This ( awarness comes from the use of software abstractions to deal with the loss of access to certain specific machine resources that are normally the domain of the software. These resources are abstracted by a series of interfaces to the Hypervisor (&HV;), known as Hypervisor Calls (or &hcall;s). The Hypervisor is primarily concerned with the management of memory, processors, interrupts and some simple transports. The heart of the Research Hypervisor design is to keep the core &HV; restricted to these items and keep the code as small and simple as possible. All other services are the domain of surrounding cooperative (s. Although these services can be used to create an (ed environment, they are beyond the scope of the &HV; itself. In the following sections we will discuss our history, specific goals, design decisions, how logical abstractions were created in the &HV; to facilitate an ( environment and, where necessary, will also discuss where the two currently supported ISAs differ. This document is a companion to the following documents: POWER Architecture Platform Reference The (PAPR) document available from IBM from: [URL] It describes the environemnt that an general purpose operating system will run in includeing bootstrap, runtime and shutdown. Of particular interest should be the description of the ( option. The Research Hypervisor ( Extentention This document describes the additional &HV; calls supported by the research hypervisor in order to manage and creare (s, available with this document. The Reseach Hypervisor Cell Processor Extention. This document provides the description of the additional &HV; interfaces necessary to access the features of the Cell Processor Architecture, available from the Cell Architecure Library. If you have to ask, you'll never know [cite].
History Machine abstractions have been around since the early data processing years, modern operating systems introduce a simpler machine abstraction for user applications centralizing complexity, micro/exo-kernel attempt to simplify the resource management by distributing the complexity while still maintaining a simple application environment and new projects emerge with approaches differing slightly, this project is another. IBM System 390 introduced an architecture that enabled machine virtualization by introducing a new processor mode that allowed access to privileged resources to be efficiently programmed using micro and milli-code instruction set. In the last few years IBM has extended the PowerPC architecture to introduce a new processor mode that had exclusive access to specific processor resources yet retained the same programming environment and instruction set. Although not as efficient as executing processor milli-code instructions it is uniquely tailored for a para-virtualized environment to be created. The IBM PHYP Hypervisor product, which was released with the recent Squadron pSeries machines[Ref], is the first software &HV; to take advantage of this new processor mode. It is capable of running several heterogeneous operating systems at once, and provides each ( with an amazing array of RAS, high end IO, and server consolidation benefits [Ref]. Other processor manufacturers, in particular Intel and AMD, are also designing (or have designed) architectural enhancements that introduce a similar processor mode to their processors. Regardless of whether or not this new processor mode exists, this document shall refer to the processor mode that the &HV; executes in as Hypervisor Mode. This project began as an open source reference implementation of the PHYP para-virtualization interfaces as defined by the Power Architecture Platform Requirements (PAPR) in order to explore new processor and architecture innovations. It is evolving in to a &HV; core that is capable is creating a common ( environment on a multitude of different architectures with and without explicit &HV; support, the ia32 architecture being the most robust example of this ability.
Goals The creation of an ( environment should go beyond the simple virtualization of the underlying hardware, but instead should create an environment that presents a simpler machine by which software (including OSes) can take, customize and easily take advantage of architectural enhancements at the processor level. With this in mind, and the PARP as a staring point we set out to explore: Support Open Source OSes like Linux, the BSDs, Darwin, etc.. and run on all machines they run. Security, by providing complete isolation, attestation, as well as TPM[ref] services completely in software. Small, auditable, and configurable source space. Explore architectural and processor enhancements. Small one off (s that performs specific isolated tasks. Library OS creating an even simpler C Library/POSIX like (. Create Real Time ( environment. Full virtualization from within an ( ( managment. New logical transports and inter-( services. Some of these goals have been explored extensively, others are just beginning, but it is all still being researched for improvement.
Hypervisor Principles The Logical Model It has been said that any problem becomes easier if you add a layer of abstraction [cite]. The creation of a set of logical resources for an ( represents that abstraction. A quick overview of the benefits are as follows.
The &hcall; The mechanism by which an ( submits a request to the &HV; is generally &ISA; specific, but the calling semantics are similar to those use by Unix System Calls. They normally require the processor to suffer a trap and an address space change while the processor enters the &HV; mode. System calls used by most operating system are generally targeted to a C library and/or POSIX symantics where arbitrary buffers pointers are passed into the kernel and some cross address space copying is required to support these semantics. &HV;s take a more shared memory centric view and reserve the &hcall; fo ruse a s a control channel for small messages that can usually fit in parameter registers reserved by the ABI and, in the pathological case with the IA32, use of the callers stack.
Memory The software running on an ( is assigned chunks of contiguous memory that become accessible to the LPAR when it requests that the &HV; maps a virtual page to a logical page. A collection of these chunks create a logical address space which the software can use to describe a virtual to logical mapping. The &HV; then takes this tuple of ( identifier and logical page identifier, resolves the actual physical page identifier, and finally inserts the translation into the page table that the &HV; controls. Partitioning physical memory into these large chunks simplifies and reduces the meta-data necessary for the &HV; to maintain the access, translation, and (in particular cases) reverse translation information that is required to manage several logical address spaces. The logical address space also has a RAS benefit, such that faulty memory banks can be identified and easily vacated so that the memory can be removed and/or replaced without the software running on the ( having any knowledge.
PowerPC Memory The PowerPC memory model is ... ... The translation from effective to virtual remains the domain ot the (; however, the transaltion from the virtual to the physical is controlled by the &HV;, and therefore, the hashed page table must be abstracted form the (. In order for the the ( to track the HTAB entries, the &HV; and the ( must both agree the geometry. The log number of bytes in the HTAB is the second word of the ibm,pft_size property of each cpu node.
Processors Processors are usually time shared by the &HV;. Each instance of a processor in an ( is a logical in such a way that there can be more logical processors than there are physical processors. There is also the ability to take a processor thread, which may not have full processor semantics, and represent it as a full logical processor. Synergistic cores (SPEs), like those found in the Cell architecture can be arbitrarily assigned and isolated to different (s. This allows for the workload assigned to an SPE to continue independent of which ( is currently executing on the host processor.
Interrupt Controler As different interrupt sources, such as processors and devices) are assigned to different (s it is important to be able to reflect the insantiated interrupt to the appropriate (. This is done by virtualizing the external interrupt controllers (XIRR).
Physcial Devices The direct access to a physical device from an ( is simply arranging for the device to be mappable by that (. Once that mapping and the necessary interrupt routing is established, the native driver is capable of controlling the device naturally. This access is usually done from a trusted ( that is known not to program the device in a malicious manner. One could imagine that such and ( could program an device to DMA to arbitrary parts of physical memory therby effectively destroying other (s or the &HV; itself. However, some machines are equiped with I/O translation mechanism (&IOMMU;) that are capable instantiating a unique I/O address space that can be partitioned and/or taged in such a way that each device can access only a specific set of physical address ranges.
&IOMMU;s &IOMMU;s come in several forms, some work on a the slot level, and some work on the bus level, some self virtualizing devices had I/O MMUs on the adapter. Each present isolation at a different granualarity. For every &IOMMU; on the system the &HV; instantiates a unique I/O address space. For convienience this address space starts at address 0 and can be of arbitrary size (usually selected for the performace characteristics of the machine). The I/O address space is the paritioned such that a contiguous range of the I/O assress space is assigned to every uniquely identifiable device, this usually is some slot identfier on that bus. For example, there is a I/O bus available for the Cell processor architecture that contains several adapter directly attached to the bus. Each adapter has its own unique identifier, and thus can be paritioned and assigned to individual (s. Some buses (usually called a south bridge or host bridge) are proprietary busses that contain adapters to industry standard buses like PCI or PCI-X. It may be the case that the standard bus does not uniquely identify each slot on that bus and therefore that standard bus is not partitionable. Each I/O address space parition is described by the ibm,my-dma-window property in the done of the device. The ( is able to use addresses in the window to program DMA transactions with the device and use the H_TCE_* &HV; calls to create translations from the window address to the ( logical address. Programming a device to DMA byond its DMA window will usually result in the device being taken off line and reused by the system.
Logcial Devices Logical devices are introduced to allow single devices to be shared by multiple (s, but an addition benefit is the drastic reduction of the number of devices each ( must support. The virualization of devices are not specifically a function of the &HV; but rather an inter-partition service that the &HV; enables.
Console By virtualizing the console (VTTYs), there is no need for low level drivers to directly access a Graphic Console or a Serial UART on the machine. This enables early boot and low level debugging/panic services to be drastically simplified.
Network By virtualizing the network (VETH/ILLAN), can benefit from a shared memory transport effectivly presenting the NIC with an arbitrary sized MTU. More than one can be selected so that the most efficient size can be chosen per payload. For network communication that is destined out of box, the ( actually operating the physical NIC can use a packet bridge, IP forwarding, or Network Address Transaltion technology forward the communication to the ouside world.
Block Devices Block devices, such as Disks, CD-ROM, DVD, or even disk images can can be hosted and vitualized by the (s that have access to them. Then by using the VSCSI protocol, other (s can have direct and shared access to them.
Performace Characteristics The Punt Unfortunately, detailed performace results were beyond the scope of this project (due to funding restrictions). However, early performance characteristics using unoptimized code paths have shown a less then 4% overhead in CPU bound performance, in order to accomodate a Hypevisor and a single controlling/IO Hosting Linux. On I/O bound workloads preliminary results show that virtualizing I/O produces a 20% loss from the adapters theoretical bandwidth, WRT disk and/or networking. In a heterogenous environment (such as the Cell Processors) where the workload is self contained in the SPE the perfoamce impact was engligible and in somecases immeasurable.

  

    About IBMPrivacyContact