Skip to main content
    Israel [change]    Terms of use
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Parallel Machine Learning Toolbox

Machine Learning


Currently, most data mining / machine learning toolkits are restricted to single-processor single-machine work. This is rapidly becoming a limitation as data is becoming more abundant. It is unreasonable to expect learning algorithms to be able to complete their tasks using a single processor. In addition, multiple core PCs are becoming the norm, and already dominate the server market.

The Parallel Machine Learning (PML) Toolbox, a joint effort of the Machine Learning group at the IBM Haifa Lab and the Data Analytics department at the IBM Watson Lab, provides tools for execution of data mining and machine learning algorithms on multiple processor environments or on multiple threaded machines.

The toolbox comprises two main components: an API for running the users' own machine learning algorithms, and several pre-programmed algorithms which serve both as examples and for comparison. The pre-programmed algorithms include a parallel version of the Support Vector Machine (SVM) classifier, linear regression, transform regression, nearest neighbors, k-means, fuzzy k-means, kernel k-means, PCA, and kernel PCA.

One of the main advantages of the PML toolbox is the ability to run it on a variety of operating systems and platforms, from multi-core laptops to supercomputers such as BlueGene. This is because the toolbox incorporates a parallelization infrastructure that completely separates parallel communications, control, and data access from learning algorithm implementation. This approach enables learning algorithm designers to focus on algorithmic issues without having to concern themselves with low-level parallelization issues. It also enables learning algorithms to be deployed on multiple hardware architectures, running either serially or in parallel, without having to change any algorithmic code. The toolbox uses the popular MPI library as the basis for its operation, and is written in C++.

Timing diagram of algorithms in the PML


Downloadable file(s) available for IBM Parallel Machine Learning Toolbox.

Free Download Free Download


    About IBMPrivacyContact