- pOSKI-v1.0.0 is available for download.
What is pOSKI?
-
The parallel Optimized Sparse Kernel Interface (pOSKI) Library is a
collection of kernels that provides automatically tuned (autotuned)
high performance computational kernels for sparse matrices, such as
Sparse-Matrix-Vector-Multiplication (SpMV).
pOSKI targets both uniprocessor and multicore machines.
pOSKI builds on prior work on the
OSKI
library, which provided autotuned kernels for SpMV and other kernels on
cache-based superscalar uniprocessors.
The purpose of both pOSKI and OSKI is to make it easy for developers of
solver libraries, and of scientific and engineering applications,
to more easily attain high performance in commonly used sparse matrix
operations, via autotuning.
- installation time tuning: an off-line autotuning which allows extensive benchmarking of different kernel implementations to identify the fastest ones.
- run-time tuning: an on-line autotuning when more is known about the user's matrix, and the best data structure and kernel implementation must be selected quickly.
Autotuning is done both at installation time and run-time:
pOSKI also lets the user cheaply reuse a prior tuned data structure and implementation, exploiting the fact that the same matrix structure can often be reused.
pOSKI is part of on-going work by the Berkeley Benchmarking and Optimization (BeBOP) group, a research program on automatic performance tuning and analysis at the University of California, Berkeley.
What optimizations does pOSKI perform?
-
The primary aim of pOSKI is to provide parallel functionality,
and includes additional optimizations presented in previous work done
for sparse matrix computations, based on the
OSKI library.
The optimizations include register blocking, thread blocking,
software prefetching, software pipelining, SIMD, and loop unrolling.
For parallel functionality, pOSKI supports several parallel programming models (see threading models in the documentation) to create multiple threads on multicore architectures, and it also supports several partitioning schemes (see partitioning models in the documentation) to split a matrix into submatrices.