% % file : abstract.tex % desc : Abstract of A'A paper for ICCS'03 % % $Header: /home/cs/richie/tuning/papers/cvsrep/iccs2003-workshop/preprint/abstract.tex,v 1.2 2003/02/21 02:21:32 richie Exp $ % \begin{abstract} \noindent This paper presents uniprocessor performance optimizations, automatic tuning techniques, and an experimental analysis of the sparse matrix operation, $y = \ATAx$, where $A$ is a sparse matrix and $x, y$ are dense vectors. We describe an implementation of this computational kernel which brings $A$ through the memory hierarchy only once, and which can be combined naturally with the register blocking optimization previously proposed in the \Sparsity\ tuning system for sparse matrix-vector multiply. We evaluate these optimizations on a benchmark set of 44 matrices and 4 platforms, showing speedups of up to \BYFACTOR{4.2}. We also develop platform-specific upper-bounds on the performance of these implementations. We analyze how closely we can approach these bounds, and show when low-level tuning techniques (\eg, better instruction scheduling) are likely to yield a significant pay-off. Finally, we propose a hybrid off-line/run-time heuristic which in practice automatically selects near-optimal values of the key tuning parameters, the register block sizes. \end{abstract} % % $Log: abstract.tex,v $ % Revision 1.2 2003/02/21 02:21:32 richie % Fixed a number of typos. % % Revision 1.1.1.1 2003/02/21 00:33:01 richie % Final 10-page submission (draft) % % eof %