%
% file : abstract.tex
% desc : Summary of contributions
%
% $Header: /home/cs/richie/tuning/papers/cvsrep/sc2002/paper/abstract.tex,v 1.6 2002/07/30 02:26:35 richie Exp $
%
\begin{abstract}
\noindent
We consider performance tuning, by code and data structure
re\-org\-ani\-za\-tion, of sparse matrix-vector multiply (\SMVM), one of the
most important computational kernels in scientific applications. This
paper addresses the fundamental questions of what limits exist on such
performance tuning, and how closely tuned code approaches these limits.
Specifically, we develop upper and lower bounds on the performance
(Mflop/s) of {\SMVM} when tuned using our previously proposed register
blocking optimization. These bounds are based on the non-zero pattern
in the matrix and the cost of basic memory operations, such as cache
hits and misses. We evaluate our tuned implementations with respect to
these bounds using hardware counter data on 4 different platforms and
on a test set of 44 sparse matrices. We find that we can often get
within 20\% of the upper bound, particularly on a class of matrices
from finite element modeling (FEM) problems; on non-FEM matrices,
performance improvements of \BYFACTOR{2} are still possible. Lastly,
we present a new heuristic that selects optimal or near-optimal
register block sizes (the key tuning parameters) more accurately than
our previous heuristic. Using the new heuristic, we show improvements
in {\SMVM} performance (Mflop/s) by as much as \BYFACTOR{2.5} over an
untuned implementation.
Collectively, our results suggest that future performance
improvements, beyond those that we have already demonstrated for
{\SMVM}, will come
from two sources: (1) consideration of higher-level matrix structures
(\EG, exploiting symmetry, matrix reordering, multiple register block
sizes), and (2) optimizing kernels with more opportunity for data
reuse (\EG, sparse matrix-multiple vector multiply, multiplication of
$A^TA$ by a vector).
\end{abstract}
%
% $Log: abstract.tex,v $
% Revision 1.6 2002/07/30 02:26:35 richie
% Final draft; changed en-dashes to commas.
%
% Revision 1.5 2002/05/10 18:07:40 richie
% Updated legend on sum_perf-XXX figures.
%
% Revision 1.4 2002/05/10 17:43:10 richie
% Removed comment about the Power3.
%
% Revision 1.3 2002/05/09 18:51:24 richie
% More grammatical fixes.
%
% Revision 1.2 2002/05/09 16:34:40 richie
% Shortened to approximately 300 words.
%
% Revision 1.1.1.1 2002/05/08 18:27:19 richie
% SC 2002 paper on SMVM performance bounds
%
%
% eof
%