The BeBOP group is broadly interested in understanding software performance tuning issues, and the interaction or implications for hardware design. Among our general interests are

  • the interaction between application software, compilers, and hardware
  • managing trade-offs among the various measures of performance, such as speed, accuracy, power, storage, ...
  • automating the performance tuning process, starting with the computational kernels which dominate application performance in scientific computing and information retrieval
  • performance modeling and evaluation of future computer architectures

See the links below for detailed project information and status.

People

Principal Investigators

Affiliated Researchers

Graduate Students

Previous Participants

Software Distributions

Awards

Publications

Download BibTeX entries for these papers and reports.

Year: 2014 | 2013 | 2012 | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001

    2014

  • Tech report: Accuracy of the s-step Lanczos method for the symmetric eigenproble
    (UCB/EECS-2014-165, September 2014)
    Erin Carson and James Demmel
    PDF

  • Ph.D. Thesis: Provably Efficient Algorithms for Numerical Tensor Algebra
    (Computer Science Division, U.C. Berkeley, August 2014)
    Edgar Solomonik
    PDF

  • Tech report: Contention Bounds for Combinations of Computation Graphs and Network Topologies
    (UCB/EECS-2014-147, August 2014)
    Grey Ballard, James Demmel, Andrew Gearhart, Benjamin Lipshitz, Oded Schwartz and Sivan Toledo
    PDF

  • Paper: A massively parallel tensor contraction framework for coupled-cluster computations
    Journal of Parallel and Distributed Computing, June 2014.
    Edgar Solomonik, Devin Matthews, Jeff R. Hammond, John F. Stanton, and James Demmel.
    Journal, PDF

  • Paper: Communication lower bounds and optimal algorithms for numerical linear algebra
    Invited to Acta Numerica. Cambridge University Press, 23, 1-155, 2014.
    Grey Ballard, Erin Carson, James Demmel, Mark Hoemmen, Nick Knight, and Oded Schwartz.
    PDF

  • Tech report: Error analysis of the s-step Lanczos method in finite precision
    (UCB/EECS-2014-55, May 2014)
    Erin Carson and James Demmel
    PDF

  • Tech report: Analysis of the finite precision s-step biconjugate gradient method
    (UCB/EECS-2014-18, March 2014)
    Erin Carson and James Demmel
    PDF

  • Paper: A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of s-step Krylov Subspace Methods
    SIAM J. Matrix Anal. Appl., 35(1), pp. 22-43.
    Erin Carson and James Demmel
    PDF

  • Tech report: Tradeoffs between synchronization, communication, and work in parallel linear algebra computations
    (UCB/EECS-2014-8, January 2014)
    Edgar Solomonik, Erin Carson, Nicholas Knight, and James Demmel
    PDF

  • 2013

  • Paper: Avoiding Communication in Nonsymmetric Lanczos-Based Krylov Subspace Methods
    SIAM J. Sci. Comput., 35(5), pp. S42-S61.
    Erin Carson, Nicholas Knight, and James Demmel
    PDF

  • Tech report: Reconstructing Householder Vectors from Tall-Skinny QR
    (UCB/EECS-2013-175, October 2013)
    Grey Ballard, James Demmel, Laura Grigori, Mathias Jacquelin, Hong Diep Nguyen, and Edgar Solomonik
    PDF

  • Paper: Communication Costs of Strassen's Matrix Multiplication
    Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz
    Invited to Research Highlights section of the Communications of the ACM, (to appear) 2013

  • Paper: Communication Efficient Gaussian Elimination with Partial Pivoting using a Shape Morphing Data Layout
    Grey Ballard, James Demmel, Ben Lipshitz, Oded Schwartz, and Sivan Toledo
    In SPAA'13: Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 232-241, 2013
    PDF

  • Paper: Communication Optimal Parallel Multiplication of Sparse Random Matrices
    Grey Ballard, Aydin Buluç, James Demmel, Laura Grigori, Ben Lipshitz Oded Schwartz, and Sivan Toledo
    In SPAA'13: Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 222-231, 2013
    PDF

  • Paper: Implementing a Blocked Aasen's Algorithm with a Dynamic Scheduler on Multicore Architectures
    Grey Ballard, Dulceneia Becker, James Demmel, Jack Dongarra, Alex Druinsky, Inon Peled, Oded Schwartz, Sivan Toledo, and Ichitaro Yamazaki
    In IPDPS'13: Proceedings of the IEEE International Parallel & Distributed Processing Symposium, 2013 (IPDPS'13 Best Paper Award)
    PDF

  • Paper: Perfect Strong Scaling Using No Additional Energy
    James Demmel, Andrew Gerhart, Ben Lipshitz, and Oded Schwartz
    In IPDPS'13: Proceedings of the IEEE International Parallel & Distributed Processing Symposium, 2013
    PDF

  • Paper: Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication
    James Demmel, David Eliahu, Armando Fox, Shoaib Kamil, Ben Lipshitz, Oded Schwartz, and Omer Spillinger
    In IPDPS'13: Proceedings of the IEEE International Parallel & Distributed Processing Symposium, 2013
    PDF

  • Ph.D. Thesis: Avoiding communication in dense linear algebra
    (Computer Science Division, U.C. Berkeley, August 2013)
    Grey Ballard
    PDF

  • Tech Report: An arithmetic complexity lower bound for computing rational functions, with applications to linear algebra
    EECS Technical Report No. UCB/EECS-2013-126, July 2013
    James Demmel
    PDF

  • Tech Report: Communication Lower Bounds and Optimal Algorithms for Programs That Reference Arrays — Part 1
    EECS Technical Report No. UCB/EECS-2013-61, May 2013
    Michael Christ, James Demmel, Nicholas Knight, Thomas Scanlon, and Katherine Yelick
    PDF (revised version), PDF (original)

  • Paper: A Communication-Optimal N-Body Algorithm for Direct Interactions
    In Proceedings of 27th IEEE International Parallel & Distributed Processing Symposium (IPDPS) 2013
    Michael Driscoll, Evangelos Georganas, Penporn Koanantakool, Edgar Solomonik and Katherine Yelick
    PDF

  • Tech Report: Exploiting data sparsity in parallel matrix powers computations
    EECS Technical Report No. UCB/EECS-2013-47, May 3, 2013
    Erin Carson, James Demmel, Nicholas Knight
    PDF
    (To appear in Proceedings of PPAM 2013, Springer LNCS)

  • Tech Report: Communication Avoiding Rank Revealing QR Factorization with Column Pivoting
    LAPACK Working Note #276, May 2013
    James Demmel, Laura Grigori, Ming Gu, and Hua Xiang
    PDF

  • Paper: Fast Reproducible Floating-Point Summation
    21st IEEE Symposium on Computer Arithmetic, April 2013
    James Demmel and Hong Diep Nguyen
    PDF

  • Paper: Numerical Accuracy and Reproducibility at ExaScale
    21st IEEE Symposium on Computer Arithmetic, April 2013
    James Demmel and Hong Diep Nguyen
    PDF

  • Paper: Cyclops Tensor Framework: reducing communication and eliminating load imbalance in massively parallel contractions
    (UCB/EECS-2013-11, February 2013)
    In Proceedings of 27th IEEE International Parallel & Distributed Processing Symposium (IPDPS) 2013
    Edgar Solomonik, Devin Matthews, Jeff Hammond, and James Demmel
    PDF (601k)

  • Paper: Minimizing communication in all-pairs shortest-paths
    (UCB/EECS-2013-10, February 2013)
    In Proceedings of 27th IEEE International Parallel & Distributed Processing Symposium (IPDPS) 2013
    Edgar Solomonik, Aydin Buluc, and James Demmel
    PDF (334k)

  • 2012

  • Tech Report: Autotuning Sparse Matrix-Vector Multiplication for Multicore
    (UCB/EECS-2012-215, November 2012)
    Jong-Ho Byun, Richard Lin, Katherine A. Yelick and James Demmel
    PDF (2223k)

  • Paper: Graph Expansion and Communication Costs of Fast Matrix Multiplication
    In Journal of the ACM, 2012
    Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz
    PDF (228k)

  • Paper: Graph Expansion and Communication Costs of Fast Rectangular Matrix Multiplication
    December 2012
    In Proceedings of the 1st Mediterranean Conference on Algorithms (MedAlg)
    Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz, Oded Schwartz
    PDF (497k)

  • Paper: Communication Avoiding and Overlapping for Numerical Linear Algebra
    November 2012
    In Proceedings of 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '12)
    Evangelos Georganas, Jorge González-Domínguez, Edgar Solomonik, Yili Zheng, Juan Touriño and Katherine Yelick
    PDF (520k)

  • Paper: Communication-Avoiding Parallel Strassen: Implementation and Performance
    November 2012
    In Proceedings of 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '12)
    Grey Ballard, James Demmel, Benjamin Lipshitz, and Oded Schwartz
    PDF (228k)

  • Tech Report: Implementing a Blocked Aasen's Algorithm with a Dynamic Scheduler on Multicore Architectures
    (October 2012)
    Grey Ballard, Dulceneia Becker, James Demmel, Jack Dongarra, Alex Druinsky, Inon Peled, Oded Schwartz, Sivan Toledo, Ichitaro Yamazaki

  • Tech Report: Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication
    (October 2012)
    James Demmel, David Eliahu, Armando Fox, Shoaib Kamil, Benjamin Lipshitz, Oded Schwartz, Omer Spillinger
    PDF (769k)

  • Tech Report: A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of s-step Krylov Subspace Methods
    (UCB/EECS-2012-197, September 2012)
    Erin Carson and James Demmel
    PDF (1457k)

  • Paper: Communication-Optimal Parallel Algorithm for Strassen's Matrix Multiplication
    Proceedings of the 24th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA)
    June 2012
    Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz, Oded Schwartz
    PDF (393k)

  • Paper: Strong Scaling of Matrix Multiplication Algorithms and Memory-Independent Communication Lower Bounds (Brief Announcement)
    ACM Symposium on Parallelism in Algorithms and Architectures (SPAA)
    (June 2012)
    Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz, and Oded Schwartz
    PDF (248k)

  • Tech Report: Communication bounds for heterogeneous architectures
    (June 2012)
    Grey Ballard, James Demmel, and Andrew Gearhart
    PDF (248k)

  • Tech Report: Instrumenting Linear Algebra Energy Consumption via On-chip Energy Counters
    (UCB/EECS-2012-168, June 2012)
    James Demmel and Andrew Gearhart
    PDF (1331k)

  • Tech Report: Perfect strong scaling using no additional energy
    (UCB/EECS-2012-126, May 2012)
    James Demmel, Andrew Gearhart, Oded Schwartz and Benjamin Lipshitz
    PDF (765k)

  • Tech Report: Communication Avoiding and Overlapping for Numerical Linear Algebra
    (UCB/EECS-2012-65, May 2012)
    Evangelos Georganas, Jorge González-Domínguez, Edgar Solomonik, Yili Zheng, Juan Touriño and Katherine Yelick
    PDF (573k)

  • Tech Report: Sequential Communication Bounds for Fast Linear Algebra
    (March 2012)
    Grey Ballard, James Demmel, Olga Holtz, Oded Schwartz
    PDF (288k)

  • Tech Report: A preliminary analysis of Cyclops Tensor Framework
    (UCB/EECS-2012-29, March 2012)
    Edgar Solomonik, Jeff Hammond, and James Demmel
    PDF (801k)

  • Paper: Communication Avoiding Symmetric Band Reduction
    ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), February 2012.
    Grey Ballard, Nicholas Knight, and James Demmel
    PDF (1357k)

  • Tech Report: Matrix multiplication on multidimensional torus networks
    (UCB/EECS-2012-28, February 2012)
    Edgar Solomonik and James Demmel
    PDF (315k)

  • Tech Report: LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version
    (UCB/EECS-2012-15, January 2012)
    Amal Khabou, James Demmel, Laura Grigori, and Ming Gu
    PDF (689k)

  • 2011

  • Tech Report: Avoiding Communication in Two-Sided Krylov Subspace Methods
    (UCB/EECS-2011-93, August 2011)
    Erin Carson, Nicholas Knight, and James Demmel
    PDF (793k)

  • Tech Report: Improving communication performance in dense linear algebra via topology aware collectives
    (UCB/EECS-2011-92, August 2011)
    Edgar Solomonik, Abhinav Bhatele, and James Demmel
    PDF (538k)

  • Paper: Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication
    International Parallel & Distributed Processing Symposium (IPDPS), May 2011
    Aydin Buluc, Samuel Williams, Lenny Oliker, James Demmel
    PDF (791k)

  • Paper: Graph Expansion and Communication Costs of Fast Matrix Multiplication
    ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), June 2011
    Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz
    [SPAA Best Paper Award]
    PDF (228k)

  • Tech Report: Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms
    (UCB/EECS-2011-10, February 2011)
    Edgar Solomonik and James Demmel
    [Euro-Par Distinguished Paper Award]
    PDF (700k)

  • Tech report: Minimizing Communication in Numerical Linear Algebra
    (UCB/EECS-2011-15, February 2011)
    Grey Ballard, James Demmel, Olga Holtz, Oded Schwartz
    [SIAM SIAG/LA Prize]
    PDF (560k) (2011 updated version)

  • 2010

  • Paper: Communication-Optimal Parallel and Sequential Cholesky Decomposition
    SIAM Journal on Scientific Computing (SISC), December 2010
    Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz
    PDF (463k)
    *Conference version published in SPAA 2009

  • Tech Report: Minimizing Communication for Eigenproblems and the Singular Value Decomposition
    (UCB/EECS-2011-14, February 2011)
    Grey Ballard, James Demmel, and Ioana Dumitriu
    PDF (852k)

  • Tech Report: Communication-Avoiding QR Decomposition for GPUs
    (UCB/EECS-2010-131, October 2010)
    Michael Anderson, Grey Ballard, James Demmel, and Kurt Keutzer
    PDF (2.3M)

  • Paper: Brief Announcement: Lower Bounds on Communication for Sparse Cholesky Factorization of a Model Problem
    Symposium on Parallelism in Algorithms and Architectures (SPAA 2010), June 2010
    Laura Grigori, Pierre-Yves David, James Demmel, Sylvain Peyronnet
    PDF (99.5k)

  • Tech Report: CALU: A Communication Optimal LU Factorization Algorithm
    (UCB/EECS-2010-29,March 2010)
    James Demmel, Laura Grigori, Hua Xiang
    PDF (428k)

  • Ph.D. Thesis: Communication-avoiding Krylov subspace methods
    (Computer Science Division, U.C. Berkeley, May 2010)
    Mark Hoemmen
    PDF (4.2M)

  • 2009

  • Ph.D. Thesis: Auto-tuning Stencil Codes for Cache-Based Multicore Platforms
    (Computer Science Division, U.C. Berkeley, December 2009)
    Kaushik Datta
    PDF (3.1M)
    Talk Slides: PPTX (4.1M) -or- PDF (3.3M)

  • Ph.D. Thesis: Automatically Tuning Collective Communication for One-Sided Programming Models
    (Computer Science Division, U.C. Berkeley, December 2009)
    Rajesh Nishtala
    PDF (4.7M)
    Talk Slides: PPTX (5.9M) -or- PDF (8.2M)

  • Paper: A Methodology for Domain-Optimized Power-Efficient Supercomputing
    (In Proc. of SC09, November, 2009)
    Marghoob Mohiyuddin, Mark Murphy, Leonid Oliker, John Shalf, John Wawrzynek, and Sam Williams
    PDF (950k)
    Talk slides:PDF (3.3M)

  • Paper: Minimizing Communication in Sparse Matrix Solvers
    (In Proc. of SC09, November, 2009)
    Marghoob Mohiyuddin, Mark Hoemmen, James Demmel, and Kathy Yelick
    PDF (1.2M)
    Talk slides:PDF (3.9M)

  • Paper: Auto-tuning the 27-point stencil for multicore
    (iWAPT2009: The Fourth International Workshop on Automatic Performance Tuning, October 2009)
    Kaushik Datta, Samuel Williams, Vasily Volkov, Jonathan Carter, Leonid Oliker, John Shalf, and Katherine Yelick
    PDF (464k)
    Talk slides: PDF (5.7 MB)

  • Paper: Communication-Optimal Parallel and Sequential Cholesky Decomposition
    (Symposium on Parallelism in Algorithms and Architectures (SPAA 2009), August 2009)
    Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz
    PDF (152k)
    *Journal version published in SISC 2010

  • Paper: A Generalized Framework for Auto-tuning Stencil Computations
    (Cray User Group Conference, Atlanta, GA, May 2009)
    [Winner, Best Paper]
    Shoaib Kamil, Cy Chan, Sam Williams, Leonid Oliker, John Shalf, Mark Howison, E. Wes Bethel, Prabhat
    PDF (354k)

  • Journal Paper: Communication Requirements and Interconnect Optimization for High-End Scientific Applications
    (IEEE Transactions on Parallel and Distributed Systems (TPDS), March 2009)
    Shoaib Kamil, Leonid Oliker, Ali Pinar, John Shalf
    Preprint PDF (8467k)

  • Paper: Optimizing Collective Communication on Multicores
    (HotPar 2009 , Berkeley, CA, USA, March 2009)
    Rajesh Nishtala and Katherine Yelick
    Paper: PDF (400kB)

  • 2008

  • Journal Paper: Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors
    (SIAM Review, December 2008)
    Kaushik Datta, Shoaib Kamil, Samuel Williams, Leonid Oliker, John Shalf, and Katherine Yelick
    PDF (2.8 MB)

  • Paper: Benchmarking GPUs to Tune Dense Linear Algebra
    (Supercomputing 2008, November 2008)
    [Winner, Best Student Paper]
    Vasily Volkov and James Demmel
    PDF (648k)

  • Paper: Communication Avoiding Gaussian Elimination
    (Supercomputing 2008, November 2008)
    Laura Grigori, James Demmel, and Hua Xiang
    PDF (584k)

  • Paper: Stencil Computation Optimization and Auto-Tuning on State-of-the-Art Multicore Architectures
    (Supercomputing 2008, November 2008)
    Kaushik Datta, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan Carter, Leonid Oliker, David Patterson, John Shalf, and Katherine Yelick
    PDF (612k)
    Talk slides: PPT (12.9 MB)

  • Tech report: Communication-optimal parallel and sequential QR and LU factorizations
    (UCB/EECS-2008-89, August 2008)
    James Demmel, Laura Grigori, Mark Frederick Hoemmen, and Julien Langou
    PDF (1.6 MB)

  • Paper: Hybrid Electric/Photonic Networks for Scientific Applications on Tiled CMPs
    (Workshop on High Performance Embedding Computing, August 2008)
    Ankit Jain, Shoaib Kamil, Marghoob Mohiyuddin, John Shalf, and John D. Kubiatowicz
    Abstract: PDF (282k)
    Talk slides: PPT (5.1 MB)

  • Masters Report: pOSKI: An Extensible Autotuning Framework to Perform Optimized SpMVs on Multicore Architectures
    Ankit Jain
    PDF (5.3 MB)

  • Tech report: LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs
    (UCB/EECS-2008-49, May 2008)
    Vasily Volkov and James Demmel
    PDF (1.0 MB)

  • Paper: Avoiding Communication in Sparse Matrix Computations
    (IEEE International Parallel and Distributed Processing Symposium, April 2008)
    James Demmel, Mark Hoemmen, Marghoob Mohiyuddin, and Katherine Yelick
    PDF (1 MB)
    Talk slides: PDF (6.2 MB)

  • Paper: Lattice Boltzmann Simulation Optimization on Leading Multicore Platforms
    (IEEE International Parallel and Distributed Processing Symposium, April 2008)
    [Winner, Best Paper for Applications track]
    Samuel Williams, Jonathan Carter, Leonid Oliker, John Shalf, and Katherine Yelick
    PDF (560k)
    Talk slides: PDF (10.4 MB) | PPT (2.6 MB)

  • 2007

  • Tech report: Using GPUs to Accelerate the Bisection Algorithm for Finding Eigenvalues of Symmetric Tridiagonal Matrices
    (UCB/EECS-2007-179, December 2007)
    Vasily Volkov and James Demmel
    PDF (1.8 MB)

  • Paper: Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms
    (Supercomputing 2007, November 2007)
    Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and James Demmel
    PDF (438k)
    Talk slides: PDF (6.4 MB) | PPT (2.5 MB)

  • Tech report: Avoiding Communication in Computing Krylov Subspaces
    (UCB/EECS-2007-123, October 2007)
    James Demmel, Mark Hoemmen, Marghoob Mohiyuddin, and Katherine Yelick
    PDF (34.9 MB)

  • Journal Paper: Scientific Computing Kernels on the Cell Processor
    (International Journal of Parallel Programming, April 2007)
    Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, and Katherine Yelick
    PDF (376k)

  • Journal Paper: When Cache Blocking Sparse Matrix Vector Multiply Works and Why
    (Applicable Algebra in Engineering, Communication and Computing, March 2007)
    Rajesh Nishtala, Richard W. Vuduc, James W. Demmel, and Katherine Yelick
    PDF (390k)

  • Paper: Benchmarking Sparse Matrix-Vector Multiply in Five Minutes
    (SPEC Benchmark Workshop 2007, Austin, TX, January 2007)
    Hormozd Gahvari, Mark Hoemmen, James Demmel, Katherine Yelick
    PDF (1 MB)
    Talk slides: PPT (6.4 MB)

  • 2006

  • Master's Thesis: Benchmarking Sparse Matrix-Vector Multiply
    (Computer Science Division, U.C. Berkeley, December 2006)
    Hormozd Gahvari
    PDF (11.5 MB)

  • Paper: Implicit and Explict Optimizations for Stencil Computations
    (Memory Systems Performance and Correctness, San Jose, California, USA, October 2006)
    Shoaib Kamil, Kaushik Datta, Samuel Williams, Leonid Oliker, John Shalf, and Katherine Yelick
    PDF (604k)
    Talk slides: PDF (3.2 MB)

  • Paper: The Potential of the Cell Processor for Scientific Computing
    (Computing Frontiers 2006, Ischia, Italy, May 2006)
    Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, and Katherine Yelick
    PDF (216k)

  • 2005

  • Paper: Fast sparse matrix-vector multiplication by exploiting variable blocks
    (Proceedings of the International Conference on High-Performance Computing and Communications, Sorrento, Italy, September 2005)
    Richard Vuduc and Hyun-Jin Moon.
    PDF (322k)

  • Paper: OSKI: A library of automatically tuned sparse matrix kernels
    (Proceedings of SciDAC 2005, Journal of Physics: Conference Series, June 2005)
    Richard Vuduc, James Demmel, and Katherine Yelick.
    PDF (190k)

  • Paper: Self-Adapting Linear Algebra Algorithms and Software
    (Proceedings of the IEEE, Special Issue on Program Generation, Optimization, and Adaptation, 93(2), February 2005)
    James Demmel, Jack Dongarra, Victor Eijkhout, Erika Fuentes, Antoine Petitet, Richard Vuduc, R. Clint Whaley, and Katherine Yelick.
    PDF (600k)

  • 2004

  • Paper: Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply
    (International Conference on Parallel Processing, Montreal, Quebec, Canada, August 2004)
    [Winner, Best Paper Award]
    Benjamin C. Lee, Richard Vuduc, James Demmel, and Katherine Yelick.
    PDF (178k) | Gzip'd PostScript (204k)
    Talk slides: PDF (540k)

  • Paper: Toward automatic performance tuning of matrix triple products based on matrix structure
    (PARA'04 Workshop on State-of-the-art in Scientific Computing, Copenhagen, Denmark, June 2004.)
    Eun-Jin Im, Ismail Bustany, Cleve Ashcraft, James Demmel, and Katherine Yelick.

  • Tech Report: Performance Modeling and Analysis of Cache Blocking in Sparse Matrix Vector Multiply
    (UCB/CSD-04-1335, June 2004)
    Rajesh Nishtala, Richard W. Vuduc, James W. Demmel, and Katherine A. Yelick
    PDF (~8MB)

  • Paper: SPARSITY: An Optimization Framework for Sparse Matrix Kernels
    (International Journal of High Performance Computing Applications, 18 (1), pp. 135-158, February 2004)
    Eun-Jin Im, Katherine A. Yelick, and Richard Vuduc.
    PDF (1.1M) | Gzip'd PostScript (1.2M)

  • Paper: Statistical Models for Empirical Search-Based Performance Tuning
    (International Journal of High Performance Computing Applications, 18 (1), pp. 65-94, February 2004)
    Richard Vuduc, James W. Demmel, and Jeff A. Bilmes.
    PDF (950k) | Gzip'd PostScript (983k)

  • 2003

  • Ph.D. Thesis: Automatic Performance Tuning of Sparse Matrix Kernels
    (Computer Science Division, U.C. Berkeley, December 2003)
    Richard Vuduc.
    PDF (7.6M)

  • Tech report: Performance Optimizations and Bounds for Sparse Symmetric Matrix-Multiple Vector Multiply
    (UCB/CSD-03-1297, November 2003)
    Benjamin C. Lee, Richard W. Vuduc, James W. Demmel, Katherine A. Yelick, Michael de Lorimier, and Lijue Zhong.
    PDF (867k) | Gzip'd PostScript (1.3M)

  • Senior Thesis: Effects of Block Size on the Block Lanczos Algorithm
    (Dept. of Mathematics, U.C. Berkeley, June 2003)
    Christopher Hsu
    MS Word (2.2 MB) | PDF (273k)

  • Paper: Memory Hierarchy Optimizations and Performance Bounds for Sparse ATA*x
    (ICCS 2003: Workshop on Parallel Linear Algebra, Melbourne, Australia, June 2003)
    Richard Vuduc, Attila Gyulassy, James W. Demmel, and Katherine A. Yelick.
    Abstract | PDF (328k) | Gzip'd PostScript (91k)
    Talk slides, PDF (735k) | Talk slides, gzip'd PostScript, 4-up (138k)
    Extended version: U.C. Berkeley Technical Report UCB/CS-03-1232

  • 2002

  • Paper: Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply
    (Proceedings of the IEEE/ACM Conference on Supercomputing, 2002, Baltimore, MD, USA, November 2002)
    Richard Vuduc, James W. Demmel, Katherine A. Yelick, Shoaib Kamil, Rajesh Nishtala, and Benjamin Lee.
    Abstract | PDF (834k) | Gzip'd PostScript (2.7M)
    Talk slides, PDF (1.0M) | Talk slides, gzip'd PostScript, 4-up (159k)

  • Paper: Automatic Performance Tuning and Analysis of Sparse Triangular Solve
    (ICS 2002: Workshop on Performance Optimization via High-Level Languages and Libraries, New York, NY, USA, June 2002)
    Richard Vuduc, Shoaib Kamil, Jen Hsu, Rajesh Nishtala, James W. Demmel, and Katherine A. Yelick.
    Abstract | PDF (548k) | PostScript (1.2M)
    Talk Slides, PDF (681k)

  • 2001

  • Paper: Optimizing Sparse Matrix-Vector Multiplication for Register Reuse
    Eun-jin Im and Katherine A. Yelick.
    International Conference on Computational Science, San Francisco, California, May 2001.
    PDF (164k) Gzip'd PostScript (132k)

  • Paper: Statistical Models for Automatic Performance Tuning
    Richard Vuduc, James Demmel, Jeff Bilmes.
    International Conference on Computational Science, San Francisco, CA, USA, May 2001.
    PDF (410k) | Gzip'd PostScript (163k) | Talk Slides, PDF (685k)


 
Writing in preparation:
  • Tech report: Matrix Splitting and Reordering for Sparse Matrix-Vector Multiply
    Hyun Jin Moon, Richard Vuduc, James W. Demmel, Katherine A. Yelick.
    UCB Technical Report, 2003. (in preparation)
    Abstract (DRAFT)

Talks and Posters

(Talks delivered in conjunction with conference papers appear under Publications.)

Year: 2008 | 2005 | 2004 | 2003 | 2002

    2008

  • Talks: Microsoft/Intel Visit to ParLab
    Multiple Authors.
    (Berkeley, California, USA, April 28-29, 2008)
    Link to Talks

  • Talk: Avoiding Communication in Linear Algebra
    James Demmel.
    (SIAM PP 2008, Atlanta, Georgia, USA, March 12-14, 2008)
    Powerpoint (940k) | PDF (1.7MB)

  • Talk: Bandwidth Avoiding Stencil Computations
    Kaushik Datta, Sam Williams, James Demmel, Katherine Yelick.
    (SIAM PP 2008, Atlanta, Georgia, USA, March 12-14, 2008)
    Powerpoint (2.45MB) | PDF (3.05MB)

  • Talk: Fast Implementations of the Akx Kernel
    Marghoob Mohiyuddin, Mark Hoemmen, James Demmel, Katherine Yelick.
    (SIAM PP 2008, Atlanta, Georgia, USA, March 12-14, 2008)
    PDF (9.3MB)

  • Talk: Communication-avoiding Krylov subspace methods
    Mark Hoemmen.
    (SIAM PP 2008, Atlanta, Georgia, USA, March 12-14, 2008)
    PDF (952k)

  • 2005

  • Talk: The Future of Numerical Linear Algebra Libraries: Automatic Tuning of Sparse Matrix Codes and the Next LAPACK and ScaLAPACK
    (SciDAC 2005 Meeting, San Francisco, California, USA, June 2005)
    PowerPoint (305k)

  • Talk: OSKI: An automatically tuned library of sparse matrix kernels
    (SIAM CSE 2005, Orlando, Florida, USA, February 2005)
    Richard Vuduc, James Demmel, Katherine Yelick.
    PowerPoint | PDF

  • 2004

  • Talk: When Cache Blocking Sparse Matrix Multiply Works and Why
    (PARA'04 Workshop on State-of-the-art in Scientific Computing, Copenhagen, Denmark, June 2004)
    Rajesh Nishtala, Richard Vuduc, James Demmel, Katherine Yelick.
    PDF (147k) | Gzip'd PostScript (138k) | Talk Slides: MS PPT (3MB)

  • Talk: Adaptable benchmarks for register blocked sparse matrix-vector multiplication, Mark Hoemmen, Matrix Computations Seminar at U.C. Berkeley, May 5, 2004.
    PowerPoint (420k) | PDF (420k) | Link to Software

  • Talk: Performance Optimizations and Bounds for Symmetric Sparse Matrix-Multiple Vector Multiply
    SIAM Parallel Processing Meeting, San Francisco, California, USA, March 2004.
    PowerPoint (319k) | PDF (331k)

  • Poster: A Computationally Efficient Triple Matrix Product for a Class of Sparse Schur-Complement Matrices
    SIAM Parallel Processing Meeting, San Francisco, California, USA, March 2004.
    PowerPoint (678k) | PDF (177k) | PNG (504k) | GIF (436k) | JPEG (985k)

  • Poster: Automatic Tuning of Collective Communications in MPI
    An overview of the Probabilistic Algorithm Selection System (PASS) for finding and choosing the best implementation of a given MPI collective operation on a given hardware platform.
    PowerPoint (5.3M) | PDF (118k)
    Full report: PDF (1.9 MB)

  • 2003

  • Poster: Automatic Performance Tuning of Sparse Matrix Kernels
    This poster, shown at the Berkeley-Stanford CS Day (2002, 2003), Bay Area Scientific Computing Day (2002), and the SIAM CSE 2003 Meeting, summarizes our overall approach to tuning a number of sparse matrix kernels, including matrix-vector multiply (non-symmetric and symmetric), triangular solve, and multiplication by ATA.
    PowerPoint (1.7 MB) | PDF (617k) | PNG (520 kB) | GIF (715 kB) | JPEG (1.5 MB)

  • Talk: Automatic Performance Tuning of Sparse Matrix Kernels
    Presentation of recent work to Intel, with a discussion of ideas related to acceleration of the Google PageRank algorithm. (24 Jan 2003)
    PowerPoint (806 kB) | PDF (830k)
    Google matrix pics, PowerPoint (142 kB) | PDF (139k)

  • 2002

  • Talk: Automatic Performance Tuning of Linear Algebra Kernels
    This presentation, given at the TOPS-SciDAC kick-off meeting (25 Jan 2002), describes recent results performance tuning of sparse linear algebra kernels.
    PowerPoint (1.5 MB) | PDF (2.7 MB)

Related Projects

Current Local Activities

Prior Projects

  • XBLAS: Extended and Mixed-Precision BLAS
  • Sparsity: A Toolkit for Optimizing Sparse Matrix-Vector Multiply
  • IRAM: Intelligent RAM Project
  • Titanium: High-Performance Java Compiler
  • PHiPAC: An Automatic Tuning System for Matrix Multiply

External Collaborations


This research was supported in part by the National Science Foundation under NSF Cooperative Agreement No. ACI-9813362, NSF Cooperative Agreement No. ACI-9619020, the Department of Energy under DOE Grant No. DE-FC02-01ER25478, and a gift from Intel. The information presented here does not necessarily reflect the position or the policy of the Government and no official endorsement should be inferred.
 
Standard disclaimer: Publications are presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.