format.h File Reference


Detailed Description

Modified block compressed sparse row data structure.

For an overview, see Modified Block Compressed Sparse Row (MBCSR) Format.

#include <oski/common.h>
#include <oski/matrix.h>
#include <oski/BCSR/format.h>

Go to the source code of this file.

Data Structures

struct  tagOski_submatMBCSR_t
 Stores an MBCSR "submatrix". More...
struct  tagBebop_matMBCSR_t
 Modified block compressed sparse row (MBCSR) format. More...
#define INC_OSKI_MBCSR_FORMAT_H
 oski/MBCSR/format.h included.

Name mangling.

#define oski_submatMBCSR_t   MANGLE_(oski_submatMBCSR_t)
 MBCSR matrix representation.
#define oski_matMBCSR_t   MANGLE_(oski_matMBCSR_t)
 MBCSR matrix representation.
#define oski_MBCSR_MatMult_funcpt   MANGLE_(oski_MBCSR_MatMult_funcpt)
 MBCSR matrix representation.

Typedefs

typedef tagOski_submatMBCSR_t oski_submatMBCSR_t
 Stores an MBCSR "submatrix".
typedef tagBebop_matMBCSR_t oski_matMBCSR_t
 Modified block compressed sparse row (MBCSR) format.


Typedef Documentation

typedef struct tagBebop_matMBCSR_t oski_matMBCSR_t
 

Modified block compressed sparse row (MBCSR) format.

An $m\times n$ matrix $A$ stored in $r\times c$ MBCSR format is logically partitioned row-wise into 3 submatrices, $A = \left(\begin{array} {}A_1 \\ A_2 \\ A_3\end{array}\right)$, where $A$ is stored in $r\times c$ MBCSR format. This partitioning is a `canonical' format in which $A_1$, $A_2$, and $A_3$ have the following properties:

  • $A_1$ consists of all block rows and diagonal blocks of size $r\times c$ and $r\times r$, respectively.
  • $A_2$ contains the degenerate diagonal block of size $r'\times r'$ where $r' < r$, and the corresponding block row of size $r'$ that contains it.
  • $A_3$ contains no diagonal blocks (i.e., is simply stored in BCSR format).

The purpose of this format is to enable fast implementations of the triangular solve and $A^TA\cdot x$ kernels in which the `special case' code to handle degenerate diagonal block, if any, is stored separately in $A_2$, and multiplication by $A_1, A_2,$ and $A_3$ can execute at `full' speed. The canonical form provides a uniform way to treat both square and rectangular matrices stored in MBCSR format.

typedef struct tagOski_submatMBCSR_t oski_submatMBCSR_t
 

Stores an MBCSR "submatrix".

An $m\times n$ matrix $B$ stored in $r\times c$ MBCSR submatrix format must satisfy the following conditions:

  • $r$ divides $m$
  • $m \leq n$

Let $M = \frac{m}{r}$ be the number of block rows. Then $B$ is stored in 4 arrays, $(P, J, V, D)$, as follows:

  • $D$ is an array of length $Mr^2$ containing all the elements of $B$ that lie within the $M$ diagonal blocks beginning at element $B(1, \delta)$ where each block is a square $r\times r$ block. The blocks are stored consecutively, and each block is stored in row-major order. More precisely, for all $0 \leq I_0 < M$, $0 \leq i < r$, and $0 \leq j < r$, the array element $D[I_0\cdot r^2 + i\cdot r + j]$ stores the matrix element $B(1+I_0\cdot r + i, \delta+I_0\cdot r + j)$.
  • $P$ is an integer array of block-row pointers, of length at least $M+1$.
  • $J$ is an integer array of block column indices, of length at least $P[M]$.
  • $V$ is an array of non-zero block values, of length at least $P[M]\cdot r\cdot c$.
  • Define block-row $I$, where $0 \leq I < M$, as rows $1+(I-1)\cdot r$ through $I\cdot r$ of $B$. Then For each $k$ such that $P[I-1] \leq k \leq P[I]$, $j_0=J[k]+1$ is the column index of the $(1,1)$ entry of an explicitly stored non-zero block whose values are laid out in row-major order in the subarray $V[krc : (k+1)rc-1]$.
  • If any block in $V$ overlaps with a diagonal block in $D$, then the corresponding entries in $V$ are set to 0.
  • No block in $V$ is defined so that it extends beyond column $n$ of $B$. That is, if $c$ does not divide $n$, then each block row may contain one block with $j_0 = n - c + 1$. If such a block overlaps with another block starting at column index $j_0'$, then the initial columns of the block at $j_0$ are set to zero.


Generated on Wed Sep 19 16:41:22 2007 for BeBOP Optimized Sparse Kernel Interface Library by  doxygen 1.4.6