Future work

This page is for discussion of projects or ideas that might be incorporated into the Sparse Matrix Converter at some time in the future. Of course, they might not be, so don't get too hopeful :) We encourage submissions of ideas -- please use the e-mail list for that.

Design philosophy
Shorter-term goals
Medium-term ideas
Longer-term ideas

Design philosophy

Dynamic environment

One of my goals was to keep things as dynamic as possible, because I foresee linking the SMC to an interactive environment like Matlab or some scripting / rapid prototyping language (hence the Common Lisp interface, as Lisp is excellent for rapid prototyping). Scientific computing is moving towards a more interactive user environment, even in the parallel realm, so I think it's helpful to provide an infrastructure that enables dynamic use.

One example of the SMC's dynamicity is that multiple matrices with different element data types (real, complex or pattern) can coexist simultaneously (without needing to recompile the library). Incorporation into the extensive OSKI library will allow flexible addition of new matrix formats without needing to recompile, and the option to tune the matrix datatype for maximum performance. Rich's OSKI library is an amazing piece of work, and I'm glad to have his support on this project.

Shorter-term goals

Putting header files in their own directory namespace, in order to avoid collisions with other projects' header files, as well as to allow simplifying the names of header files. Thus, we can write
```
#include <bebop/util/malloc.h>
```
rather than
```
#include <smvm_malloc.h>
```
I finished implementing this change on 12 Nov 2007.
Separating low-level conversion / parsing code from high-level representation (i.e., anything that requires a header file, or struct or enum definitions). This would allow other library authors to use our converters, without requiring them to commit to our structs, enums, or other particulars of our higher-level representations. It would also ease the transition to automatically generated converters and parsers, in the style of the Bernoulli Sparse Compiler Toolkit.

Medium-term ideas

Rewriting the Harwell-Boeing sparse matrix file parser. I discuss the relevant issues on this page. The parsers for the other supported file formats should also be made more fault-tolerant. I have a prototype in Lisp, but getting it to work in C and be robust and fault-tolerant is nontrivial.
Integration with OSKI. There are some design differences to work out. We'll start by extracting the low-level kernels (e.g. conversion routines) and work from there to a common higher-level interface.
Round out the type converters: eventually it should be possible to convert from any type to any other type (allowing for the possibility of going through an intermediate type like CSR).
Automatic file format detection: It should be pretty easy to guess whether a file is in Matrix Market or Harwell-Boeing format, just from looking at the first line.

Longer-term ideas

Format conversion reasoning: Construct a graph of the available type conversions, so that if a user specifies any two formats, the library can convert from one to the other by following the appropriate available paths (e.g. BCSR to CSR to JAD). This would be part of a matrix format management system, part of which is already in place in OSKI.
Out-of-core conversion routines: In most cases, format conversion requires making a copy of the matrix. For large matrices, the resulting memory requirements may be prohibitive. It would be helpful if in those cases, the library could be smart enough to use an out-of-core conversion algorithm. I imagine the user would have to specify a place for temp files, as heavy usage of /tmp can be harmful on some systems, and using NFS-mounted home directories could also be a bad idea. I like this idea with an interactive system better, as the library could notify users if the matrix is too large and ask whether to try an out-of-core algorithm or give up. With the big batch jobs, you might want static control of that situation, because I think the out-of-core algorithms will be slow (basically at disk bandwidth speeds rather than memory bandwidth speeds). But it's still helpful to have a failsafe so that the code won't crash and waste the job.
Parallel matrix data structures: The BeBOP group wants to move towards applying tuning techniques to distributed matrices as a whole (rather than to the local components only). In that case, we would need robust format conversion routines that could interact with parallel file formats such as HDF. However, I'm not sure what the user demand is for this sort of thing. If users usually generate matrices on-the-fly and don't save them to disk, then it's probably not worthwhile to support complicated parallel formats. Benchmarkers can afford to do simple conversions offline.