LFMat

About

Benchmarks

News

Tutorial

Download

Links

Hosted on

Designed with

Best viewed with

Contact:


	Goals

The first goal of LFMat is to furnish convenient matrix tools for the finite element methods.

Actually, there's a lot of libraries for linear algebra on the net, but it seems that it's still hard to find flexible and high performance free software for the required procedures (genericity, speed, adaptated storage, ...).

LFMat is a generic purpose, fully templated open source C++ matrix library. Particular attention has been furnished to get convenient storage for SIMD instructions like 3Dnow! and SSE2 on x86 processors and Altivec on PowerPC ones. It means that there's specializations for severals important types like float or double in order to get the deserving performances. Furthermore, important routines make careful use of cache, leading -- as example -- to solvers up to 8 times faster than standard lapack ones in the same situation (see benchmarks).

Matrices can contain any kind of data (double, float, symbolic expressions, ...) and user can choose orientation, storage style and structure (see tutorial). Furthermore, matrices can be of fixed size (known at compilation time), allowing compilers to make additional optimizations.


	SIMD instructions

These instructions work far better with data aligned in memory. If it is not the case, the CPU wastes time to retrieve data.

Table 1.1. Indexes in a symmetric float row oriented dense matrix, stored using the lower part with no alignment and with an alignment of 4 values.

Alignment value = 1

Alignment value = 4

    0  .  .  .  .  .  ...
    1  2  .  .  .  .  ...
    3  4  5  .  .  .  ...
    6  7  8  9  .  .  ...
    10 11 12 13 14 .  ...
    15 16 17 18 19 20 ...
    .. .. .. .. .. .. ...

    0  .  .  .  .  .  ...
    4  5  .  .  .  .  ...
    8  9  10 .  .  .  ...
    12 13 14 15 .  .  ...
    16 17 18 19 20 .  ...
    24 25 26 27 28 29 ...
    .. .. .. .. .. .. ...

Thus, for a classical symmetric matrix, lines should be aligned with multiples of the size of vectors used in SIMD instruction. Table 1.1 shows the indexes in a symmetric dense matrix, with no alignment and with an alignment of 4 values.

Thus, scalar products between lines and other lines or vector become truly faster. This increases performance of a lot of procedures because this kind of scalar product is in the center of a lot of ones.


	Features

For now, storage styles can be:

dense (n*m elements for a rectangular matrix, n*(n+1)/2 for a square symmetric matrix),
dense uncompressed (n*n for a symmetric matrix),
sky line (user gives the beginning and/or the end of each lines),
sparse, row or column compressed,
band.

Structures can be:

generic (no particular properties),
diagonal.
symmetric,
antisymmetric,
hermitian,
triangular, upper or lower,

The number of reserved elements depends on both storage and structure.

Furthermore, matrices can be:

row oriented,
column oriented,
diagonal oriented (still in progress).

Some useful procedures have been coded for different kind of matrices:

solvers (cholesky, ... see Table 1.2 ),
operators (*, ... see Table 1.3),
eigen values finders.
converter between different kind of matrices

All these procedures have been designed to be fast, using cache and SIMD instruction where possible.

Table 1.2. Supported solvers. "R" and "C" means that ro and column oriented are supported. "O" means that the procedure is optimized for good use of the cache memory.

Storage style	Generic square	Diagonal	Symmetric	Symmetric positive	Triangular lower	Triangular upper
Dense	RO	RC	R	RO	R	RC
Skyline	RO	RC	R	RO	R	RC
Sparse

Table 1.3. Supported operators. * means matrix(row oriented)*vector.

Storage style	Generic square	Diagonal	Symmetric	Triangular lower	Triangular upper
Dense	*	*	*	*	*
Skyline	*		*	*	*
Sparse	*		*	*	*


	Todo

Prefetching.
Better test for altivec in configure.in.
Direct solve of sparse matrices (without conversion to skyline ones) -- algorithm already done but not tested.
User should be able to choose how to find pivots in LU factorization.
add_vec_mul subroutine with SSE, SSE2 and Altivec (only 3Dnow! for now).
Better algorithms to find eigen values.
Better use of fixed size matrices.
Benchmarks with SSE2, SSE, Altivec.