Author: Niv Drory, Claus A. Gössl, Jan Snigula

Copying

The Little Template Library is available under the GNU General Public License. See the file `COPYING' for details.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA

Introduction

LTL provides dynamic arrays of up to 7 dimensions, subarrays and slicing, support for fixed size vectors and matrices including basic linear algebra operations, expression templates based evaluation, and I/O facilities for columnar ASCII and FITS format files.

Although the library is developed with application to astronomical image and data processing in mind (therefore FITS format I/O), it is by no means restricted to these fields of application. In fact, it qualifies as a fully general array processing package.

Focus is laid on a high abstraction level regarding the handling of expressions involving arrays or parts thereof and linear algebra related operations without the usually involved negative impact on performance. Hence the use of expression templates throughout the library. The price to pay is dependence on a compiler implementing enough of the current ANSI C++ specification, as well as significantly higher demand on resources at compile time.

Version 2.0 adds an support for convolutions in expression templates, an interface to the FFTW library and a simple method to plot directly into gnuplot. The types complex<T> are now fully supported, including all functions defined in the C++ standard header <complex>. Vector and tensor fields (implemented as MArray<FVector<>> and MArray<FMatrix<>> are now also supported in expression templates. Some convolutions, e.g. the gradient return such fields. Convolution operations in expressions, including finite-differencing kernels are now also fully supported. Internally, the expression template engine has been simplified. Now all expression operands are derived from a single base type, simplifying the code significantly. The FITS interface has been extended to support FITS extensions and tables.

Version 1.9 adds an interface to call BLAS and LAPACK routines on MArrays of dimension 2 and 1, as well as LU and Singular Value Decompositions on fixed-size FMatrix and FVector classes.

Version 1.8 adds support for multithreaded environments, for example use of OpenMP, and a much more user-friendly interface to non-linear least-squares fitting of functions.

A new feature as of LTL 1.7 is the automatic use of the vector execution unit present in some modern processors in the evaluation of array expressions. This is basically a compiler independent auto-vectorizer. Currently PowerPC processors (Altivec) and x86 processors (MMX/SSE/SSE2/SSE3) are supported.

LTL is known to compile on GCC versions 2.95.2 and above (Linux/x86/x86_64, Mac OS X/PPC/x86/x86_64, Solaris/SPARC), Intel ICC version 7.1 and above (Linux/x86/x86_64), Sun C++ version 5.5 and above (Solaris/SPARC), IBM xlC version 6 and above (AIX/PPC, Mac OS X/PPC), and LLVM/Clang. Other sufficiently ANSI complient C++ compilers should also do, but might require small modifications. Patches welcome. Old compiler versions do not receive the same amount of testing as more recent versions, so support for some old compiler/architecture combinatins may have been lost. Most development and testing is done on Linux and Mac OS X using g++ 4.x, LLVM, and Intel ICC.

The multidimensional array class ltl::MArray has the following features:

creating and referencing subarrays (rank preserving)
slicing (rank reducing), e.g. a column of an image
mixing subarrays and slices in the same indexing expression e.g. a submatrix of a slice of a cube
referencing the data of other arrays ('views')
reference counting for the memory chunks holding the actual data
STL-compatible iterators
arbitrary complex arithmetic expressions without creation of temporary objects: 'expression templates'
support for all standard library math functions
support for complex<float> and complex<double>
support for FVector and FMatrix element types: tensor fields
user supplied functions can be added easily
auto-vectorization of expressions (Altivec, SSE2/3/4)
indexing arbitrary sets of elements: where()
conditional expression evaluation: merge( A!=0, 1/A, 0 )
applying arbitrary functors to expressions: apply(f, Expr)
statistical and logical reductions: sum, average, variance, kappa-sigma-clipping, ..., allof, anyof, noneof
full reductions (scalar valued) and partial reductions (reducing rank by one)
stream I/O
ASCII-file I/O for columnar data
FITS file I/O including extensions and tables
transparent memory-mapped arrays
threadsafe reference-counting memory management
BLAS and LAPACK interface for 1 and 2D arrays.
FFTW interface
support for convolution operators in expression templates, e.g. gradient, Laplacian, ...
gnuplot interface

The fixed vector and matrix classes ltl::FVector and ltl::FMatrix provide:

compile time fixed size (allows strong optimization)
expression template based evaluation of arithmetic and linear algebra expressions
referencing column and row vectors of a matrix
vector dot product
matrix-vector and matrix-matrix dot-product
all operations on small enough objects are automatically unrolled by template metaprograms
STL-compatible iterators
Gauss-Jordan-Elimination
LU decomposition
Singular Value decomposition
Linear Least Squares Fitting method(s)
Nonlinear Least Squares Fitting methods(s)
gnuplot interface
Full support for complex<T>

Additionally, there are some utility classes providing:

command line and config file parsing
output formatting utilities
date format handling

The official homepage is at http://www.mpe.mpg.de/~drory/ltl/index.html .

The author's email addresses are:

Niv Drory < drory at mpe dot mpg dot de >
Claus A. Goessl < cag at usm dot uni-muenchen dot de >
Jan Snigula < snigula at usm dot uni-muenchen dot de >

Comments, suggestions, and code contributions are welcome anytime.

Documentation

The documentation is divided into 2 parts. The first part gives an overview of the library and its usage and contains many examples. It is a good starting point for getting to know the LTL:

The Doxygen-generated documnetation for the classes and functions serves as a reference, for example ltl::MArray, ltl::FVector, ltl::FMatrix. You can access all of this through the modules section of the documentation.

Compiling and Installing LTL

Compiling LTL

Make sure your compiler is supported. Currently GCC versions 2.95.2 and newer are known to work. GCC version 3.1 or later is recommended, since it is much faster, uses less memory and produces more optimized code when compiling heavily templated source. See http://gcc.gnu.org .

LTL's build system is based on GNU autoconf. After unpacking the source archive, go to the directory containing the source and type ./configure, then make, optionally make check to run a testsuite, and finally make install.

The configure script will check for the presence of some standard headers and POSIX functions and verify that the compiler used has all necessary features needed to compile LTL.

The configure script accepts a number of standardized options, run ./configure –help for a complete list. The most important options and environment variables are discussed in what follows.

The configure system honors the setting of the CXX, CXXFLAGS, and CPPFLAGS environment variables. The directory-prefix for installing header files and the library defaults to /usr/local. Headers are installed in /usr/local/include, the library, libltl.a, in /usr/local/lib. Info documentation goes to /usr/local/info.

Note that the LTL header files will actually be installed in a subdirectory ltl/ within the include directory to prevent conflicts with existing header files.

The prefix can be changed by the –prefix= option to configure. –libdir=, –includedir=, and –infodir= can be used to change the target directories individually.

If you are planning on using LTL in a multithreaded environment (e.g. using OpenMP), supply the flag –enable-multithread to configure. This will ensure that reference counting memory allocations/deallocations are guarded by locks.

The configure script will try to auto-detect BLAS and LAPACK libraries on your system. If this fails or you wish to force the use of a specific implementation, use the options –with-blas[=LIB] and –with-lapack[=LIB].

A typical session might look as follows:

tar -zxf ltl-2.0.tar.gz
cd ltl-2.0/
./configure --prefix=$HOME/ 
...
make
make check
make install

The configure script will try to find sensible defaults for the system's C++ compiler. You can override the defaults by setting the CXX and CXXFLAGS environment variables. To compile with a non-default compiler or different compiler flags, replace the invocation of the configure script above by

./configure --prefix=$HOME/ CXXFLAGS='-g -O3 -Wall -march=... -fstrict-aliasing'

or

./configure --prefix=$HOME/ CXX=/opt/clang+llvm-2.8/bin/clang++ CXXFLAGS='-g -O3 -Wall'

Using LTL

Using LTL involves including one or more header files, depending on the features of LTL that are needed in your source file and linking against libltl.

To keep compile times as low as possible, the LTL headers are split into parts providing distinct functionality.

Using Multidimensional Arrays

To use the multidimensional array class, its expression template facilities and standard math operations, use

#include <ltl/marray.h> // class MArray

To use the I/O facilities for the MArray class, include one or more of the following after including ltl/marray.h:

#include <ltl/marray_io.h>   // stream IO, operator<< and operator>>
#include <ltl/ascio.h>       // columnar ASCII file IO
#include <ltl/fitsio.h>      // FITS format IO

To use statistical functions on MArrays and their expressions, include

#include <ltl/statistics.h> // sum, average, median, ...

For support for convolution operations with small kernels in MArray expressions, include

#include <ltl/convolve.h>.

To call BLAS and LAPACK routines on MArrays include

#include <ltl/marray/blas.h> // BLAS Level 1, 2, and 3 routines

and/or

#include <ltl/marray/lapack.h> // Selected LAPACK routines (easily extensible).

For interfacing the FFTW3 library, include

#include <ltl/fftw.h> // Simple interface to FFTW3 routines

For a simple way of plotting data from your program, either for debugging purposes or for more serious visualization, there is a stream-based interface to executing and communicating with gnuplot:

#include <ltl/util/gnuplot.h> // Stream-based interface to gnuplot

Using Fixed Vectors and Matrices

To use the fixed vector class, its expression template facilities and standard math operations, use

#include <ltl/fvector.h> // class FVector

To use the fixed matrix class, its expression template facilities and standard math operations, use

#include <ltl/fmatrix.h> // class FMatrix

For some linear algebra with FMatrix and FVector use

#include <ltl/fmatrix/gaussj.h>    // Gauss-Jordan
#include <ltl/fmatrix/lusolve.h>   // LU decomposition
#include <ltl/fmatrix/svdsolve.h>  // SVD
#include <ltl/fmatrix/lmfit.h>     // non-linear least-squares fitting

Note that including <ltl/fmatrix.h> automatically includes <ltl/fvector.h>, since these represent column and row vectors of matrices.

Namespaces

All LTL objects reside in the namespace ltl, objects in the util library reside in namespace util

Global Configuration

Some global options can be set in the file <ltl/config.h> by defining or undefining preprocessor symbols. These options can also be controlled on a case by case basis by defining or undefining the appropriate symbol before including the first LTL header.

Also, make sure you include the ltl headers before any standard library headers to make sure that configuration options get honored by the standard library as well.

For example, if you want range checking (for debugging) turned on in a particular source file of your project, write

// MUST occur BEFORE including any ltl stuff!
#define LTL_RANGE_CHECKING
#include <ltl/marray.h>
...

In particular these are:

LTL_RANGE_CHECKING: perform range checking when indexing. Off by default.
LTL_ABORT_ON_RANGE_ERROR: abort and coredump if a range error occurs. This is the default if range checking is on.
LTL_THROW_ON_RANGE_ERROR: throw an exception if a range error occurs.
LTL_THREADSAFE_MEMBLOCK: Define this if you are using multiple threads or if you are compiling using OpenMP. This will guard memory block reference counting by locks. This option can also be set at configure time using the option –enable-openmp. Locks are implemented as machine intrinsics if available (GCC) or as posix mutexes if not (much slower). You have to make sure that you supply the necessary -march flag to GCC to ensure that the machine intrinsics are defined (and exist for the CPU you are compiling for).
LTL_TEMPLATE_LOOP_LIMIT: maximum length of loops which are fully unrolled by expression template engine for ltl::FVector and ltl::FMatrix expressions. Defaults to 25.
LTL_USE_SIMD: use vector execution unit when evaluating array expressions (auto-vectorize loops on supported hardware and if all operands in the expression have vectorized implementations). Currently, many array operations and reductions have vectorized implementations for double (on x86), float, and int element data. Shorter int types may or may not be supported.
LTL_UNROLL_EXPRESSIONS (default) or LTL_DONT_UNROLL_EXPRESSIONS: Controls whether loops are unrolled 4 times or not. Sometimes unrolling can yield worse performance on register-poor architectures such as x86/ia32.
LTL_UNROLL_EXPRESSIONS_SIMD (default) or LTL_DONT_UNROLL_EXPRESSIONS_SIMD: Controls whether vectorized loops are unrolled 4 times. Sometimes unrolling can yield worse performance on register-poor architectures such as x86/ia32.
LTL_DEBUG_EXPRESSIONS: Print information about the evaluation method chosen for expressions. This will indicate among others if the multidimensional loops were collapsed and if the expression was vectorized.