|
#define | TENSOR_MAXDIM 6 |
|
#define | IND1 _i |
| Macros IND1, ..., IND6, and IND are a convenience for indexing in macro iterators. More...
|
|
#define | IND2 _i,_j |
|
#define | IND3 _i,_j,_k |
|
#define | IND4 _i,_j,_k,_l |
|
#define | IND5 _i,_j,_k,_l,_m |
|
#define | IND6 _i,_j,_k,_l,_m,_n |
|
#define | IND IND6 |
|
#define | ITERATOR1(t, exp) |
|
#define | ITERATOR2(t, exp) |
|
#define | ITERATOR3(t, exp) |
|
#define | ITERATOR4(t, exp) |
|
#define | ITERATOR5(t, exp) |
|
#define | ITERATOR6(t, exp) |
|
#define | ITERATOR(t, exp) |
|
#define | UNARYITERATOR1(X, x, exp) |
|
#define | UNARYITERATOR2(X, x, exp) |
|
#define | UNARYITERATOR3(X, x, exp) |
|
#define | UNARYITERATOR4(X, x, exp) |
|
#define | UNARYITERATOR5(X, x, exp) |
|
#define | UNARYITERATOR6(X, x, exp) |
|
#define | UNARYITERATOR(X, x, exp) |
|
#define | BINARYITERATOR1(X, x, Y, y, exp) |
|
#define | BINARYITERATOR2(X, x, Y, y, exp) |
|
#define | BINARYITERATOR3(X, x, Y, y, exp) |
|
#define | BINARYITERATOR4(X, x, Y, y, exp) |
|
#define | BINARYITERATOR5(X, x, Y, y, exp) |
|
#define | BINARYITERATOR6(X, x, Y, y, exp) |
|
#define | BINARYITERATOR(X, x, Y, y, exp) |
|
#define | TERNARYITERATOR1(X, x, Y, y, Z, z, exp) |
|
#define | TERNARYITERATOR2(X, x, Y, y, Z, z, exp) |
|
#define | TERNARYITERATOR3(X, x, Y, y, Z, z, exp) |
|
#define | TERNARYITERATOR4(X, x, Y, y, Z, z, exp) |
|
#define | TERNARYITERATOR5(X, x, Y, y, Z, z, exp) |
|
#define | TERNARYITERATOR6(X, x, Y, y, Z, z, exp) |
|
#define | TERNARYITERATOR(X, x, Y, y, Z, z, exp) |
|
#define | UNARY_OPTIMIZED_ITERATOR(X, x, exp) |
|
#define | UNARY_UNOPTIMIZED_ITERATOR(X, x, exp) |
|
#define | UNARY_UNOPTIMIZED_ITERATOR_NESTED(X, x, exp) |
|
#define | BINARY_OPTIMIZED_ITERATOR(X, x, Y, y, exp) |
|
#define | TERNARY_OPTIMIZED_ITERATOR(X, x, Y, y, Z, z, exp) |
|
Macros for easy and efficient iteration over tensors.
Several different macros have been defined to make it easy to iterate over expressions involving tensors. They vary in their generality, ease of use, and efficiency.
The most general, most easy to use, but also most inefficient, and least safe, is
where t
is a Tensor of any type, size or dimension that is used to define the range of the loop indices, and expression can be nearly anything, including multi-line expressions performing arbitrary operations on multiple tensors. The loop indices, going from left to right in the dimensions, are
E.g., to add two matrices together (there are more efficient ways to do this, such as a+=b
)
Tensor<long>
a(4,2),
b(4,2);
E.g., to print out the indices of all elements of a matrix greater than 0.5;
cout << _i << " " << _j << endl;
});
To make it possible to index arbitrary dimension tensors, the macro IND
has been defined as the indices for the highest supported dimension. E.g., to elementwise divide the contents of two tensors of unknown dimension
Note that using IND
employs bounds checking where as direct indexing with _i
, etc., does not.
The generality of these macros is offset by their inefficiency and lack of safety. The inefficiency is twofold. First, the ITERATOR
macro generates a separate block of code for each possible dimension. This could cause code bloat and increased compilation time. To solve this problem, the macros ITERATOR1
, ITERATOR2
, etc., have been defined, with the corresponding IND1
, IND2
, etc. These macros may be applied to tensor expressions of the appropriate dimension.
The second inefficiency is at runtime, due to the explicit indexing of all the tensor expressions and the inability to optimize the order in which memory is traversed. The lack of safety is the inability to check that the tensors in the expression conform and that the indices are not out of bounds.
The safety and cost of explicit indexing are addressed by the macros UNARYITERATOR
, BINARYITERATOR
, and TERNARYITERATOR
, along with their specialization to specific numbers of dimensions (again by appending the dimension number to the name of the macro). These macros are safer since you have to explicitly name the tensors you are iterating over, so that the macro can now check that the input tensors conform. The cost of looping is reduced by replacing explicit indexing with pointer arithmetic. These macros still define the loop indices _i
, _j
, etc., but also define _p0
, _p1
, etc., as pointers to the current elements of tensor argument 0, tensor argument 1, etc..
E.g., set elements of a 3-d tensor, t
, of type double
to a function of the indices
E.g., to merge two double
tensors as real and imaginary parts of complex tensor of any dimension
However, we still have the problems that if the dimensions of a tensor have been reordered, the loops will go through memory inefficiently, and the dimension independent macros still generate redundant code blocks. Also, the innermost loop might not be very long and will be inefficient.
The most general, efficient and code-compact macros internally use the TensorIterator
, which you could also use directly. Since there is no nest of explicit loops, the tensor indices are no longer available as _i
, _j
, etc.. Furthermore, the TensorIterator
can reorder the loops to optimize the memory traversal, and fuse dimensions to make the innermost loop longer for better vectorization and reduced loop overhead.
The most efficient macros for iteration are UNARY_OPTIMIZED_ITERATOR
, BINARY_OPTIMIZED_ITERATOR
, and TERNARY_OPTIMIZED_ITERATOR
. As before, these define the pointers _p0
, _p1
, _p2
, which point to the current (and corresponding) element of each argument tensor. However, unlike the previous macros there is no guarantee that the elements are looped thru in the order expected by a simple nest of loops. Furthermore, the indices are completely unvailable. In addition to using the iterators for optimal traversal, these macros attempt to use a single loop for optimal vector performance.
E.g., the most efficient and safe way to perform the previous example of merging two double
tensors as real and imaginary parts of a complex tensor of any dimension
This is precisely how most internal operations are implemented.
In some situations it is necessary to preserve the expected order of loops and to not fuse dimensions. The macros UNARY_UNOPTIMIZED_ITERATOR
, BINARY_UNOPTIMIZED_ITERATOR
, and TERNARY_UNOPTIMIZED_ITERATOR
use the TensorIterator
but disable loop reordering and fusing. Once these optimizations have been turned off, the loop indices are avaiable, if needed, from the ind
[] member of the iterator (which is named _iter
).
E.g., the fillindex() method is implemented as follows
NB: None of the above iterator macros can be nested ... use the actual tensor iterator to do this.
Recommendation — for both efficiency and safety, use the optimized macros (UNARY_OPTMIZED_ITERATOR
, etc.), unless it is necessary to preserve loop order, in which case use the unoptimized versions. If you need the loop indices, use the macros UNARY_ITERATOR
, etc., unless you have a very general expression that they cannot handle. In this last instance, or for ease of rapid implementation, use the general ITERATOR
macro first described.