NUMERIC MULTIDIMENSIONAL ARRAYS

NUMERIC MULTIDIMENSIONAL ARRAYS

API Reference

This reference covers C, ALisp and AmosQL functionality regarding the use of NMA objects, both memory-resident and proxy. The terms NMA and array are used interchangeably, and refer to all kinds of NMA objects, unless stated otherwise.

The SciSPARQL functionality is described in SciSPARQL User Manual, and it makes no distinction between resident and proxy NMAs.

All dimension numbers, logical subscripts and storage indexes are 0-based in all the following interfaces. (In SciSPARQL they are 1-based by default, which can be changed by using _sq_python_ranges_ switch)

1. NMA interface in C

SSDM implements, and ssdm.h header makes available to C programs the complete set of functionality to create and access NMAs.

Warning: all functions listed here are unsafe: calling them might corrupt the memory or crash the process if the incorrect arguments are supplied!

1.1 Creating arrays

The process should follow one of the paths shown in the following graph

oidtype make_nma0(int ndims)

Allocate and array (or array proxy) descriptor for the given number of dimensions

void (oidtype x, int i, LONGINT size)

Set the size of array x in dimension i to size.

void nma_init(oidtype x, int kind)

Allocate the storage (for a resident array x with initialized dimensions) to accommodate elements of given kind. The default nesting of elements puts the first dimension to be the outmost one. The following kinds are used:

0 - integer

1 - double

2 - complex

void nma_initProxy(oidtype x, int kind, int proxyTag, oidtype s)

Initialize the array proxy object, providing s ALisp value as an array identifier, and registered proxyTag value to identify the array storage subsystem. (The proxy tags are registered from ALisp). The default nesting order of dimensions puts the first dimension to be the outmost - this should be changed if the actual storage order is different.

void nma_reverseStorageOrder(oidtype x)

Reverse the storage order of resident array or array proxy. Reversing the default storage order puts the first dimension to be the inmost one.

oidtype nma_allocate(oidtype x, int newKind)

Allocate a new resident array to store the elements of described by a given derived array or array proxy x. If newKind is negative, the same element type will be used, otherwise it can be widened by supplying a value as to nma_init().

1.2 Reading array properties

int nma_ndims(oidtype x)

Return the number of dimensions of an array x (resident or proxy).

LONGINT nma_dim(oidtype x, int i)

Return the size of array x in dimension i.

int nma_kind(oidtype x)

Return the element type of array, as listed for the kind argument to nma_init() function.

int nma_kind2elemsize(int kind)

Return the size (in bytes) of the element type identified by integer kind value.

int nma_proxytag(oidtype x)

Return the proxy tag of an array proxy, 0 for resident arrays (can be used to distinguish the former from the latter)

int nma_isOriginal(oidtype x)

Return 1 if the array (or array proxy) includes all elements of its respective storage object (or stored array). Return 0 for derived resident arrays and array proxies. There is typically one original array or array proxy, and array projection/subset operations result in (non-original) smaller arrays (or proxies).

oidtype nma_s(oidtype x)

Return array identifier in the storage system as an ALisp value. For resident arrays, the storage object is returned as a 1-dimensional NUMARRAY vector. Direct use of NUMARRAY is not advised, since the same object is returned for original array and its derived arrays. However, oidtype comparison can be used identify if two arrays are related. For proxies, appropriate deep comparison (available in ALisp) should be used to establish equivalence.

1.3 Array elements and the iterator interface

The elements (as integer, double or complex values) of resident arrays (both original and derived) are accessible for reading and writing by pointers returned by the iterator interface. Every array has a built-in iterator to store the logical subscript of element being currently accessed. Multiple iterators per array will be supported in a future version.

void nma_iter_reset(oidtype x)

Reset the iterator to point to the first elements (with every subscript set to 0) of a resident array x.

int nma_iter_next(oidtype x)

Advance the iterator to the next element. The rightmost logical subscript is incremented when possible. Return 1 on success and 0 if there are no more elements to iterate across.

void nma_iter_setidx(oidtype x, int i, LONGINT idx)

Set the logical subscript idx in dimension i of an array element to access.

LONGINT nma_iter_getidx(oidtype x, int i)

Get the current iterator index in dimension i.

void* nma_iter2pointer(oidtype x)

Get the pointer to the array element pointed to by iterator. This pointer should be cast to the respective C type.

Warining: Even though certain pointer arithmetics is possible on original arrays, with the nesting order of dimensions in mind, it is not recommended, as less straightforward storage approaches might be implemented in the future versions. For derived arrays, the access logic is already quite complex, and it is fully implemented in the iterator interface.

In order to apply vector operations to multiple array elements at once, the use of Fragment Mapper, as described in the next section, is encouraged.

1.4 Fragment Mapper

An original resident array is currently always stored in a single contiguous memory region. Array projection and subset operations result in smaller derived arrays, containing the same physical elements, however, contiguity is no longer guaranteed. It is sometimes desirable (e.g. for memory-to-file dumping, or SIMD vector arithmetics) to process the array elements in contiguously stored sequences. To identify the biggest contiguous sequences, the Fragment Mapper interface should be used.

The user mapper function should have the following signature:

int (*NMAFragmentMapper)(oidtype x, LONGINT si, LONGINT fsize, void *xa);

with arguments including the array (or array proxy) x, 1-dimensional storage index si, fragment size fsize, and a pointer to arbitrary data (typically used to store the context of the performed operations) xa. It returns 0 normally, or non-zero as a stop signal (so that no further calls to it will occur).

void nma_mapfragments(oidtype x, NMAFragmentMapper mapperFn, void *xa)

Call the mapperFn function for every (largest possible) contiguous fragment of array (or array proxy) x. Pass xa pointer on every call.

1.5 Debug functions

void nma_inspect(oidtype x)

Print (to the standard output) all the internal properties of the array (or array proxy) x for each dimension. This includes offset, original array size, storage order, logical origin, stride, and access multiplier, and derived array size, as described in [#2012].

void nma_sinspect(oidtype x, char* buf)

Similar to nma_inspect(), but prints the output to the character buffer buf, which should be allocated upfront.

2. NMA Interface in ALisp

All functions are type-safe, unless stated otherwise. Appropriate ALisp errors are raised if wrong arguments are provided.

Vector is used as a synonymous term for a standard lisp array, one created with make-array, accessed with aref and serialized as #(...) in Lisp.

2.1 Creating arrays and accessing their elements

make-nma (kind dims)

Create a new resident array of given kind and with dimensions provided as dims list or vector. The following kinds are used:

0 - integer

1 - double

2 - complex

make-nmaproxy (kind dims proxy-tag s)

Create a new array proxy object of given kind and with dimensions provided as dims list. proxy-tag is the result of nma-register-proxytag described below. s is any Lisp value serving to identify the array in the external system. Derived proxies will share the same s value.

nma-allocate (x new-kind)

Allocate a new resident array to store the elements of described by a given derived array or array proxy x. If new-kind is negative, the same element type will be used, otherwise it can be widened by supplying a value as make-nma.

nma-copy (x)

Create a new resident array that is a deep copy of resident array x. Fails on array proxies.

nma-elt (x idxs)

Return the numeric element of resident array x identified by idxs list or vector of logical subscripts.

nma-set (x idxs value)

Set the element of resident array x identified by idxs to the given value.

are-valid-subscripts (x idxs)

Return T if idx contains the valid subscripts denoting an element in array x. Return NIL otherwise.

nma-fill (x list)

Fill the array x with values from the hierarchical list list or verctor. The latter should contain as many nesting levels as the dimensions of x, and a list or vector at each level should hold exactly as many elements as the array size in the respective dimension. Numeric types are widened when necessary. Errors are raised on incorrect element types or list shape.

array-to-inma (vector)

Transform a Lisp vector of integers into a 1-dimensional integer NMA.

nma2array (x &optional projection)

Convert an NMA x to a nested Lisp vector. If projection list is specified, project NMA dimensions first according to indexes given in projection. If projection length is the same as number of dimensions in x, return single element as a number.

2.2 Accessing array properties

nma-ndims (x)

Return the number of dimensions of an array x (resident or proxy).

nma-dim (x i)

Return the size of array x in dimension i.

nma-dims (x)

Return a Lisp vector containing all dimensions sizes of x.

nma-kind (x)

Return the element type of array, as listed for the kind argument to make-nma function.

nma-kind2elemsize (kind)

Return the size (in bytes) of the element type identified by integer kind value.

nma-s (x)

Return array identifier in the storage system as an ALisp value. For resident arrays, the storage object is returned as a 1-dimensional NUMARRAY vector. Direct use of NUMARRAY is not advised, since the same object is returned for original array and its derived arrays. However, EQ comparison can be used identify if two arrays are related. For proxies, appropriate deep comparison should be used to establish equivalence.

nma-set-s (x s)

Set the array identifier for an array proxy, or a new NUMARRAY object for a resident array. Use with caution.

nma-proxytag (x)

Return 0 for resident arrays, or integer proxy tag, identifying the storage subsystem for array proxies. Can be used to distinguish resident arrays from array proxies)

nma-is-original (x)

Return T if the array (or array proxy) includes all elements of its respective storage object (or stored array). Return NIL for derived resident arrays and array proxies. There is typically one original array or array proxy, and array projection/subset operations result in (non-original) smaller arrays (or proxies)

nma-elemcnt (x)

Return the number of elements in array x. Equals to the product of dimension sizes, or 1 for single-element (dimensionless) proxies.

2.3 Iterator interface

void nma-iter-reset (x)

Reset the iterator to point to the first elements (with every subscript set to 0) of a resident array x.

nma-iter-next (x)

Advance the iterator to the next element. The rightmost logical subscript is incremented when possible. Return T on success and NIL if there are no more elements to iterate across.

nma-iter-cur (x)

Return the current element of x pointed to by the iterator.

nma-iter-setcur (x value)

Set to value the element of x pointed to by the iterator.

2.4. Array transformations

nma-permute (x positions)

Change the logical order of dimensions of x, given the positions list or vector containing N distinct 0 to N-1 integers, where N is the dimensionality of x. For example, transposition of 2-dimensional matrix x can be achieved by calling (nma-permute x '(1 0)).

is-permutation-idxs (positions)

Return T if positions is a list of distinct 0 to N-1 integers where N is list length. Return NIL otherwise

nma-subset (x k lo stride hi)

Return a derived array of x (with the same number of dimensions), containing slices in dimension k starting from and including lo and up to and including hi. The array shape in other dimensions remains unaffected. If stride is not equal to 1 then lo, lo+stride, lo+2*stride, ... subscripts, limited by hi are selected in the given dimension.

nma-project (x k idx)

Return a derived array if x (with decreased number of dimensions), or its element, if x is 1-dimensional. A single index idx is selected in dimension k, resulting in a single slice.

The above operations are implemented so that no actual array data is touched, hence they are cheap to perform, are independent of array size, and work equally well on resident arrays and array proxies.

They can also be composed freely, resulting in a new derived array at each step. Distributivity is guaranteed, as long as the dimension index changes are accounted for. For example, for a 2-dimensional array x and valid parameters a...f the following equivalences hold:

(nma-subset (nma-subset x 0 a b c) 1 d e f) <=> (nma-subset (nma-subset x 1 d e f) 0 a b c)

(nma-project (nma-project x 1 d) 0 a) <=> (nma-project (nma-project x 0 a) 0 d)

(the dimension 1 of x becomes the dimension 0 after the inner projection)

(nma-project (nma-permute x '(1 0)) 0 d) <=> (nma-project x 1 d)

(nma-subset (nma-permute x '(1 0)) 1 a b c) <=> (nma-permute (nma-subset x 0 a b c) '(1 0))

2.5 Array computations

nma-vsum (x y)

Element-wise sum of arrays of the same shapes. A new array with the element type widest among x and y is allocated and returned.

nmau-visum (x y)

Same as above, but array x is updated and returned. Fails if element type of y is wider.

nma-scale (x s)

Multiplication of each element of x by scalar value s. A new array of the shape of x and element type widest among x and s is allocated and returned.

nmau-scale (x s)

Same as above, but array x is updated and returned. Fails if numeric type of s is wider.

nma-round (x)

Round each element of array x, provided its element type is double. A new array of the shape of x and element type integer is allocated and returned.

nma-roundto (x digits)

Round each element of array x to the specified precision 10^(-digits). A new array of the shape and element type of x is allocated and returned.

nmau-roundto (x digits)

Same as above, but array x is updated and returned.

2.6 Boolean matrix computations

Boolean arrays are currently indistinguishable from integer arrays, storing 4 bytes per element. More compact representations will be introduced in a future version. The operations below treat zero values as false, and non-zero values as true, and are only defined on 2-dimensional matrices.

nmau-reflexive-closure (x)

Builds a reflexive closure of a binary predicate defined by x (by setting diagonal elements to 1). Matrix x is updated and returned.

nmau-transitive-closure (x)

Build a transitive closure of a binary predicated defined by x, using Warschall algorithm (O(N³) complexity, where N is the diagonal length of possibly non-square Boolean matrix) Matrix x is updated and returned.

nmau-cr-transitive-closure (x column-map)

A version of the above algorithm, suited for Boolean matrices containing considerable amount of columns entirely filled with zeroes, and thus omitted from x. An 1-dimensional integer NMA column-map maps presented columns of x to the respective row indexes (none of the rows are omitted). Matrix x is updated and returned.

In order to handle the Boolean matrices containing lots of zero-filled rows instead, matrix x should be transposed before and after the operation.

nma-boolean-product (x y padding)

Computes the join of two binary predicates defined by x and y Boolean matrices. A new Boolean matrix is allocated and returned. If padding is 0, x and y are required to be of compatible sizes (i.e. number of columns in x should be the same as number of rows in y), otherwise, x and y can be of different sizes, and will be padded. The following padding modes are supported:

0 - strict (no padding)

1 - diagonal

2 - all zeroes

2.7 Intra-array aggregation

(Unsafe) nma-sum (x)

Computes the sum of all elements of x. returns the number of same type as element type of x.

(Unsafe) nma-avg (x)

Computes the average of all elements of x. returns the floating-point number. (Complex element type is not currently supported)

(Unsafe) nma-min (x)

Computes the minimum of all elements of x. returns the number of same type as element type of x. Fails on complex data type.

(Unsafe) nma-max (x)

Computes the maximum of all elements of x. returns the number of same type as element type of x. Fails on complex data type.

2.8 Debug functions

nma-inspect (x)

2.9. Array bulk output

nma-dump (x stream)

Print the contents of the resident array x as a nested Lisp list to the Lisp stream.

(Unsafe) nma-filedump (x arrayid chunksize is-hex filesize-max get-filename-fn)

Break the array into chunks of given chunksize and write to the sequence of files (of size up to filesize-max) with names returned by calls to get-filename-fn Lisp function or closure, with no parameters. Sizes are provided in bytes.

Each chunk is written as a record with provided integer arrayid, sequentially generated chunk id, and the binary data representing the chunk itself. If is-hex parameter is 1, text (CSV) files are written, with chunk data in hexadecimal format. Otherwise, MS SQL Server native binary format is used, so that the resulting files can be loaded into

BULK INSERT ArrayChunks FROM <filename> WITH (DATATYPE = 'native')

into a table with fields (INTEGER, INTEGER, VARBINARY).

The last file remains open, so the next call to nma-filedump will write another array to the same file.

nma-filedump-close ()

Closes the last file open by a call nma-filedump. A subsequent call to nma-filedump will have to open a new file.

2.10. Handling array proxies

apr (x)

If x is not an array proxy, returne it unchanged. Otherwise, resolve it, by creating a resident array and filling it with data retrieved from the corresponding external storage system. If a proxy size limit was set using nma-proxy-setlimit, raise NMA_BIGPROXY_ERROR if amount of array elements would exceed that limit.

nma-register-proxytag (resolve-fn)

Register an array storage subsystem, and return the integer proxy tag to be used when creating proxies referring to arrays stored in that system. If resolve-fn is not NIL, this lisp function or closure will be called with a single argument, whenever apr Lisp function, apr() and aapr() Amos functions encounter an array proxy object with the associated proxy tag. In case of chunk-based proxy resolving, resolve-fn here should be NIL, and nma-register-aapr should be subsequently called, providing the additional parameters.

nma-register-aapr (proxytag buffer2query-fn query-fn-str buffer-limit fp-limit)

Lisp function or closure buffer2query-fn will be called with two arguments

buffer - a list of Lisp vectors, where the first element is chunk id, sorted on chunk id but possibly containing repeated chunk id values

arrayid - an array identifier, as stored in the nma-s property of array proxies

This buffer2query-fn function or closure should return a vector of arguments to call an Amos function identified by query-fn-str name (as a string), which would return a bag vectors {chunkid, chunk} vectors, sorted by chunkid, with chunk values of Binary type. The existence of function identified by query-fn-str is required at the time nma-register-aapr is called.

The amount of distinct chunk ids in the buffer is limited by buffer-limit argument (the actual buffer size might be bigger due to repeated chunk id values). The amount tolerated irrelevant chunks returned from Amos (i.e. "false positives", in case a pattern-based query is induced from the buffer) is limited by fp-limit parameter.

nma-proxy-set-default-chunksize (proxytag chunksize)

Set the default chunk size (in bytes) for proxies with given proxytag to chunksize.

nma-proxy-setlimit (limit)

Set the maximum number of elements in a proxy that can be resolved by apr Lisp function, and also apr() and aapr() Amos functions.

nma-proxy-enabled ()

Return T if at least one proxy tag is registered, NIL otherwise.

nma-cache-setlimit (limit)

Set the total maximum size of chunk cache (in bytes). Only the sizes of cached chunks are counted to this limit.

nma-proxy-startcache (x)

Start the chunk cache on array proxy x. All derived proxies will share reference to the same cache, in the same way the share the array id.

nma-proxy-has-cache (x)

Check if the proxy is connected to a chunk cache, return T if true, NIL otherwise.

nma-cache-reset ()

Empty all chunk caches.

nma-cache-put (x chunkid chunk)

Put the Binary object chunk with associated integer chunkid into the chunk cache associated with array proxy x.

3. NMA Interface in AmosQL