Skip to content
Matthew Brett edited this page Nov 22, 2012 · 3 revisions

Nibabel looks ahead

Introduction

A nibabel image is the association of 4.5 things:

  • 'data' - something that can make an ndarray with the data in it
  • affine - a 4x4 affine giving the relationship between voxel coordinates and coordinates in RAS+ space
  • header - the metadata associated with this image, in the image's own format. For example, a nifti image has a header of type Nifti1Header.
  • filenames / fileobjects to read or write the file from / to
  • extra - a possibly unused dictionary for other stuff you might want to carry with the image
>>> import numpy as np
>>> import nibabel as nib
>>> arr = np.arange(24).reshape((2,3,4))
>>> aff = np.diag([2,3,4,1])
>>> img = nib.Nifti1Pair(arr, aff)
>>> type(img._affine)
<type 'numpy.ndarray'>
>>> type(img._header)
<class 'nibabel.nifti1.Nifti1PairHeader'>
>>> img.extra
{}

In this case, the data for the image is the original array

>>> type(img._data)
<type 'numpy.ndarray'>
>>> img._data is arr
True

The filenames (all null in this case) are in the file_map attribute, which is a dict, one entry per file needed to load / save the image.

>>> sorted(img.file_map.keys())
['header', 'image']

Proxy images

The image above looks at an array in memory. Let's call that an array image. There can also be proxy images.

We can save the image like this:

>>> nib.save(img, 'example.img')

and load it again:

>>> loaded_img = nib.load('example.img')

Now note that the data is not an array but an ArrayProxy instance:

>>> type(loaded_img._data)
<class 'nibabel.analyze.ImageArrayProxy'>

A proxy is an object that has a shape, and can be made into an array:

>>> loaded_img._data.shape
(2, 3, 4)
>>> type(np.array(loaded_img._data))
<type 'numpy.ndarray'>

Attribute access

Notice that the attributes are all private - img._data, img._affine, img._header. Access is via:

>>> data = img.get_data()
>>> data is img._data
True
>>> hdr = img.get_header()
>>> hdr is img._header
True
>>> aff = img.get_affine()
>>> aff is img._affine
True

Why this java-like interface?

For the data - because _data could be a proxy, and I wanted to keep that to myself for now.

For the header and the affine - I thought both needed to be copied when instantiating the image. Imagine:

hdr = loaded_img.get_header()
new_img = nib.Nifti1Image(arr, aff, hdr)
hdr['descrip'] = 'a string'

We probably don't want setting hdr['descrip'] in this case to affect both loaded_img and new_img. If header or affine is a simple attribute, then one header can easily be attached to multiple images, and this could be confusing.

For the header - because I wanted to forbid setting of the header attribute, on the basis that the header type must be correct for it to be possible to save the image, or predict what fields etc would be available in a given image.

For the affine - because it should always be None or a 4x4 affine. Also, setting affine could conflict with affine information in the header.

Last, I had wanted a path to images that, if they are not immutable, at least we may be able to know the image has not been modified after load. Protecting access with these accessor methods makes this easier. See below.

Filenames, filemaps

Filenames are stored in the file_map attribute - a simple dictionary, with one key, value pair for each file needed to save the image. For example, for Nifti pair images, as we've seen:

>>> sorted(img.file_map.keys())
['header', 'image']

The values are FileHolder instances:

>>> type(img.file_map['header'])
<class 'nibabel.fileholders.FileHolder'>

A FileHolder has these attributes:

>>> sorted(img.file_map['header'].__dict__.keys())
['filename', 'fileobj', 'pos']

so it can store image filenames, or a fileobj. In either case it can store a position in the file at which the data starts - pos. This is for the case when we pass the image a fileobj at a certain position, given by orig_pos = fileobj.tell() - and then read - in fact - the header. Now the fileobj has a different position. We may need to pass this filemap to a new image. In order for this to make sense we need to do a fileobj.seek(orig_pos) before reading the information from the file.

Why can't we do this seek after reading the header, so that we leave it at the correct position for a subsequent read? We could do this if the fileobj is a simple thing like a python file instance or a StringIO thing, because seeks are very quick. For other file-like objects, such as gzip.GzipFile or bz2.BZ2File instances, a seek can be very slow. Hence we want to avoid doing this seek until we have to.

Questions

General discussion of design

Say 10 minutes or so?

Proxy or not?

At the moment, I've hidden _data. So, there's no public way of telling whether the image is a proxy image or an array image. Options are:

  1. No change

  2. Expose the _data attribute as img.dataobj directly. This makes the machinery more explicit but perhaps confusing. In this case, your proxy test would be:

    loaded_img.dataobj.is_proxy
    

    or something like that.

  3. Put is_proxy onto the image as a property or method:

    img.is_proxy
    

Immutable images

Immutable images would be nice. One place we often need something like that is when we are passing images to external programs as files. For example, we might do this:

loaded_img = nib.load('example.img')
data = loaded_img.get_data()
if np.any(data) < 0:
    run_something_on(img)
filename = img.get_filename()
new_img = some_spm_processing(filename)

The problem is that, at the moment, when we get to the new_img = line, we can't be sure that loaded_img still corresponds to the data in the original file example.img, without going through all the code in run_something_on. In practice that means that, each time we load an image, and we then need to pass it on as a filename, we'll have to save it as a new file.

There's no way with the current design to make the images immutable in general, because an array image contains a reference to an array:

>>> arr = np.arange(24).reshape((2,3,4))
>>> an_img = nib.Nifti1Image(arr, None)
>>> an_img.get_data()[0,0,0]
0
>>> arr[0,0,0] = 99
>>> an_img.get_data()[0,0,0]
99

But we might be able to get somewhere by setting a maybe_modified flag when we know there's a risk that the image has been modified since load. For example, an array image would always have img.maybe_modified == True because of the issue above.

Options:

  1. Stay the same. Force save when passing out filenames. At least it's safe.
  2. Conservative. Set maybe_modified for any image that has had a call to any of
    img.get_data(), img.get_header(), img.get_affine() - and for any array image. Disadvantage - you can't do anything much to an image - even look at it - without setting the maybe modified flag
  3. Ugly. Make default calls to ``img.get_header(), img.get_affine(),
    img.get_data()`` return copies, with argument like img.get_affine(copy=False) as an altenative. For copy=False, set the modified flag, otherwise, unless this is an array image, we can be sure that the image object has not been modified relative to its loaded state.
Clone this wiki locally