API

DeepFinder

Each step of the DeepFinder workflow is coded as a class. The parameters of each method are stored as class attributes and are given default values in the constructor. These parameters can easily be given custom values as follows:

from deepfinder.training import Train
trainer = Train(Ncl=5, dim_in=56) # initialize training task, where default batch_size=25
trainer.batch_size = 16 # customize batch_size value

Each class has a main method called ‘launch’ to execute the procedure. These classes all inherit from a mother class ‘DeepFinder’ that possesses features useful for communicating with the GUI.

Training

class deepfinder.training.TargetBuilder

generate_with_shapes(objl, target_array, ref_list)

Generates segmentation targets from object list. Here macromolecules are annotated with their shape.

Parameters

objl (list of dictionaries) – Needs to contain [phi,psi,the] Euler angles for orienting the shapes.
target_array (3D numpy array) – array that initializes the training target. Allows to pass an array already containing annotated structures like membranes. index order of array should be [z,y,x]
ref_list (list of 3D numpy arrays) – These reference arrays are expected to be cubic and to contain the shape of macromolecules (‘1’ for ‘is object’ and ‘0’ for ‘is not object’) The references order in list should correspond to the class label. For ex: 1st element of list -> reference of class 1; 2nd element of list -> reference of class 2 etc.

Returns

Target array, where ‘0’ for background class, {‘1’,’2’,…} for object classes.

Return type

3D numpy array

generate_with_spheres(objl, target_array, radius_list)

Generates segmentation targets from object list. Here macromolecules are annotated with spheres. This method does not require knowledge of the macromolecule shape nor Euler angles in the objl. On the other hand, it can be that a network trained with ‘sphere targets’ is less accurate than with ‘shape targets’.

Parameters

objl (list of dictionaries) –
target_array (3D numpy array) – array that initializes the training target. Allows to pass an array already containing annotated structures like membranes. index order of array should be [z,y,x]
radius_list (list of int) – contains sphere radii per class (in voxels). The radii order in list should correspond to the class label. For ex: 1st element of list -> sphere radius for class 1, 2nd element of list -> sphere radius for class 2 etc.

Returns

Target array, where ‘0’ for background class, {‘1’,’2’,…} for object classes.

Return type

3D numpy array

class deepfinder.training.Train(Ncl, dim_in)

launch(path_data, path_target, objlist_train, objlist_valid)

This function launches the training procedure. For each epoch, an image is plotted, displaying the progression with different metrics: loss, accuracy, f1-score, recall, precision. Every 10 epochs, the current network weights are saved.

Parameters

path_data (list of string) – contains paths to data files (i.e. tomograms)
path_target (list of string) – contains paths to target files (i.e. annotated volumes)
objlist_train (list of dictionaries) – contains information about annotated objects (e.g. class, position) In particular, the tomo_idx should correspond to the index of ‘path_data’ and ‘path_target’. See utils/objl.py for more info about object lists. During training, these coordinates are used for guiding the patch sampling procedure.
objlist_valid (list of dictionaries) – same as ‘objlist_train’, but objects contained in this list are not used for training, but for validation. It allows to monitor the training and check for over/under-fitting. Ideally, the validation objects should originate from different tomograms than training objects.

Note

The function saves following files at regular intervals:

net_weights_epoch*.h5: contains current network weights

net_train_history.h5: contains arrays with all metrics per training iteration

net_train_history_plot.png: plotted metric curves

Inference

class deepfinder.inference.Segment(Ncl, path_weights, patch_size=192)

launch(dataArray)

This function enables to segment a tomogram. As tomograms are too large to be processed in one take, the tomogram is decomposed in smaller overlapping 3D patches.

Parameters

dataArray (3D numpy array) – the volume to be segmented
weights_path (str) – path to the .h5 file containing the network weights obtained by the training procedure

Returns

contains predicted score maps. Array with index order [class,z,y,x]

Return type

numpy array

class deepfinder.inference.Cluster(clustRadius)

launch(labelmap)

This function analyzes the segmented tomograms (i.e. labelmap), identifies individual macromolecules and outputs their coordinates. This is achieved with a clustering algorithm (meanshift).

Parameters

labelmap (3D numpy array) – segmented tomogram
clustRadius (int) – parameter for clustering algorithm. Corresponds to average object radius (in voxels)

Returns

the object list with coordinates and class labels of identified macromolecules

Return type

list of dict

Utilities

Common utils

deepfinder.utils.common.bin_array(array)

Subsamples a 3D array by a factor 2. Subsampling is performed by averaging voxel values in 2x2x2 tiles.

Parameters: array (numpy array) –
Returns: binned array
Return type: numpy array

deepfinder.utils.common.plot_volume_orthoslices(vol, filename)

Writes an image file containing ortho-slices of the input volume. Generates same visualization as matlab function ‘tom_volxyz’ from TOM toolbox. If volume type is int8, the function assumes that the volume is a labelmap, and hence plots in color scale. Else, it assumes that the volume is tomographic data, and plots in gray scale.

Parameters

vol (3D numpy array) –
filename (str) – ‘/path/to/file.png’

deepfinder.utils.common.read_array(filename, dset_name='dataset')

Reads arrays. Handles .h5 and .mrc files, according to what extension the file has.

Parameters

filename (str) – ‘/path/to/file.ext’ with ‘.ext’ either ‘.h5’ or ‘.mrc’
dset_name (str, optional) – h5 dataset name. Not necessary to specify when reading .mrc

Returns

numpy array

deepfinder.utils.common.write_array(array, filename, dset_name='dataset')

Writes array. Can write .h5 and .mrc files, according to the extension specified in filename.

Parameters

array (numpy array) –
filename (str) – ‘/path/to/file.ext’ with ‘.ext’ either ‘.h5’ or ‘.mrc’
dset_name (str, optional) – h5 dataset name. Not necessary to specify when reading .mrc

Object list utils

deepfinder.utils.objl.above_thr(objlIN, thr)

Parameters

objl (list of dict) –
thr (float) – threshold

Returns

contains only objects with cluster size >= thr

Return type

list of dict

deepfinder.utils.objl.disp(objlIN): Prints objl in terminal

deepfinder.utils.objl.get_class(objlIN, label)

Get all objects of specified class.

Parameters

objl (list of dict) –
label (int) –

Returns

contains only objects from class ‘label’

Return type

list of dict

deepfinder.utils.objl.get_labels(objlIN): Returns a list with different (unique) labels contained in input objl

deepfinder.utils.objl.get_obj(objl, obj_id)

Get objects with specified object ID.

Parameters

objl (list of dict) – input object list
obj_id (list of int) – object ID of wanted object(s)

Returns

contains object(s) with obj ID ‘obj_id’

Return type

list of dict

deepfinder.utils.objl.get_tomo(objlIN, tomo_idx)

Get all objects originating from tomo ‘tomo_idx’.

Parameters

objlIN (list of dict) – contains objects from various tomograms
tomo_idx (int) – tomogram index

Returns

contains objects from tomogram ‘tomo_idx’

Return type

list of dict

deepfinder.utils.objl.read(filename)

Reads object list. Handles .xml and .xlsx files, according to what extension the file has.

Parameters: filename (str) – ‘/path/to/file.ext’ with ‘.ext’ either ‘.xml’ or ‘.xlsx’
Returns: list of dict

deepfinder.utils.objl.remove_class(objl, label_list)

Removes all objects from specified classes.

Parameters

objl (list of dict) – input object list
label_list (list of int) – label of objects to remove

Returns

same as input object list but with objects from classes ‘label_list’ removed

Return type

list of dict

deepfinder.utils.objl.remove_obj(objl, obj_id)

Removes objects by object ID.

Parameters

objl (list of dict) – input object list
obj_id (list of int) – object ID of wanted object(s)

Returns

same as input object list but with object(s) ‘obj_id’ removed

Return type

list of dict

deepfinder.utils.objl.scale_coord(objlIN, scale)

Scales coordinates by specified factor. Useful when using binned (sub-sampled) volumes, where coordinates need to be multiplied or divided by 2.

Parameters

objlIN (list of dict) –
scale (float, int or tuple) – if float or int, same scale is applied to all dim

Returns

object list with scaled coordinates

Return type

list of dict

deepfinder.utils.objl.write(objl, filename)

Writes object list. Can write .xml and .xlsx files, according to the extension specified in filename.

Parameters

objl (list of dict) –
filename (str) – ‘/path/to/file.ext’ with ‘.ext’ either ‘.xml’ or ‘.xlsx’

Scoremap utils

deepfinder.utils.smap.bin(scoremaps)

Subsamples the scoremaps by a factor 2. Subsampling is performed by averaging voxel values in 2x2x2 tiles.

Parameters: scoremaps (4D numpy array) – array with index order [class,z,y,x]
Returns: 4D numpy array

deepfinder.utils.smap.read_h5(filename)

Reads scormaps stored in .h5 file.

Parameters: filename (str) – path to file This .h5 file has one dataset per class (dataset ‘/class*’ contains scoremap of class *)
Returns: scoremaps array with index order [class,z,y,x]
Return type: 4D numpy array

deepfinder.utils.smap.to_labelmap(scoremaps)

Converts scoremaps into a labelmap.

Parameters: scoremaps (4D numpy array) – array with index order [class,z,y,x]
Returns: array with index order [z,y,x]
Return type: 3D numpy array

deepfinder.utils.smap.write_h5(scoremaps, filename)

Writes scoremaps in .h5 file

Parameters

scoremaps (4D numpy array) – array with index order [class,z,y,x]
filename (str) – path to file This .h5 file has one dataset per class (dataset ‘/class*’ contains scoremap of class *)