API

DeepFinder

Each step of the DeepFinder workflow is coded as a class. The parameters of each method are stored as class attributes and are given default values in the constructor. These parameters can easily be given custom values as follows:

from deepfinder.training import Train
trainer = Train(Ncl=5, dim_in=56) # initialize training task, where default batch_size=25
trainer.batch_size = 16 # customize batch_size value

Each class has a main method called ‘launch’ to execute the procedure. These classes all inherit from a mother class ‘DeepFinder’ that possesses features useful for communicating with the GUI.

Training

class deepfinder.training.TargetBuilder
generate_with_shapes(objl, target_array, ref_list)

Generates segmentation targets from object list. Here macromolecules are annotated with their shape.

Parameters
  • objl (list of dictionaries) – Needs to contain [phi,psi,the] Euler angles for orienting the shapes.

  • target_array (3D numpy array) – array that initializes the training target. Allows to pass an array already containing annotated structures like membranes. index order of array should be [z,y,x]

  • ref_list (list of 3D numpy arrays) – These reference arrays are expected to be cubic and to contain the shape of macromolecules (‘1’ for ‘is object’ and ‘0’ for ‘is not object’) The references order in list should correspond to the class label. For ex: 1st element of list -> reference of class 1; 2nd element of list -> reference of class 2 etc.

Returns

Target array, where ‘0’ for background class, {‘1’,’2’,…} for object classes.

Return type

3D numpy array

generate_with_spheres(objl, target_array, radius_list)

Generates segmentation targets from object list. Here macromolecules are annotated with spheres. This method does not require knowledge of the macromolecule shape nor Euler angles in the objl. On the other hand, it can be that a network trained with ‘sphere targets’ is less accurate than with ‘shape targets’.

Parameters
  • objl (list of dictionaries) –

  • target_array (3D numpy array) – array that initializes the training target. Allows to pass an array already containing annotated structures like membranes. index order of array should be [z,y,x]

  • radius_list (list of int) – contains sphere radii per class (in voxels). The radii order in list should correspond to the class label. For ex: 1st element of list -> sphere radius for class 1, 2nd element of list -> sphere radius for class 2 etc.

Returns

Target array, where ‘0’ for background class, {‘1’,’2’,…} for object classes.

Return type

3D numpy array

class deepfinder.training.Train(Ncl, dim_in)
launch(path_data, path_target, objlist_train, objlist_valid)

This function launches the training procedure. For each epoch, an image is plotted, displaying the progression with different metrics: loss, accuracy, f1-score, recall, precision. Every 10 epochs, the current network weights are saved.

Parameters
  • path_data (list of string) – contains paths to data files (i.e. tomograms)

  • path_target (list of string) – contains paths to target files (i.e. annotated volumes)

  • objlist_train (list of dictionaries) – contains information about annotated objects (e.g. class, position) In particular, the tomo_idx should correspond to the index of ‘path_data’ and ‘path_target’. See utils/objl.py for more info about object lists. During training, these coordinates are used for guiding the patch sampling procedure.

  • objlist_valid (list of dictionaries) – same as ‘objlist_train’, but objects contained in this list are not used for training, but for validation. It allows to monitor the training and check for over/under-fitting. Ideally, the validation objects should originate from different tomograms than training objects.

Note

The function saves following files at regular intervals:

net_weights_epoch*.h5: contains current network weights

net_train_history.h5: contains arrays with all metrics per training iteration

net_train_history_plot.png: plotted metric curves

Inference

class deepfinder.inference.Segment(Ncl, path_weights, patch_size=192)
launch(dataArray)

This function enables to segment a tomogram. As tomograms are too large to be processed in one take, the tomogram is decomposed in smaller overlapping 3D patches.

Parameters
  • dataArray (3D numpy array) – the volume to be segmented

  • weights_path (str) – path to the .h5 file containing the network weights obtained by the training procedure

Returns

contains predicted score maps. Array with index order [class,z,y,x]

Return type

numpy array

class deepfinder.inference.Cluster(clustRadius)
launch(labelmap)

This function analyzes the segmented tomograms (i.e. labelmap), identifies individual macromolecules and outputs their coordinates. This is achieved with a clustering algorithm (meanshift).

Parameters
  • labelmap (3D numpy array) – segmented tomogram

  • clustRadius (int) – parameter for clustering algorithm. Corresponds to average object radius (in voxels)

Returns

the object list with coordinates and class labels of identified macromolecules

Return type

list of dict

Utilities

Common utils

deepfinder.utils.common.bin_array(array)

Subsamples a 3D array by a factor 2. Subsampling is performed by averaging voxel values in 2x2x2 tiles.

Parameters

array (numpy array) –

Returns

binned array

Return type

numpy array

deepfinder.utils.common.plot_volume_orthoslices(vol, filename)

Writes an image file containing ortho-slices of the input volume. Generates same visualization as matlab function ‘tom_volxyz’ from TOM toolbox. If volume type is int8, the function assumes that the volume is a labelmap, and hence plots in color scale. Else, it assumes that the volume is tomographic data, and plots in gray scale.

Parameters
  • vol (3D numpy array) –

  • filename (str) – ‘/path/to/file.png’

deepfinder.utils.common.read_array(filename, dset_name='dataset')

Reads arrays. Handles .h5 and .mrc files, according to what extension the file has.

Parameters
  • filename (str) – ‘/path/to/file.ext’ with ‘.ext’ either ‘.h5’ or ‘.mrc’

  • dset_name (str, optional) – h5 dataset name. Not necessary to specify when reading .mrc

Returns

numpy array

deepfinder.utils.common.write_array(array, filename, dset_name='dataset')

Writes array. Can write .h5 and .mrc files, according to the extension specified in filename.

Parameters
  • array (numpy array) –

  • filename (str) – ‘/path/to/file.ext’ with ‘.ext’ either ‘.h5’ or ‘.mrc’

  • dset_name (str, optional) – h5 dataset name. Not necessary to specify when reading .mrc

Object list utils

deepfinder.utils.objl.above_thr(objlIN, thr)
Parameters
  • objl (list of dict) –

  • thr (float) – threshold

Returns

contains only objects with cluster size >= thr

Return type

list of dict

deepfinder.utils.objl.disp(objlIN)

Prints objl in terminal

deepfinder.utils.objl.get_class(objlIN, label)

Get all objects of specified class.

Parameters
  • objl (list of dict) –

  • label (int) –

Returns

contains only objects from class ‘label’

Return type

list of dict

deepfinder.utils.objl.get_labels(objlIN)

Returns a list with different (unique) labels contained in input objl

deepfinder.utils.objl.get_obj(objl, obj_id)

Get objects with specified object ID.

Parameters
  • objl (list of dict) – input object list

  • obj_id (list of int) – object ID of wanted object(s)

Returns

contains object(s) with obj ID ‘obj_id’

Return type

list of dict

deepfinder.utils.objl.get_tomo(objlIN, tomo_idx)

Get all objects originating from tomo ‘tomo_idx’.

Parameters
  • objlIN (list of dict) – contains objects from various tomograms

  • tomo_idx (int) – tomogram index

Returns

contains objects from tomogram ‘tomo_idx’

Return type

list of dict

deepfinder.utils.objl.read(filename)

Reads object list. Handles .xml and .xlsx files, according to what extension the file has.

Parameters

filename (str) – ‘/path/to/file.ext’ with ‘.ext’ either ‘.xml’ or ‘.xlsx’

Returns

list of dict

deepfinder.utils.objl.remove_class(objl, label_list)

Removes all objects from specified classes.

Parameters
  • objl (list of dict) – input object list

  • label_list (list of int) – label of objects to remove

Returns

same as input object list but with objects from classes ‘label_list’ removed

Return type

list of dict

deepfinder.utils.objl.remove_obj(objl, obj_id)

Removes objects by object ID.

Parameters
  • objl (list of dict) – input object list

  • obj_id (list of int) – object ID of wanted object(s)

Returns

same as input object list but with object(s) ‘obj_id’ removed

Return type

list of dict

deepfinder.utils.objl.scale_coord(objlIN, scale)

Scales coordinates by specified factor. Useful when using binned (sub-sampled) volumes, where coordinates need to be multiplied or divided by 2.

Parameters
  • objlIN (list of dict) –

  • scale (float, int or tuple) – if float or int, same scale is applied to all dim

Returns

object list with scaled coordinates

Return type

list of dict

deepfinder.utils.objl.write(objl, filename)

Writes object list. Can write .xml and .xlsx files, according to the extension specified in filename.

Parameters
  • objl (list of dict) –

  • filename (str) – ‘/path/to/file.ext’ with ‘.ext’ either ‘.xml’ or ‘.xlsx’

Scoremap utils

deepfinder.utils.smap.bin(scoremaps)

Subsamples the scoremaps by a factor 2. Subsampling is performed by averaging voxel values in 2x2x2 tiles.

Parameters

scoremaps (4D numpy array) – array with index order [class,z,y,x]

Returns

4D numpy array

deepfinder.utils.smap.read_h5(filename)

Reads scormaps stored in .h5 file.

Parameters

filename (str) – path to file This .h5 file has one dataset per class (dataset ‘/class*’ contains scoremap of class *)

Returns

scoremaps array with index order [class,z,y,x]

Return type

4D numpy array

deepfinder.utils.smap.to_labelmap(scoremaps)

Converts scoremaps into a labelmap.

Parameters

scoremaps (4D numpy array) – array with index order [class,z,y,x]

Returns

array with index order [z,y,x]

Return type

3D numpy array

deepfinder.utils.smap.write_h5(scoremaps, filename)

Writes scoremaps in .h5 file

Parameters
  • scoremaps (4D numpy array) – array with index order [class,z,y,x]

  • filename (str) – path to file This .h5 file has one dataset per class (dataset ‘/class*’ contains scoremap of class *)