API
DeepFinder
Each step of the DeepFinder workflow is coded as a class. The parameters of each method are stored as class attributes and are given default values in the constructor. These parameters can easily be given custom values as follows:
from deepfinder.training import Train
trainer = Train(Ncl=5, dim_in=56) # initialize training task, where default batch_size=25
trainer.batch_size = 16 # customize batch_size value
Each class has a main method called ‘launch’ to execute the procedure. These classes all inherit from a mother class ‘DeepFinder’ that possesses features useful for communicating with the GUI.
Training
- class deepfinder.training.TargetBuilder
- generate_with_shapes(objl, target_array, ref_list)
Generates segmentation targets from object list. Here macromolecules are annotated with their shape.
- Parameters
objl (list of dictionaries) – Needs to contain [phi,psi,the] Euler angles for orienting the shapes.
target_array (3D numpy array) – array that initializes the training target. Allows to pass an array already containing annotated structures like membranes. index order of array should be [z,y,x]
ref_list (list of 3D numpy arrays) – These reference arrays are expected to be cubic and to contain the shape of macromolecules (‘1’ for ‘is object’ and ‘0’ for ‘is not object’) The references order in list should correspond to the class label. For ex: 1st element of list -> reference of class 1; 2nd element of list -> reference of class 2 etc.
- Returns
Target array, where ‘0’ for background class, {‘1’,’2’,…} for object classes.
- Return type
3D numpy array
- generate_with_spheres(objl, target_array, radius_list)
Generates segmentation targets from object list. Here macromolecules are annotated with spheres. This method does not require knowledge of the macromolecule shape nor Euler angles in the objl. On the other hand, it can be that a network trained with ‘sphere targets’ is less accurate than with ‘shape targets’.
- Parameters
objl (list of dictionaries) –
target_array (3D numpy array) – array that initializes the training target. Allows to pass an array already containing annotated structures like membranes. index order of array should be [z,y,x]
radius_list (list of int) – contains sphere radii per class (in voxels). The radii order in list should correspond to the class label. For ex: 1st element of list -> sphere radius for class 1, 2nd element of list -> sphere radius for class 2 etc.
- Returns
Target array, where ‘0’ for background class, {‘1’,’2’,…} for object classes.
- Return type
3D numpy array
- class deepfinder.training.Train(Ncl, dim_in)
- launch(path_data, path_target, objlist_train, objlist_valid)
This function launches the training procedure. For each epoch, an image is plotted, displaying the progression with different metrics: loss, accuracy, f1-score, recall, precision. Every 10 epochs, the current network weights are saved.
- Parameters
path_data (list of string) – contains paths to data files (i.e. tomograms)
path_target (list of string) – contains paths to target files (i.e. annotated volumes)
objlist_train (list of dictionaries) – contains information about annotated objects (e.g. class, position) In particular, the tomo_idx should correspond to the index of ‘path_data’ and ‘path_target’. See utils/objl.py for more info about object lists. During training, these coordinates are used for guiding the patch sampling procedure.
objlist_valid (list of dictionaries) – same as ‘objlist_train’, but objects contained in this list are not used for training, but for validation. It allows to monitor the training and check for over/under-fitting. Ideally, the validation objects should originate from different tomograms than training objects.
Note
- The function saves following files at regular intervals:
net_weights_epoch*.h5: contains current network weights
net_train_history.h5: contains arrays with all metrics per training iteration
net_train_history_plot.png: plotted metric curves
Inference
- class deepfinder.inference.Segment(Ncl, path_weights, patch_size=192)
- launch(dataArray)
This function enables to segment a tomogram. As tomograms are too large to be processed in one take, the tomogram is decomposed in smaller overlapping 3D patches.
- Parameters
dataArray (3D numpy array) – the volume to be segmented
weights_path (str) – path to the .h5 file containing the network weights obtained by the training procedure
- Returns
contains predicted score maps. Array with index order [class,z,y,x]
- Return type
numpy array
- class deepfinder.inference.Cluster(clustRadius)
- launch(labelmap)
This function analyzes the segmented tomograms (i.e. labelmap), identifies individual macromolecules and outputs their coordinates. This is achieved with a clustering algorithm (meanshift).
- Parameters
labelmap (3D numpy array) – segmented tomogram
clustRadius (int) – parameter for clustering algorithm. Corresponds to average object radius (in voxels)
- Returns
the object list with coordinates and class labels of identified macromolecules
- Return type
list of dict
Utilities
Common utils
- deepfinder.utils.common.bin_array(array)
Subsamples a 3D array by a factor 2. Subsampling is performed by averaging voxel values in 2x2x2 tiles.
- Parameters
array (numpy array) –
- Returns
binned array
- Return type
numpy array
- deepfinder.utils.common.plot_volume_orthoslices(vol, filename)
Writes an image file containing ortho-slices of the input volume. Generates same visualization as matlab function ‘tom_volxyz’ from TOM toolbox. If volume type is int8, the function assumes that the volume is a labelmap, and hence plots in color scale. Else, it assumes that the volume is tomographic data, and plots in gray scale.
- Parameters
vol (3D numpy array) –
filename (str) – ‘/path/to/file.png’
- deepfinder.utils.common.read_array(filename, dset_name='dataset')
Reads arrays. Handles .h5 and .mrc files, according to what extension the file has.
- Parameters
filename (str) – ‘/path/to/file.ext’ with ‘.ext’ either ‘.h5’ or ‘.mrc’
dset_name (str, optional) – h5 dataset name. Not necessary to specify when reading .mrc
- Returns
numpy array
- deepfinder.utils.common.write_array(array, filename, dset_name='dataset')
Writes array. Can write .h5 and .mrc files, according to the extension specified in filename.
- Parameters
array (numpy array) –
filename (str) – ‘/path/to/file.ext’ with ‘.ext’ either ‘.h5’ or ‘.mrc’
dset_name (str, optional) – h5 dataset name. Not necessary to specify when reading .mrc
Object list utils
- deepfinder.utils.objl.above_thr(objlIN, thr)
- Parameters
objl (list of dict) –
thr (float) – threshold
- Returns
contains only objects with cluster size >= thr
- Return type
list of dict
- deepfinder.utils.objl.disp(objlIN)
Prints objl in terminal
- deepfinder.utils.objl.get_class(objlIN, label)
Get all objects of specified class.
- Parameters
objl (list of dict) –
label (int) –
- Returns
contains only objects from class ‘label’
- Return type
list of dict
- deepfinder.utils.objl.get_labels(objlIN)
Returns a list with different (unique) labels contained in input objl
- deepfinder.utils.objl.get_obj(objl, obj_id)
Get objects with specified object ID.
- Parameters
objl (list of dict) – input object list
obj_id (list of int) – object ID of wanted object(s)
- Returns
contains object(s) with obj ID ‘obj_id’
- Return type
list of dict
- deepfinder.utils.objl.get_tomo(objlIN, tomo_idx)
Get all objects originating from tomo ‘tomo_idx’.
- Parameters
objlIN (list of dict) – contains objects from various tomograms
tomo_idx (int) – tomogram index
- Returns
contains objects from tomogram ‘tomo_idx’
- Return type
list of dict
- deepfinder.utils.objl.read(filename)
Reads object list. Handles .xml and .xlsx files, according to what extension the file has.
- Parameters
filename (str) – ‘/path/to/file.ext’ with ‘.ext’ either ‘.xml’ or ‘.xlsx’
- Returns
list of dict
- deepfinder.utils.objl.remove_class(objl, label_list)
Removes all objects from specified classes.
- Parameters
objl (list of dict) – input object list
label_list (list of int) – label of objects to remove
- Returns
same as input object list but with objects from classes ‘label_list’ removed
- Return type
list of dict
- deepfinder.utils.objl.remove_obj(objl, obj_id)
Removes objects by object ID.
- Parameters
objl (list of dict) – input object list
obj_id (list of int) – object ID of wanted object(s)
- Returns
same as input object list but with object(s) ‘obj_id’ removed
- Return type
list of dict
- deepfinder.utils.objl.scale_coord(objlIN, scale)
Scales coordinates by specified factor. Useful when using binned (sub-sampled) volumes, where coordinates need to be multiplied or divided by 2.
- Parameters
objlIN (list of dict) –
scale (float, int or tuple) – if float or int, same scale is applied to all dim
- Returns
object list with scaled coordinates
- Return type
list of dict
- deepfinder.utils.objl.write(objl, filename)
Writes object list. Can write .xml and .xlsx files, according to the extension specified in filename.
- Parameters
objl (list of dict) –
filename (str) – ‘/path/to/file.ext’ with ‘.ext’ either ‘.xml’ or ‘.xlsx’
Scoremap utils
- deepfinder.utils.smap.bin(scoremaps)
Subsamples the scoremaps by a factor 2. Subsampling is performed by averaging voxel values in 2x2x2 tiles.
- Parameters
scoremaps (4D numpy array) – array with index order [class,z,y,x]
- Returns
4D numpy array
- deepfinder.utils.smap.read_h5(filename)
Reads scormaps stored in .h5 file.
- Parameters
filename (str) – path to file This .h5 file has one dataset per class (dataset ‘/class*’ contains scoremap of class *)
- Returns
scoremaps array with index order [class,z,y,x]
- Return type
4D numpy array
- deepfinder.utils.smap.to_labelmap(scoremaps)
Converts scoremaps into a labelmap.
- Parameters
scoremaps (4D numpy array) – array with index order [class,z,y,x]
- Returns
array with index order [z,y,x]
- Return type
3D numpy array