Motl#

class cryocat.core.cryomotl.Motl(motl_df=None)#

Bases: object

adapt_to_trimming(trim_coord_start, trim_coord_end)#

The adapt_to_trimming function takes in the trim_coord_start and trim_coord_end values, which are the coordinates used for trimming the tomogram, and changes particle coordinates to correspond to the trimmed tomogram. The particles from the trimmed area are removed.

Parameters:
trim_coord_startnumpy.ndarray

Starting coordinates of the trimming used on the tomogram. The order of coordinates is x, y, z.

trim_coord_endnumpy.ndarray

Ending coordinates of the trimming used on the tomogramThe order of coordinates is x, y, z.

Returns:
None

Notes

Currently does not work for multiple trimming values (specified e.g. by tomo_id), only one set of coordinates is supported and applied to all particles.

This method modifies the df attribute of the object.

apply_rotation(rotation)#

The apply_rotation function applies a rotation to the angles in the DataFrame.

Parameters:
rotationscipy.spatial.transform._rotation.Rotation

A scipy rotation object representing the rotation.

Returns:
None

Notes

This method modifies the df attribute of the object.

Examples

>>> # If you have a reference and you want the average coming from your motl to be aligned with that reference
>>> # you can use ChimeraX fit function to fit the average to the reference. In the log file you will get output
>>> # that looks like this:
>>> #
>>> # Matrix rotation and translation
>>> # 0.50840235 0.86106603 -0.00960989 -17.20630613
>>> # -0.86111334 0.50832424 -0.00950164 65.37276832
>>> # -0.00329660 0.01310586 0.99990868 -0.66904104
>>> # Axis 0.01312604 -0.00366553 -0.99990713
>>> # Axis point 48.64783020 47.75956206 0.00000000
>>> # Rotation angle (degrees) 59.44816674
>>> # Shift along axis 0.20350209
>>> #
>>> # To apply this rotation on the motl that corresponds to the fitted average you run:
>>> from scipy.spatial.transform import Rotation as R
>>> motl = Motl.load("/path/to/the/motl.em")
>>> # Create the matrix from the ChimeraX output by leaving out the last column
>>> rot_matrix=np.array([ [0.50840235, 0.86106603, -0.00960989 ],
... [ -0.86111334, 0.50832424, -0.00950164 ],
... [ -0.00329660, 0.01310586, 0.99990868] ])
>>> rot=R.from_matrix(rot_matrix.T) # note that the matrix is transposed here
>>> motl.apply_rotation(rot)
>>> motl.write_out("rotated_motl.em")
apply_tomo_rotation(rotation_angles, tomo_id, tomo_dim)#

Apply tomogram rotation to the corresponding particles in the motl. The rotation angles can come e.g. from trimvol command or from slicer in etomo. Currently works only for one tomogram.

Parameters:
rotation_anglesarray-like

Rotation angles in degrees corresponding to rotation around x, y, and z axis.

tomo_idint

Tomo ID of the particles that should be rotated and shifted.

tomo_dimarray-like

Dimensions of the tomogram in x, y, z.

Returns:
feature_motlMotl

A new motl with rotated and shifted particles.

assign_column(input_df, column_pairs)#

The assign_column function takes a dataframe and a dictionary of column pairs. The function then iterates through the dictionary, checking if the paired key is in the input_df columns. If it is, it assigns that column to the em_key in self.df.

Parameters:
input_dfpandas.DataFrame

The input DataFrame containing the columns to be assigned.

column_pairsdict

A dictionary mapping the column names in self.df to the corresponding column names in input_df.

Returns:
None

Examples

>>> assign_column(input_df, {'tomo_id': 'input_df_key1', 'subtomo_id': 'input_df_key2'})
assign_random_classes(number_of_classes)#

Assign a random class to each point in the motl.

Parameters:
number_of_classes: int

The total number of classes to choose from.

Returns:
None

Notes

This method modifies the df attribute of the object.

static check_df_correct_format(input_df)#

Check if the input DataFrame has the correct format.

Parameters:
input_dfpandas.DataFrame

The DataFrame to be checked.

Returns:
bool

True if the input DataFrame has the correct format, False otherwise.

check_df_type(input_motl)#

Checks the type of the input dataframe and assigns it to the class attribute ‘df’ if it is in the correct format. If it is not in the correct format it tries to convert it.

Parameters:
input_motlpandas.DataFrame

The input dataframe to be checked.

Returns:
None

Notes

This function is meant to be called by child classes as the possible conversion is not implemented within this class.

clean_by_distance(distance_in_voxels, feature_id, metric_id='score', keep_greater=True, dist_mask=None)#

Cleans df by removing particles closer than a given distnace threshold (in voxels).

Parameters:
distance_in_voxelsfloat

The distance cutoff in voxels.

feature_idstr

The ID of the feature by which the particles are grouped before cleaning.

metric_idstr, default=’score’

The ID of the metric to decide which particles to keep. Defaults to “score”. The particle with the greater value is kept.

keep_greater: bool, default=True

Whether to keep the particles with great (True) or lower (False) value. Default is True.

dist_maskstr or ndarray

Binary mask/map (or path to it) for directional cleaning. If provided the distance_in_voxels is used to find all points within this radius and then those points in the region where the mask is 1 will be cleaned. Defaults to None.

Returns:
None

Notes

This method modifies the df attribute of the object.

clean_by_distance_to_points(points, radius_in_voxels, feature_id='tomo_id', inplace=True, output_file=None)#

Cleans the motl by removing points that are within a specified radius of any point in a the provided dataframe with points.

Parameters:
pointsDataFrame

DataFrame containing the coordinates of points to check against. It has to contain column specified by the param feature_id and also columns x,y,z.

radius_in_voxelsfloat

The radius within which points will be considered for removal, in voxel units.

feature_idstr, default=”tomo_id”

The column name that identifies how the particles in motl should be grouped. Defaults to ‘tomo_id’.

inplacebool, default=True

If True, modifies the motl in place. If False, returns a new Motl object. Defaults to True.

output_filestr or None, optional

If specified, the cleaned Motl object will be written to this file. Defaults to None.

Returns:
Motl or None

If inplace is False, returns a new Motl object containing the cleaned DataFrame. Otherwise, returns None.

Notes

The function uses a KDTree for efficient spatial queries, which can significantly speed up the process of finding nearby points. The function assumes that the input motl and the points DataFrame have columns ‘x’, ‘y’, and ‘z’ that represent coordinates and also columns specified by feature_id.

clean_by_otsu(feature_id, histogram_bin=None, global_level=False)#

Clean the DataFrame by applying Otsu’s thresholding algorithm on the scores.

Parameters:
feature_idstr

The feature ID to be used to group particles for cleaning.

histogram_binint, optional

The number of bins for the histogram. If not provided, a default value will be used based on the feature ID. Defaults to None.

global_levelbool, default=False

Flag to indicate whether to compute the Otsu threshold grouping the particles based on feature_id on the dataset-level instead of on a tomogram-basis

Returns:
None
Raises:
UserInputError
If the selected feature ID does not correspond to either “tomo_id” or “object_id”,
and histogram_bin is not specified.

Notes

This method modifies the df attribute of the object.

clean_by_tomo_mask(tomo_list, tomo_masks, inplace=True, boundary_type='center', box_size=None, output_file=None)#

Removes particles from the motive list based on provided tomomgram masks.

Parameters:
tomo_liststr, array-like, or int

Tomogram indices specifying the masks provided. See cryocat.ioutils.tlt_load() for more information on formatting.

tomo_maskslist, array-like or str

List of paths to tomogram masks list of np.ndarrays with the masks loaded. If a single path/np.ndarray is specified (instead of the list) the same mask will be used for all tomograms specified in the tomo_list.

inplacebool, default=True

If true, the original instance of the motl is changed. If False, the instance of Motl is created and returned, the original motive list remains unchanged. Defaults to True.

boundary_type{“center”, “whole”}

Specify whether only the center should be part of the mask (“center”) or the whole box (“whole”). In the latter case, the box_size have to be specified as well. Defaults to “center”.

box_sizeint, optional

Size of the subtomogram box. Required if boundary_type is set to “whole”. Defaults to None.

output_filestr, optional

Path to save the cleaned motive list. If not provided, the motive list is not saved. Defaults to None.

Returns:
cleaned_motlMotl

The motive list after removing particles based on the mask. Only if inplace is set to False.

Raises:
ValueError

If the number of tomograms does not match the number of provided masks when tomo_masks is a list.

Notes

The function loads tomograms and their corresponding masks, binarizes the masks, and then filters out particles in the motive list that fall into masked-out (zero-valued) areas of the tomograms. If output_file is provided, the cleaned motive list is saved to this file.

The function creates a new instance of a Motl and does not alter the original one.

compute_otsu_threshold(feature_id, hbin)#

Compute Otsu threshold on the Motl ‘score’ value after grouping the particles by a desired feature. This function generates a plot of the particle distribution for each value of the feature and overlays the threshold value.

Parameters:
feature_idstr

The feature ID to be used to group particles.

hbinint

The number of bins for the histogram.

Returns:
pandas.DataFrame

Dataframe in Motl format containing the particles filtered according to the Otsu threshold.

Examples

>>> original_motl = cryomotl.Motl.load("my_motl.em")
>>> filtered_motl_df = original_motl.compute_otsu_threshold(feature_id='class', hbin=40)
convert_to_motl(input_df)#

Abstract method implemented only within child classes.

Parameters:
input_dfpandas.DataFrame
Returns:
None
Raises:
UserInputError
In case this function is called from Motl - the provided input_df should be in correct format, there is
no conversion possible.
static create_empty_motl_df()#

Creates an empty DataFrame with the columns defined in cryocat.cryomotl.Motl.motl_columns.

Parameters:
None
Returns:
pandas.DataFrame

An empty DataFrame with the columns defined in cryocat.cryomotl.Motl.motl_columns.

drop_duplicates(duplicates_column='subtomo_id', decision_column='score', decision_sort_ascending=False)#

Drop duplicates based on a specified column and keep the first occurrence with the highest/lowest score.

Parameters:
duplicates_column: str, default=”subtomo_id”

The column based on which duplicates will be dropped. Defaults to subtomo_id.

decision_column: str, default=”score”

The column used to decide which duplicate to keep. Defaults to score.

decision_sort_ascending: bool, default=False

Whether to sort the decision column in ascending order. Defaults to False (i.e., it will sort in descending order.)

Returns:
None

Notes

This method modifies the df attribute of the object.

Examples

>>> m=cryomotl.Motl.load("example_motl.em")
>>> # remove entries with duplicated subtomo_idx values, keeping the ones with larger score
>>> m.drop_duplicates()
>>> # remove entries with duplicated subtomo_idx values, keeping the ones with lower score
>>> m.drop_duplicates(decision_sort_ascending=True)
>>> # remove entries with duplicated subtomo_idx values, keeping the ones with lower geom1 value
>>> m.drop_duplicates(decision_column="geom1", decision_sort_ascending=True)
>>> # remove entries with duplicated object_id values, keeping the ones with lower geom1 value
>>> m.drop_duplicates(duplicates_column="object_id", decision_column="geom1", decision_sort_ascending=True)
fill(input_dict)#

The fill function is used to fill in the values of a column or columns in the starfile. The input_dict argument should be a dictionary with keys that are either column names, coord, angles, shifts.

Parameters:
input_dictdict

Dictionary with keys from cryocat.cryomotl.Motl.motl_columns and new values to be assigned. Three special keys are allowed: coord (which will assign values to x, y, z columns), angles (which will assign values to phi, theta, psi), and shifts (which will assign values to shift_x, shift_y, shift_z).

Returns:
None

Notes

This method modifies the df attribute of the object.

flip_handedness(tomo_dimensions=None)#

Flip the handedness of the particles in the motl.

Parameters:
tomo_dimensionsstr or pandas.DataFrame or array-like, optional

Dimensions of tomograms in the motl. If not provided, only the orientation is changed. For specification on tomo_dimensions format see cryocat.ioutils.dimensions_load(). Defaults to None.

Returns:
None

Notes

Orientation is flipped by changing the sign of the theta angle (following ZXZ convention of Euler angles).

The position flip is performed by subtracting the z-coordinate from the maximum z-dimension and adding 1.

This method modifies the df attribute of the object.

Examples

>>> flip_handedness(tomo_dimensions="dimensions.txt")
get_angles(tomo_number=None)#

This function takes in a tomo_number and returns the angles of all particles in that tomogram. If no tomo_number is given, it will return the angles of all particles.

Parameters:
tomo_numberint, optional

The tomogram number. If not provided, all angles will be returned. Defaults to None.

Returns:
numpy.ndarray

An array of angles in the format [phi, theta, psi] (corresponds to the zxz Euler convention).

get_barycentric_motl(idx, nn_idx)#

Returns a new Motl object with coordinates corresponding to the barycentric coordinates of the particles (speficied by their indices within idx) and their nearest neigbors (specified by their indices within nn_idx).

Parameters:
idxarray-like

Array (type int) with indices of particles to be used for the analysis.

nn_idxarray-like

Array (type int) with indices of nearest neigbors to be used to compute the barycentric coordinates. The dimensions are (N, x) where N corresponds to the number of particles (same as in idx) and x is the number of nearest neigbors. If x equals 1, the barycentric coordinates are computed between two points (specified by idx and nn_idx), if x equals 2, the barycentric coordinates correspond to the barycentric coordinates of a triangle specified by the 3 points (one from idx, two from nn_idx). In theory, x can be arbitrarily large.

Returns:
Motl

A new Motl object with coordinates corresponding to the barycentric centers

Notes

TODO: Move the geometry specific computation to the cryocat.geom.

get_coordinates(tomo_number=None)#

This function takes in a tomo_number and returns the coordinates of all particles in that tomogram. If no tomo_number is given, it will return the coordinates of all particles. The coordinates are computes as x + shift_x, y + shift_y, z + shift_z.

Parameters:
tomo_numberint, optional

The tomogram number. If not provided, all coordinates will be returned. Defaults to None.

Returns:
numpy.ndarray

3D array of coordinates in the format [x + shift_x, y + shift_y, z + shift_z].

get_feature(feature_id)#

Returns the values from the column in self.df specified by feature_id.

Parameters:
feature_idstr

The column name to get the values for.

Returns:
numpy.ndarray

Values corresponding to the feature_id.

Raises:
UserInputError

In case the feature_id is not existing column in self.df dataframe.

get_max_number_digits(feature_id='tomo_id')#

This function returns the maximum number of digits in the column specified by feature_id.

Parameters:
feature_idstr, default=”tomo_id”

Specify the column name of the feature to get the max digits. Defaults to “tomo_id”.

Returns:
int

The maximum number of digits in the column specified by feature_id.

classmethod get_motl_intersection(motl1, motl2, feature_id='subtomo_id')#

Creates motl intersection of two motls based on feature_id.

Parameters:
motl1Motl

First motl.

motl2Motl

Second motl.

feature_idstr, default=”subtomo_id”

Feature ID to use for intersection. Defaults to “subtomo_id”.

Returns:
Motl

The intersection (based on feature_id) of two motls.

get_motl_subset(feature_values, feature_id='tomo_id', return_df=False, reset_index=True)#

Get a subset of the Motl object based on specified feature values.

Parameters:
feature_valuesarray-like or int

The feature values to filter the Motl object by.

feature_idstr, default=”tomo_id”

The name of the feature column to filter by. Defaults to “tomo_id”.

return_dfbool, default=False

Whether to return the filtered subset as a DataFrame. Defaults to False.

reset_indexbool, default=True

Whether to reset the index of the filtered subset. Defaults to True.

Returns
——-
`pandas.DataFrame` or :class:`Motl`

If return_df is True, returns the filtered subset as a DataFrame. Otherwise, returns a new Motl object containing the filtered subset.

Warning

The default value of reset_index was changed from False to True.

get_random_subset(number_of_particles)#

Generate a random subset of particles from the motl.

Parameters:
number_of_particles: int

Number of particles to select randomly.

Returns:
Motl object

A new motl containing the randomly selected subset of particles.

get_relative_position(idx, nn_idx)#

Returns a new Motl object with coordinates corresponding to the center between the particles (speficied by their indices within idx) and their nearest neigbors (specified by their indices within nn_idx).

Parameters:
idxarray-like

Indices of particles to be used for the analysis.

nn_idxarray-like

Indices of nearest neigbors of particles specified in idx.

Returns:
Motl

A new Motl object with coordinates corresponding to the center between the particles and their nearest neighbors.

Notes

TODO: Move the geometry specific computation to the cryocat.geom.

get_rotations(tomo_number=None)#

The get_rotations function returns rotations for all particles.

Parameters:
tomo_numberint, optional

The tomogram number. If not provided, all rotations will be returned. Defaults to None.

Returns:
list

List of rotations (with type scipy.spatial.transform._rotation.Rotation) for all particles.

get_unique_values(feature_id)#

Get unique values from a specific feature.

Parameters:
feature_idstr

The ID of the feature.

Returns:
numpy.ndarray

A numpy.ndarray containing the unique values stored in the column feature_id.

classmethod load(input_motl, motl_type='emmotl', **kwargs)#

This function is a factory function that returns an instance of the appropriate Motl class.

Parameters:
input_motlpandas.DataFrame or Motl

Either path to the motl or pandas.DataFrame in the format corresponding to general motl_df or in the format specific to the motl_type.

motl_typestr, {‘emmotl’, ‘dynamo’, ‘relion’, ‘stopgap’, ‘modmotl’}

A string indicating what type of Motl input should be loaded (emmotl, relion, stopgap, dynamo, mod). Defaults to emmotl.

**kwargsdict

Additional keyword arguments to pass to the constructor of the specified Motl subclass. Implemented for RelionMotl; additional arguments: version, pixel_size, binning, optics_data

Returns:
child of Motl

Subclass of the abstract class Motl specified by motl_type.

Raises:
UserInputError

In case the motl_type is not supported.

make_angles_canonical()#
classmethod merge_and_drop_duplicates(motl_list)#

Merge a list of Motl instances or paths to motl files to a single motl. Does not renumber particles - uniqueness has to be inherent to the instances!

Parameters:
motl_listlist

A list of Motl instances or paths.

Returns:
Motl instance

The merged Motl instance (or instance of the input class) with renumbered objects and particles.

Raises:
UserInputError

If motl_list is not a list or is empty.

classmethod merge_and_renumber(motl_list)#
Merge a list of Motl instances or paths to motl files to a single motl. It renumbers its particles and objects

to ensure uniqueness.

Parameters:
motl_listlist

A list of Motl instances or paths.

Returns:
Motl instance

The merged Motl instance (or instance of the input class) with renumbered objects and particles.

Raises:
UserInputError

If motl_list is not a list or is empty.

motl_columns = ['score', 'geom1', 'geom2', 'subtomo_id', 'tomo_id', 'object_id', 'subtomo_mean', 'x', 'y', 'z', 'shift_x', 'shift_y', 'shift_z', 'geom3', 'geom4', 'geom5', 'phi', 'psi', 'theta', 'class']#
static recenter_to_subparticle(input_motl, input_mask, rotation=None, motl_type='emmotl', **kwargs)#

Computes the center of mass of the provided binary mask and computes the necessary shift between the mask box center and the center of mass. This shift is applied to the motl positions. If rotation is specified it applies it to the shifted particles as well.

Parameters:
input_motl: Motl or str or Pandas.DataFrame

Input motl to apply the recentering to (see cryocat.cryomotl.Motl.load() for more details on format)

input_maskstr

Binary mask specified either as a file path or ndarray. The box size of the mask should correspond to the box size of the reference on which the mask was placed.

rotationscipy.spatial.transform._rotation.Rotation

Rotation to apply on the new positions. Defaults to None.

motl_typestr, optional

Type of motl to load if not standard Motl.

kwargsdict

Additional keyword arguments passed to Motl.load.

Returns:
Motl

Motl with shifted coordinates.

Notes

This method modifies the df attribute of the object.

remove_feature(feature_id, feature_values)#

The function removes particles based on their feature (i.e. tomo number).

Parameters:
feature_idstr

Specify the feature based on which the particles will be removed.

feature_valuesarray-like

Specify which particles should be removed.

Returns:
None

Notes

This method modifies the df attribute of the object.

remove_out_of_bounds_particles(dimensions, boundary_type='center', box_size=None)#

Removes particles that are out of tomogram bounds.

Parameters:
dimensionsstr

Filepath or ndarray specifying tomograms’ dimensions.

boundary_typestr, {“center”, “whole”}

Specify whether only the center should be part of the tomogram (“center”) or the whole box (“whole”). In the latter case, the box_size have to be specified as well. Defaults to “center”.

box_sizeint, optional

Size of the box/subtomogram. It has to be specified if boundary_type is “whole”. Defaults to None.

Returns:
None
Raises:
UserInputError

In case the boundary_type is “whole” and the box_size is not specified.

UserInputError

In case boundary_type is neither “whole” or “center”.

Notes

This method modifies the df attribute of the object.

renumber_objects_sequentially(starting_number=1)#

Renumber objects sequentially, starting with 1 or provided number.

Parameters:
starting_number: int, default=1

The starting number for renumbering objects. The default is 1.

Returns:
None

Notes

This method modifies the df attribute of the object.

renumber_particles()#

This function renumbers the particles in a motl. This is useful when you want to reorder the particles in a motl, or if you have deleted some of them and need to renumber the remaining ones. The function takes no

Parameters:
None
Returns:
None

Notes

Numbering starts with 1.

This method modifies the df attribute of the object.

scale_coordinates(scaling_factor)#

Scales coordinates (including shifts) by the scaling factor.

Parameters:
scaling_factorfloat

Factor to scale the coordinates.

Returns:
None

Notes

This method modifies the df attribute of the object.

shift_positions(shift, inplace=True)#

Shifts the coordinates by the provided shift.

Parameters:
shiftnumpy.ndarray

3D shift to be applied to the coordinates.

inplaceboolean, default=True

Whether to return a new instance of the motl with shifted coordinates (False) or perform the shift on df directly (True). Defaults to True.

Returns:
new_motlMotl

A new instance of motl with shifted coordinate (only if inplace is set to False).

Notes

This method modifies the df attribute of the object.

split_by_feature(feature_id, write_out=False, output_prefix='')#

Splits motl by the feature_id and writes them out.

Parameters:
feature_idstr

Specify the feature based on which the motl will be split.

write_outbool, default=False

Whether to write out the motls. Defaults to False.

output_prefixbool, default=””

Prefix (including the path) of the motls to be written out. Used only if write_out is True. No separation character will be added - it has to be specified as the last character. Defaults to empty str.

Returns:
list

List of Motl split by given feature_id.

Warning

This method does not preserve a child class - it always returns Motl. Correspondingly, the individual motls - if written out - will be in “emmotl” format.

Notes

TODO: Make this function to be class specific.

split_in_asymmetric_subunits(symmetry, xyz_shift)#

Split the motive list into assymetric subunits.

Parameters:
symmetrystr or number

Symmetry to be used. Currently cyclic and dihedral symmetry are supported. Cx or cx specify the cyclic symmetry of order x, Dx or dx dihedral symmetry of order x. If symmetry is specified as int/float, cyclic symmetry is assumed.

xyz_shiftnumpy.ndarray

Shift by which the center of current particles should be shifted to be centered at first subunit.

Returns:
Motl

Splitted particle list.

Warning

This method does not preserve a child class - it always returns Motl.

update_coordinates()#

Aplies the existing shifts to x, y, z positions, rounds the new coordinates and stores them as integer positions in x, y, z and stores the rest into shifts. After the positions are updated, new extraction of subtomograms is necessery.

Parameters:
None
Returns:
None

Notes

The rounding follows round-half-up convention, not the banker’s rounding which is default in Python.

This method modifies the df attribute of the object.

write_out(output_path, motl_type='emmotl')#

Writes out a motl file to the specified output path.

Parameters:
output_pathstr

The path to write the motl file to.

motl_typestr, {‘emmotl’, ‘dynamo’, ‘relion’, ‘stopgap’}

The type of motl file to write. Defaults to “emmotl”.

Returns:
None
Raises:
UserInputError

If the provided motl format is not supported.

write_to_model_file(feature_id, output_base, point_size, binning=1.0, zero_padding=None)#

It splits the dataframe based on feature_id and writes them out as mod files (from IMOD). The values in “class” column are used to created different objects, the countour is always the same. This function requires IMOD’s point2model function to exist and being in PATH.

Parameters:
feature_idstr

Name of the feature (column) to split by.

output_basestr

The base for the output files. The final name of each mod file will have a form of {output_base}_{feature_id}_{feature_id_value} where the feature_id_value will be pad with zeros.

point_sizeint

Size of the point that should be used

binningfloat, default=1.0

Scaling factor to apply to coordinates. Defaults to 1.0 which corresponds to no binning.

zero_paddingint, optional

Defines the zero padding for the feature_id_value for the output names (see above). In None, the length of the maximum value in feature_id is used. Defaults to None.

Returns:
None