Motl basics#
Motl stands for “motive list” and contains list of particles and their properties. It can be used in subtomogram averaging or to perform contextual analysis.
Internally, the particle list is stored within a Motl class as pandas DataFrame with 20 columns (see below) and N rows, where N corresponds to number of particles.
Externally, the particle list can be loaded from and written as a binary file in any of the following: EM format (novaSTA, TOM/AV3, ArtiaX compatible), a RELION starfile (currently up to version 4.x), a STOPGAP starfile, IMOD .mod file, and a simple CSV file.
The module cryomotl contains stand stand-alone functions for hassle-free conversions between formats, as well as classes as described below
Motl class#
Motl class is a parent class containing functions that are general to all formats. It is the class that is used for all manipulations applied to the motl list. The particle list itself is stored in the member variable df as pandas DataFrame that has following columns:
“score” - a quality metric (typically cross-correlation value between the particle and the reference)
“geom1” - a free geometric property
“geom2” - a free geometric property
“subtomo_id” - a subtomogram id; IMPORTANT many functions rely on this one to be unique
“tomo_id” - a tomogram id to which the particle is affiliated to
“object_id” - an object id to which the particle is affiliated to
“subtomo_mean” - a mean value of the subtomogram
“x” - a position in the tomogram (an integer value), typically used for subtomogram extraction
“y” - a position in the tomogram (an integer value), typically used for subtomogram extraction
“z” - a position in the tomogram (an integer value), typically used for subtomogram extraction
“shift_x” - shift of the particle in X direction (a float value); to complete position of a particle is given by x + shift_x
“shift_y” - shift of the particle in Y direction (a float value); to complete position of a particle is given by y + shift_y
“shift_z” - shift of the particle in Z direction (a float value); to complete position of a particle is given by z + shift_z
“geom3” - a free geometric property
“geom4” - a free geometric property
“geom5” - a free geometric property
“phi” - a phi angle describing rotation around the first Z axis (following Euler zxz convention)
“psi” - a psi angle describing rotation around the second Z axis (following Euler zxz convention)
“theta” - a theta angle describing rotation around the X axis (following Euler zxz convention)
“class” - a class of the particle
Child classes#
Different subclasses include EmMotl, RelionMotl, StopgapMotl, DynamoMotl, ModMotl and contain functionalities allowing for smooth transitions between the different conventions utilised in a given software. They mostly contain funtions for reading in/out lists and are used under-the-hood to ensure compatibility with the parent Motl file, which we use in order to use functions to modify and inspect our files.
Working with particle lists: Basic examples#
In the following section, some examples of how to use Motl files in your analyisis pipelines are provided. For a complete list of fucntions, please refer to the cryomotl module in the API guide. NOTE: For all the functions displayed, it is assumed that the cryomotl module is imported:
[ ]:
from cryocat import cryomotl
Loading a particle list as a Motl object#
The first step to work with a particle list is to load it and store it as a Motl object. This can be accomplished with the load() function, which is used to initialize the Motl class. ##### Example:
[ ]:
my_motl = cryomotl.Motl.load('path/to/motl_file')
Then, the properties of the particles can be displayed by inspecting the df attribute (pandas DataFrame).
[ ]:
display(my_motl.df)
This will print out the content of the pandas Dataframe will all the columns described in the previous section.
Extract a subset of particles#
To work on a subset of particles based on the value of a particular feature, you can extract them with the get_motl_subset() function. By default, the feature that is taken into account is the tomogram ID (tomo_id) and it returns a new Motl object, however you cna ask for a panda DataFrame as well. Here are some examples:
[ ]:
subset_motl_tomo = my_motl.get_motl_subset(2) #select particles belonging to tomogram 2 and return a new Motl object
subset_motl_class = my_motl.get_motl_subset(1, feature_id="class") #select particles belonging to class 1 and return a new Motl object
subset_df_tomo = my_motl.get_motl_subset(2, return_df=True) #select particles belonging to tomogram 2 and return a pandas dataframe
Clean a particle list#
Different functions are available to clean particle list depending on the analysis pipeline.
Clean particles by distance: Clean duplicate particles, commonly required when working with particles from oversampling. The function that accomplishes this is
clean_by_distance(). This functions changes thedfattribute of the Motl. To save the particle list after cleaning, it is necessary to write the function to file with thewrite_out()function (see below).
[ ]:
my_motl.clean_by_distance(10) #remove particles that are closer than 10 voxels to each other, keeping the one with the highest score value
my_motl.clean_by_distance(10, feature_id="class") #remove particles that are closer than 10 voxels to each other, grouping the particles by class and keeping the one with the highest score value
Clean particles by CC value: This is accomplished with the
clean_by_otsu()function.
[ ]:
my_motl.clean_by_otsu(feature_id="tomo_id") #group the particle by tomogram and compute the Otsu's threshold to select the score threshold according to which the particles are cleaned
Clean particles by radius: In case of particles the coordinates whereof were sampled on a spherical surface, the
clean_by_radius()function allows to remove those particles that are not fitting the surface but are rather outside the sphere radius. This function is part of thestructuremodule. If an output path is passed, the cleaned particle list will be written to the specified file.
[ ]:
from cryocat import structure
my_motl_cleaned = structure.PleomorphicSurface.clean_by_radius(my_motl, feature_id='object_id', threshold=None, output_file='./my_motl_cleaned.em') #for each object, remove the particles that are outside its radius ± the standard deviation of the distance between the particles and the object center
my_motl_cleaned = structure.PleomorphicSurface.clean_by_radius(my_motl, feature_id='object_id', threshold=5, output_file='./my_motl_cleaned.em') #for each object, remove the particles that are outside its radius ± the threshold value
Write a Motl to file#
If you have edited the df attribute of your Motl object or you generated a new Motl object and you wish to save them to a particle list file, you need to use the write_out() function:
[ ]:
my_motl.write_out("path/to/desired_output_file.em") #this will save my_motl to desired_output_file.em
Example of different classes in use#
Let’s assume we are working with a Stopgap motl file and want to scale the coordinates 2x in order to decrease the binning factor we are working at twice. In order to do that, we need to ensure loading the file within the conventions of the appropriate instance by specifying the type of the motl we are dealing with.
[ ]:
my_sg_motl = cryomotl.Motl.load('input_motl_file.star', motl_type='stopgap')
Still using the Motl class, we modify the object according to our needs.
[ ]:
my_sg_motl.scale_coordinates(2)
my_sg_motl.update_coordinates()
If one displays the df of the object at this stage, they would notice that column labels are as described above, i.e. characteristic to a Motl class object. To write out the file in the correct format for further use in stopgap, we should call.
[ ]:
cryomotl.Motl.write_out(my_sg_motl,'output_file.star','stopgap') #specify motl type to ensure correct convention
We can inspect written out data by using either of the classes: Motl or Stopgap.
[ ]:
display(cryomotl.StopgapMotl.read_in('output_motl_file.star')) # keeps the specifiers (labels) and the column order of the file as is; object stored as DataFrame
display(cryomotl.Motl.load('output_motl_file.star', motl_type='stopgap').df) # the labels corresponds to the Motl object; to display the values we need to inspect the df attribute
Reading model files from IMOD#
Amongst the child classes of Motl, ModMotl allows to read binary .mod files from IMOD and work with them as ModMotl objects, which inherit the same parameters as Motl objects. Examples:
Load one or multiple .mod files from IMOD and store them as ModMotl:
[ ]:
my_modmotl = cryomotl.ModMotl('path/to/model_folder/', mod_prefix='tomo_') #loaad all mod files in model_folder that start with tomo_
my_modmotl = cryomotl.ModMotl('path/to/model_file.mod') #load model_file.mod
display(my_modmotl.df) # inspect the content
Get the content of a model file as a pandas DataFrame object:
[ ]:
mod_df = cryomotl.ModMotl.read_in('path/to/model_file.mod') # read in a mod file and return a pandas dataframe
Conversely, to convert a Motl and write it out to a binary .mod file, the write_to_model_file() function is available:
[ ]:
my_motl.write_to_model_file("tomo_id", "tomo", point_size=1) #split my_motl based on the tomo_id and write the resulting motl files to individual model files with the prefix "tomo_"