sta#

cryocat.analysis.sta.compute_alignment_statistics(motl_base_name, start_it, end_it, motl_type='stopgap', filter_rows=None, filter_column='subtomo_id', output_file=None, load_kwargs=None)#

Compute alignment statistics for specified motls and iterations. Pairs of (current motl, subsequent motl) are evaluated for differences in cone angles, in-plane angles, change in positions of particles and root mean square errors (RMSE) in x, y, and z directions. The output contains mean, median, std, and variance for cone and in-plane angles, the mean distance between the particles and the RMSE of movement in x, y, and z directions.

Parameters:
motl_base_namestr

Base name for a motl to perform the evaluation on. Base name means without the iteration number and extension. For example for name motl_shift_3.em the base name is motl_shift_.

start_itint

Starting iteration number.

end_itint

Ending iteration number.

motl_typestr (stopgap|emmotl|relion|relion5|relion5_1), default=”stopgap”

Type of the input motl. Defaults to “stopgap”.

filter_rowsarray-like, optional

Rows to filter. Only rows that are within the filter_rows will be kept. Defaults to None which means no filtering.

filter_columnsstr, default=”subtomo_id”

Column names based on which the filtering is perfomed. If fitler_rows is None, no filtering will be done and this parameter will not be used. Defaults to “subtomo_id”.

output_filestr, optional

Output file for the statistics. If None no file will be written out. Defaults to None.

load_kwargsdict, optional

Dictionary of keyword arguments passed to the Motl.load method (and subsequently to the underlying Motl class constructors like ‘RelionMotl’ and RelionMotlv5). This is useful for providing necessary metadata like pixel_size, binning, optics_data, or custom formats (tomo_format, subtomo_format). Defaults to None.

Returns:
pandas DataFrame

Comptuted statistics of the alignment for the specified iterations.

Examples

>>> # No filtering, motls motl_1.star to motl_17.star will be loaded for evaluation. Statistics
>>> # will be written into /path/to/the/motl_alignment_stats.csv file.
>>> stats_df = compute_alignment_statistics(
...    "/path/to/the/motl_", 1, 17,
...     motl_type="stopgap", output_file="/path/to/the/motl_alignment_stats.csv"
... )
>>> # Motls motl_1.star to motl_17.star will be loaded for evaluation, no file will be written out.
>>> # Filtering will be done based on column geom3 and only particles with values in filter_rows will be evaluated.
>>> stats_df = compute_alignment_statistics(
...     "/path/to/the/motl_", 1, 17,
...     filter_rows=values_to_keep_for_motl, filter_column="geom3",
...     motl_type="stopgap"
... )
cryocat.analysis.sta.create_denovo_multiref_run(input_motl, number_of_classes, output_motl_base, input_motl_type='emmotl', class_occupancy=None, iteration_number=1, number_of_runs=1, output_motl_type='stopgap')#

Creates number_of_runs motls for reference averaging and one motl for alignment. The motls for reference averaging are created by random selection of N particles for each class from the input_motl, where N equals to class_occupancy. The particles within the classes of each motl can overlap, i.e. each class will have a unique set of particles, but some particles can be assigned in mutliple classes. The alignment motl is just input motl where the class was randomly assign to be from 1 to number_of_classes. The idea behind this is to run multi-reference alignment where different runs will have different starting references while due to simmulated annealing only one motl for alignment is needed afterwards.

Parameters:
input_motlstr, pandas dataframe or Motl

Input motl (specified either as a path, dataframe or Motl object).

number_of_classesint

Number of classes to create references for and to assign randomly to the alignment motl.

output_motl_basestr

Base path for the output motl files. The final name will be created as output_motl_base_ref_mr#runID_iterationNumber where runID is from 1 to number_of_runs and iterationNumber is iteration_number. The alignment motl will be named output_motl_base_iterationNumber. In both cases, the extension will be determined based on the output_motl_type.

input_motl_typestr (emmotl|stopgap|relion|relion5|relion5_1), default=”emmotl”

Type of the input motl file. Defaults to “emmotl”.

class_occupancyint, optional

Number of particles per class for the reference averaging motls. If None, the number is determined as 1/10 of total number of particles in the input motl. Defaults to None.

iteration_numberint, default=1

Iteration number to be used in the output name creation. Defaults to 1.

number_of_runsint, default=1

Number of motls to create. Defaults to 1.

output_motl_typestr (stopgap|emmotl|relion), default=”stopgap”

Type of the output motl file. Defaults to “stopgap”.

Returns:
None

Examples

>>> # Will create two motls in stopgap format with names stopgap_dn_ref_mr1_4.star and stopgap_dn_ref_mr2_4.star for
>>> # reference averaging and one alignment motl stopgap_dn_4.star. In each motl, the particles will have 8 classes.
>>> # The alignment motl will have same number of particles as the input_motl, the reference motls will have
>>> # number_of_classes * class_occupancy (16 000) particles each.
>>> create_denovo_multiref_run(
... "/path/to/relion_1.star", number_of_classes=8, output_motl_base="stopgap_dn",
... input_motl_type="relion", class_occupancy = 2000, iteration_number=4, number_of_runs=2,
... output_motl_type="stopgap"
... )
cryocat.analysis.sta.create_multiref_run(input_motl, number_of_classes, output_motl_base, input_motl_type='emmotl', iteration_number=1, number_of_runs=1, output_motl_type='stopgap')#

Creates motls for multiple runs of a multi-reference alignment. In essence, it will randomly assign specified number of classes to each motl that will be created. New motls will be written out into files output_motl_base_mr#runID_iterationNumber either in stopgap, emmotl or relion format.

Parameters:
input_motlstr, pandas dataframe or Motl

Input motl (specified either as a path, dataframe or Motl object).

number_of_classesint

Number of classes to assign randomly.

output_motl_basestr

Base path for the output motl files. The final name will be created as output_motl_base_mr#runID_iterationNumber where runID is from 1 to number_of_runs and iterationNumber is iteration_number. The extension will be determined based on the output_motl_type.

input_motl_typestr (emmotl|stopgap|relion|relion5|relion5_1), default=”emmotl”

Type of the input motl file. Defaults to “emmotl”.

iteration_numberint, default=1

Iteration number to be used in the output name creation. Defaults to 1.

number_of_runsint, default=1

Number of motls to create. Defaults to 1.

output_motl_typestr (stopgap|emmotl|relion), default=”stopgap”

Type of the output motl file. Defaults to “stopgap”.

Returns:
None

Examples

>>> # Will create two motls in stopgap format with names stopgap_classes_mr1_4.star and stopgap_classes_mr2_4.star
>>> create_multiref_run(
... "/path/to/relion_1.star", number_of_classes=8, output_motl_base="stopgap_classes",
... input_motl_type="relion", iteration_number=4, number_of_runs=2,
... output_motl_type="stopgap"
... )
cryocat.analysis.sta.evaluate_alignment(motl_base_names, start_it, end_it, motl_type='stopgap', write_out_stats=False, plot_values=True, filter_rows=None, filter_columns='subtomo_id', labels=None, graph_title='Alignment stability', graph_output_file=None, load_kwargs=None)#

Evaluate alignment stability for specified motls and iterations.

Parameters:
motl_base_namesstr or list

List of MOTL base names or a single motl base name to perform the evaluation on. Base name means without the iteration number and extension. For example for name motl_shift_3.em the base name is motl_shift_.

start_itint

Starting iteration number.

end_itint

Ending iteration number.

motl_typestr (stopgap|emmotl|relion), default=”stopgap”

Type of the input motl. Defaults to “stopgap”.

write_out_statsbool, default=False

Whether to write out stats. If True, the stats will be written to the motl_base_name + _as_motlID.csv where the motlID is given by its position in the motl_base_names list. For example, for motl_shift_3.em the final will be motl_shift_as_1.em if the motl_shift_ is the first motl in the motl_base_names. Defaults to False.

plot_valuesbool, default=True

Whether to plot values. Defaults to True.

filter_rowsarray-like or list of array-like, optional

Rows to filter. Only rows that are within the filter_rows will be kept. Defaults to None which means no filtering.

filter_columnsstr or list, default=”subtomo_id”

Column names based on which the filtering is perfomed. If fitler_rows is None, no filtering will be done and this parameter will not be used. Defaults to “subtomo_id”.

labelsstr or list, optional

Labels for the plot. Should have the same length as the motl_base_names. In case of None, the labels will be automatically set as motl_base_names (in case those names contain paths, the paths will be removed). Used only if plot_values is True. Defaults to None.

graph_titlestr, default=”Alignment stability”

Title of the graph. Used only if plot_values is True. Defaults to “Alignment stability”.

graph_output_filestr, optional

Output file for the graph. Used only if plot_values is True. If None no file will be written out. Defaults to None.

load_kwargsdict, optional

Dictionary of keyword arguments passed to the Motl.load method (and subsequently to the underlying Motl class constructors like ‘RelionMotl’ and RelionMotlv5). This is useful for providing necessary metadata like pixel_size, binning, optics_data, or custom formats (tomo_format, subtomo_format). Defaults to None.

Returns:
list of pandas DataFrames

List of computed alignment stability statistics dataframes.

Examples

>>> # Single motl, no filtering, motls motl_1.star to motl_17.star will be loaded for evaluation. Statistics
>>> # will be written into /path/to/the/motl_as_1.csv file.
>>> motl_base_name = "/path/to/the/motl_"
>>> stats_df = evaluate_alignment(motl_base_name, 1, 17, motl_type="stopgap", plot_values=True,write_out_stats=True)
>>> # Multiple motls, no filtering, motls motl1_1.star to motl1_17.star and motl3_1.star to motl3_17.star
>>> # will be loaded for evaluation. Statistics will be written into /path/to/the/motl1_as_1.csv and
>>> # /path/to/the/motl3_as_2.csv files.
>>> motl_base_names = ["/path/to/the/motl1_", "/path/to/the/motl3_"]
>>> stats_df = evaluate_alignment(motl_base_names, 1, 17, motl_type="stopgap", plot_values=True, write_out_stats=True)
>>> # Multiple motls, motls motl1_1.star to motl1_17.star and motl3_1.star to motl3_17.star will be loaded for
>>> # evaluation. Statistics will be written into /path/to/the/motl1_as_1.csv and /path/to/the/motl3_as_2.csv files.
>>> # Filtering will be done based on column geom3 and only particles with values in filter_rows will be evaluated.
>>> motl_base_names = ["/path/to/the/motl1_", "/path/to/the/motl3_"]
>>> filter_rows = [values_to_keep_for_motl1, values_to_keep_for_motl3]
>>> stats_df = evaluate_alignment(
...     motl_base_names, 1, 17,
...     filter_rows=filter_rows, filter_column="geom3",
...     motl_type="stopgap", plot_values=True, write_out_stats=True
... )
>>> # Multiple motls, motls motl1_1.star to motl1_17.star and motl3_1.star to motl3_17.star will be loaded for
>>> # evaluation. Statistics will be written into /path/to/the/motl1_as_1.csv and /path/to/the/motl3_as_2.csv files.
>>> # Filtering will be done based on column geom3 for motl1 and based on subtomo_id for motl3.
>>> # Only particles with values in filter_rows will be evaluated.
>>> motl_base_names = ["/path/to/the/motl1_", "/path/to/the/motl3_"]
>>> filter_rows = [values_to_keep_for_motl1, values_to_keep_for_motl3]
>>> filter_column = ["geom3", "subtomo_id"]
>>> stats_df = evaluate_alignment(
...     motl_base_names, 1, 17,
...     filter_rows=filter_rows, filter_column=filter_column,
...     motl_type="stopgap", plot_values=True, write_out_stats=True
... )
>>> # Multiple motls, motls motl1_1.star to motl1_17.star and motl3_1.star to motl3_17.star will be loaded for
>>> # evaluation. Statistics will be written into /path/to/the/motl1_as_1.csv and /path/to/the/motl3_as_2.csv files.
>>> # Filtering will be done based on column geom3 for motl1 and no filtering will be done for motl3.
>>> # Only particles with values in filter_rows will be evaluated.
>>> motl_base_names = ["/path/to/the/motl1_", "/path/to/the/motl3_"]
>>> filter_rows = [values_to_keep_for_motl1, None]
>>> filter_column = ["geom3", None]
>>> stats_df = evaluate_alignment(
...     motl_base_names, 1, 17,
...     filter_rows=filter_rows, filter_column=filter_column,
...     motl_type="stopgap", plot_values=True, write_out_stats=True
... )
cryocat.analysis.sta.evaluate_classification(motl_base_name, start_it, end_it, motl_type='stopgap', output_file_stats=None, plot_results=False, output_file_graphs=None, load_kwargs=None)#

Get the occupancy of each class over the iterations and the class stability of subtomograms over iterations.

Parameters:
motl_base_namestr

Base name for a motl to perform the evaluation on. Base name means without the iteration number and extension. For example for name motl_shift_3.em the base name is motl_shift_.

start_itint

Starting iteration number.

end_itint

Ending iteration number.

motl_typestr (stopgap|emmotl|relion|relion5|relion5_1), default=”stopgap”

Type of the input motl. Defaults to “stopgap”.

output_file_statsstr, optional

Name of the file into which the results will be written out. If None, no results will be written out. Defaults to None.

plot_results: bool, default=False

Whether to plot the results. Defaults to False.

output_file_graphs: str, optional

Name of the file into which the plotted graphs will be written out. If None, the graphs will not be written out. If plot_results is False, this parameter is unused. Defaults to None.

load_kwargsdict, optional

Dictionary of keyword arguments passed to the Motl.load method (and subsequently to the underlying Motl class constructors like ‘RelionMotl’ and RelionMotlv5). This is useful for providing necessary metadata like pixel_size, binning, optics_data, or custom formats (tomo_format, subtomo_format). Defaults to None.

Returns
——-
occupancydict

A dictionary containing the occupancy of each class over the iterations.

changing_subtomosdict

A dictionary containing the number of different subtomogram IDs for each class over iterations.

cryocat.analysis.sta.evaluate_multirun_stability(input_motls, input_motl_type='stopgap')#

Evaluate how many particles ended up within the same class among all the classification runs. It is meant to be used for multiruns with existing references (i.e. not de novo ones) where all runs uses the same references in the same order.

Parameters:
input_motls: list

List of input motl files. At least two are required.

motl_typestr (stopgap|emmotl|relion|relion5|relion5_1), default=”stopgap”

Type of the input motl. Defaults to “stopgap”.

Returns:
common_occupanciesdict

A dictionary containing common subtomo_ids for each class of particles.

cryocat.analysis.sta.get_class_occupancy(motl_base_name, start_it, end_it, motl_type='stopgap', load_kwargs=None)#

Get the occupancy of each class over the iterations.

Parameters:
motl_base_namestr

Base name for a motl to perform the evaluation on. Base name means without the iteration number and extension. For example for name motl_shift_3.em the base name is motl_shift_.

start_itint

Starting iteration number.

end_itint

Ending iteration number.

motl_typestr (stopgap|emmotl|relion|relion5|relion5_1), default=”stopgap”

Type of the input motl. Defaults to “stopgap”.

load_kwargsdict, optional

Dictionary of keyword arguments passed to the Motl.load method (and subsequently to the underlying Motl class constructors like ‘RelionMotl’ and RelionMotlv5). This is useful for providing necessary metadata like pixel_size, binning, optics_data, or custom formats (tomo_format, subtomo_format). Defaults to None.

Returns:
occupancydict

A dictionary containing the occupancy of each class over the iterations.

Notes

Loading of many motls can take some time. If you also want to compute stability of classes it is recommended to use cryocat.sta.evaluate_classification() which gives both occupancy and stability and reads in all the motls only once.

cryocat.analysis.sta.get_motl_extension(motl_type)#

Return the file extension for a given motl type.

Parameters:
motl_typestr (emmotl|relion|relion5|relion5_1|stopgap)

The type of motl file.

Returns:
str

The file extension corresponding to the motl type.

Raises:
ValueError

If the motl type is not supported.

cryocat.analysis.sta.get_motl_filename(motl_base_name, iteration, motl_type)#
cryocat.analysis.sta.get_stable_particles(motl_base_name, start_it, end_it, motl_type='emmotl', load_kwargs=None)#

Load and analyze particle data across multiple iterations to identify stable particles, i.e. particles that do not change their class.

Parameters:
motl_base_namestr

Base name for a motl to perform the evaluation on. Base name means without the iteration number and extension. For example for name motl_shift_3.em the base name is motl_shift_.

start_itint

Starting iteration number.

end_itint

Ending iteration number.

motl_typestr (stopgap|emmotl|relion|relion5|relion5_1), default=”stopgap”

Type of the input motl. Defaults to “stopgap”.

load_kwargsdict, optional

Dictionary of keyword arguments passed to the Motl.load method (and subsequently to the underlying Motl class constructors like ‘RelionMotl’ and RelionMotlv5). This is useful for providing necessary metadata like pixel_size, binning, optics_data, or custom formats (tomo_format, subtomo_format). Defaults to None.

Returns
——-
list

List of subtomo_ids that have the same class across the specified iterations.

Notes

This function loads motive list files from specified iterations, merges them, and identifies subtomo_ids (subtomogram identifiers) that have a consistent class across all iterations. The percentage of stable particles relative to the total number of particles in the first iteration is printed.

cryocat.analysis.sta.get_subtomos_class_stability(motl_base_name, start_it, end_it, motl_type='stopgap', load_kwargs=None)#

Calculate the class stability of subtomograms over iterations.

Parameters:
motl_base_namestr

Base name for a motl to perform the evaluation on. Base name means without the iteration number and extension. For example for name motl_shift_3.em the base name is motl_shift_.

start_itint

Starting iteration number.

end_itint

Ending iteration number.

motl_typestr (stopgap|emmotl|relion|relion5|relion5_1), default=”stopgap”

Type of the input motl. Defaults to “stopgap”.

load_kwargsdict, optional

Dictionary of keyword arguments passed to the Motl.load method (and subsequently to the underlying Motl class constructors like ‘RelionMotl’ and RelionMotlv5). This is useful for providing necessary metadata like pixel_size, binning, optics_data, or custom formats (tomo_format, subtomo_format). Defaults to None.

Returns:
different_sidsdict

A dictionary containing the number of different subtomogram IDs for each class over iterations.

Notes

Loading of many motls can take some time. If you also want to compute occupancy of classes it is recommended to use cryocat.sta.evaluate_classification() which gives both occupancy and stability and reads in all the motls only once.

cryocat.analysis.sta.write_out_motl(input_motl, output_file_base, output_motl_type)#

Writes out a given motl file to a specified output format.

Parameters:
input_motlmotl

Input motl file to be written out.

output_file_basestr

Base name for the output file.

output_motl_typestr (emfile|relion|relion5|relion5_1|stopgap)

Type of the output motl file.

Returns:
None
Raises:
ValueError

If the output_motl_type is not one of the supported types.