sta#
- cryocat.analysis.sta.compute_alignment_statistics(motl_base_name, start_it, end_it, motl_type='stopgap', filter_rows=None, filter_column='subtomo_id', output_file=None, load_kwargs=None)#
Compute alignment statistics for specified motls and iterations. Pairs of (current motl, subsequent motl) are evaluated for differences in cone angles, in-plane angles, change in positions of particles and root mean square errors (RMSE) in x, y, and z directions. The output contains mean, median, std, and variance for cone and in-plane angles, the mean distance between the particles and the RMSE of movement in x, y, and z directions.
- Parameters:
- motl_base_namestr
Base name for a motl to perform the evaluation on. Base name means without the iteration number and extension. For example for name motl_shift_3.em the base name is motl_shift_.
- start_itint
Starting iteration number.
- end_itint
Ending iteration number.
- motl_typestr (stopgap|emmotl|relion|relion5|relion5_1), default=”stopgap”
Type of the input motl. Defaults to “stopgap”.
- filter_rowsarray-like, optional
Rows to filter. Only rows that are within the filter_rows will be kept. Defaults to None which means no filtering.
- filter_columnsstr, default=”subtomo_id”
Column names based on which the filtering is perfomed. If fitler_rows is None, no filtering will be done and this parameter will not be used. Defaults to “subtomo_id”.
- output_filestr, optional
Output file for the statistics. If None no file will be written out. Defaults to None.
- load_kwargsdict, optional
Dictionary of keyword arguments passed to the
Motl.loadmethod (and subsequently to the underlying Motl class constructors like ‘RelionMotl’ andRelionMotlv5). This is useful for providing necessary metadata likepixel_size,binning,optics_data, or custom formats (tomo_format,subtomo_format). Defaults to None.
- Returns:
- pandas DataFrame
Comptuted statistics of the alignment for the specified iterations.
Examples
>>> # No filtering, motls motl_1.star to motl_17.star will be loaded for evaluation. Statistics >>> # will be written into /path/to/the/motl_alignment_stats.csv file. >>> stats_df = compute_alignment_statistics( ... "/path/to/the/motl_", 1, 17, ... motl_type="stopgap", output_file="/path/to/the/motl_alignment_stats.csv" ... )
>>> # Motls motl_1.star to motl_17.star will be loaded for evaluation, no file will be written out. >>> # Filtering will be done based on column geom3 and only particles with values in filter_rows will be evaluated. >>> stats_df = compute_alignment_statistics( ... "/path/to/the/motl_", 1, 17, ... filter_rows=values_to_keep_for_motl, filter_column="geom3", ... motl_type="stopgap" ... )
- cryocat.analysis.sta.create_denovo_multiref_run(input_motl, number_of_classes, output_motl_base, input_motl_type='emmotl', class_occupancy=None, iteration_number=1, number_of_runs=1, output_motl_type='stopgap')#
Creates number_of_runs motls for reference averaging and one motl for alignment. The motls for reference averaging are created by random selection of N particles for each class from the input_motl, where N equals to class_occupancy. The particles within the classes of each motl can overlap, i.e. each class will have a unique set of particles, but some particles can be assigned in mutliple classes. The alignment motl is just input motl where the class was randomly assign to be from 1 to number_of_classes. The idea behind this is to run multi-reference alignment where different runs will have different starting references while due to simmulated annealing only one motl for alignment is needed afterwards.
- Parameters:
- input_motlstr, pandas dataframe or Motl
Input motl (specified either as a path, dataframe or Motl object).
- number_of_classesint
Number of classes to create references for and to assign randomly to the alignment motl.
- output_motl_basestr
Base path for the output motl files. The final name will be created as output_motl_base_ref_mr#runID_iterationNumber where runID is from 1 to number_of_runs and iterationNumber is iteration_number. The alignment motl will be named output_motl_base_iterationNumber. In both cases, the extension will be determined based on the output_motl_type.
- input_motl_typestr (emmotl|stopgap|relion|relion5|relion5_1), default=”emmotl”
Type of the input motl file. Defaults to “emmotl”.
- class_occupancyint, optional
Number of particles per class for the reference averaging motls. If None, the number is determined as 1/10 of total number of particles in the input motl. Defaults to None.
- iteration_numberint, default=1
Iteration number to be used in the output name creation. Defaults to 1.
- number_of_runsint, default=1
Number of motls to create. Defaults to 1.
- output_motl_typestr (stopgap|emmotl|relion), default=”stopgap”
Type of the output motl file. Defaults to “stopgap”.
- Returns:
- None
Examples
>>> # Will create two motls in stopgap format with names stopgap_dn_ref_mr1_4.star and stopgap_dn_ref_mr2_4.star for >>> # reference averaging and one alignment motl stopgap_dn_4.star. In each motl, the particles will have 8 classes. >>> # The alignment motl will have same number of particles as the input_motl, the reference motls will have >>> # number_of_classes * class_occupancy (16 000) particles each. >>> create_denovo_multiref_run( ... "/path/to/relion_1.star", number_of_classes=8, output_motl_base="stopgap_dn", ... input_motl_type="relion", class_occupancy = 2000, iteration_number=4, number_of_runs=2, ... output_motl_type="stopgap" ... )
- cryocat.analysis.sta.create_multiref_run(input_motl, number_of_classes, output_motl_base, input_motl_type='emmotl', iteration_number=1, number_of_runs=1, output_motl_type='stopgap')#
Creates motls for multiple runs of a multi-reference alignment. In essence, it will randomly assign specified number of classes to each motl that will be created. New motls will be written out into files output_motl_base_mr#runID_iterationNumber either in stopgap, emmotl or relion format.
- Parameters:
- input_motlstr, pandas dataframe or Motl
Input motl (specified either as a path, dataframe or Motl object).
- number_of_classesint
Number of classes to assign randomly.
- output_motl_basestr
Base path for the output motl files. The final name will be created as output_motl_base_mr#runID_iterationNumber where runID is from 1 to number_of_runs and iterationNumber is iteration_number. The extension will be determined based on the output_motl_type.
- input_motl_typestr (emmotl|stopgap|relion|relion5|relion5_1), default=”emmotl”
Type of the input motl file. Defaults to “emmotl”.
- iteration_numberint, default=1
Iteration number to be used in the output name creation. Defaults to 1.
- number_of_runsint, default=1
Number of motls to create. Defaults to 1.
- output_motl_typestr (stopgap|emmotl|relion), default=”stopgap”
Type of the output motl file. Defaults to “stopgap”.
- Returns:
- None
Examples
>>> # Will create two motls in stopgap format with names stopgap_classes_mr1_4.star and stopgap_classes_mr2_4.star >>> create_multiref_run( ... "/path/to/relion_1.star", number_of_classes=8, output_motl_base="stopgap_classes", ... input_motl_type="relion", iteration_number=4, number_of_runs=2, ... output_motl_type="stopgap" ... )
- cryocat.analysis.sta.evaluate_alignment(motl_base_names, start_it, end_it, motl_type='stopgap', write_out_stats=False, plot_values=True, filter_rows=None, filter_columns='subtomo_id', labels=None, graph_title='Alignment stability', graph_output_file=None, load_kwargs=None)#
Evaluate alignment stability for specified motls and iterations.
- Parameters:
- motl_base_namesstr or list
List of MOTL base names or a single motl base name to perform the evaluation on. Base name means without the iteration number and extension. For example for name motl_shift_3.em the base name is motl_shift_.
- start_itint
Starting iteration number.
- end_itint
Ending iteration number.
- motl_typestr (stopgap|emmotl|relion), default=”stopgap”
Type of the input motl. Defaults to “stopgap”.
- write_out_statsbool, default=False
Whether to write out stats. If True, the stats will be written to the motl_base_name + _as_motlID.csv where the motlID is given by its position in the motl_base_names list. For example, for motl_shift_3.em the final will be motl_shift_as_1.em if the motl_shift_ is the first motl in the motl_base_names. Defaults to False.
- plot_valuesbool, default=True
Whether to plot values. Defaults to True.
- filter_rowsarray-like or list of array-like, optional
Rows to filter. Only rows that are within the filter_rows will be kept. Defaults to None which means no filtering.
- filter_columnsstr or list, default=”subtomo_id”
Column names based on which the filtering is perfomed. If fitler_rows is None, no filtering will be done and this parameter will not be used. Defaults to “subtomo_id”.
- labelsstr or list, optional
Labels for the plot. Should have the same length as the motl_base_names. In case of None, the labels will be automatically set as motl_base_names (in case those names contain paths, the paths will be removed). Used only if plot_values is True. Defaults to None.
- graph_titlestr, default=”Alignment stability”
Title of the graph. Used only if plot_values is True. Defaults to “Alignment stability”.
- graph_output_filestr, optional
Output file for the graph. Used only if plot_values is True. If None no file will be written out. Defaults to None.
- load_kwargsdict, optional
Dictionary of keyword arguments passed to the
Motl.loadmethod (and subsequently to the underlying Motl class constructors like ‘RelionMotl’ andRelionMotlv5). This is useful for providing necessary metadata likepixel_size,binning,optics_data, or custom formats (tomo_format,subtomo_format). Defaults to None.
- Returns:
- list of pandas DataFrames
List of computed alignment stability statistics dataframes.
Examples
>>> # Single motl, no filtering, motls motl_1.star to motl_17.star will be loaded for evaluation. Statistics >>> # will be written into /path/to/the/motl_as_1.csv file. >>> motl_base_name = "/path/to/the/motl_" >>> stats_df = evaluate_alignment(motl_base_name, 1, 17, motl_type="stopgap", plot_values=True,write_out_stats=True)
>>> # Multiple motls, no filtering, motls motl1_1.star to motl1_17.star and motl3_1.star to motl3_17.star >>> # will be loaded for evaluation. Statistics will be written into /path/to/the/motl1_as_1.csv and >>> # /path/to/the/motl3_as_2.csv files. >>> motl_base_names = ["/path/to/the/motl1_", "/path/to/the/motl3_"] >>> stats_df = evaluate_alignment(motl_base_names, 1, 17, motl_type="stopgap", plot_values=True, write_out_stats=True)
>>> # Multiple motls, motls motl1_1.star to motl1_17.star and motl3_1.star to motl3_17.star will be loaded for >>> # evaluation. Statistics will be written into /path/to/the/motl1_as_1.csv and /path/to/the/motl3_as_2.csv files. >>> # Filtering will be done based on column geom3 and only particles with values in filter_rows will be evaluated. >>> motl_base_names = ["/path/to/the/motl1_", "/path/to/the/motl3_"] >>> filter_rows = [values_to_keep_for_motl1, values_to_keep_for_motl3] >>> stats_df = evaluate_alignment( ... motl_base_names, 1, 17, ... filter_rows=filter_rows, filter_column="geom3", ... motl_type="stopgap", plot_values=True, write_out_stats=True ... )
>>> # Multiple motls, motls motl1_1.star to motl1_17.star and motl3_1.star to motl3_17.star will be loaded for >>> # evaluation. Statistics will be written into /path/to/the/motl1_as_1.csv and /path/to/the/motl3_as_2.csv files. >>> # Filtering will be done based on column geom3 for motl1 and based on subtomo_id for motl3. >>> # Only particles with values in filter_rows will be evaluated. >>> motl_base_names = ["/path/to/the/motl1_", "/path/to/the/motl3_"] >>> filter_rows = [values_to_keep_for_motl1, values_to_keep_for_motl3] >>> filter_column = ["geom3", "subtomo_id"] >>> stats_df = evaluate_alignment( ... motl_base_names, 1, 17, ... filter_rows=filter_rows, filter_column=filter_column, ... motl_type="stopgap", plot_values=True, write_out_stats=True ... )
>>> # Multiple motls, motls motl1_1.star to motl1_17.star and motl3_1.star to motl3_17.star will be loaded for >>> # evaluation. Statistics will be written into /path/to/the/motl1_as_1.csv and /path/to/the/motl3_as_2.csv files. >>> # Filtering will be done based on column geom3 for motl1 and no filtering will be done for motl3. >>> # Only particles with values in filter_rows will be evaluated. >>> motl_base_names = ["/path/to/the/motl1_", "/path/to/the/motl3_"] >>> filter_rows = [values_to_keep_for_motl1, None] >>> filter_column = ["geom3", None] >>> stats_df = evaluate_alignment( ... motl_base_names, 1, 17, ... filter_rows=filter_rows, filter_column=filter_column, ... motl_type="stopgap", plot_values=True, write_out_stats=True ... )
- cryocat.analysis.sta.evaluate_classification(motl_base_name, start_it, end_it, motl_type='stopgap', output_file_stats=None, plot_results=False, output_file_graphs=None, load_kwargs=None)#
Get the occupancy of each class over the iterations and the class stability of subtomograms over iterations.
- Parameters:
- motl_base_namestr
Base name for a motl to perform the evaluation on. Base name means without the iteration number and extension. For example for name motl_shift_3.em the base name is motl_shift_.
- start_itint
Starting iteration number.
- end_itint
Ending iteration number.
- motl_typestr (stopgap|emmotl|relion|relion5|relion5_1), default=”stopgap”
Type of the input motl. Defaults to “stopgap”.
- output_file_statsstr, optional
Name of the file into which the results will be written out. If None, no results will be written out. Defaults to None.
- plot_results: bool, default=False
Whether to plot the results. Defaults to False.
- output_file_graphs: str, optional
Name of the file into which the plotted graphs will be written out. If None, the graphs will not be written out. If plot_results is False, this parameter is unused. Defaults to None.
- load_kwargsdict, optional
Dictionary of keyword arguments passed to the
Motl.loadmethod (and subsequently to the underlying Motl class constructors like ‘RelionMotl’ andRelionMotlv5). This is useful for providing necessary metadata likepixel_size,binning,optics_data, or custom formats (tomo_format,subtomo_format). Defaults to None.- Returns
- ——-
- occupancydict
A dictionary containing the occupancy of each class over the iterations.
- changing_subtomosdict
A dictionary containing the number of different subtomogram IDs for each class over iterations.
- cryocat.analysis.sta.evaluate_multirun_stability(input_motls, input_motl_type='stopgap')#
Evaluate how many particles ended up within the same class among all the classification runs. It is meant to be used for multiruns with existing references (i.e. not de novo ones) where all runs uses the same references in the same order.
- Parameters:
- input_motls: list
List of input motl files. At least two are required.
- motl_typestr (stopgap|emmotl|relion|relion5|relion5_1), default=”stopgap”
Type of the input motl. Defaults to “stopgap”.
- Returns:
- common_occupanciesdict
A dictionary containing common subtomo_ids for each class of particles.
- cryocat.analysis.sta.get_class_occupancy(motl_base_name, start_it, end_it, motl_type='stopgap', load_kwargs=None)#
Get the occupancy of each class over the iterations.
- Parameters:
- motl_base_namestr
Base name for a motl to perform the evaluation on. Base name means without the iteration number and extension. For example for name motl_shift_3.em the base name is motl_shift_.
- start_itint
Starting iteration number.
- end_itint
Ending iteration number.
- motl_typestr (stopgap|emmotl|relion|relion5|relion5_1), default=”stopgap”
Type of the input motl. Defaults to “stopgap”.
- load_kwargsdict, optional
Dictionary of keyword arguments passed to the
Motl.loadmethod (and subsequently to the underlying Motl class constructors like ‘RelionMotl’ andRelionMotlv5). This is useful for providing necessary metadata likepixel_size,binning,optics_data, or custom formats (tomo_format,subtomo_format). Defaults to None.
- Returns:
- occupancydict
A dictionary containing the occupancy of each class over the iterations.
Notes
Loading of many motls can take some time. If you also want to compute stability of classes it is recommended to use
cryocat.sta.evaluate_classification()which gives both occupancy and stability and reads in all the motls only once.
- cryocat.analysis.sta.get_motl_extension(motl_type)#
Return the file extension for a given motl type.
- Parameters:
- motl_typestr (emmotl|relion|relion5|relion5_1|stopgap)
The type of motl file.
- Returns:
- str
The file extension corresponding to the motl type.
- Raises:
- ValueError
If the motl type is not supported.
- cryocat.analysis.sta.get_motl_filename(motl_base_name, iteration, motl_type)#
- cryocat.analysis.sta.get_stable_particles(motl_base_name, start_it, end_it, motl_type='emmotl', load_kwargs=None)#
Load and analyze particle data across multiple iterations to identify stable particles, i.e. particles that do not change their class.
- Parameters:
- motl_base_namestr
Base name for a motl to perform the evaluation on. Base name means without the iteration number and extension. For example for name motl_shift_3.em the base name is motl_shift_.
- start_itint
Starting iteration number.
- end_itint
Ending iteration number.
- motl_typestr (stopgap|emmotl|relion|relion5|relion5_1), default=”stopgap”
Type of the input motl. Defaults to “stopgap”.
- load_kwargsdict, optional
Dictionary of keyword arguments passed to the
Motl.loadmethod (and subsequently to the underlying Motl class constructors like ‘RelionMotl’ andRelionMotlv5). This is useful for providing necessary metadata likepixel_size,binning,optics_data, or custom formats (tomo_format,subtomo_format). Defaults to None.- Returns
- ——-
- list
List of subtomo_ids that have the same class across the specified iterations.
Notes
This function loads motive list files from specified iterations, merges them, and identifies subtomo_ids (subtomogram identifiers) that have a consistent class across all iterations. The percentage of stable particles relative to the total number of particles in the first iteration is printed.
- cryocat.analysis.sta.get_subtomos_class_stability(motl_base_name, start_it, end_it, motl_type='stopgap', load_kwargs=None)#
Calculate the class stability of subtomograms over iterations.
- Parameters:
- motl_base_namestr
Base name for a motl to perform the evaluation on. Base name means without the iteration number and extension. For example for name motl_shift_3.em the base name is motl_shift_.
- start_itint
Starting iteration number.
- end_itint
Ending iteration number.
- motl_typestr (stopgap|emmotl|relion|relion5|relion5_1), default=”stopgap”
Type of the input motl. Defaults to “stopgap”.
- load_kwargsdict, optional
Dictionary of keyword arguments passed to the
Motl.loadmethod (and subsequently to the underlying Motl class constructors like ‘RelionMotl’ andRelionMotlv5). This is useful for providing necessary metadata likepixel_size,binning,optics_data, or custom formats (tomo_format,subtomo_format). Defaults to None.
- Returns:
- different_sidsdict
A dictionary containing the number of different subtomogram IDs for each class over iterations.
Notes
Loading of many motls can take some time. If you also want to compute occupancy of classes it is recommended to use
cryocat.sta.evaluate_classification()which gives both occupancy and stability and reads in all the motls only once.
- cryocat.analysis.sta.write_out_motl(input_motl, output_file_base, output_motl_type)#
Writes out a given motl file to a specified output format.
- Parameters:
- input_motlmotl
Input motl file to be written out.
- output_file_basestr
Base name for the output file.
- output_motl_typestr (emfile|relion|relion5|relion5_1|stopgap)
Type of the output motl file.
- Returns:
- None
- Raises:
- ValueError
If the output_motl_type is not one of the supported types.