Peak Analysis#

  • The peak analysis module provides tools to analyze the template candidates for template matching in STOPAGAP or GAPStopTM. The results can be used to set up the parameters for the template matching.

  • The example files for this tutorial can be found here. The expected output is here.

Template file#

  • The template file is used to specify the inputs and also write out some of the results of the peak analysis.

  • It is a *.csv file that stores the table containing columns that has to be set by users as well as columns that will be filled during the analysis.

  • Internally, the table is loaded as pandas DataFrame.

  • IMPORTANT during the peak analysis, the file has to be closed otherwise it cannot be written into which will results in “permission denied” error.

Structure#

  • The first column (without the header) is the unique id of each experiment.

  • Done (True/False): Whether the main run_analysis is completed. When run_analysis is done, it is set to True. For other function if False, the row will be skipped.

  • Following columns that need to be set by user:

    • Structure (string): name of the structure (i.e. ribosome, npc whole, npc, …). The inputs related to this structure should be in the folder parent_folder_path/structure.

    • Map type (string): the type of the template, e.g. sta (map from subtomo), sta_sg from stopgram subtomo, model etc. - not really used in the analysis, but can be useful for filtering.

    • Template (string): name of the template (has to be in em format); without .em extension.

    • Mask (string): mask to use for the analysis (the same soft mask as used in TM).

    • Mask is tight (True/False): whether the Mask is tight or not.

    • Tight mask (string): name of the sharp, very tight mask used for voxel counting and bounding box measurements. It has to be .em file format, without extension.

    • Compare (string): type of structure to compare the tempalte to; ‘tmpl’ will compare the template to itself, ‘subtomo’ to subtomogram from ‘Tomo map’, structure_name to a different structure (e.g. set it to ‘ribosome’ will compare the tempalte to the ribosome specified in ‘Tomo map’).

    • Tomo map (string): name of the “tomogram” map (can be subtomogram, different sta map) in .em format (wihtout .em extension). This map is fixed, while the “Template” map is being rotated. It has to be localized in parent_folder_path/structure/. For subtomo this field is filled automatically upon calling create_subtomograms_for_tm.

    • Symmetry (int): C symmetry

    • Apply wedge (True/False): whether to apply wedge compensation or not. Relevant only for subtomo comparison. Should be set to True for normal cases.

    • Angles (string): name (including the csv extension) of the angle_list file that should be used for the analysis. It should be located in angle_list_path

    • Degrees (int): angular step in degrees.

    • Apply angular offset (True/False): whether to apply additional offset w.r.t. to starting angle (e.g. to check how sensitive the peak value is to this). If True, the half of the “Degrees” will be used to introduce the maximal offset for given angular step.

    • Phi, Theta, Psi (floats): Starting angles for subtomo analysis (to have starting position at 0 difference). Is automatically filled by create_subtomograms_for_tm function. They are irrelevant for tmpl and other structure type of comparison.

    • Binning (int): the binning of the template.

    • Pixelsize (float): the voxel size of the tempalte in Angstroms.

    • Boxsize (int): size of the dimension of the template .em file.

    • Motl (string): Only for subtomo comparison. Motl file (in .em format but without extension) to be used to localize the best subtomogram position and orientation in tomogram. Should be in parent_folder_path/structure/. Used in create_subtomograms_for_tm.

    • Tomo created (True/False): Only for subtomo comparision. Used in create_subtomograms_for_tm - if False the subtomogram will be created and it will set it to True. If True it will not be created again.

    • Tomogram (string): name of the tomogram (localized in) parent_folder_path folder. It has to be .mrc format and the extension of the file has to be .mrc (not .rec), the extension is not specified in the name.

  • The rest of the columns in the *.csv files will be filled during the analysis:

    • Output folder (string): name of the output folder for all the results. To ensure uniqueness it is created as id_#id_results. Is filled automatically by run_analysis.

    • Voxels (int): number of voxels in the soft mask (“Mask”). Filled in by get_mask_stats.

    • Voxels TM (int): number of the voxels in the sharp mask (“Tight mask”). Filled in by get_mask_stats.

    • Dim x, y, z (ints): dimensions of the structure tight bounding box computed from the “Tight mask”. Filled in by get_mask_stats.

    • Solidity (float): Solidity of the “Tight mask”, computed as number of filled voxels divided by volume of the convex hull. Filled in by get_mask_stats.

    • Peak value (float): value of peak in _scores.em. Filled by compute_center_peak_stats_and_profiles.

    • Peak x, y, z (ints): position of the peak in the scores.em map. Filled by compute_center_peak_stats_and_profiles.

    • VC dist_all (int): voxel count of the label corresponding to the peak position from the _dist_all.em distance map (the label is written out as _dist_all_label.em). Computed by compute_dist_maps_voxels.

    • VC dist_normals (int): voxel count of the label corresponding to the peak position from the _dist_normals.em distance map (the label is written out as _dist_normals_label.em). Computed by compute_dist_maps_voxels.

    • VC dist_inplane (int): voxel count of the label corresponding to the peak position from the _dist_inplane.em distance map (the label is written out as _dist_inplane_label.em). Computed by compute_dist_maps_voxels.

    • Solidity dist_all (float): Solidity of the label of _dist_all_label.em. Computed by compute_dist_maps_voxels.

    • Solidity dist_normals (float): Solidity of the label of _dist_normals_label.em. Computed by compute_dist_maps_voxels.

    • Solidity dist_inplane (float): Solidity of the label of _dist_inplane_label.em. Computed by compute_dist_maps_voxels.

    • VCO dist_all (int): same as VC dist_all but morphological opening was performed on the label (_dist_all_label_open.em). Computed by compute_dist_maps_voxels.

    • VCO dist_normals (int): same as VC dist_normals but morphological opening was performed on the label (_dist_normals_label_open.em). Computed by compute_dist_maps_voxels.

    • VCO dist_inplane (int): same as VC dist_inplane but morphological opening was performed on the label (_dist_inplane_label_open.em). Computed by compute_dist_maps_voxels.

    • O dist_all x, y, z (ints): size of the bounding box of _dist_all_label_open.em. Computed by compute_dist_maps_voxels.

    • O dist_normals x, y, z (ints): size of the bounding box of _dist_normals_label_open.em. Computed by compute_dist_maps_voxels.

    • O dist_inplane x, y, z (ints): size of the bounding box of _dist_inplane_label_open.em. Computed by compute_dist_maps_voxels.

    • Drop x,y,z (floats): drop of the voxels neigbouring the peak (connectivity 1), computed as (v[px-1]+v[px+1])/2, where px is peak center in x. Computed by compute_center_peak_stats_and_profiles.

    • Mean 1-5 (floats): mean values of the peak surroundings (1 is for sphere of radius one, 5 for radius 5) Computed by compute_center_peak_stats_and_profiles.

    • Median 1-5 (floats): median values of the peak surroundings (1 is for sphere of radius one, 5 for radius 5) Computed by compute_center_peak_stats_and_profiles.

    • Var 1-5 (floats): variance values of the peak surroundings (1 is for sphere of radius one, 5 for radius 5) Computed by compute_center_peak_stats_and_profiles.

Setup the notebook#

[1]:
%load_ext autoreload
%autoreload 2
[2]:
from cryocat import pana
[3]:
import warnings
warnings.filterwarnings('ignore')

Setup paths#

[ ]:
parent_folder_path = './inputs/'
angle_list_path = './inputs/'
template_list = './inputs/template_list.csv'
wedge_path = './inputs/'

Full analysis#

Set indices#

  • Set indices to run the analysis on - it has to be a list

[ ]:
indices = [0,1,2,3]

Extract subtomograms#

  • Find the best subtomo (based on CC) and cuts it out to prepare it for the peak analysis

  • It does not take the indices - instead it checks if “Tomo created” is True or False and gets subtomos for all False ones

  • The subtomo is stored in parent_folder_path/structure_name/subtomo_name.em where subtomo_name is created based on motl name and tomo name (all in template csv)

[ ]:
pana.create_subtomograms_for_tm(template_list, parent_folder_path)

Run peak analysis#

  • Run basic analysis on peak - the inputs are specified in the csv file with template

  • Creates _scores.em, _angles.em, _dist_all.em, _dist_normals.em, _dist_inplane.em, and .csv file with basic stats - the mask overlap is for the soft mask used in TM

[ ]:
pana.run_analysis(template_list, indices, angle_list_path, wedge_path, parent_folder_path)

Analysis of distance maps#

  • Analysis of distance maps - it will get the area around the scores peak and label it, counts voxels, solidity and bounding boxes for all three distance maps

  • Creates labeled dist maps: _dist_all_label.em, _dist_normals_label.em, _dist_inplane_label.em, _dist_all_label_open.em, _dist_normals_label_open.em, _dist_inplane_label_open.em

[ ]:
pana.compute_dist_maps_voxels(template_list, indices, parent_folder_path)

Mask statistics#

  • Compute basic statistics on tight masks

[ ]:
pana.get_mask_stats(template_list, indices, parent_folder_path)

Peak line profiles and statistics#

  • Get peak statistics

  • Creates id_5_peak_line_profiles.csv file with peak profiles in x,y,z

[ ]:
pana.compute_center_peak_stats_and_profiles(template_list, indices, parent_folder_path)

Tight mask overlap#

  • Sharp mask overlap is not computed during the analysis (only the overlap of the mask used for TM is)

  • To compute the sharp and very tight mask overlap one can run pana.compute_sharp_mask_overlap(template_list,indices, angle_list_path, parent_folder_path)

  • Since it can be time consuming for large boxes the following function first check if the same analysis

  • was not already done on the same tight mask and angles and if so, just copies the results, otherwise it computes from scratch

[ ]:
pana.check_existing_tight_mask_values(template_list, indices, parent_folder_path, angle_list_path)

Angular histograms#

  • Additional analysis on angles

  • Creates histogram of scores values and peak value dependency on different angles

  • Creates outputs in _gradual_angles_analysis.csv and ‘_gradual_angles_histograms.csv files in the output folders

[ ]:
pana.run_angle_analysis(template_list, indices, wedge_path, parent_folder_path, write_output = True)

Summary PDF#

  • Create pdf summary - fully based on the csv file

  • Creates _summary.pdf

[ ]:
pana.create_summary_pdf(template_list, indices, parent_folder_path)