RelionMotl#
- class cryocat.core.cryomotl.RelionMotl(input_motl=None, version=None, pixel_size=None, binning=None, optics_data=None, tomo_format='', subtomo_format='')#
Bases:
Motl- adapt_original_entries()#
The function updates DataFrame stored in
self.relion_dfbased on the values inself.df. In case the number of particles changed (i.e.,self.dfhas less particles thanself.relion_df), the new relion_df is shortened based onccSubtomoIDfrom self.relion_df andsubtomo_idfromself.df. The shifts are set to zeros andccSubtomoIDis removed.- Parameters:
- None
- Returns:
- pandas.DataFrame
The updated version of
self.relion_df.
Notes
The size and values of
self.relion_dfare not changed.
- columns_v3_0 = ['rlnMicrographName', 'rlnCoordinateX', 'rlnCoordinateY', 'rlnCoordinateZ', 'rlnAngleRot', 'rlnAngleTilt', 'rlnAnglePsi', 'rlnImageName', 'rlnPixelSize', 'rlnRandomSubset', 'rlnOriginX', 'rlnOriginY', 'rlnOriginZ', 'rlnClassNumber']#
- columns_v3_1 = ['rlnMicrographName', 'rlnCoordinateX', 'rlnCoordinateY', 'rlnCoordinateZ', 'rlnAngleRot', 'rlnAngleTilt', 'rlnAnglePsi', 'rlnImageName', 'rlnPixelSize', 'rlnOpticsGroup', 'rlnGroupNumber', 'rlnOriginXAngst', 'rlnOriginYAngst', 'rlnOriginZAngst', 'rlnClassNumber', 'rlnRandomSubset']#
- columns_v4 = ['rlnCoordinateX', 'rlnCoordinateY', 'rlnCoordinateZ', 'rlnAngleRot', 'rlnAngleTilt', 'rlnAnglePsi', 'rlnTomoName', 'rlnTomoParticleName', 'rlnRandomSubset', 'rlnOpticsGroup', 'rlnOriginXAngst', 'rlnOriginYAngst', 'rlnOriginZAngst', 'rlnGroupNumber', 'rlnClassNumber']#
- columns_v5_1 = ['rlnCoordinateX', 'rlnCoordinateY', 'rlnCoordinateZ', 'rlnAngleRot', 'rlnAngleTilt', 'rlnAnglePsi', 'rlnMicrographName', 'rlnImageName', 'rlnPixelSize', 'rlnOpticsGroup', 'rlnGroupNumber', 'rlnOriginXAngst', 'rlnOriginYAngst', 'rlnOriginZAngst', 'rlnClassNumber', 'rlnRandomSubset']#
- convert_angles_from_relion(relion_df)#
The function converts angles from the Relion format, which corresponds to ZYZ Euler convention, to the zxz Euler convention which is used within cryoCAT.
- Parameters:
- relion_dfpandas.DataFrame
The input DataFrame ir Relion format containing the angles in ZYZ convention.
- Returns:
- None
- Raises:
- Warning
If no rotations are specified in the relion starfile.
- ValueError
If only some rotations are specified in the relion starfile.
Notes
The function modifies the “phi”, “psi”, and “theta” columns of “self.df” to store the converted angles.
- convert_angles_to_relion(relion_df)#
Converts angles from cryoCAT convention (zxz) to the convention used in Relion (ZYZ).
- Parameters:
- relion_dfpandas.DataFrame
The DataFrame containing the angles.
- Returns:
- pandas.DataFrame
DataFrame in Relion format with converted angles.
Notes
TODO: Check why ZXZ is used instead of zxz.
- convert_shifts(relion_df)#
Converts shifts from Relion format to emmotl format and stores them in self.df.
- Parameters:
- relion_dfpandas.DataFrame
DataFrame containing shifts in Relion format.
- Returns:
- None
Warning
Shifts in Relion 3.1 and higher are stored in Angstroms, not pixels/voxels. Correct pixel size is thus necessary for correct conversion. The pixel size should be set as the class attribute before calling this function.
Notes
Relion stores the shifts of the particle while in cryoCAT the shifts represent shifts of a reference.
- convert_to_motl(relion_df, version=None, optics_df=None, tomo_format='', subtomo_format='')#
The function converts a DataFrame in relion format into a motl DataFrame.
- Parameters:
- relion_dfpandas.DataFrame
DataFrame in relion format.
- versionfloat, optional
Version of Relion DataFrame. Defaults to None.
- optics_dfpandas.DataFrame, optional
DataFrame with optics data. Defaults to None
- subtomo_formatstring
- tomo_format: string
- Returns:
- None
Notes
This method modifies the
dfattribute of the object.
- create_final_output(relion_df, optics_df=None)#
Creates the final output frames and specifiers based on the given input dataframes.
- Parameters:
- relion_dfpandas.DataFrame
The dataframe containing the relion data.
- optics_dfpandas.DataFrame, optional
The dataframe containing the optics data. Defaults to None.
- Returns:
- frameslist
List of pandas.DataFrame containing all data (e.g. particle list, optics group)
- spefifierslist
List of
strcontaining the specifiers, i.e., the descriptions for the frames.
Notes
If optics_df is None, the frames and specifiers will be based on relion_df and self.data_spec.
If optics_df is not None, the frames and specifiers will be based on optics_df, relion_df, “data_optics”, and self.data_spec.
If self.version is less than 3.1, the frames and specifiers will be based on the concatenated dataframe of optics_df and relion_df (with duplicates removed) and self.data_spec.
- create_optics_group_v3_1(pixel_size=None, subtomo_size=None)#
Creates an optics group with default parameters corresponding to Relion v. 3.1.
- Parameters:
- pixel_sizefloat, optional
The pixel size of the data. If not provided, the pixel size of the object instance (
self.pixel_size) will be used. Defaults to None.- subtomo_sizeint, optional
The size of the subtomograms. If not provided, it will be set to “NaN”. Defaults to None.
- Returns:
- pandas.DataFrame
A DataFrame containing the default optics parameters.
- create_optics_group_v4(pixel_size=None, subtomo_size=None, binning=None)#
Creates an optics group with default parameters corresponding to Relion v. 4.x.
- Parameters:
- pixel_sizefloat, optional
The pixel size of the data. If not provided, the pixel size of the object instance (
self.pixel_size) will be used. Defaults to None.- subtomo_sizeint, optional
The size of the subtomograms. If not provided, it will be set to “NaN”. Defaults to None.
- binningint, optional
The binning of the subtomograms. If not provided, the binning of the object instance (
self.binning) will be used. Defaults to None.
- Returns:
- pandas.DataFrame
A DataFrame containing the default optics parameters.
- create_particles_data(version)#
Creates an empty DataFrame in Relion version-specific format with the size corresponding to
self.df.- Parameters:
- versionfloat
The version of the Relion to be used. Valid values are 3.0, 3.1, and any other value for version 4 or higher.
- Returns:
- pandas.DataFrame
The empty DataFrame with columns corresponding to the specified Relion version.
- create_relion_df(tomo_format='', subtomo_format='', use_original_entries=False, keep_all_entries=False, version=None, add_object_id=False, add_subunit_id=False, binning=None, pixel_size=None, adapt_object_attr=False)#
This function creates takes the
self.dfattribute and creates a DataFrame that is Relion format.- Parameters:
- tomo_formatstr, default=””
Format of the tomo name output format. See
cryocat.cryomotl.RelionMotl.prepare_particles_data()for more information. Defaults to empty string.- subtomo_formatstr, default=””
Format of the subtomogram name output format. See
cryocat.cryomotl.RelionMotl.prepare_particles_data()for more information. Defaults to empty string.- use_original_entriesbool, default=False
Determine whether to use (True) the original entries stored in
self.relion_dfor not (False). If True, all relion entries that are not used in motl (e.g., rlnCtfImage, rlnHelicalTubeID) are fetched from the original relion dataframe. Coordinates, rotations, classes etc. will be updated. Defaults to False.- keep_all_entries: bool, default=False
Used only if use_original_entries is True. If True, it will keep all the entries as they were loaded including coordinates, rotations and classes. Essentially, it should be set to True only if some selection on particles was done and nothing changed. Defaults to False.
- versionfloat, optional
Specify the version and thereby the format of the DataFrame. If not provided the value from
self.versionwill be used. Defaults to None.- add_object_idbool, default=False
Whether to add “object_id” from
self.dfto the DataFrame. If True, the column will be named “ccObjectName”. This is particularly useful for exporting fields mapped during loading, such as “rlnHelicalTubeID”. Defaults to False.- add_subunit_idbool, default=False
Whether to add “subunit_id” from
self.dfto the DataFrame. If True, the column will be named “ccSubunitName”. Defaults to False.- binningint, optional
Binning that should be used for conversion in case of Relion v. 4.x. If not provided the value from
self.binningwill be used. Defaults to None.- pixel_sizefloat, optional
The pixel size of the data. If not provided, the pixel size of the object instance (
self.pixel_size) will be used. Defaults to None.- adapt_object_attrbool, default=False
Store the created DataFrame to
self.relion_dfattribute of the object. Defaults to False.
- Returns:
- pandas.DataFrame
A dataframe in Relion format.
See also
cryocat.cryomotl.RelionMotl.prepare_particles_data()Provides more info tomo_format and subtomo_format.
- default_version = 3.1#
- static get_version_from_file(frames, specifiers)#
Determines the version of Relion that was used to generate a starfile.
- Parameters:
- frameslist
List of DataFrames loaded from a starfile. The length corresponds to the length of the specifiers list.
- specifierslist
List of data specifiers (
strtype) loaded from a starfile. The length corresponds to the length of the frames list.
- Returns:
- float
A version number.
- static get_version_specific_names(version)#
The function returns the version-specific names of columns in Relion DataFrame.
- Parameters:
- versionfloat
The version number.
- Returns:
- tomo_id_namestr
The name for the tomogram ID (“rlnMicrographName” for Relion 3.1 and lower, “rlnTomoName” for Relion 4.0 and higher).
- subtomo_id_namestr
The name for the subtomogram ID (“rlnImageName” for Relion 3.1 and lower, “rlnTomoParticleName” for Relion 4.0 and higher).
- shifts_id_nameslist
A list of names (type
str) for the shift coordinates(“rlnOriginX” for Relion 3.0 and lower, “rlnOriginXAngst” for Relion 3.1 and higher).- data_specstr
The name for the particle list specification (“data” for Relion 3.0 and lower, “data_particles” for Relion 3.1 and higher).
- parse_subtomo_id(relion_df, subtomo_format='')#
The function parses the subtomogram id from a Relion starfile. The function takes in a pandas.DataFrame in relion format and looks for the
rlnImageName(for Relion 3.1 and lower) column or for therlnTomoParticleName(for Relion 4.0 and higher) column and tries to parse the subtomogram id for each particle. It checks whether the subtomogram indices are unique and if not, it renumbers thesubtomo_idto a sequence from 1 to length of the particle list and stores the original value ingeom3.- Parameters:
- relion_dfpandas.DataFrame
The DataFrame in Relion format containing the subtomogram numbers.
- subtomo_formatstr, default=””
Custom pattern to extract the subtomogram ID (e.g., “$yyyy”). If empty, robust automated parsing is used. See
cryocat.cryomotl.RelionMotl.prepare_particles_data()for detailed syntax and examples.
- Returns:
- None
Warning
Due to lack of format in relion starfiles it is possible that this function will fail. Currently, following formats are expected:
Relion 3.1 and lower for “rlnImageName”: second number in the last entry (/path/tomoID_subtomoID_pixelSize.mrc)
Relion 4.0 and higher for “rlnTomoParticleName”: the only number in the last entry (TS_tomoID/subtomoID)
Relion 5.0 and higher for “rlnTomoParticleName”: the only number in the last entry (TS_tomoID/subtomoID)
See also
cryocat.cryomotl.RelionMotl.prepare_particles_data()Provides comprehensive examples and explains the syntax for
tomo_formatandsubtomo_format.
Notes
The function modifies the
subtomo_idcolumn ofself.dfto store the subtomogram indices. In case they are not uniqe it also modifiesgeom3columns ofself.df.TODO: Add custom format specifier.
- parse_tomo_id(relion_df, tomo_format='')#
The function parses the tomogram id from a Relion starfile. The function takes in a pandas.DataFrame in relion format and looks for the
rlnMicrographName(for Relion 3.1 and lower) column or for therlnTomoName(for Relion 4.0 and higher) column and tries to parse the tomogram id for each particle. If the column is not present it tries to parse the tomo id from the subtomogram path (rlnImageNamefor relion 3.1 and lower,rlnTomoNamefor Relion 4.0 and higher).- Parameters:
- relion_dfpandas.DataFrame
The DataFrame in Relion format containing the tilt-series or micrographs.
- tomo_formatstr, default=””
Custom pattern to extract the tomogram ID (e.g., “$xxxx”). If empty, robust automated parsing is used. See
cryocat.cryomotl.RelionMotl.prepare_particles_data()for detailed syntax and examples.
Warning
Due to lack of format in relion starfiles it is possible that this function will fail. Currently, following formats are expected:
Relion 5.1, 3.1 and lower for “rlnMicrographName”: first number in the last entry (/path/tomoID_pixelSize.mrc)
Relion 5.1, 3.1 and lower for “rlnImageName”: first number in the last entry (/path/tomoID_subtomoID_pixelSize.mrc)
Relion 4.0 for “rlnTomoName”: first number in the last entry (TS_tomoID)
Relion 4.0 for “rlnTomoParticleName”: first number in the first entry (TS_tomoID/subtomoID)
See also
cryocat.cryomotl.RelionMotl.prepare_particles_data()Provides comprehensive examples and explains the syntax for
tomo_formatandsubtomo_format.
Notes
TODO: Add custom format specifier.
- prepare_optics_data(use_original_entries=True, optics_data=None, version=None, subtomo_size=None, pixel_size=None, binning=None)#
The function prepares the optics data for relion DataFrame. It takes in a dictionary or starfile path as an argument, and returns a pandas DataFrame containing the optics information in version specific format.
- Parameters:
- use_original_entriesbool, default=True
Whether to use the
self.optics_df(True) as source or not. If set to True, the optics_data as well as version will be ignored. Defaults to True.- optics_datastr, optional
The optics data specified either as a path to the starfile (it can also contain the particle list), dictionary or as DataFrame. It is used only if “use_original_entries” is set to False. Defaults to None.
- versionfloat, optional
Relion version to be used for the DataFrame. It is used only if use_original_entries is set to False and the “optics_data” is a path to starfile. If not set,
self.versionwill be used instead. Defaults to None.- pixel_sizefloat, optional
The pixel size of the data. If not provided, the pixel size of the object instance (
self.pixel_size) will be used. Defaults to None.- subtomo_sizeint, optional
The size of the subtomograms. If not provided, it will be set to “NaN”. Defaults to None.
- Returns
- ——-
- pandas.DataFrame
DataFrame with the optics data.
- Raises:
- UserInputError
If
optics_datais not str nor pandas.DataFrame.- Warning
If
optics_datais not specified andself.optics_dfis empty.
- prepare_particles_data(tomo_format='', subtomo_format='', version=None, pixel_size=None)#
The function creates a DataFrame that contains the information on particles in Relion format. The function takes in the version of Relion to be used and formats describining how the tomogram/tilt-series and subtomogram names should be assembled.
- Parameters:
- tomo_formatstr, default=””
Format specifying the tomogram/tilt-series name by containing sequence of “x” introduced by “$” character. The longest sequence is evaluated as the position of the tomo_id and replaced with corresponding tomo_id. The number of x letters of the longest sequence determines number of digits to pad with zero. For example, for tomo_id 5 will following format “/path/to/tomo/$xxxx.rec” result in “/path/to/tomo/0005.rec”. The sequence can be present multiple times, sequences of “x” shorter than the longest one will be kept intact: for tomo_id 5 will “/path/to/tomo/$xxxx/$xxxx_$xx.mrc result in “/path/to/tomo/0005/0005_$xx.mrc”. Defaults to empty string, in which case the tomo_id will be used without any zero padding.
- subtomo_formatstr, default=””
Format specifying the subtomogram name by containing sequence of “y” introduced by “$” character. The longest sequence is evaluated as the position of the subtomo_id and replaced with corresponding subtomo_id. The number of “y” letters of the longest sequence determines number of digits to pad with zero. For example, for subtomo_id 65 with following format “/path/to/subtomograms/$yyy.mrc” will result in /path/to/subtomograms/065.mrc”. The sequence can be present multiple times, sequences of “y” shorter than the longest one will be kept intact: for subtomo_id 65 will “/path/to/subtomograms/$yy_$yyy.mrc” result in “/path/to/subtomograms/$yy_065.mrc”. The subtomo_format can also contain sequence of “x” letters introduced by “$” in which case these are replaced by tomo_id in the same way as for tomo_format. For example, for tomo_id 5 and subtomogram_id 65 the following “/path/to/subtomograms/$xxxx/$xxxx_$yyy.mrc” will result in “/path/to/subtomograms/0005/0005_065.mrc”. Defaults to empty string, in which case the subtomo_id will be used without any zero padding.
- versionfloat, optional
Relion version to be used for the DataFrame. Defaults to None, in which case
self.versionis used.- pixel_sizefloat, optional
The pixel size of the data. If not provided, the pixel size of the object instance (
self.pixel_size) will be used. Defaults to None.
- Returns:
- pandas.DataFrame
A DataFrame with particle list in Relion format.
- Raises:
- UserInputError
In case the format does not contain valid sequence.
Examples
>>> rln_motl = cryomotl.RelionMotl() >>> rln_motl.fill({"tomo_id": [2], "subtomo_id":[65]})
>>> rln_df = rln_motl.prepare_particles_data(tomo_format="/path/to/$xxxx.rec", ... subtomo_format="/path/to/$xxxx/$xxxx_$yy_2.6A.mrc", version=3.1) >>> print(rln_df["rlnMicrographName"].values[0]) >>> print(rln_df["rlnImageName"].values[0]) /path/to/0002.rec /path/to/0002/0002_65_2.6A.mrc
>>> rln_df = rln_motl.prepare_particles_data(tomo_format="/path/to/$xxxx", ... subtomo_format="/path/to/$xxxx/$xxxx_$yy_2.6A", version=4.0) >>> print(rln_df["rlnTomoName"].values[0]) >>> print(rln_df["rlnTomoParticleName"].values[0]) /path/to/0002 /path/to/0002/0002_65_2.6A
>>> rln_df = rln_motl.prepare_particles_data(tomo_format="/path/to/$xx.rec", ... subtomo_format="/path/to/xxxx/xxxx_$yy_2.6A.mrc", version=3.1) >>> print(rln_df["rlnMicrographName"].values[0]) >>> print(rln_df["rlnImageName"].values[0]) /path/to/02.rec /path/to/xxxx/xxxx_65_2.6A.mrc
>>> rln_df = rln_motl.prepare_particles_data(tomo_format="", ... subtomo_format="/path/to/$xxx/$yy_2.6A.mrc", version=3.1) >>> print(rln_df["rlnMicrographName"].values[0]) >>> print(rln_df["rlnImageName"].values[0]) 2 /path/to/002/65_2.6A.mrc
>>> rln_df = rln_motl.prepare_particles_data(tomo_format="", ... subtomo_format="/path/to/$xxx/yy_2.6A.mrc", version=3.1) >>> print(rln_df["rlnMicrographName"].values[0]) >>> print(rln_df["rlnImageName"].values[0]) ValueError: The format /path/to/$xxx/yy_2.6A.mrc does not contain any sequence of \$ followed by y.
- read_in(input_path)#
Reads in a starfile and returns the particle list, version of the starfile and optics data if present.
- Parameters:
- input_pathstr
The path to the starfile.
- Returns:
- framespandas.DataFrame
Pandas.DataFrame containing the particle list in relion format.
- versionfloat
The version extracted from the starfile. See meth:
cryocat.cryomotl.RelionMotl.get_version_from_filefor more info.- optics_dfpandas.DataFrame or None
Pandas.DataFrame containing optics if available, otherwise None.
- set_pixel_size()#
Sets the pixel size of the object (self.pixel_size). The function first checks if the pixel size has already been set, and if it has not, then it will try to get the pixel size from either the self.relion_df or self.optics_data dataframes. If neither of these are available, then it is set to 1.0.
- Parameters:
- None
- Returns:
- None
Notes
Pixel size is important to correctly compute shifts for Relion version > 3.1 and also for correctly rescaling cooridantes for Relion version > 4.0.
- set_version(input_df, version=None)#
Sets the class attribute version, in case it has not been set already. The function takes in a pandas.DataFrame and an optional version number as arguments. If no version number is provided, the function will attempt to determine which Relion version was used by checking for specific column names in the DataFrame. If it cannot find any of these columns, it will default to version 3.1.
- Parameters:
- input_dfpandas.DataFrame
DataFrame in Relion format that is used for version determination, unless the class attribute was already set or version argument is not none.
- versionfloat, optional
Set the version of the data unless it was already set. Defaults to None.
- Returns:
- None
- set_version_specific_names()#
Sets version specific names for the current object.
- Parameters:
- None
- Returns:
- None
Notes
This function sets the following attributes of the current object: - “tomo_id_name”: The name of the tomogram ID (“rlnMicrographName” for Relion 3.1 and lower, “rlnTomoName” for Relion 4.0 and higher). - “subtomo_id_name”: The name of the subtomogram ID (“rlnImageName” for Relion 3.1 and lower, “rlnTomoParticleName” for Relion 4.0 and higher). - “shifts_id_names”: The names of the shift IDs (“rlnOriginX” for Relion 3.0 and lower, “rlnOriginXAngst” for Relion 3.1 and higher). - “data_spec”: The particle list specification (“data” for Relion 3.0 and lower, “data_particles” for Relion 3.1 and higher).
- write_out(output_path, write_optics=True, tomo_format='', subtomo_format='', use_original_entries=False, keep_all_entries=False, version=None, add_object_id=False, add_subunit_id=False, binning=None, pixel_size=None, optics_data=None, subtomo_size=None)#
This function converts
self.dfDataFrame to a DataFrame in Relion format and writes it out as a starfile.- Parameters:
- ouput_pathstr
The output path to the starfile to be written out.
- write_opticsbool, default=True
Whether to include optics data in the starfile or not. Defaults to True.
- tomo_formatstr, default=””
Format of the tomo name output format. See
cryocat.cryomotl.RelionMotl.prepare_particles_data()for more information. Defaults to empty string.- subtomo_formatstr, default=””
Format of the subtomogram name output format. See
cryocat.cryomotl.RelionMotl.prepare_particles_data()for more information. Defaults to empty string.- use_original_entriesbool, default=False
Determine whether to use (True) the original entries stored in
self.relion_dfor not (False). If True, all relion entries that are not used in motl (e.g., rlnCtfImage, rlnHelicalTubeID) are fetched from the original relion dataframe. Coordinates, rotations, classes etc. will be updated. Defaults to False.- keep_all_entries: bool, default=False
Used only if use_original_entries is True. If True, it will keep all the entries as they were loaded including coordinates, rotations and classes. Essentially, it should be set to True only if some selection on particles was done and nothing changed. Defaults to False.
- versionfloat, optional
Specify the version and thereby the format of the DataFrame. If not provided the value from
self.versionwill be used. Defaults to None.- add_object_idbool, default=False
Whether to add “object_id” from
self.dfto the DataFrame. If True, the column will be named “ccObjectName”. This is particularly useful for exporting fields mapped during loading, such as “rlnHelicalTubeID”. Defaults to False.- add_subunit_idbool, default=False
Whether to add “subunit_id” from
self.dfto the DataFrame. If True, the column will be named “ccSubunitName”. Defaults to False.- binningint, optional
Binning that should be used for conversion in case of Relion v. 4.x. If not provided the value from
self.binningwill be used. Defaults to None.- pixel_sizefloat, optional
The pixel size of the data. If not provided, the pixel size of the object instance (
self.pixel_size) will be used. Defaults to None.- optics_datastr, optional
A DataFrame or a dictionary containing optics data or a path to the starfile that should be used to fetch the optics from. See
cryocat.cryomotl.RelionMotl.prepare_optics_data()for more details. Used only ifwrite_opticsis True. If it is None andwrite_opticsis True, then the attributeself.optics_dfwill be used. Defaults to None.- subtomo_sizeint, optional
The size of the subtomograms. If not provided, it will be set to “NaN”. Defaults to None.
- Returns:
- None
See also
cryocat.cryomotl.RelionMotl.prepare_particles_data()Provides more information tomo_format and subtomo_format.
cryocat.cryomotl.RelionMotl.prepare_optics_data()Provide more information on optics_data inputs.