RelionMotl#

class cryocat.core.cryomotl.RelionMotl(input_motl=None, version=None, pixel_size=None, binning=None, optics_data=None, tomo_format='', subtomo_format='')#

Bases: Motl

adapt_original_entries()#

The function updates DataFrame stored in self.relion_df based on the values in self.df. In case the number of particles changed (i.e., self.df has less particles than self.relion_df), the new relion_df is shortened based on ccSubtomoID from self.relion_df and subtomo_id from self.df. The shifts are set to zeros and ccSubtomoID is removed.

Parameters:
None
Returns:
pandas.DataFrame

The updated version of self.relion_df.

Notes

The size and values of self.relion_df are not changed.

columns_v3_0 = ['rlnMicrographName', 'rlnCoordinateX', 'rlnCoordinateY', 'rlnCoordinateZ', 'rlnAngleRot', 'rlnAngleTilt', 'rlnAnglePsi', 'rlnImageName', 'rlnPixelSize', 'rlnRandomSubset', 'rlnOriginX', 'rlnOriginY', 'rlnOriginZ', 'rlnClassNumber']#
columns_v3_1 = ['rlnMicrographName', 'rlnCoordinateX', 'rlnCoordinateY', 'rlnCoordinateZ', 'rlnAngleRot', 'rlnAngleTilt', 'rlnAnglePsi', 'rlnImageName', 'rlnPixelSize', 'rlnOpticsGroup', 'rlnGroupNumber', 'rlnOriginXAngst', 'rlnOriginYAngst', 'rlnOriginZAngst', 'rlnClassNumber', 'rlnRandomSubset']#
columns_v4 = ['rlnCoordinateX', 'rlnCoordinateY', 'rlnCoordinateZ', 'rlnAngleRot', 'rlnAngleTilt', 'rlnAnglePsi', 'rlnTomoName', 'rlnTomoParticleName', 'rlnRandomSubset', 'rlnOpticsGroup', 'rlnOriginXAngst', 'rlnOriginYAngst', 'rlnOriginZAngst', 'rlnGroupNumber', 'rlnClassNumber']#
columns_v5_1 = ['rlnCoordinateX', 'rlnCoordinateY', 'rlnCoordinateZ', 'rlnAngleRot', 'rlnAngleTilt', 'rlnAnglePsi', 'rlnMicrographName', 'rlnImageName', 'rlnPixelSize', 'rlnOpticsGroup', 'rlnGroupNumber', 'rlnOriginXAngst', 'rlnOriginYAngst', 'rlnOriginZAngst', 'rlnClassNumber', 'rlnRandomSubset']#
convert_angles_from_relion(relion_df)#

The function converts angles from the Relion format, which corresponds to ZYZ Euler convention, to the zxz Euler convention which is used within cryoCAT.

Parameters:
relion_dfpandas.DataFrame

The input DataFrame ir Relion format containing the angles in ZYZ convention.

Returns:
None
Raises:
Warning

If no rotations are specified in the relion starfile.

ValueError

If only some rotations are specified in the relion starfile.

Notes

The function modifies the “phi”, “psi”, and “theta” columns of “self.df” to store the converted angles.

convert_angles_to_relion(relion_df)#

Converts angles from cryoCAT convention (zxz) to the convention used in Relion (ZYZ).

Parameters:
relion_dfpandas.DataFrame

The DataFrame containing the angles.

Returns:
pandas.DataFrame

DataFrame in Relion format with converted angles.

Notes

TODO: Check why ZXZ is used instead of zxz.

convert_shifts(relion_df)#

Converts shifts from Relion format to emmotl format and stores them in self.df.

Parameters:
relion_dfpandas.DataFrame

DataFrame containing shifts in Relion format.

Returns:
None

Warning

Shifts in Relion 3.1 and higher are stored in Angstroms, not pixels/voxels. Correct pixel size is thus necessary for correct conversion. The pixel size should be set as the class attribute before calling this function.

Notes

Relion stores the shifts of the particle while in cryoCAT the shifts represent shifts of a reference.

convert_to_motl(relion_df, version=None, optics_df=None, tomo_format='', subtomo_format='')#

The function converts a DataFrame in relion format into a motl DataFrame.

Parameters:
relion_dfpandas.DataFrame

DataFrame in relion format.

versionfloat, optional

Version of Relion DataFrame. Defaults to None.

optics_dfpandas.DataFrame, optional

DataFrame with optics data. Defaults to None

subtomo_formatstring
tomo_format: string
Returns:
None

Notes

This method modifies the df attribute of the object.

create_final_output(relion_df, optics_df=None)#

Creates the final output frames and specifiers based on the given input dataframes.

Parameters:
relion_dfpandas.DataFrame

The dataframe containing the relion data.

optics_dfpandas.DataFrame, optional

The dataframe containing the optics data. Defaults to None.

Returns:
frameslist

List of pandas.DataFrame containing all data (e.g. particle list, optics group)

spefifierslist

List of str containing the specifiers, i.e., the descriptions for the frames.

Notes

If optics_df is None, the frames and specifiers will be based on relion_df and self.data_spec.

If optics_df is not None, the frames and specifiers will be based on optics_df, relion_df, “data_optics”, and self.data_spec.

If self.version is less than 3.1, the frames and specifiers will be based on the concatenated dataframe of optics_df and relion_df (with duplicates removed) and self.data_spec.

create_optics_group_v3_1(pixel_size=None, subtomo_size=None)#

Creates an optics group with default parameters corresponding to Relion v. 3.1.

Parameters:
pixel_sizefloat, optional

The pixel size of the data. If not provided, the pixel size of the object instance (self.pixel_size) will be used. Defaults to None.

subtomo_sizeint, optional

The size of the subtomograms. If not provided, it will be set to “NaN”. Defaults to None.

Returns:
pandas.DataFrame

A DataFrame containing the default optics parameters.

create_optics_group_v4(pixel_size=None, subtomo_size=None, binning=None)#

Creates an optics group with default parameters corresponding to Relion v. 4.x.

Parameters:
pixel_sizefloat, optional

The pixel size of the data. If not provided, the pixel size of the object instance (self.pixel_size) will be used. Defaults to None.

subtomo_sizeint, optional

The size of the subtomograms. If not provided, it will be set to “NaN”. Defaults to None.

binningint, optional

The binning of the subtomograms. If not provided, the binning of the object instance (self.binning) will be used. Defaults to None.

Returns:
pandas.DataFrame

A DataFrame containing the default optics parameters.

create_particles_data(version)#

Creates an empty DataFrame in Relion version-specific format with the size corresponding to self.df.

Parameters:
versionfloat

The version of the Relion to be used. Valid values are 3.0, 3.1, and any other value for version 4 or higher.

Returns:
pandas.DataFrame

The empty DataFrame with columns corresponding to the specified Relion version.

create_relion_df(tomo_format='', subtomo_format='', use_original_entries=False, keep_all_entries=False, version=None, add_object_id=False, add_subunit_id=False, binning=None, pixel_size=None, adapt_object_attr=False)#

This function creates takes the self.df attribute and creates a DataFrame that is Relion format.

Parameters:
tomo_formatstr, default=””

Format of the tomo name output format. See cryocat.cryomotl.RelionMotl.prepare_particles_data() for more information. Defaults to empty string.

subtomo_formatstr, default=””

Format of the subtomogram name output format. See cryocat.cryomotl.RelionMotl.prepare_particles_data() for more information. Defaults to empty string.

use_original_entriesbool, default=False

Determine whether to use (True) the original entries stored in self.relion_df or not (False). If True, all relion entries that are not used in motl (e.g., rlnCtfImage, rlnHelicalTubeID) are fetched from the original relion dataframe. Coordinates, rotations, classes etc. will be updated. Defaults to False.

keep_all_entries: bool, default=False

Used only if use_original_entries is True. If True, it will keep all the entries as they were loaded including coordinates, rotations and classes. Essentially, it should be set to True only if some selection on particles was done and nothing changed. Defaults to False.

versionfloat, optional

Specify the version and thereby the format of the DataFrame. If not provided the value from self.version will be used. Defaults to None.

add_object_idbool, default=False

Whether to add “object_id” from self.df to the DataFrame. If True, the column will be named “ccObjectName”. This is particularly useful for exporting fields mapped during loading, such as “rlnHelicalTubeID”. Defaults to False.

add_subunit_idbool, default=False

Whether to add “subunit_id” from self.df to the DataFrame. If True, the column will be named “ccSubunitName”. Defaults to False.

binningint, optional

Binning that should be used for conversion in case of Relion v. 4.x. If not provided the value from self.binning will be used. Defaults to None.

pixel_sizefloat, optional

The pixel size of the data. If not provided, the pixel size of the object instance (self.pixel_size) will be used. Defaults to None.

adapt_object_attrbool, default=False

Store the created DataFrame to self.relion_df attribute of the object. Defaults to False.

Returns:
pandas.DataFrame

A dataframe in Relion format.

See also

cryocat.cryomotl.RelionMotl.prepare_particles_data()

Provides more info tomo_format and subtomo_format.

default_version = 3.1#
static get_version_from_file(frames, specifiers)#

Determines the version of Relion that was used to generate a starfile.

Parameters:
frameslist

List of DataFrames loaded from a starfile. The length corresponds to the length of the specifiers list.

specifierslist

List of data specifiers (str type) loaded from a starfile. The length corresponds to the length of the frames list.

Returns:
float

A version number.

static get_version_specific_names(version)#

The function returns the version-specific names of columns in Relion DataFrame.

Parameters:
versionfloat

The version number.

Returns:
tomo_id_namestr

The name for the tomogram ID (“rlnMicrographName” for Relion 3.1 and lower, “rlnTomoName” for Relion 4.0 and higher).

subtomo_id_namestr

The name for the subtomogram ID (“rlnImageName” for Relion 3.1 and lower, “rlnTomoParticleName” for Relion 4.0 and higher).

shifts_id_nameslist

A list of names (type str) for the shift coordinates(“rlnOriginX” for Relion 3.0 and lower, “rlnOriginXAngst” for Relion 3.1 and higher).

data_specstr

The name for the particle list specification (“data” for Relion 3.0 and lower, “data_particles” for Relion 3.1 and higher).

parse_subtomo_id(relion_df, subtomo_format='')#

The function parses the subtomogram id from a Relion starfile. The function takes in a pandas.DataFrame in relion format and looks for the rlnImageName (for Relion 3.1 and lower) column or for the rlnTomoParticleName (for Relion 4.0 and higher) column and tries to parse the subtomogram id for each particle. It checks whether the subtomogram indices are unique and if not, it renumbers the subtomo_id to a sequence from 1 to length of the particle list and stores the original value in geom3.

Parameters:
relion_dfpandas.DataFrame

The DataFrame in Relion format containing the subtomogram numbers.

subtomo_formatstr, default=””

Custom pattern to extract the subtomogram ID (e.g., “$yyyy”). If empty, robust automated parsing is used. See cryocat.cryomotl.RelionMotl.prepare_particles_data() for detailed syntax and examples.

Returns:
None

Warning

Due to lack of format in relion starfiles it is possible that this function will fail. Currently, following formats are expected:

  • Relion 3.1 and lower for “rlnImageName”: second number in the last entry (/path/tomoID_subtomoID_pixelSize.mrc)

  • Relion 4.0 and higher for “rlnTomoParticleName”: the only number in the last entry (TS_tomoID/subtomoID)

  • Relion 5.0 and higher for “rlnTomoParticleName”: the only number in the last entry (TS_tomoID/subtomoID)

See also

cryocat.cryomotl.RelionMotl.prepare_particles_data()

Provides comprehensive examples and explains the syntax for tomo_format and subtomo_format.

Notes

The function modifies the subtomo_id column of self.df to store the subtomogram indices. In case they are not uniqe it also modifies geom3 columns of self.df.

TODO: Add custom format specifier.

parse_tomo_id(relion_df, tomo_format='')#

The function parses the tomogram id from a Relion starfile. The function takes in a pandas.DataFrame in relion format and looks for the rlnMicrographName (for Relion 3.1 and lower) column or for the rlnTomoName (for Relion 4.0 and higher) column and tries to parse the tomogram id for each particle. If the column is not present it tries to parse the tomo id from the subtomogram path (rlnImageName for relion 3.1 and lower, rlnTomoName for Relion 4.0 and higher).

Parameters:
relion_dfpandas.DataFrame

The DataFrame in Relion format containing the tilt-series or micrographs.

tomo_formatstr, default=””

Custom pattern to extract the tomogram ID (e.g., “$xxxx”). If empty, robust automated parsing is used. See cryocat.cryomotl.RelionMotl.prepare_particles_data() for detailed syntax and examples.

Warning

Due to lack of format in relion starfiles it is possible that this function will fail. Currently, following formats are expected:

  • Relion 5.1, 3.1 and lower for “rlnMicrographName”: first number in the last entry (/path/tomoID_pixelSize.mrc)

  • Relion 5.1, 3.1 and lower for “rlnImageName”: first number in the last entry (/path/tomoID_subtomoID_pixelSize.mrc)

  • Relion 4.0 for “rlnTomoName”: first number in the last entry (TS_tomoID)

  • Relion 4.0 for “rlnTomoParticleName”: first number in the first entry (TS_tomoID/subtomoID)

See also

cryocat.cryomotl.RelionMotl.prepare_particles_data()

Provides comprehensive examples and explains the syntax for tomo_format and subtomo_format.

Notes

TODO: Add custom format specifier.

prepare_optics_data(use_original_entries=True, optics_data=None, version=None, subtomo_size=None, pixel_size=None, binning=None)#

The function prepares the optics data for relion DataFrame. It takes in a dictionary or starfile path as an argument, and returns a pandas DataFrame containing the optics information in version specific format.

Parameters:
use_original_entriesbool, default=True

Whether to use the self.optics_df (True) as source or not. If set to True, the optics_data as well as version will be ignored. Defaults to True.

optics_datastr, optional

The optics data specified either as a path to the starfile (it can also contain the particle list), dictionary or as DataFrame. It is used only if “use_original_entries” is set to False. Defaults to None.

versionfloat, optional

Relion version to be used for the DataFrame. It is used only if use_original_entries is set to False and the “optics_data” is a path to starfile. If not set, self.version will be used instead. Defaults to None.

pixel_sizefloat, optional

The pixel size of the data. If not provided, the pixel size of the object instance (self.pixel_size) will be used. Defaults to None.

subtomo_sizeint, optional

The size of the subtomograms. If not provided, it will be set to “NaN”. Defaults to None.

Returns
——-
pandas.DataFrame

DataFrame with the optics data.

Raises:
UserInputError

If optics_data is not str nor pandas.DataFrame.

Warning

If optics_data is not specified and self.optics_df is empty.

prepare_particles_data(tomo_format='', subtomo_format='', version=None, pixel_size=None)#

The function creates a DataFrame that contains the information on particles in Relion format. The function takes in the version of Relion to be used and formats describining how the tomogram/tilt-series and subtomogram names should be assembled.

Parameters:
tomo_formatstr, default=””

Format specifying the tomogram/tilt-series name by containing sequence of “x” introduced by “$” character. The longest sequence is evaluated as the position of the tomo_id and replaced with corresponding tomo_id. The number of x letters of the longest sequence determines number of digits to pad with zero. For example, for tomo_id 5 will following format “/path/to/tomo/$xxxx.rec” result in “/path/to/tomo/0005.rec”. The sequence can be present multiple times, sequences of “x” shorter than the longest one will be kept intact: for tomo_id 5 will “/path/to/tomo/$xxxx/$xxxx_$xx.mrc result in “/path/to/tomo/0005/0005_$xx.mrc”. Defaults to empty string, in which case the tomo_id will be used without any zero padding.

subtomo_formatstr, default=””

Format specifying the subtomogram name by containing sequence of “y” introduced by “$” character. The longest sequence is evaluated as the position of the subtomo_id and replaced with corresponding subtomo_id. The number of “y” letters of the longest sequence determines number of digits to pad with zero. For example, for subtomo_id 65 with following format “/path/to/subtomograms/$yyy.mrc” will result in /path/to/subtomograms/065.mrc”. The sequence can be present multiple times, sequences of “y” shorter than the longest one will be kept intact: for subtomo_id 65 will “/path/to/subtomograms/$yy_$yyy.mrc” result in “/path/to/subtomograms/$yy_065.mrc”. The subtomo_format can also contain sequence of “x” letters introduced by “$” in which case these are replaced by tomo_id in the same way as for tomo_format. For example, for tomo_id 5 and subtomogram_id 65 the following “/path/to/subtomograms/$xxxx/$xxxx_$yyy.mrc” will result in “/path/to/subtomograms/0005/0005_065.mrc”. Defaults to empty string, in which case the subtomo_id will be used without any zero padding.

versionfloat, optional

Relion version to be used for the DataFrame. Defaults to None, in which case self.version is used.

pixel_sizefloat, optional

The pixel size of the data. If not provided, the pixel size of the object instance (self.pixel_size) will be used. Defaults to None.

Returns:
pandas.DataFrame

A DataFrame with particle list in Relion format.

Raises:
UserInputError

In case the format does not contain valid sequence.

Examples

>>> rln_motl = cryomotl.RelionMotl()
>>> rln_motl.fill({"tomo_id": [2], "subtomo_id":[65]})
>>> rln_df = rln_motl.prepare_particles_data(tomo_format="/path/to/$xxxx.rec",
... subtomo_format="/path/to/$xxxx/$xxxx_$yy_2.6A.mrc", version=3.1)
>>> print(rln_df["rlnMicrographName"].values[0])
>>> print(rln_df["rlnImageName"].values[0])
/path/to/0002.rec
/path/to/0002/0002_65_2.6A.mrc
>>> rln_df = rln_motl.prepare_particles_data(tomo_format="/path/to/$xxxx",
... subtomo_format="/path/to/$xxxx/$xxxx_$yy_2.6A", version=4.0)
>>> print(rln_df["rlnTomoName"].values[0])
>>> print(rln_df["rlnTomoParticleName"].values[0])
/path/to/0002
/path/to/0002/0002_65_2.6A
>>> rln_df = rln_motl.prepare_particles_data(tomo_format="/path/to/$xx.rec",
... subtomo_format="/path/to/xxxx/xxxx_$yy_2.6A.mrc", version=3.1)
>>> print(rln_df["rlnMicrographName"].values[0])
>>> print(rln_df["rlnImageName"].values[0])
/path/to/02.rec
/path/to/xxxx/xxxx_65_2.6A.mrc
>>> rln_df = rln_motl.prepare_particles_data(tomo_format="",
... subtomo_format="/path/to/$xxx/$yy_2.6A.mrc", version=3.1)
>>> print(rln_df["rlnMicrographName"].values[0])
>>> print(rln_df["rlnImageName"].values[0])
2
/path/to/002/65_2.6A.mrc
>>> rln_df = rln_motl.prepare_particles_data(tomo_format="",
... subtomo_format="/path/to/$xxx/yy_2.6A.mrc", version=3.1)
>>> print(rln_df["rlnMicrographName"].values[0])
>>> print(rln_df["rlnImageName"].values[0])
ValueError: The format /path/to/$xxx/yy_2.6A.mrc does not contain any sequence of \$ followed by y.
read_in(input_path)#

Reads in a starfile and returns the particle list, version of the starfile and optics data if present.

Parameters:
input_pathstr

The path to the starfile.

Returns:
framespandas.DataFrame

Pandas.DataFrame containing the particle list in relion format.

versionfloat

The version extracted from the starfile. See meth:cryocat.cryomotl.RelionMotl.get_version_from_file for more info.

optics_dfpandas.DataFrame or None

Pandas.DataFrame containing optics if available, otherwise None.

set_pixel_size()#

Sets the pixel size of the object (self.pixel_size). The function first checks if the pixel size has already been set, and if it has not, then it will try to get the pixel size from either the self.relion_df or self.optics_data dataframes. If neither of these are available, then it is set to 1.0.

Parameters:
None
Returns:
None

Notes

Pixel size is important to correctly compute shifts for Relion version > 3.1 and also for correctly rescaling cooridantes for Relion version > 4.0.

set_version(input_df, version=None)#

Sets the class attribute version, in case it has not been set already. The function takes in a pandas.DataFrame and an optional version number as arguments. If no version number is provided, the function will attempt to determine which Relion version was used by checking for specific column names in the DataFrame. If it cannot find any of these columns, it will default to version 3.1.

Parameters:
input_dfpandas.DataFrame

DataFrame in Relion format that is used for version determination, unless the class attribute was already set or version argument is not none.

versionfloat, optional

Set the version of the data unless it was already set. Defaults to None.

Returns:
None
set_version_specific_names()#

Sets version specific names for the current object.

Parameters:
None
Returns:
None

Notes

This function sets the following attributes of the current object: - “tomo_id_name”: The name of the tomogram ID (“rlnMicrographName” for Relion 3.1 and lower, “rlnTomoName” for Relion 4.0 and higher). - “subtomo_id_name”: The name of the subtomogram ID (“rlnImageName” for Relion 3.1 and lower, “rlnTomoParticleName” for Relion 4.0 and higher). - “shifts_id_names”: The names of the shift IDs (“rlnOriginX” for Relion 3.0 and lower, “rlnOriginXAngst” for Relion 3.1 and higher). - “data_spec”: The particle list specification (“data” for Relion 3.0 and lower, “data_particles” for Relion 3.1 and higher).

write_out(output_path, write_optics=True, tomo_format='', subtomo_format='', use_original_entries=False, keep_all_entries=False, version=None, add_object_id=False, add_subunit_id=False, binning=None, pixel_size=None, optics_data=None, subtomo_size=None)#

This function converts self.df DataFrame to a DataFrame in Relion format and writes it out as a starfile.

Parameters:
ouput_pathstr

The output path to the starfile to be written out.

write_opticsbool, default=True

Whether to include optics data in the starfile or not. Defaults to True.

tomo_formatstr, default=””

Format of the tomo name output format. See cryocat.cryomotl.RelionMotl.prepare_particles_data() for more information. Defaults to empty string.

subtomo_formatstr, default=””

Format of the subtomogram name output format. See cryocat.cryomotl.RelionMotl.prepare_particles_data() for more information. Defaults to empty string.

use_original_entriesbool, default=False

Determine whether to use (True) the original entries stored in self.relion_df or not (False). If True, all relion entries that are not used in motl (e.g., rlnCtfImage, rlnHelicalTubeID) are fetched from the original relion dataframe. Coordinates, rotations, classes etc. will be updated. Defaults to False.

keep_all_entries: bool, default=False

Used only if use_original_entries is True. If True, it will keep all the entries as they were loaded including coordinates, rotations and classes. Essentially, it should be set to True only if some selection on particles was done and nothing changed. Defaults to False.

versionfloat, optional

Specify the version and thereby the format of the DataFrame. If not provided the value from self.version will be used. Defaults to None.

add_object_idbool, default=False

Whether to add “object_id” from self.df to the DataFrame. If True, the column will be named “ccObjectName”. This is particularly useful for exporting fields mapped during loading, such as “rlnHelicalTubeID”. Defaults to False.

add_subunit_idbool, default=False

Whether to add “subunit_id” from self.df to the DataFrame. If True, the column will be named “ccSubunitName”. Defaults to False.

binningint, optional

Binning that should be used for conversion in case of Relion v. 4.x. If not provided the value from self.binning will be used. Defaults to None.

pixel_sizefloat, optional

The pixel size of the data. If not provided, the pixel size of the object instance (self.pixel_size) will be used. Defaults to None.

optics_datastr, optional

A DataFrame or a dictionary containing optics data or a path to the starfile that should be used to fetch the optics from. See cryocat.cryomotl.RelionMotl.prepare_optics_data() for more details. Used only if write_optics is True. If it is None and write_optics is True, then the attribute self.optics_df will be used. Defaults to None.

subtomo_sizeint, optional

The size of the subtomograms. If not provided, it will be set to “NaN”. Defaults to None.

Returns:
None

See also

cryocat.cryomotl.RelionMotl.prepare_particles_data()

Provides more information tomo_format and subtomo_format.

cryocat.cryomotl.RelionMotl.prepare_optics_data()

Provide more information on optics_data inputs.