starfileio#

class cryocat.utils.starfileio.Starfile(file_path=None, frames=None, specifiers=None, comments=None)#

Bases: object

Read, write, and manipulate RELION/STOPGAP STAR files.

A STAR file contains one or more data blocks, each identified by a specifier (e.g. data_particles). Each block is parsed into a pandas.DataFrame.

Attributes:

frameslist of pandas.DataFrame: One DataFrame per data block.
specifierslist of str: Data-block specifier strings (e.g. "data_particles").
commentslist of list of str: Comment lines associated with each data block.

static fix_relion5_star(input_path)#

Convert a RELION 5 STAR file with key-value data_general blocks to loop format.

RELION 5 sometimes writes data_general sections as bare _key value pairs rather than as a loop_ table. This function rewrites such sections into the loop-based format expected by the rest of the STAR parser so that the file can be read by read().

If the file already contains a loop_ inside data_general (i.e. it is already in the correct format), the original path is returned unchanged and no temporary file is created.

Parameters:

input_pathstr: Path to the input RELION 5 STAR file.

Returns:

str: Path to the (possibly temporary) fixed file, or input_path if no changes were necessary.

static get_frame_and_comments(file_path, specifier)#

Read a single data block and its comments from a STAR file.

Parameters:

file_pathstr: Path to the .star file.
specifierstr: Specifier of the data block to retrieve (e.g. "data_particles").

Returns:

framepandas.DataFrame: The data block.
commentslist of str: Comments associated with the data block.

Raises:

ValueError: If specifier is not found in the file.

static get_specifier_id(specifiers, specifier_id)#

Return the list index of a specifier string, or None if not found.

Parameters:

specifierslist of str: List of specifier strings from a parsed STAR file.
specifier_idstr: Specifier to search for.

Returns:

int or None: Index of specifier_id in specifiers, or None if absent.

static read(file_path, data_id=None)#

Parse a STAR file into DataFrames, specifiers, and comments.

Columns that contain entirely numeric data are automatically cast to numeric types.

Parameters:

file_pathstr: Path to the .star file to read.
data_idint, optional: If given, return only the data block at this index rather than all blocks. Defaults to None.

Returns:

frameslist of pandas.DataFrame, or pandas.DataFrame: All data blocks (or a single block when data_id is given).
specifierslist of str, or str: Specifier strings (or a single specifier when data_id is given).
commentslist of list of str, or list of str: Comments (or a single comment list when data_id is given).

static remove_lines(file_path, lines_to_remove, output_path=None, data_specifier=None, number_columns=True)#

Remove rows from a data block in a STAR file.

Parameters:

file_pathstr: Path to the input STAR file.
lines_to_removearray-like of int: Integer row indices (0-based) to remove from the target data block.
output_pathstr, optional: If given, the modified STAR file is written to this path. If None, the modified data structures are returned instead. Defaults to None.
data_specifierstr, optional: Specifier of the data block to modify. If None, the first block is used. Defaults to None.
number_columnsbool, default=True: Whether to write column indices in the output file. Defaults to True.

Returns:

tuple or None: (frames, specifiers, comments) when output_path is None, otherwise None.

static write(frames, output_path, specifiers=None, comments=None, number_columns=True, float_precision=6)#

Write data blocks to a STAR file.

Parameters:

frameslist of pandas.DataFrame: Data blocks to write.
output_pathstr: Path of the output .star file.
specifierslist of str, optional: Specifier string for each data block. Defaults to ["data"] * len(frames).
commentslist of list of str or None, optional: Comments to write before each data block. Defaults to no comments.
number_columnsbool, default=True: If True, append #<index> after each column header line. Forced off for blocks whose specifier contains "stopgap". Defaults to True.
float_precisionint, default=6: Number of decimal places used when rounding float values. Defaults to 6.

Raises:

ValueError: If the lengths of frames, specifiers, and comments do not match.

class cryocat.utils.starfileio.Token(token_type, value, location)#

Bases: object

Lexical token used during STAR file parsing.

Each token carries a type (from TokenType), its string value, and the (1-based) line/column location in the original file.

static check(tokens, token_type)#

Return True if the next token in the queue matches token_type.

Parameters:

tokenslist of Token: Token queue (front at index -1).
token_typeTokenType: Expected token type.

Returns:

bool

static check_then_consume(tokens, token_type)#

Consume the next token if it matches token_type, otherwise return None.

Parameters:

tokenslist of Token: Token queue (front at index -1).
token_typeTokenType: Token type to match.

Returns:

Token or None: The consumed token, or None if the type did not match.

static consume(tokens, token_type)#

Remove and return the next token, raising IOError if the type does not match.

Parameters:

tokenslist of Token: Token queue (front at index -1).
token_typeTokenType: Expected token type.

Returns:

Token: The consumed token.

Raises:

IOError: If the queue is empty or the next token has a different type.

static lookahead(tokens, token_type_target, ignores)#

Scan ahead through the queue for token_type_target, skipping ignores.

Parameters:

tokenslist of Token: Token queue (front at index -1).
token_type_targetTokenType: Token type to search for.
ignoreslist of TokenType: Token types that should be skipped during the scan.

Returns:

bool: True if token_type_target is found before any non-ignored token.

static parse_column(tokens)#

Consume a single PROPERTY token (optionally followed by a COMMENT) as a column name.

The leading _ of the PROPERTY value is stripped before returning.

Parameters:

tokenslist of Token: Token queue (front at index -1).

Returns:

str: Column name without the leading _.

static parse_columns(tokens)#

Consume the loop_ keyword and subsequent PROPERTY tokens as column names.

Parameters:

tokenslist of Token: Token queue (front at index -1).

Returns:

commentslist of str: Any comments found before loop_.
columnslist of str: Column names (PROPERTY values with the leading _ stripped).

static parse_newline_or_comments(tokens)#

Consume NEWLINE and COMMENT tokens from the front of the queue.

Parameters:

tokenslist of Token: Token queue (front at index -1).

Returns:

list of str: Comment strings extracted from consumed COMMENT tokens.

static parse_rows(tokens, columns)#

Consume LITERAL tokens as row data and build a DataFrame.

Parameters:

tokenslist of Token: Token queue (front at index -1).
columnslist of str: Column names for the resulting DataFrame.

Returns:

commentslist of str: Any comments found before the row data.
datapandas.DataFrame: Parsed rows as a DataFrame.

static parse_specifier(tokens)#

Consume leading whitespace/comments and then a LITERAL token as a data specifier.

Parameters:

tokenslist of Token: Token queue (front at index -1).

Returns:

commentslist of str: Any comments found before the specifier.
specifierstr: The specifier value.

static tokenize(text)#

Tokenize raw STAR file text into a reversed list of Token objects.

The returned list is ordered so that list.pop() yields tokens in reading order (i.e. the first token in the file is at index -1).

Parameters:

textstr: Raw text content of a STAR file.

Returns:

list of Token: Tokens in reverse reading order.

class cryocat.utils.starfileio.TokenType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)#

Bases: Enum

COMMENT = 2#

LITERAL = 0#

LOOP = 3#

NEWLINE = 1#

PROPERTY = 4#