starfileio#

class cryocat.utils.starfileio.Starfile(file_path=None, frames=None, specifiers=None, comments=None)#

Bases: object

Read, write, and manipulate RELION/STOPGAP STAR files.

A STAR file contains one or more data blocks, each identified by a specifier (e.g. data_particles). Each block is parsed into a pandas.DataFrame.

Attributes:
frameslist of pandas.DataFrame

One DataFrame per data block.

specifierslist of str

Data-block specifier strings (e.g. "data_particles").

commentslist of list of str

Comment lines associated with each data block.

static fix_relion5_star(input_path)#

Convert a RELION 5 STAR file with key-value data_general blocks to loop format.

RELION 5 sometimes writes data_general sections as bare _key value pairs rather than as a loop_ table. This function rewrites such sections into the loop-based format expected by the rest of the STAR parser so that the file can be read by read().

If the file already contains a loop_ inside data_general (i.e. it is already in the correct format), the original path is returned unchanged and no temporary file is created.

Parameters:
input_pathstr

Path to the input RELION 5 STAR file.

Returns:
str

Path to the (possibly temporary) fixed file, or input_path if no changes were necessary.

static get_frame_and_comments(file_path, specifier)#

Read a single data block and its comments from a STAR file.

Parameters:
file_pathstr

Path to the .star file.

specifierstr

Specifier of the data block to retrieve (e.g. "data_particles").

Returns:
framepandas.DataFrame

The data block.

commentslist of str

Comments associated with the data block.

Raises:
ValueError

If specifier is not found in the file.

static get_specifier_id(specifiers, specifier_id)#

Return the list index of a specifier string, or None if not found.

Parameters:
specifierslist of str

List of specifier strings from a parsed STAR file.

specifier_idstr

Specifier to search for.

Returns:
int or None

Index of specifier_id in specifiers, or None if absent.

static read(file_path, data_id=None)#

Parse a STAR file into DataFrames, specifiers, and comments.

Columns that contain entirely numeric data are automatically cast to numeric types.

Parameters:
file_pathstr

Path to the .star file to read.

data_idint, optional

If given, return only the data block at this index rather than all blocks. Defaults to None.

Returns:
frameslist of pandas.DataFrame, or pandas.DataFrame

All data blocks (or a single block when data_id is given).

specifierslist of str, or str

Specifier strings (or a single specifier when data_id is given).

commentslist of list of str, or list of str

Comments (or a single comment list when data_id is given).

static remove_lines(file_path, lines_to_remove, output_path=None, data_specifier=None, number_columns=True)#

Remove rows from a data block in a STAR file.

Parameters:
file_pathstr

Path to the input STAR file.

lines_to_removearray-like of int

Integer row indices (0-based) to remove from the target data block.

output_pathstr, optional

If given, the modified STAR file is written to this path. If None, the modified data structures are returned instead. Defaults to None.

data_specifierstr, optional

Specifier of the data block to modify. If None, the first block is used. Defaults to None.

number_columnsbool, default=True

Whether to write column indices in the output file. Defaults to True.

Returns:
tuple or None

(frames, specifiers, comments) when output_path is None, otherwise None.

static write(frames, output_path, specifiers=None, comments=None, number_columns=True, float_precision=6)#

Write data blocks to a STAR file.

Parameters:
frameslist of pandas.DataFrame

Data blocks to write.

output_pathstr

Path of the output .star file.

specifierslist of str, optional

Specifier string for each data block. Defaults to ["data"] * len(frames).

commentslist of list of str or None, optional

Comments to write before each data block. Defaults to no comments.

number_columnsbool, default=True

If True, append #<index> after each column header line. Forced off for blocks whose specifier contains "stopgap". Defaults to True.

float_precisionint, default=6

Number of decimal places used when rounding float values. Defaults to 6.

Raises:
ValueError

If the lengths of frames, specifiers, and comments do not match.

class cryocat.utils.starfileio.Token(token_type, value, location)#

Bases: object

Lexical token used during STAR file parsing.

Each token carries a type (from TokenType), its string value, and the (1-based) line/column location in the original file.

static check(tokens, token_type)#

Return True if the next token in the queue matches token_type.

Parameters:
tokenslist of Token

Token queue (front at index -1).

token_typeTokenType

Expected token type.

Returns:
bool
static check_then_consume(tokens, token_type)#

Consume the next token if it matches token_type, otherwise return None.

Parameters:
tokenslist of Token

Token queue (front at index -1).

token_typeTokenType

Token type to match.

Returns:
Token or None

The consumed token, or None if the type did not match.

static consume(tokens, token_type)#

Remove and return the next token, raising IOError if the type does not match.

Parameters:
tokenslist of Token

Token queue (front at index -1).

token_typeTokenType

Expected token type.

Returns:
Token

The consumed token.

Raises:
IOError

If the queue is empty or the next token has a different type.

static lookahead(tokens, token_type_target, ignores)#

Scan ahead through the queue for token_type_target, skipping ignores.

Parameters:
tokenslist of Token

Token queue (front at index -1).

token_type_targetTokenType

Token type to search for.

ignoreslist of TokenType

Token types that should be skipped during the scan.

Returns:
bool

True if token_type_target is found before any non-ignored token.

static parse_column(tokens)#

Consume a single PROPERTY token (optionally followed by a COMMENT) as a column name.

The leading _ of the PROPERTY value is stripped before returning.

Parameters:
tokenslist of Token

Token queue (front at index -1).

Returns:
str

Column name without the leading _.

static parse_columns(tokens)#

Consume the loop_ keyword and subsequent PROPERTY tokens as column names.

Parameters:
tokenslist of Token

Token queue (front at index -1).

Returns:
commentslist of str

Any comments found before loop_.

columnslist of str

Column names (PROPERTY values with the leading _ stripped).

static parse_newline_or_comments(tokens)#

Consume NEWLINE and COMMENT tokens from the front of the queue.

Parameters:
tokenslist of Token

Token queue (front at index -1).

Returns:
list of str

Comment strings extracted from consumed COMMENT tokens.

static parse_rows(tokens, columns)#

Consume LITERAL tokens as row data and build a DataFrame.

Parameters:
tokenslist of Token

Token queue (front at index -1).

columnslist of str

Column names for the resulting DataFrame.

Returns:
commentslist of str

Any comments found before the row data.

datapandas.DataFrame

Parsed rows as a DataFrame.

static parse_specifier(tokens)#

Consume leading whitespace/comments and then a LITERAL token as a data specifier.

Parameters:
tokenslist of Token

Token queue (front at index -1).

Returns:
commentslist of str

Any comments found before the specifier.

specifierstr

The specifier value.

static tokenize(text)#

Tokenize raw STAR file text into a reversed list of Token objects.

The returned list is ordered so that list.pop() yields tokens in reading order (i.e. the first token in the file is at index -1).

Parameters:
textstr

Raw text content of a STAR file.

Returns:
list of Token

Tokens in reverse reading order.

class cryocat.utils.starfileio.TokenType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)#

Bases: Enum

COMMENT = 2#
LITERAL = 0#
LOOP = 3#
NEWLINE = 1#
PROPERTY = 4#