starfileio#
- class cryocat.utils.starfileio.Starfile(file_path=None, frames=None, specifiers=None, comments=None)#
Bases:
objectRead, write, and manipulate RELION/STOPGAP STAR files.
A STAR file contains one or more data blocks, each identified by a specifier (e.g.
data_particles). Each block is parsed into apandas.DataFrame.- Attributes:
- frameslist of pandas.DataFrame
One DataFrame per data block.
- specifierslist of str
Data-block specifier strings (e.g.
"data_particles").- commentslist of list of str
Comment lines associated with each data block.
- static fix_relion5_star(input_path)#
Convert a RELION 5 STAR file with key-value
data_generalblocks to loop format.RELION 5 sometimes writes
data_generalsections as bare_key valuepairs rather than as aloop_table. This function rewrites such sections into the loop-based format expected by the rest of the STAR parser so that the file can be read byread().If the file already contains a
loop_insidedata_general(i.e. it is already in the correct format), the original path is returned unchanged and no temporary file is created.- Parameters:
- input_pathstr
Path to the input RELION 5 STAR file.
- Returns:
- str
Path to the (possibly temporary) fixed file, or input_path if no changes were necessary.
- static get_frame_and_comments(file_path, specifier)#
Read a single data block and its comments from a STAR file.
- Parameters:
- file_pathstr
Path to the
.starfile.- specifierstr
Specifier of the data block to retrieve (e.g.
"data_particles").
- Returns:
- framepandas.DataFrame
The data block.
- commentslist of str
Comments associated with the data block.
- Raises:
- ValueError
If specifier is not found in the file.
- static get_specifier_id(specifiers, specifier_id)#
Return the list index of a specifier string, or None if not found.
- Parameters:
- specifierslist of str
List of specifier strings from a parsed STAR file.
- specifier_idstr
Specifier to search for.
- Returns:
- int or None
Index of specifier_id in specifiers, or None if absent.
- static read(file_path, data_id=None)#
Parse a STAR file into DataFrames, specifiers, and comments.
Columns that contain entirely numeric data are automatically cast to numeric types.
- Parameters:
- file_pathstr
Path to the
.starfile to read.- data_idint, optional
If given, return only the data block at this index rather than all blocks. Defaults to None.
- Returns:
- frameslist of pandas.DataFrame, or pandas.DataFrame
All data blocks (or a single block when data_id is given).
- specifierslist of str, or str
Specifier strings (or a single specifier when data_id is given).
- commentslist of list of str, or list of str
Comments (or a single comment list when data_id is given).
- static remove_lines(file_path, lines_to_remove, output_path=None, data_specifier=None, number_columns=True)#
Remove rows from a data block in a STAR file.
- Parameters:
- file_pathstr
Path to the input STAR file.
- lines_to_removearray-like of int
Integer row indices (0-based) to remove from the target data block.
- output_pathstr, optional
If given, the modified STAR file is written to this path. If None, the modified data structures are returned instead. Defaults to None.
- data_specifierstr, optional
Specifier of the data block to modify. If None, the first block is used. Defaults to None.
- number_columnsbool, default=True
Whether to write column indices in the output file. Defaults to True.
- Returns:
- tuple or None
(frames, specifiers, comments)when output_path is None, otherwise None.
- static write(frames, output_path, specifiers=None, comments=None, number_columns=True, float_precision=6)#
Write data blocks to a STAR file.
- Parameters:
- frameslist of pandas.DataFrame
Data blocks to write.
- output_pathstr
Path of the output
.starfile.- specifierslist of str, optional
Specifier string for each data block. Defaults to
["data"] * len(frames).- commentslist of list of str or None, optional
Comments to write before each data block. Defaults to no comments.
- number_columnsbool, default=True
If True, append
#<index>after each column header line. Forced off for blocks whose specifier contains"stopgap". Defaults to True.- float_precisionint, default=6
Number of decimal places used when rounding float values. Defaults to 6.
- Raises:
- ValueError
If the lengths of frames, specifiers, and comments do not match.
- class cryocat.utils.starfileio.Token(token_type, value, location)#
Bases:
objectLexical token used during STAR file parsing.
Each token carries a type (from TokenType), its string value, and the (1-based) line/column location in the original file.
- static check(tokens, token_type)#
Return True if the next token in the queue matches
token_type.- Parameters:
- tokenslist of Token
Token queue (front at index
-1).- token_typeTokenType
Expected token type.
- Returns:
- bool
- static check_then_consume(tokens, token_type)#
Consume the next token if it matches
token_type, otherwise return None.- Parameters:
- tokenslist of Token
Token queue (front at index
-1).- token_typeTokenType
Token type to match.
- Returns:
- Token or None
The consumed token, or None if the type did not match.
- static consume(tokens, token_type)#
Remove and return the next token, raising IOError if the type does not match.
- Parameters:
- tokenslist of Token
Token queue (front at index
-1).- token_typeTokenType
Expected token type.
- Returns:
- Token
The consumed token.
- Raises:
- IOError
If the queue is empty or the next token has a different type.
- static lookahead(tokens, token_type_target, ignores)#
Scan ahead through the queue for
token_type_target, skippingignores.- Parameters:
- tokenslist of Token
Token queue (front at index
-1).- token_type_targetTokenType
Token type to search for.
- ignoreslist of TokenType
Token types that should be skipped during the scan.
- Returns:
- bool
True if
token_type_targetis found before any non-ignored token.
- static parse_column(tokens)#
Consume a single PROPERTY token (optionally followed by a COMMENT) as a column name.
The leading
_of the PROPERTY value is stripped before returning.- Parameters:
- tokenslist of Token
Token queue (front at index
-1).
- Returns:
- str
Column name without the leading
_.
- static parse_columns(tokens)#
Consume the
loop_keyword and subsequent PROPERTY tokens as column names.- Parameters:
- tokenslist of Token
Token queue (front at index
-1).
- Returns:
- commentslist of str
Any comments found before
loop_.- columnslist of str
Column names (PROPERTY values with the leading
_stripped).
- static parse_newline_or_comments(tokens)#
Consume NEWLINE and COMMENT tokens from the front of the queue.
- Parameters:
- tokenslist of Token
Token queue (front at index
-1).
- Returns:
- list of str
Comment strings extracted from consumed COMMENT tokens.
- static parse_rows(tokens, columns)#
Consume LITERAL tokens as row data and build a DataFrame.
- Parameters:
- tokenslist of Token
Token queue (front at index
-1).- columnslist of str
Column names for the resulting DataFrame.
- Returns:
- commentslist of str
Any comments found before the row data.
- datapandas.DataFrame
Parsed rows as a DataFrame.
- static parse_specifier(tokens)#
Consume leading whitespace/comments and then a LITERAL token as a data specifier.
- Parameters:
- tokenslist of Token
Token queue (front at index
-1).
- Returns:
- commentslist of str
Any comments found before the specifier.
- specifierstr
The specifier value.
- static tokenize(text)#
Tokenize raw STAR file text into a reversed list of Token objects.
The returned list is ordered so that
list.pop()yields tokens in reading order (i.e. the first token in the file is at index-1).- Parameters:
- textstr
Raw text content of a STAR file.
- Returns:
- list of Token
Tokens in reverse reading order.