Crystallographic Indexing

CrystFEL processing pipeline tools.

class TRSFX.indexer.AlignDetectorConfig(geometry_in, geometry_out, mille_dir, level=2, camera_length=True, out_of_plane=False, out_of_plane_tilts=False, panel_totals=False)

Configuration for align_detector.

class TRSFX.indexer.Ambigator(directory, input_stream, symmetry, apparent_symmetry, params=None, modules=None, slurm=None, verbose=False)

Resolves indexing ambiguities when point group symmetry is lower than lattice symmetry.

Parameters

symmetrystr

The true point group symmetry of the structure (-y)

apparent_symmetrystr

The apparent lattice symmetry (-w), used to determine the ambiguity operator

class TRSFX.indexer.DetectorRefinement(directory, list_file, geometry, cell_file, params, modules=None, n_jobs=10, mille_level=2, slurm=None, align_flags=None, verbose=False)

Two-stage detector geometry refinement using Millepede.

Generates calibration data then aligns detector panels.

Example

>>> ref = DetectorRefinement(
...     directory="refinement_output",
...     list_file="events.lst",
...     geometry="detector.geom",
...     cell_file="cell.pdb",
...     params={"indexing": "xgandalf"},
... )
>>> ref.submit()  # Submit mille generation jobs
>>> ref.align()   # Wait for mille, then submit alignment
>>> print(ref.refined_geometry)  # Path to refined geometry
align(slurm=None)

Wait for mille jobs and submit the alignment job.

Return type:

Job

property refined_geometry: Path

Wait for alignment and return path to refined geometry.

submit()

Submit mille data generation jobs.

Return type:

List[Job]

class TRSFX.indexer.GridSearch(directory, list_file, geometry, config, cell_file=None, modules=None, slurm=None, verbose=False)

Grid search over indexamajig parameters.

Subsamples the input list and runs indexamajig with every combination of parameters specified in the grid configuration.

Special handling for ‘clen’ parameter: instead of passing as a CLI flag, generates modified geometry files for each clen value.

Example

>>> config = GridSearchConfig(
...     base_params={"indexing": "xgandalf", "peaks": "peakfinder8"},
...     grid_params={
...         "threshold": [6, 8, 10],
...         "clen": [0.120, 0.125, 0.130],  # Special: modifies geometry
...     },
...     n_subsample=1000,
... )
>>> gs = GridSearch(
...     directory="grid_output",
...     list_file="events.lst",
...     geometry="detector.geom",
...     config=config,
... )
>>> gs.submit()
classmethod analyze(directory)

Analyze completed grid search results.

Returns dict with ‘results’, ‘best_params’, and ‘summary’.

Return type:

Dict[str, Any]

property best_params: Dict[str, Any]

Parameters from the best-performing run (grid params only, including clen if scanned).

property best_run: Dict[str, Any] | None

Full result dict from the best-performing run.

classmethod from_manifest(directory)

Reconstruct a GridSearch from a previously saved manifest.

Useful for analyzing results after jobs have completed.

Return type:

GridSearch

property n_runs: int

Number of parameter combinations to test.

property n_total_jobs: int

Total number of SLURM jobs that will be submitted.

property results: List[Dict[str, Any]]

Wait for all jobs and return ranked results.

submit()

Submit all grid search jobs to the cluster.

Return type:

List[Job]

summary()

Return a formatted summary of results.

Return type:

str

class TRSFX.indexer.GridSearchConfig(base_params, grid_params, n_subsample=1000, n_jobs_per_run=4)

Configuration for indexamajig parameter grid search.

Validates that grid parameters are properly structured and don’t conflict with base parameters.

classmethod from_dict(d)

Create from a dictionary (e.g., loaded from JSON config).

Return type:

GridSearchConfig

iter_combinations()

Iterate over all parameter combinations.

property n_combinations: int

Total number of parameter combinations.

property parameter_names: List[str]

Names of parameters being searched.

class TRSFX.indexer.Indexamajig(directory, list_file, geometry, params, cell_file=None, modules=None, n_jobs=100, slurm=None, mille=False, mille_level=2, verbose=False)

Manages parallel indexamajig execution over chunked input lists.

Splits input into chunks and prepares jobs. Call submit() to launch. Use GridSearch class for parameter optimization.

Example

>>> idx = Indexamajig(
...     directory="indexing_output",
...     list_file="events.lst",
...     geometry="detector.geom",
...     params={"indexing": "xgandalf", "peaks": "peakfinder8"},
...     cell_file="cell.pdb",
...     n_jobs=100,
... )
>>> idx.submit()

Create a grid search over indexamajig parameters.

This is a convenience method. For more control, use GridSearch directly.

Return type:

GridSearch

Parameters

directorypath

Output directory for grid search results

list_filepath

Input event list file

geometrypath

Detector geometry file

base_paramsdict

Base indexamajig parameters (applied to all runs)

grid_paramsdict

Parameters to search over. Values must be lists.

cell_filepath, optional

Unit cell file

moduleslist, optional

Environment modules to load

n_subsampleint

Number of events to subsample for testing

n_jobs_per_runint

Number of parallel jobs per parameter combination

slurmSlurmConfig, optional

SLURM configuration

verbosebool

Print progress information

Returns

GridSearch

Configured grid search ready for submission

classmethod refine_detector(directory, list_file, geometry, cell_file, params, modules=None, n_jobs=10, mille_level=2, slurm=None, align_flags=None, verbose=False)

Create a detector refinement workflow using Millepede.

Return type:

DetectorRefinement

Parameters

directorypath

Output directory

list_filepath

Input event list

geometrypath

Initial detector geometry

cell_filepath

Unit cell file (required for refinement)

paramsdict

Indexamajig parameters

moduleslist, optional

Environment modules to load

n_jobsint

Number of parallel mille jobs

mille_levelint

Millepede hierarchy level (1-3)

slurmSlurmConfig, optional

SLURM configuration

align_flagsdict, optional

Flags for align_detector (camera_length, out_of_plane, etc.)

verbosebool

Print progress information

Returns

DetectorRefinement

Configured refinement ready for submission

property results: Dict[str, Any]

Wait for jobs and return aggregated statistics.

submit()

Submit all prepared jobs to the cluster.

Return type:

List[Job]

class TRSFX.indexer.IndexamajigConfig(geometry, input_list, output_stream, cell_file=None, params=<factory>)

Configuration for a single indexamajig invocation.

to_cli(modules=None)

Build the CLI command string, optionally with module loads.

Return type:

str

class TRSFX.indexer.Partialator(directory, input_stream, symmetry, output_name='merged', params=None, modules=None, slurm=None, verbose=False)

Merges and scales crystallographic reflections with partiality correction.

class TRSFX.indexer.SlurmConfig(time=60, mem_gb=8, cores=1, partition=None, account=None, job_name=None, extra=<factory>)

Configuration for SLURM job submission.

classmethod from_dict(d)

Create from a dictionary, extracting known keys.

Return type:

SlurmConfig

to_dict()

Convert to submitit-compatible directives.

Return type:

Dict[str, Any]

TRSFX.indexer.concat_streams(source_dir, output_file)

Concatenate all stream files in a directory.

Return type:

Path

TRSFX.indexer.expand_event_list(source_list, output_list, n_frames=None, entry_prefix='//', start_index=0)

Expand a file list into an event list for CrystFEL.

Output format: file entry frame_number Example: /path/file.h5 //1 1

If n_frames is None, detects automatically from the first HDF5 file.

Return type:

Path

TRSFX.indexer.parse_stream_stats(stream_path)

Count indexed crystals and chunks in a stream file.

Return type:

Dict[str, int]

TRSFX.indexer.split_list(source_list, output_dir, n_chunks)

Split an event list into n_chunks roughly equal parts.

Returns list of paths to chunk files.

Return type:

List[Path]

Raises

FileNotFoundError

If source_list doesn’t exist

ValueError

If source_list is empty or n_chunks < 1

TRSFX.indexer.subsample_list(source_list, output_list, n_samples, seed=None)

Randomly subsample events from a list file.

If n_samples exceeds the number of events, returns all events.

Return type:

Path

Raises

FileNotFoundError

If source_list doesn’t exist

ValueError

If source_list is empty or n_samples < 1