Skip to content

copick config export-croissant

core

Export a copick project to an mlcroissant manifest.

Usage

copick config export-croissant [OPTIONS]

Description

Writes a Croissant metadata.json plus CSV sidecars under <project-root>/Croissant/. The source project is loaded either from a copick config (--config) or directly from CryoET Data Portal dataset IDs (--source-dataset-ids); the two are mutually exclusive. For filesystem sources --base-url is the absolute URL that resolves to --project-root at consumer read time; for CDP sources it is ignored (the canonical portal s3:// prefix is used).

With --emit-config PATH, also writes a ready-to-use mlcroissant copick configuration JSON at PATH. Pair with --config-overlay DIR to embed a writable overlay (Mode B) so visualization tools can annotate without touching the source data.

Subset selection: any of --runs / --tomograms / --features / --picks / --meshes / --segmentations / --objects may be provided to restrict the export. URI-based flags follow copick's standard URI grammar and can be repeated to union multiple selectors. Any omitted flag means "no filter, include everything of that type". CDP-only reshape flags (--tomo-type-map, --object-name-map, --session-id-template, the --*-portal-meta / --*-author filters) let you rename and filter portal-derived rows on the way out.

Options

Option Type Default Description
--config file Path to the input copick configuration file. Mutually exclusive with --source-dataset-ids.
--source-dataset-ids text Comma-separated CryoET Data Portal dataset IDs (e.g. '10000,10001'). Creates a temporary CDP config; mutually exclusive with --config.
--project-root directory required Copick project root directory; Croissant/ is written under this.
--force boolean flag False Overwrite an existing Croissant/metadata.json under --project-root.
--base-url text Absolute URL that resolves to --project-root at consumer read time. Required for filesystem sources; ignored for CDP (common portal-URL prefix is used).
--dataset-name text Dataset title for the Croissant.
--description text Dataset description.
--license text CC-BY-4.0 Dataset license.
--cite-as text "" Citation string.
--date-published text ISO date string (defaults to today).
--no-file-sha256 boolean flag False Skip computing sha256 for picks/meshes (faster but marks output non-strict).
--emit-config file Also write an mlcroissant copick config JSON at this path, pointing at the exported Croissant. Off by default.
--config-overlay text Overlay URL to embed in the emitted copick config (Mode B). Accepts any fsspec URL (e.g. 'ssh:///remote/overlay', 's3://bucket/overlay') or a bare local path. Only used when --emit-config is set. If omitted, the emitted config is Mode A (self-contained).
--config-overlay-fs-args text JSON object of fsspec kwargs for --config-overlay (e.g. '{"host":"localhost","port":2222}'). Local overlays add 'auto_mkdir=true' automatically unless overridden.
--config-static-fs-args text JSON object of fsspec kwargs for reaching the Croissant's base URL (data location) from the emitted copick config. Defaults to the source config's overlay_fs_args. Never written to the Croissant manifest itself (kept credential-free for sharing).
--config-croissant-fs-args text JSON object of fsspec kwargs for reading the Croissant manifest itself from the emitted copick config. Defaults to empty (typical when --project-root is local).
--runs text Comma-separated run names to include. Omit to include all runs.
--tomograms text · multiple Copick URI to filter tomograms (e.g. 'wbp@10.0'). Repeatable. Omit to include all tomograms.
--features text · multiple Copick URI to filter features (e.g. 'wbp@10.0:sobel'). Repeatable. Omit to include all features.
--picks text · multiple Copick URI to filter picks (e.g. 'ribosome:/'). Repeatable. Omit to include all picks.
--meshes text · multiple Copick URI to filter meshes (e.g. 'ribosome:/'). Repeatable. Omit to include all meshes.
--segmentations text · multiple Copick URI to filter segmentations (e.g. 'membrane:/@10.0'). Repeatable. Omit to include all segmentations.
--objects text Comma-separated pickable object names to emit density maps for. Omit to include all objects.
--tomo-type-map text Rename tomo_type values at CSV emission time, e.g. 'wbp-raw:wbp,denoised-cryocare:denoised'.
--object-name-map text Rename object names at CSV emission time (applies to picks/meshes/segmentations/objects and copick:config.pickable_objects), e.g. 'cytosolic-ribosome:ribosome'.
--session-id-template text Python str.format template for synthesizing picks/segmentations session_id values from CDP annotation metadata (CDP-only). Placeholders: any scalar _PortalAnnotation field, plus {author}, {authors}, {annotation_file_id}.
--picks-portal-meta text Comma-separated k=v pairs filtering CDP picks by portal annotation metadata (e.g. 'ground_truth_status=true,method_type=manual'). CDP-only.
--picks-author text Comma-separated author names filtering CDP picks (e.g. 'Alice,Bob'). CDP-only.
--segmentations-portal-meta text Comma-separated k=v pairs filtering CDP segmentations by portal annotation metadata. CDP-only.
--segmentations-author text Comma-separated author names filtering CDP segmentations. CDP-only.
--tomograms-portal-meta text Comma-separated k=v pairs filtering CDP tomograms by portal tomogram metadata (e.g. 'reconstruction_method=wbp,ctf_corrected=true'). CDP-only.
--tomograms-author text Comma-separated author names filtering CDP tomograms. CDP-only.
--split text · multiple Assign runs to an ML split, e.g. 'train=TS_001,TS_002'. Repeatable. Standard names (train/val/validation/test/eval) map to the canonical cr:*Split URIs; custom names emit without a URI.
--splits-file file CSV with columns 'split' and 'run' providing split assignments. Combined with any --split flags (the CLI flags override duplicate split names).
--debug / --no-debug boolean flag False Enable debug logging.

Examples

# Export a filesystem project with an explicit consumer base URL
copick config export-croissant \
    --config my_project/filesystem.json \
    --project-root my_project \
    --base-url https://data.example.org/my_project/ \
    --dataset-name "My cryoET project" \
    --license CC-BY-4.0

# Export a subset: two runs, ribosome picks, 10 A WBP tomograms
copick config export-croissant \
    --config my_project/filesystem.json \
    --project-root my_project \
    --base-url https://data.example.org/my_project/ \
    --runs TS_001,TS_002 \
    --tomograms "wbp@10.0" \
    --picks "ribosome:*/*"

# Export straight from portal datasets with CDP reshape transforms
copick config export-croissant \
    --source-dataset-ids 10000 \
    --project-root /tmp/curated \
    --picks "cytosolic-ribosome:*/*" \
    --object-name-map "cytosolic-ribosome:ribosome" \
    --session-id-template "{method_type}" \
    --picks-author "Alice"

See also