copick config export-croissant
core
Export a copick project to an mlcroissant manifest.
Usage
Description
Writes a Croissant metadata.json plus CSV sidecars under
<project-root>/Croissant/. The source project is loaded either from a
copick config (--config) or directly from CryoET Data Portal dataset IDs
(--source-dataset-ids); the two are mutually exclusive. For filesystem
sources --base-url is the absolute URL that resolves to --project-root
at consumer read time; for CDP sources it is ignored (the canonical portal
s3:// prefix is used).
With --emit-config PATH, also writes a ready-to-use mlcroissant copick
configuration JSON at PATH. Pair with --config-overlay DIR to embed a
writable overlay (Mode B) so visualization tools can annotate without
touching the source data.
Subset selection: any of --runs / --tomograms / --features / --picks
/ --meshes / --segmentations / --objects may be provided to restrict
the export. URI-based flags follow copick's standard URI grammar and can be
repeated to union multiple selectors. Any omitted flag means "no filter,
include everything of that type". CDP-only reshape flags
(--tomo-type-map, --object-name-map, --session-id-template, the
--*-portal-meta / --*-author filters) let you rename and filter
portal-derived rows on the way out.
Options
| Option | Type | Default | Description |
|---|---|---|---|
--config |
file | — | Path to the input copick configuration file. Mutually exclusive with --source-dataset-ids. |
--source-dataset-ids |
text | — | Comma-separated CryoET Data Portal dataset IDs (e.g. '10000,10001'). Creates a temporary CDP config; mutually exclusive with --config. |
--project-root |
directory | required | Copick project root directory; Croissant/ is written under this. |
--force |
boolean flag | False |
Overwrite an existing Croissant/metadata.json under --project-root. |
--base-url |
text | — | Absolute URL that resolves to --project-root at consumer read time. Required for filesystem sources; ignored for CDP (common portal-URL prefix is used). |
--dataset-name |
text | — | Dataset title for the Croissant. |
--description |
text | — | Dataset description. |
--license |
text | CC-BY-4.0 |
Dataset license. |
--cite-as |
text | "" |
Citation string. |
--date-published |
text | — | ISO date string (defaults to today). |
--no-file-sha256 |
boolean flag | False |
Skip computing sha256 for picks/meshes (faster but marks output non-strict). |
--emit-config |
file | — | Also write an mlcroissant copick config JSON at this path, pointing at the exported Croissant. Off by default. |
--config-overlay |
text | — | Overlay URL to embed in the emitted copick config (Mode B). Accepts any fsspec URL (e.g. 'ssh:///remote/overlay', 's3://bucket/overlay') or a bare local path. Only used when --emit-config is set. If omitted, the emitted config is Mode A (self-contained). |
--config-overlay-fs-args |
text | — | JSON object of fsspec kwargs for --config-overlay (e.g. '{"host":"localhost","port":2222}'). Local overlays add 'auto_mkdir=true' automatically unless overridden. |
--config-static-fs-args |
text | — | JSON object of fsspec kwargs for reaching the Croissant's base URL (data location) from the emitted copick config. Defaults to the source config's overlay_fs_args. Never written to the Croissant manifest itself (kept credential-free for sharing). |
--config-croissant-fs-args |
text | — | JSON object of fsspec kwargs for reading the Croissant manifest itself from the emitted copick config. Defaults to empty (typical when --project-root is local). |
--runs |
text | — | Comma-separated run names to include. Omit to include all runs. |
--tomograms |
text · multiple | — | Copick URI to filter tomograms (e.g. 'wbp@10.0'). Repeatable. Omit to include all tomograms. |
--features |
text · multiple | — | Copick URI to filter features (e.g. 'wbp@10.0:sobel'). Repeatable. Omit to include all features. |
--picks |
text · multiple | — | Copick URI to filter picks (e.g. 'ribosome:/'). Repeatable. Omit to include all picks. |
--meshes |
text · multiple | — | Copick URI to filter meshes (e.g. 'ribosome:/'). Repeatable. Omit to include all meshes. |
--segmentations |
text · multiple | — | Copick URI to filter segmentations (e.g. 'membrane:/@10.0'). Repeatable. Omit to include all segmentations. |
--objects |
text | — | Comma-separated pickable object names to emit density maps for. Omit to include all objects. |
--tomo-type-map |
text | — | Rename tomo_type values at CSV emission time, e.g. 'wbp-raw:wbp,denoised-cryocare:denoised'. |
--object-name-map |
text | — | Rename object names at CSV emission time (applies to picks/meshes/segmentations/objects and copick:config.pickable_objects), e.g. 'cytosolic-ribosome:ribosome'. |
--session-id-template |
text | — | Python str.format template for synthesizing picks/segmentations session_id values from CDP annotation metadata (CDP-only). Placeholders: any scalar _PortalAnnotation field, plus {author}, {authors}, {annotation_file_id}. |
--picks-portal-meta |
text | — | Comma-separated k=v pairs filtering CDP picks by portal annotation metadata (e.g. 'ground_truth_status=true,method_type=manual'). CDP-only. |
--picks-author |
text | — | Comma-separated author names filtering CDP picks (e.g. 'Alice,Bob'). CDP-only. |
--segmentations-portal-meta |
text | — | Comma-separated k=v pairs filtering CDP segmentations by portal annotation metadata. CDP-only. |
--segmentations-author |
text | — | Comma-separated author names filtering CDP segmentations. CDP-only. |
--tomograms-portal-meta |
text | — | Comma-separated k=v pairs filtering CDP tomograms by portal tomogram metadata (e.g. 'reconstruction_method=wbp,ctf_corrected=true'). CDP-only. |
--tomograms-author |
text | — | Comma-separated author names filtering CDP tomograms. CDP-only. |
--split |
text · multiple | — | Assign runs to an ML split, e.g. 'train=TS_001,TS_002'. Repeatable. Standard names (train/val/validation/test/eval) map to the canonical cr:*Split URIs; custom names emit without a URI. |
--splits-file |
file | — | CSV with columns 'split' and 'run' providing split assignments. Combined with any --split flags (the CLI flags override duplicate split names). |
--debug / --no-debug |
boolean flag | False |
Enable debug logging. |
Examples
# Export a filesystem project with an explicit consumer base URL
copick config export-croissant \
--config my_project/filesystem.json \
--project-root my_project \
--base-url https://data.example.org/my_project/ \
--dataset-name "My cryoET project" \
--license CC-BY-4.0
# Export a subset: two runs, ribosome picks, 10 A WBP tomograms
copick config export-croissant \
--config my_project/filesystem.json \
--project-root my_project \
--base-url https://data.example.org/my_project/ \
--runs TS_001,TS_002 \
--tomograms "wbp@10.0" \
--picks "ribosome:*/*"
# Export straight from portal datasets with CDP reshape transforms
copick config export-croissant \
--source-dataset-ids 10000 \
--project-root /tmp/curated \
--picks "cytosolic-ribosome:*/*" \
--object-name-map "cytosolic-ribosome:ribosome" \
--session-id-template "{method_type}" \
--picks-author "Alice"
See also
copick config append-croissant— union more filtered rows into an existing Croissantcopick config set-splits— edit train/val/test split assignments after exportcopick config mlcroissant— build a copick config that reads the exported Croissant