Croissant
The mlcroissant implementation is a concrete implementation of the abstract copick API that reads project
structure from an mlcroissant JSON-LD manifest plus
CSV sidecars under a Croissant/ subdirectory of the project. It is defined in the copick.impl.mlcroissant module.
See mlcroissant setup for a user-facing walkthrough and mlcroissant tutorial for a hands-on end-to-end example.
Metadata Models
copick.impl.mlcroissant.CopickConfigMLCroissant
Bases: CopickConfig
Copick configuration for mlcroissant-backed storage.
Attributes:
-
croissant_url(str) –URL / path to the Croissant metadata.json.
-
croissant_base_url(Optional[str]) –Optional override for
copick:baseUrl. Used when the dataset has been moved from its published location. -
overlay_root(Optional[str]) –Optional writable overlay. When provided, the Croissant is treated as read-only and writes land in the overlay (Mode B). When omitted, the Croissant's base URL is used as the write target (Mode A).
-
overlay_fs_args(Optional[Dict[str, Any]]) –Extra fsspec kwargs for the overlay filesystem.
-
croissant_fs_args(Optional[Dict[str, Any]]) –Extra fsspec kwargs for fetching metadata.json and CSVs.
-
static_fs_args(Optional[Dict[str, Any]]) –Extra fsspec kwargs for resolving data URLs against the Croissant's
base_url. MirrorsCopickConfigFSSpec.static_fs_args— the Croissant'sbase_urlplays the same role asstatic_rootin the filesystem backend (read-only shared data location). Kept out of the Croissant manifest itself so shared artifacts stay credential free; consumers supply these in their local copick config.
copick.impl.mlcroissant.CroissantIndex
In-memory index of a Croissant manifest + its CSV sidecars.
Holds both the read side (materialised rows per artifact type) and the write-tracking state (dirty CSVs, auto-commit toggle).
from_url
from_url(croissant_url: str, *, base_url_override: Optional[str] = None, fs_args: Optional[Dict[str, Any]] = None, static_fs_args: Optional[Dict[str, Any]] = None) -> CroissantIndex
Fetch and parse the Croissant manifest at croissant_url.
fs_args are applied when reading metadata.json + CSV sidecars.
static_fs_args are stored on the index and applied when resolving
data URLs against base_url via :meth:resolve_url.
resolve_url
Resolve a CSV url column value into (fsspec_fs, absolute_path).
self.static_fs_args are merged beneath caller-supplied fs_args
so callers can still override on a per-call basis.
commit
Write dirty CSVs and update metadata.json.
Uses atomic temp-file-and-rename on the local filesystem. For remote
filesystems, relies on fsspec's pipe/open("wb") atomicity
(best effort).
reload
Re-read metadata.json and CSVs from disk.
Use this to pick up changes made by another process / root instance.
Any unflushed dirty state is discarded — callers relying on
batch()-deferred commits should commit() before reload().
get_split
Return the split for run_name, or None if unassigned.
set_split
Assign split to run_name (empty string / None clears).
Raises :class:KeyError if run_name is not in the index. Marks
copick/runs dirty; honours _auto_commit.
Data Entities
copick.impl.mlcroissant.CopickRootMLC
Bases: CopickRoot
Copick root backed by an mlcroissant manifest.
Mode A (self-contained): overlay_root is None and copick:baseUrl is
writable. Writes go to the project tree and auto-sync to the CSVs.
Mode B (remote + overlay): overlay_root is set and the Croissant is
read-only. Writes go to the overlay only; the Croissant is not updated.
static_is_overlay
Whether the Croissant base URL and the overlay point to the same location.
True in Mode A by construction (overlay_base_url is None and
fs_overlay is built from base_url) and in Mode B when the
configured overlay_root resolves to the same path as the Croissant's
base_url. Query methods consult this to avoid returning each
artifact twice (once from the CSV index, once from the overlay glob).
from_file
Initialise from a copick config JSON (not the Croissant itself).
refresh
Reload the Croissant index from disk and reset child caches.
Each :class:CopickRootMLC maintains its own in-memory Croissant
index for performance. When another process (or another root
instance in the same process) has modified the project — e.g. after
an in-process copick sync CLI invocation — call refresh()
on the original root to pick up those changes.
get_runs_in_split
Return all runs currently assigned to split_name.
set_splits
Bulk-assign splits. mapping is {split: iterable_of_run_names}.
When clear_existing is True, every run's split is cleared first
so the final state matches mapping exactly. Otherwise the
existing splits are preserved for runs not mentioned in mapping.
All writes coalesce into a single commit via :meth:batch.
clear_splits
Clear split assignment for runs (iterable) or for every run
if runs is None. All writes coalesce into a single commit.
copick.impl.mlcroissant.CopickObjectMLC
Bases: CopickObjectOverlay
copick.impl.mlcroissant.CopickRunMLC
Bases: CopickRunOverlay
copick.impl.mlcroissant.CopickPicksMLC
copick.impl.mlcroissant.CopickMeshMLC
Bases: CopickMeshOverlay
copick.impl.mlcroissant.CopickSegmentationMLC
Bases: CopickSegmentationOverlay
copick.impl.mlcroissant.CopickVoxelSpacingMLC
Bases: CopickVoxelSpacingOverlay
copick.impl.mlcroissant.CopickTomogramMLC
Bases: CopickTomogramOverlay
copick.impl.mlcroissant.CopickFeaturesMLC
Bases: CopickFeaturesOverlay
Exporter
copick.ops.croissant.export_croissant
export_croissant(root: CopickRoot, project_root: str, *, base_url: Optional[str] = None, dataset_name: Optional[str] = None, description: Optional[str] = None, license: Optional[str] = None, cite_as: Optional[str] = None, date_published: Optional[str] = None, validate: bool = True, compute_file_sha256: bool = True, force: bool = False, runs: Optional[Iterable[str]] = None, tomograms: Optional[Iterable[str]] = None, features: Optional[Iterable[str]] = None, picks: Optional[Iterable[str]] = None, meshes: Optional[Iterable[str]] = None, segmentations: Optional[Iterable[str]] = None, objects: Optional[Iterable[str]] = None, tomo_type_map: Optional[Dict[str, str]] = None, object_name_map: Optional[Dict[str, str]] = None, session_id_template: Optional[str] = None, picks_portal_meta: Optional[Dict[str, Any]] = None, picks_author: Optional[Iterable[str]] = None, segmentations_portal_meta: Optional[Dict[str, Any]] = None, segmentations_author: Optional[Iterable[str]] = None, tomograms_portal_meta: Optional[Dict[str, Any]] = None, tomograms_author: Optional[Iterable[str]] = None, splits: Optional[Dict[str, Any]] = None) -> str
Export root to <project_root>/Croissant/.
Parameters:
-
root(CopickRoot) –A loaded copick project (filesystem, CDP, or mlcroissant source).
-
project_root(str) –Absolute path / URL of the copick project root. The exporter writes
<project_root>/Croissant/metadata.json+ CSVs. -
base_url(Optional[str], default:None) –Required for filesystem sources; absolute URL that resolves to
project_rootat consumer-read time. Ignored for CDP sources (common portal-URL prefix is used instead). -
dataset_name(Optional[str], default:None) –Dataset title (defaults to
root.config.name). -
description(Optional[str], default:None) –Dataset description.
-
license(Optional[str], default:None) –Dataset license string.
-
cite_as(Optional[str], default:None) –Citation.
-
date_published(Optional[str], default:None) –ISO date string. Defaults to today.
-
validate(bool, default:True) –Run the Croissant validator after assembly. Raises on errors.
-
compute_file_sha256(bool, default:True) –Compute sha256 per picks JSON / mesh GLB (O(N) reads).
-
runs(Optional[Iterable[str]], default:None) –Optional iterable of run names to include. If
None(default), every run is exported. Names that don't exist inrootare silently skipped. -
tomograms(Optional[Iterable[str]], default:None) –Optional iterable of copick URIs (e.g.
"wbp@10.0") to filter tomograms. Each URI is resolved via :func:copick.util.uri.resolve_copick_objectsand the results are unioned.Nonemeans no filter. -
features(Optional[Iterable[str]], default:None) –Optional iterable of copick URIs (e.g.
"wbp@10.0:sobel"). -
picks(Optional[Iterable[str]], default:None) –Optional iterable of copick URIs (e.g.
"ribosome:*/*"). -
meshes(Optional[Iterable[str]], default:None) –Optional iterable of copick URIs (e.g.
"ribosome:*/*"). -
segmentations(Optional[Iterable[str]], default:None) –Optional iterable of copick URIs (e.g.
"membrane:*/*@10.0"). -
objects(Optional[Iterable[str]], default:None) –Optional iterable of pickable-object names to include in the object density map CSV.
copick:config.pickable_objectsis unaffected. -
force(bool, default:False) –When
True, overwrite an existingCroissant/metadata.jsonunderproject_root. WhenFalse(default) and a manifest already exists, raiseFileExistsErrorinstead of clobbering it. -
tomo_type_map(Optional[Dict[str, str]], default:None) –Optional
{src_tomo_type: dst_tomo_type}remap applied totomograms.csv/features.csvtomo_typecolumns at emission time. Universally applicable. -
object_name_map(Optional[Dict[str, str]], default:None) –Optional
{src: dst}remap applied toobject_namein picks / meshes / segmentations, the objects.csvnamecolumn, andcopick:config.pickable_objects[].name. Renamed pickable objects carry the original portal name inmetadata["portal_original_name"]. Universally applicable. -
session_id_template(Optional[str], default:None) –Python
str.formattemplate for synthesizing picks / segmentationssession_idvalues from CDP annotation metadata. Placeholders are any scalar field of_PortalAnnotationplus{author},{authors},{annotation_file_id}. CDP-only; raises on non-CDP sources. -
picks_portal_meta(Optional[Dict[str, Any]], default:None) –Dict passed to
run.get_picks(portal_meta_query=...); CDP-only. -
picks_author(Optional[Iterable[str]], default:None) –List passed to
run.get_picks(portal_author_query=...); CDP-only. -
segmentations_portal_meta(Optional[Dict[str, Any]], default:None) –ditto for segmentations; CDP-only.
-
segmentations_author(Optional[Iterable[str]], default:None) –ditto for segmentations; CDP-only.
-
tomograms_portal_meta(Optional[Dict[str, Any]], default:None) –ditto for tomograms; CDP-only.
-
tomograms_author(Optional[Iterable[str]], default:None) –ditto for tomograms; CDP-only.
Returns:
-
str–The path to the written metadata.json.