Skip to content

Croissant

The mlcroissant implementation is a concrete implementation of the abstract copick API that reads project structure from an mlcroissant JSON-LD manifest plus CSV sidecars under a Croissant/ subdirectory of the project. It is defined in the copick.impl.mlcroissant module.

See mlcroissant setup for a user-facing walkthrough and mlcroissant tutorial for a hands-on end-to-end example.

Metadata Models

copick.impl.mlcroissant.CopickConfigMLCroissant

Bases: CopickConfig

Copick configuration for mlcroissant-backed storage.

Attributes:

  • croissant_url (str) –

    URL / path to the Croissant metadata.json.

  • croissant_base_url (Optional[str]) –

    Optional override for copick:baseUrl. Used when the dataset has been moved from its published location.

  • overlay_root (Optional[str]) –

    Optional writable overlay. When provided, the Croissant is treated as read-only and writes land in the overlay (Mode B). When omitted, the Croissant's base URL is used as the write target (Mode A).

  • overlay_fs_args (Optional[Dict[str, Any]]) –

    Extra fsspec kwargs for the overlay filesystem.

  • croissant_fs_args (Optional[Dict[str, Any]]) –

    Extra fsspec kwargs for fetching metadata.json and CSVs.

  • static_fs_args (Optional[Dict[str, Any]]) –

    Extra fsspec kwargs for resolving data URLs against the Croissant's base_url. Mirrors CopickConfigFSSpec.static_fs_args — the Croissant's base_url plays the same role as static_root in the filesystem backend (read-only shared data location). Kept out of the Croissant manifest itself so shared artifacts stay credential free; consumers supply these in their local copick config.


copick.impl.mlcroissant.CroissantIndex

In-memory index of a Croissant manifest + its CSV sidecars.

Holds both the read side (materialised rows per artifact type) and the write-tracking state (dirty CSVs, auto-commit toggle).

from_url

from_url(croissant_url: str, *, base_url_override: Optional[str] = None, fs_args: Optional[Dict[str, Any]] = None, static_fs_args: Optional[Dict[str, Any]] = None) -> CroissantIndex

Fetch and parse the Croissant manifest at croissant_url.

fs_args are applied when reading metadata.json + CSV sidecars. static_fs_args are stored on the index and applied when resolving data URLs against base_url via :meth:resolve_url.

resolve_url

resolve_url(url_value: str, **fs_args) -> Tuple[AbstractFileSystem, str]

Resolve a CSV url column value into (fsspec_fs, absolute_path).

self.static_fs_args are merged beneath caller-supplied fs_args so callers can still override on a per-call basis.

commit

commit() -> None

Write dirty CSVs and update metadata.json.

Uses atomic temp-file-and-rename on the local filesystem. For remote filesystems, relies on fsspec's pipe/open("wb") atomicity (best effort).

reload

reload() -> None

Re-read metadata.json and CSVs from disk.

Use this to pick up changes made by another process / root instance. Any unflushed dirty state is discarded — callers relying on batch()-deferred commits should commit() before reload().

get_split

get_split(run_name: str) -> Optional[str]

Return the split for run_name, or None if unassigned.

set_split

set_split(run_name: str, split: Optional[str]) -> None

Assign split to run_name (empty string / None clears).

Raises :class:KeyError if run_name is not in the index. Marks copick/runs dirty; honours _auto_commit.

clear_split

clear_split(run_name: str) -> None

Clear the split assignment for run_name.

get_all_splits

get_all_splits() -> Dict[str, List[str]]

Return {split_name: [sorted run names]} from the current index.

Data Entities

copick.impl.mlcroissant.CopickRootMLC

Bases: CopickRoot

Copick root backed by an mlcroissant manifest.

Mode A (self-contained): overlay_root is None and copick:baseUrl is writable. Writes go to the project tree and auto-sync to the CSVs.

Mode B (remote + overlay): overlay_root is set and the Croissant is read-only. Writes go to the overlay only; the Croissant is not updated.

static_is_overlay

static_is_overlay: bool

Whether the Croissant base URL and the overlay point to the same location.

True in Mode A by construction (overlay_base_url is None and fs_overlay is built from base_url) and in Mode B when the configured overlay_root resolves to the same path as the Croissant's base_url. Query methods consult this to avoid returning each artifact twice (once from the CSV index, once from the overlay glob).

splits

splits: Dict[str, List[str]]

Return {split_name: [run names]} from the Croissant index.

from_file

from_file(path: str) -> CopickRootMLC

Initialise from a copick config JSON (not the Croissant itself).

sync

sync() -> None

Flush any dirty CSVs and rewrite metadata.json.

refresh

refresh() -> None

Reload the Croissant index from disk and reset child caches.

Each :class:CopickRootMLC maintains its own in-memory Croissant index for performance. When another process (or another root instance in the same process) has modified the project — e.g. after an in-process copick sync CLI invocation — call refresh() on the original root to pick up those changes.

get_runs_in_split

get_runs_in_split(split_name: str) -> List[CopickRunMLC]

Return all runs currently assigned to split_name.

set_splits

set_splits(mapping: Dict[str, Any], *, clear_existing: bool = False) -> None

Bulk-assign splits. mapping is {split: iterable_of_run_names}.

When clear_existing is True, every run's split is cleared first so the final state matches mapping exactly. Otherwise the existing splits are preserved for runs not mentioned in mapping. All writes coalesce into a single commit via :meth:batch.

clear_splits

clear_splits(runs: Optional[Any] = None) -> None

Clear split assignment for runs (iterable) or for every run if runs is None. All writes coalesce into a single commit.

batch

batch()

Context manager that defers commits until exit.

Usage

with root.batch(): for ...: run.new_picks(...).store()


copick.impl.mlcroissant.CopickObjectMLC


copick.impl.mlcroissant.CopickRunMLC


copick.impl.mlcroissant.CopickPicksMLC

Bases: CopickPicksOverlay

Picks file backed by an mlcroissant manifest (static) + optional overlay.


copick.impl.mlcroissant.CopickMeshMLC


copick.impl.mlcroissant.CopickSegmentationMLC


copick.impl.mlcroissant.CopickVoxelSpacingMLC


copick.impl.mlcroissant.CopickTomogramMLC


copick.impl.mlcroissant.CopickFeaturesMLC

Exporter

copick.ops.croissant.export_croissant

export_croissant(root: CopickRoot, project_root: str, *, base_url: Optional[str] = None, dataset_name: Optional[str] = None, description: Optional[str] = None, license: Optional[str] = None, cite_as: Optional[str] = None, date_published: Optional[str] = None, validate: bool = True, compute_file_sha256: bool = True, force: bool = False, runs: Optional[Iterable[str]] = None, tomograms: Optional[Iterable[str]] = None, features: Optional[Iterable[str]] = None, picks: Optional[Iterable[str]] = None, meshes: Optional[Iterable[str]] = None, segmentations: Optional[Iterable[str]] = None, objects: Optional[Iterable[str]] = None, tomo_type_map: Optional[Dict[str, str]] = None, object_name_map: Optional[Dict[str, str]] = None, session_id_template: Optional[str] = None, picks_portal_meta: Optional[Dict[str, Any]] = None, picks_author: Optional[Iterable[str]] = None, segmentations_portal_meta: Optional[Dict[str, Any]] = None, segmentations_author: Optional[Iterable[str]] = None, tomograms_portal_meta: Optional[Dict[str, Any]] = None, tomograms_author: Optional[Iterable[str]] = None, splits: Optional[Dict[str, Any]] = None) -> str

Export root to <project_root>/Croissant/.

Parameters:

  • root (CopickRoot) –

    A loaded copick project (filesystem, CDP, or mlcroissant source).

  • project_root (str) –

    Absolute path / URL of the copick project root. The exporter writes <project_root>/Croissant/metadata.json + CSVs.

  • base_url (Optional[str], default: None ) –

    Required for filesystem sources; absolute URL that resolves to project_root at consumer-read time. Ignored for CDP sources (common portal-URL prefix is used instead).

  • dataset_name (Optional[str], default: None ) –

    Dataset title (defaults to root.config.name).

  • description (Optional[str], default: None ) –

    Dataset description.

  • license (Optional[str], default: None ) –

    Dataset license string.

  • cite_as (Optional[str], default: None ) –

    Citation.

  • date_published (Optional[str], default: None ) –

    ISO date string. Defaults to today.

  • validate (bool, default: True ) –

    Run the Croissant validator after assembly. Raises on errors.

  • compute_file_sha256 (bool, default: True ) –

    Compute sha256 per picks JSON / mesh GLB (O(N) reads).

  • runs (Optional[Iterable[str]], default: None ) –

    Optional iterable of run names to include. If None (default), every run is exported. Names that don't exist in root are silently skipped.

  • tomograms (Optional[Iterable[str]], default: None ) –

    Optional iterable of copick URIs (e.g. "wbp@10.0") to filter tomograms. Each URI is resolved via :func:copick.util.uri.resolve_copick_objects and the results are unioned. None means no filter.

  • features (Optional[Iterable[str]], default: None ) –

    Optional iterable of copick URIs (e.g. "wbp@10.0:sobel").

  • picks (Optional[Iterable[str]], default: None ) –

    Optional iterable of copick URIs (e.g. "ribosome:*/*").

  • meshes (Optional[Iterable[str]], default: None ) –

    Optional iterable of copick URIs (e.g. "ribosome:*/*").

  • segmentations (Optional[Iterable[str]], default: None ) –

    Optional iterable of copick URIs (e.g. "membrane:*/*@10.0").

  • objects (Optional[Iterable[str]], default: None ) –

    Optional iterable of pickable-object names to include in the object density map CSV. copick:config.pickable_objects is unaffected.

  • force (bool, default: False ) –

    When True, overwrite an existing Croissant/metadata.json under project_root. When False (default) and a manifest already exists, raise FileExistsError instead of clobbering it.

  • tomo_type_map (Optional[Dict[str, str]], default: None ) –

    Optional {src_tomo_type: dst_tomo_type} remap applied to tomograms.csv / features.csv tomo_type columns at emission time. Universally applicable.

  • object_name_map (Optional[Dict[str, str]], default: None ) –

    Optional {src: dst} remap applied to object_name in picks / meshes / segmentations, the objects.csv name column, and copick:config.pickable_objects[].name. Renamed pickable objects carry the original portal name in metadata["portal_original_name"]. Universally applicable.

  • session_id_template (Optional[str], default: None ) –

    Python str.format template for synthesizing picks / segmentations session_id values from CDP annotation metadata. Placeholders are any scalar field of _PortalAnnotation plus {author}, {authors}, {annotation_file_id}. CDP-only; raises on non-CDP sources.

  • picks_portal_meta (Optional[Dict[str, Any]], default: None ) –

    Dict passed to run.get_picks(portal_meta_query=...); CDP-only.

  • picks_author (Optional[Iterable[str]], default: None ) –

    List passed to run.get_picks(portal_author_query=...); CDP-only.

  • segmentations_portal_meta (Optional[Dict[str, Any]], default: None ) –

    ditto for segmentations; CDP-only.

  • segmentations_author (Optional[Iterable[str]], default: None ) –

    ditto for segmentations; CDP-only.

  • tomograms_portal_meta (Optional[Dict[str, Any]], default: None ) –

    ditto for tomograms; CDP-only.

  • tomograms_author (Optional[Iterable[str]], default: None ) –

    ditto for tomograms; CDP-only.

Returns:

  • str

    The path to the written metadata.json.