mlcroissant tutorial

This tutorial walks you through exporting an existing copick project to an mlcroissant manifest, reading it back through the mlcroissant backend, adding new annotations with live auto-sync, and republishing.

Prerequisites

Install copick with its mlcroissant dependencies:

pip install "copick"

(mlcroissant and aiohttp are declared as core dependencies since v1.24.)

1. Start from a filesystem project

We'll use the test sample_project that copick ships as a pooch-cached test fixture. Download and extract it:

python -c "
import copick.tests.conftest as c
c.TOTO.fetch('sample_project.zip', processor=__import__('pooch').Unzip())
print(c.OZ / 'sample_project')
"

Alternatively, create a minimal project by hand — any copick filesystem project with runs / picks / tomograms will work.

2. Export the Croissant

From Python:

import copick
from copick.ops.croissant import export_croissant

root = copick.from_file("path/to/sample_project/filesystem.json")

export_croissant(
    root,
    project_root="path/to/sample_project",
    base_url="file://path/to/sample_project",  # local use
    dataset_name="Sample copick project",
    description="Cryo-ET data with picks and segmentations.",
    license="CC-BY-4.0",
)

Or from the CLI:

copick config export-croissant \
    --config path/to/sample_project/filesystem.json \
    --project-root path/to/sample_project \
    --base-url file://path/to/sample_project

The command writes path/to/sample_project/Croissant/:

Croissant/
    metadata.json
    runs.csv
    voxel_spacings.csv
    tomograms.csv
    features.csv
    picks.csv
    meshes.csv
    segmentations.csv
    objects.csv

Every CSV gets a real sha256 recorded in metadata.json, and per-file sha256 hashes are embedded as columns in picks.csv and meshes.csv.

3. Inspect the output

Open Croissant/metadata.json in your editor. Key fields:

@context: the canonical Croissant 1.1 vocabulary plus our copick: namespace additions.
copick:baseUrl: the absolute URL that CSV url columns resolve relative to.
copick:config: the CopickConfig as an embedded JSON blob — pickable objects, user id, etc.
distribution: 8 cr:FileObject entries (one per CSV) with sha256.
recordSet: 8 cr:RecordSet entries, one per artifact type, each sourcing its fields from a CSV column.

Open Croissant/picks.csv — you should see one row per existing pick:

run,user_id,session_id,object_name,url,sha256
run_001,alice,42,ribosome,ExperimentRuns/run_001/Picks/alice_42_ribosome.json,e14c...

4. Open the project through the mlcroissant backend

import copick

root = copick.from_croissant(
    "path/to/sample_project/Croissant/metadata.json",
)

print(f"Project: {root.config.name} (mode {root.mode})")
for run in root.runs:
    print(f"  {run.name}: "
          f"{len(run.voxel_spacings)} VS, "
          f"{len(run.picks)} picks, "
          f"{len(run.meshes)} meshes, "
          f"{len(run.segmentations)} segs")

Because copick:baseUrl is a writable file:// URL, the backend enters Mode A (self-contained). Writes will auto-sync to the Croissant CSVs.

5. Add a new pick with live auto-sync

from copick.models import CopickLocation, CopickPoint

run = root.get_run("run_001")
picks = run.new_picks(object_name="ribosome", user_id="bob", session_id="99")
picks.points = [
    CopickPoint(location=CopickLocation(x=100.0, y=200.0, z=300.0)),
    CopickPoint(location=CopickLocation(x=110.0, y=210.0, z=310.0)),
]
picks.store()

After store():

A new JSON file is written at path/to/sample_project/ExperimentRuns/run_001/Picks/bob_99_ribosome.json.
A new row is appended to Croissant/picks.csv (with a freshly computed sha256).
The picks-csv entry in Croissant/metadata.json gets its sha256 refreshed.

Re-opening the project from a fresh Python process confirms the update:

root2 = copick.from_croissant("path/to/sample_project/Croissant/metadata.json")
print([p.user_id for p in root2.get_run("run_001").picks])
# ['alice', 'bob']

6. Bulk imports with `batch()`

If you're about to write many artifacts in a tight loop, rewriting metadata.json after every call is wasteful. Wrap the loop in root.batch() to defer the flush:

with root.batch():
    for i, coords in enumerate(my_predicted_points):
        picks = run.new_picks(object_name="ribosome", user_id="model", session_id=str(i))
        picks.points = [CopickPoint(location=CopickLocation(*coords))]
        picks.store()
# metadata.json is rewritten exactly once here, on context exit.

7. Mode B — remote Croissant, local overlay

If someone published the Croissant to an HTTPS or public S3 location and you want to annotate locally without mutating the published data, provide a separate overlay_root:

root = copick.from_croissant(
    "https://data.example.org/sample_project/Croissant/metadata.json",
    overlay_root="/tmp/my_local_overlay",
)
# root.mode == "B"

Writes now land under /tmp/my_local_overlay/ExperimentRuns/...; the remote Croissant is not modified. To publish your overlay as a new Croissant:

copick config export-croissant \
    --config path/to/my/croissant-mode-b.json \
    --project-root /tmp/my_merged_project \
    --base-url https://mymirror.example.org/merged

This rebuilds a fresh Croissant from the merged filesystem + overlay view.

8. Republishing the project

The exporter never touches data files under ExperimentRuns/ or Objects/ — only Croissant/ is (re)written. To publish:

tar czf sample_project.tar.gz sample_project/
# or
aws s3 sync sample_project/ s3://my-bucket/sample_project/

Consumers fetch the archive, point copick.from_croissant at <project>/Croissant/metadata.json, and get an immediately usable copick project.