Skip to content

Data Model

Data Entities

Root

The project's root. This is the entry point for the copick API. It allows access to information about the pickable objects and runs contained in the project.

Example Code - Print available objects and runs
"""Print all objects and runs in a copick project."""

import copick

# Initialize the root object from a configuration file
root = copick.from_file("path/to/config.json")

# List all available objects
obj_info = [(o.name, o.label) for o in root.objects.values()]

print("Pickable objects in this project:")
for name, label in obj_info:
    print(f"  {name}: {label}")

# Execute a function on each run in the project
runs = root.runs

print("Runs in this project:")
for run in runs:
    print(f"Run: {run.name}")
    # Do something with the run

Refer to the API Reference for more information on the CopickRoot API.

Pickable Object

Objects are any entity that can be labeled inside a 3D image using points, meshes or dense segmentation masks. In most cases, these will be macromolecular complexes or other cellular structures, like membranes. They can also be more abstract entities like "contamination particles", "carbon edges", or "sample boundaries".

In the configuration file, each object is defined by a JSON object, that allows the user to specify the object's name, label, color, radius, and other properties.

Naming Conventions

Object names should never contain underscores!

Example Code - Read an object's density map.
"""Read a density map from an object's zarr-store into a numpy array."""

import copick
import numpy as np
import zarr

# Initialize the root object from a configuration file
root = copick.from_file("path/to/config.json")

# Get the object named 'proteasome'
proteasome = root.get_object("proteasome")

# Read the density map for the object from its zarr-store
zarr_array = zarr.open(proteasome.zarr())["0"]
density_map = np.array(zarr_array)

Refer to the API Reference for more information on the CopickObject API.

Example Object Definition

The following is an example of a pickable object definition in the configuration file:

{
    "name": "proteasome",
    "is_particle": true,
    "pdb_id": "3J9I",
    "emdb_id": "1234",
    "identifier": "GO:0001234",
    "label": 1,
    "color": [255, 0, 0, 255],
    "radius": 60,
    "map_threshold": 0.0418
}

  • name: The name of the object, which should be unique within one project.
  • is_particle: A boolean indicating whether the object can be represented by point annotations. By default, all objects can be represented by mesh annotations or dense segmentations.

  • pdb_id: The PDB ID of the object, if available.

  • emdb_id: The EMDB ID of the object, if available.
  • identifier: The GO ID of the object or a UniProtKB accession, if available. When using the data portal, this field is used to find matching annotations in the data portal.
  • label: An integer that indicates which numeric label should be used in segmentations to represent this object.
  • color: An array of four integers that represent the RGBA color of the object when rendered in a 3D viewer.
  • radius: An integer that represents the radius of the object in angstroms. This is used to determine the size of the object when rendering it as a sphere in a 3D viewer.
  • map_threshold: A float that represents the threshold value to use when a density map is used to represent the object. This is used to determine the isosurface level to use when rendering the object as a mesh. Density maps are discovered by the copick API by looking for files with the same name as the object in the Objects directory of the project's root.

Run

A run is a collection of data that is associated with a particular location on the sample. Run objects allow access to any 3D image data, segmentations, and annotations that are associated with a particular location on the sample. Images are stored in groups based on their voxel spacing, while point annotations, mesh annotations, and dense segmentations are related to the run as a whole.

Example Code - List available segmentations for a run
"""Print all segmentations for a run in a copick project."""

import copick

# Initialize the root object from a configuration file
root = copick.from_file("path/to/config.json")

# Get the run named 'TS_001'
run = root.get_run("TS_001")

# List all available segmentations for the run
segmentations = run.segmentations
for segmentation in segmentations:
    print(f"Segmentation: {segmentation.name}")

Refer to the API Reference for more information on the CopickRun API.

Voxel Spacing

A voxel spacing groups together all tomograms of a particular resolution. Voxel spacings are rounded to the third decimal place.

Example Code - List available tomograms for a voxel spacing
"""Print the list of tomograms for a given voxel spacing."""

import copick

# Initialize the root object from a configuration file
root = copick.from_file("path/to/config.json")

# Get the run named 'TS_001'
run = root.get_run("TS_001")

# Get the voxel spacing with a resolution of 10 angstroms
voxel_spacing = run.get_voxel_spacing(10.000)

# List all available tomograms for the voxel spacing
tomograms = voxel_spacing.tomograms
for tomogram in tomograms:
    print(f"Tomogram: {tomogram.name}")

Refer to the API Reference for more information on the CopickVoxelSpacing API.

Image data

Tomogram

At each resolution, multiple tomograms can be stored. Tomograms are stored as OME-NGFF files, which are a zarr-based format that allows for efficient access to multiscale 3D image data. The filename of the zarr file allows relating the image to its reconstruction method or processing steps. Typical useful tomogram types are wbp, sirt, denoised, etc.

Example Code - Read a tomogram into a numpy array
"""Read a tomogram from a zarr-store into a numpy array."""

import copick
import numpy as np
import zarr

# Initialize the root object from a configuration file
root = copick.from_file("path/to/config.json")

# Get the run named 'TS_001'
run = root.get_run("TS_001")

# Get the voxel spacing with a resolution of 10 angstroms
voxel_spacing = run.get_voxel_spacing(10.000)

# Get the tomogram named 'wbp'
tomogram = voxel_spacing.get_tomogram("wbp")

# Read the tomogram from its zarr-store
# Scale "0" is the unbinned tomogram
zarr_array = zarr.open(tomogram.zarr())["0"]
tomogram_data = np.array(zarr_array)

# Scale "1" is the tomogram binned by 2
zarr_array_bin2 = zarr.open(tomogram.zarr())["1"]
tomogram_data_bin2 = np.array(zarr_array_bin2)

Refer to the API Reference for more information on the CopickTomogram API.

Example tomogram file name

Tomograms are named according to the following pattern:

wbp.zarr

The wbp part of the filename is the type of tomogram. This could be wbp, sirt, denoised, etc.

Feature Map

Feature maps are stored as OME-NGFF files with relation to the tomogram they are computed from. Feature maps are stored as zarr files, and can be used to store any type of data that is computed from a tomogram. They may be useful for interactive segmentation tasks.

Example Code - Read a feature map into a numpy array
"""Read a feature map from a zarr-store into a numpy array."""

import copick
import numpy as np
import zarr

# Initialize the root object from a configuration file
root = copick.from_file("path/to/config.json")

# Get the run named 'TS_001'
run = root.get_run("TS_001")

# Get the voxel spacing with a resolution of 10 angstroms
voxel_spacing = run.get_voxel_spacing(10.000)

# Get the tomogram named 'wbp'
tomogram = voxel_spacing.get_tomogram("wbp")

# Get the feature map named 'sobel'
feature_map = tomogram.get_features("sobel")

# Read the feature map from its zarr-store
zarr_array = zarr.open(feature_map.zarr())["0"]
feature_map_data = np.array(zarr_array)

Refer to the API Reference for more information on the CopickFeatures API.

Example feature map file name

Feature maps are named according to the following pattern:

wbp_density_features.zarr

The wbp part of the filename is the type of tomogram that the feature map was computed from. The sobel part of the filename is the type of feature that the feature map represents. This could be density, gradient, curvature, etc.

Annotation data

Point Annotations

Point annotations are stored as JSON files in the Picks directory of the run. Each file contains a list of points in angstrom coordinates that represent the location of a particular object in the tomogram. The filename of the JSON file allows relating the points to the user or tool that created them, as well as the object that they represent.

Naming Conventions

user_ids, session_ids, and object names should never contain underscores!

Example Code - Read point annotations from copick
"""Read points from a CopickPicks object."""

import copick
import numpy as np

# Initialize the root object from a configuration file
root = copick.from_file("path/to/config.json")

# Get the first run in the project
run = root.runs[0]

# Get 'proteasome' picks of user 'alice'
picks = run.get_picks(object_name="proteasome", user_id="alice")[0]

# Get the points from the picks
point_arr = np.ndarray((len(picks.points), 3))
for idx, pt in enumerate(picks.points):
    point_arr[idx, :] = [pt.location.x, pt.location.y, pt.location.z]

Refer to the API Reference for more information on the CopickPicks API.

Example point file name

Point files are named according to the following pattern:

good.picker_0_proteasome.json

The good.picker part of the filename is the user or tool that created the points. The 0 part of the filename is the session id of the user or tool that created the points. The proteasome part of the filename is the name of the object that the points represent.

Mesh Annotations

Mesh annotations are stored as glb files in the Meshes directory of the run. Each file contains a 3D mesh, with vertices in angstrom coordinates, that represents the shape of a particular object in the tomogram. The filename of the glb file allows relating the mesh to the user or tool that created it, as well as the object that it represents.

Naming Conventions

user_ids, session_ids, and object names should never contain underscores!

Example Code - Read mesh annotations and visualize them in 3D
"""Read a mesh from a CopickMesh object and display it."""

import copick

# Initialize the root object from a configuration file
root = copick.from_file("path/to/config.json")

# Get the first run in the project
run = root.runs[0]

# Get a membrane mesh from user 'bob'
mesh = run.get_meshes(object_name="membrane", user_id="bob")[0]

# Show the mesh
mesh.mesh.show()

Refer to the API Reference for more information on the CopickMesh API.

Example mesh file name

Mesh files are named according to the following pattern:

good.picker_0_proteasome.glb

The good.picker part of the filename is the user or tool that created the mesh. The 0 part of the filename is the session id of the user or tool that created the mesh. The proteasome part of the filename is the name of the object that the mesh represents.

Dense Segmentations

Dense segmentations are stored as OME-NGFF files in the Segmentations directory of the run. Each can either contain a binary segmentation (values of 0 or 1) or a multilabel segmentation (where permissable labels are defined by the labels among the pickable objects). The filename of the zarr file allows relating the segmentation to the user or tool that created it, as well as the object that it represents.

Naming Conventions

user_ids, session_ids, and object names should never contain underscores!

Example Code - Read a segmentation into a numpy array
"""Read a segmentation from a CopickSegmentation object."""

import copick
import numpy as np
import zarr

# Initialize the root object from a configuration file
root = copick.from_file("path/to/config.json")

# Get the first run in the project
run = root.runs[0]

# Get 'proteasome' segmentation of user 'alice'
segmentation = run.get_segmentations(object_name="proteasome", user_id="alice")[0]

# Get the segmentation array from the segmentation
seg_zarr = zarr.open(segmentation.zarr())["0"]
seg = np.array(seg_zarr)

Refer to the API Reference for more information on the CopickSegmentation API.

Example segmentation file names

Segmentation files are named according to the following pattern:

10.000_good.picker_0_proteasome.zarr

The 10.000 part of the filename is the voxel spacing of the tomogram that the segmentation was created from. The good.picker part of the filename is the user or tool that created the segmentation. The 0 part of the filename is the session id of the user or tool that created the segmentation. The proteasome part of the filename is the name of the object that the segmentation represents. This is a binary segmentation.

10.000_good.picker_0_segmentation-multilabel.zarr

The 10.000 part of the filename is the voxel spacing of the tomogram that the segmentation was created from. The good.picker part of the filename is the user or tool that created the segmentation. The 0 part of the filename is the session id of the user or tool that created the segmentation. The segmentation part of the filename is an arbitrary name that describes the segmentation. This is a multilabel segmentation, thus all objects in the project could be represented in this segmentation.

On-disk Data Model

The on-disk data model of copick is as follows:

📁 copick_root
├─ 📄 copick_config.json
├─ 📁 Objects
│  └─ 📄 [pickable_object_name].zarr
└─ 📁 ExperimentRuns
   └─ 📁 [run_name] (index: src/io/copick_models.py:CopickPicks.runs)
      ├─ 📁 VoxelSpacing[xx.yyy]/
      │  ├─ 📁 [tomotype].zarr/
      │  │  └─ [OME-NGFF spec at 100%, 50% and 25% scale]
      │  └─ 📁 [tomotype]_[feature_type]_features.zarr/
      │     └─ [OME-NGFF spec at 100% scale]
      ├─ 📁 VoxelSpacing[x2.yy2]/
      │  ├─ 📁 [tomotype].zarr/
      │  │  └─ [OME-NGFF spec at 100%, 50% and 25% scale]
      │  └─ 📁 [tomotype]_[feature_type]_features.zarr/
      │     └─ [OME-NGFF spec at 100% scale]
      ├─ 📁 Picks/
      │  └─ 📄 [user_id | tool_name]_[session_id | 0]_[object_name].json
      ├─ 📁 Meshes/
      │  └─ 📄 [user_id | tool_name]_[session_id | 0]_[object_name].glb
      └─ 📁 Segmentations/
         ├─ 📁 [xx.yyy]_[user_id | tool_name]_[session_id | 0]_[object_name].zarr
         │   └─ [OME-NGFF spec at 100% scale, 50% and 25% scale]
         └─ 📁 [xx.yyy]_[user_id | tool_name]_[session_id | 0]_[name]-multilabel.zarr
             └─ [OME-NGFF spec at 100% scale, 50% and 25% scale]