Skip to content

Stats Operations

The copick.ops.stats module provides statistical analysis functions for Copick projects. These functions generate comprehensive statistics about picks, meshes, and segmentations including counts, distributions, and frequency analysis.

Functions

copick.ops.stats.picks_stats

picks_stats(root: Union[str, CopickRoot], runs: Union[str, CopickRun, Iterable[str], Iterable[CopickRun], None] = None, user_id: Union[str, Iterable[str], None] = None, session_id: Union[str, Iterable[str], None] = None, object_name: Union[str, Iterable[str], None] = None, parallel: bool = False, workers: Optional[int] = 8, show_progress: bool = True) -> Dict[str, Union[int, Dict[str, int]]]

Generate statistics for picks in a Copick project.

Parameters:

  • root (Union[str, CopickRoot]) –

    The Copick project root.

  • runs (Union[str, CopickRun, Iterable[str], Iterable[CopickRun], None], default: None ) –

    The runs to query picks from. If None, query all runs.

  • user_id (Union[str, Iterable[str], None], default: None ) –

    The user ID of the picks. If None, query all users.

  • session_id (Union[str, Iterable[str], None], default: None ) –

    The session ID of the picks. If None, query all sessions.

  • object_name (Union[str, Iterable[str], None], default: None ) –

    The pickable object of the picks. If None, query all picks.

  • parallel (bool, default: False ) –

    Whether to query picks in parallel. Default is False.

  • workers (Optional[int], default: 8 ) –

    The number of workers to use. Default is 8.

  • show_progress (bool, default: True ) –

    Whether to show progress. Default is True.

Returns:

  • Dict[str, Union[int, Dict[str, int]]]

    A dictionary containing total pick count and distribution statistics.

copick.ops.stats.meshes_stats

meshes_stats(root: Union[str, CopickRoot], runs: Union[str, CopickRun, Iterable[str], Iterable[CopickRun], None] = None, user_id: Union[str, Iterable[str], None] = None, session_id: Union[str, Iterable[str], None] = None, object_name: Union[str, Iterable[str], None] = None, parallel: bool = False, workers: Optional[int] = 8, show_progress: bool = True) -> Dict[str, Union[int, Dict[str, int]]]

Generate statistics for meshes in a Copick project.

Parameters:

  • root (Union[str, CopickRoot]) –

    The Copick project root.

  • runs (Union[str, CopickRun, Iterable[str], Iterable[CopickRun], None], default: None ) –

    The runs to query meshes from. If None, query all runs.

  • user_id (Union[str, Iterable[str], None], default: None ) –

    The user ID of the meshes. If None, query all users.

  • session_id (Union[str, Iterable[str], None], default: None ) –

    The session ID of the meshes. If None, query all sessions.

  • object_name (Union[str, Iterable[str], None], default: None ) –

    The pickable object of the meshes. If None, query all meshes.

  • parallel (bool, default: False ) –

    Whether to query meshes in parallel. Default is False.

  • workers (Optional[int], default: 8 ) –

    The number of workers to use. Default is 8.

  • show_progress (bool, default: True ) –

    Whether to show progress. Default is True.

Returns:

  • Dict[str, Union[int, Dict[str, int]]]

    A dictionary containing mesh count and frequency statistics.

copick.ops.stats.segmentations_stats

segmentations_stats(root: Union[str, CopickRoot], runs: Union[str, CopickRun, Iterable[str], Iterable[CopickRun], None] = None, user_id: Union[str, Iterable[str], None] = None, session_id: Union[str, Iterable[str], None] = None, is_multilabel: bool = None, name: Union[str, Iterable[str], None] = None, voxel_size: Union[float, Iterable[float], None] = None, parallel: bool = False, workers: Optional[int] = 8, show_progress: bool = True) -> Dict[str, Union[int, Dict[str, int]]]

Generate statistics for segmentations in a Copick project.

Parameters:

  • root (Union[str, CopickRoot]) –

    The Copick project root.

  • runs (Union[str, CopickRun, Iterable[str], Iterable[CopickRun], None], default: None ) –

    The runs to query segmentations from. If None, query all runs.

  • user_id (Union[str, Iterable[str], None], default: None ) –

    The user ID of the segmentations. If None, query all users.

  • session_id (Union[str, Iterable[str], None], default: None ) –

    The session ID of the segmentations. If None, query all sessions.

  • is_multilabel (bool, default: None ) –

    Whether the segmentations are multilabel. If None, query all segmentations.

  • name (Union[str, Iterable[str], None], default: None ) –

    The name of the segmentation. If None, query all segmentations.

  • voxel_size (Union[float, Iterable[float], None], default: None ) –

    The voxel size of the segmentation. If None, query all segmentations.

  • parallel (bool, default: False ) –

    Whether to query segmentations in parallel. Default is False.

  • workers (Optional[int], default: 8 ) –

    The number of workers to use. Default is 8.

  • show_progress (bool, default: True ) –

    Whether to show progress. Default is True.

Returns:

  • Dict[str, Union[int, Dict[str, int]]]

    A dictionary containing segmentation count and frequency statistics.

Usage Examples

Picks Statistics

from copick.ops.stats import picks_stats
from copick.impl.filesystem import CopickRootFSSpec

# Open a project
root = CopickRootFSSpec.from_file("config.json")

# Get comprehensive picks statistics
all_picks_stats = picks_stats(root)
print(f"Total picks: {all_picks_stats['total_picks']}")
print(f"Distribution by run: {all_picks_stats['distribution_by_run']}")

# Get statistics for specific runs and objects
filtered_stats = picks_stats(
    root=root,
    runs=["experiment_001", "experiment_002"],
    object_name=["ribosome", "proteasome"],
    user_id=["annotator_1"]
)

# Enable parallel processing for large projects
parallel_stats = picks_stats(
    root=root,
    parallel=True,
    workers=12,
    show_progress=True
)

Meshes Statistics

from copick.ops.stats import meshes_stats

# Get mesh statistics with filtering
mesh_stats = meshes_stats(
    root=root,
    runs=["experiment_001"],
    user_id=["modeler_1", "modeler_2"],
    object_name=["ribosome"]
)

print(f"Total meshes: {mesh_stats['total_meshes']}")
print(f"Distribution by user: {mesh_stats['distribution_by_user']}")
print(f"Frequent combinations: {mesh_stats['session_user_object_combinations']}")

# Get statistics for all meshes with parallel processing
all_mesh_stats = meshes_stats(root, parallel=True)

Segmentations Statistics

from copick.ops.stats import segmentations_stats

# Get segmentation statistics with comprehensive filtering
seg_stats = segmentations_stats(
    root=root,
    runs=["experiment_001"],
    user_id=["segmentation_model"],
    name=["membrane", "organelle"],
    voxel_size=[10.0, 20.0],
    is_multilabel=True
)

print(f"Total segmentations: {seg_stats['total_segmentations']}")
print(f"Distribution by name: {seg_stats['distribution_by_name']}")
print(f"Distribution by voxel size: {seg_stats['distribution_by_voxel_size']}")
print(f"Distribution by multilabel: {seg_stats['distribution_by_multilabel']}")

# Analyze combination frequencies
combinations = seg_stats['session_user_voxelspacing_multilabel_combinations']
most_frequent = max(combinations.items(), key=lambda x: x[1])
print(f"Most frequent combination: {most_frequent[0]} ({most_frequent[1]} occurrences)")

Return Value Structure

Picks Statistics

{
    "total_picks": int,                    # Total number of individual pick points
    "total_pick_files": int,               # Total number of pick files
    "distribution_by_run": {               # Pick count per run
        "run_name": pick_count
    },
    "distribution_by_user": {              # Pick count per user
        "user_id": pick_count
    },
    "distribution_by_session": {           # Pick count per session
        "session_id": pick_count
    },
    "distribution_by_object": {            # Pick count per object
        "object_name": pick_count
    }
}

Meshes Statistics

{
    "total_meshes": int,                   # Total number of mesh files
    "distribution_by_user": {              # Mesh count per user
        "user_id": mesh_count
    },
    "distribution_by_session": {           # Mesh count per session
        "session_id": mesh_count
    },
    "distribution_by_object": {            # Mesh count per object
        "object_name": mesh_count
    },
    "session_user_object_combinations": {  # Frequency of specific combinations
        "session_user_object": frequency
    }
}

Segmentations Statistics

{
    "total_segmentations": int,            # Total number of segmentation files
    "distribution_by_user": {              # Segmentation count per user
        "user_id": segmentation_count
    },
    "distribution_by_session": {           # Segmentation count per session
        "session_id": segmentation_count
    },
    "distribution_by_name": {              # Segmentation count per name
        "name": segmentation_count
    },
    "distribution_by_voxel_size": {        # Segmentation count per voxel size
        "voxel_size": segmentation_count
    },
    "distribution_by_multilabel": {        # Segmentation count by multilabel status
        "True/False": segmentation_count
    },
    "session_user_voxelspacing_multilabel_combinations": {  # Frequency of specific combinations
        "session_user_name_voxelsize_multilabel": frequency
    }
}

Performance Considerations

Parallel Processing

All stats functions support parallel processing for improved performance with large projects:

# Enable parallel processing with custom worker count
stats = picks_stats(
    root=root,
    parallel=True,
    workers=16,  # Adjust based on your system
    show_progress=True
)