CZ cryoET Data Portal

data-portal-project data-portal-project
copick project setup static data from the CZ cryoET data portal.

copick has direct integration with the CZ cryoET Data Portal python API. This allows users to access data from the portal and create new annotations for data portal tomograms. Datasets to be curated can be selected by dataset ID.

The data portal project is a special project type that is created by setting the cryoet-data-portal configuration type. This project type is can be used with any other overlay-backend. Choose one below for more information on how to set up your project.

Choose your overlay backend:

Cofiguration Template
{
    "name": "Example Project",
    "description": "This is an example project, demonstrating overlaying a local-backend on a CZ cryoET Data Portal dataset.",
    "version": "0.5.0",
    "pickable_objects": [
        {
            "name": "ribosome",
            "is_particle": true,
            "label": 1,
            "color": [
                0,
                117,
                220,
                255
            ],
            "go_id": "GO:0022626",
            "radius": 150.0
        },
        {
            "name": "atpase",
            "is_particle": true,
            "label": 2,
            "color": [
                251,
                192,
                147,
                255
            ],
            "go_id": "GO:0045259",
            "radius": 150.0
        },
        {
            "name": "membrane",
            "is_particle": false,
            "label": 3,
            "color": [
                200,
                200,
                200,
                255
            ],
            "go_id": "GO:0016020",
            "radius": 10.0
        }
    ],
    "user_id": "example.user",
    "config_type": "cryoet_data_portal",
    "overlay_root": "local:///path/to/copick_project/",
    "dataset_ids": [
        10301,
        10302
    ],
    "overlay_fs_args": {
        "auto_mkdir": true
    }
}

Set up your overlay project

This directory will contain all newly created data for your project.

Make sure it exists on the filesystem and is writable:

touch /path/to/copick_project
# Replace /path/to/copick_project with the path to your project overlay

If it does not yet exist, create it with the following command:

mkdir /path/to/copick_project
# Replace /path/to/copick_project with the path to your project overlay

In the config file, the location should be passed to the overlay_root-field. Any arguments specified to the overlay_fs_args-field will be passed to LocalFileSystem.

{
  "overlay_root": "local:///path/to/copick_project",
  "overlay_fs_args": {
    "auto_mkdir": true
  }
}
More about overlay_fs_args ...

The auto_mkdir-flag is necessary to create copick-directories if they do not yet exist.

Set up your static project

In the case of CZ cryoET data portal datasets, setting up the static project is as easy as specifying one or multiple dataset IDs. The below example selects runs from datasets 10301 and 10302.

{
  "dataset_ids": [
    10301,
    10302
  ]
}

Configuration Type

When using the CZ cryoET data portal, the config_type-field should be set to cryoet-data-portal.

Cofiguration Template
{
    "name": "Example Project",
    "description": "This is an example project, demonstrating overlaying a s3-backend on a CZ cryoET Data Portal dataset.",
    "version": "0.5.0",
    "pickable_objects": [
        {
            "name": "ribosome",
            "is_particle": true,
            "label": 1,
            "color": [
                0,
                117,
                220,
                255
            ],
            "go_id": "GO:0022626",
            "radius": 150.0
        },
        {
            "name": "atpase",
            "is_particle": true,
            "label": 2,
            "color": [
                251,
                192,
                147,
                255
            ],
            "go_id": "GO:0045259",
            "radius": 150.0
        },
        {
            "name": "membrane",
            "is_particle": false,
            "label": 3,
            "color": [
                200,
                200,
                200,
                255
            ],
            "go_id": "GO:0016020",
            "radius": 10.0
        }
    ],
    "user_id": "example.user",
    "config_type": "cryoet_data_portal",
    "overlay_root": "s3://bucket/copick_project/",
    "dataset_ids": [
        10301,
        10302
    ],
    "overlay_fs_args": {
        "profile": "your_profile"
    }
}

Set up your overlay project

This S3 URI will contain all newly created data for your project.

Make sure the intended S3 bucket is writable:

echo "Hello, World!" > test.txt
aws s3 cp test.txt s3://your-bucket-name/copick_project/test.txt
aws s3 ls s3://your-bucket-name/copick_project/
aws s3 rm s3://your-bucket-name/copick_project/test.txt
# Replace s3://your-bucket-name/copick_project/ with your S3 URI

AWS authentication

Make sure you have the necessary AWS credentials set up and available in the shell you're running the above commands in. Refer to the AWS CLI documentation for more information.

In the config file, the location should be passed to the overlay_root-field. Any arguments specified to the overlay_fs_args-field will be passed to S3FileSystem. profile should be one of the profiles set up in your ~/.aws/credentials file.

{
  "overlay_root": "s3://bucket-name/copick_project/",
  "overlay_fs_args": {
        "profile": "example_profile"
    }
}
More about overlay_fs_args ...

Specifying profile is one possible way of setting up AWS credentials. Refer to the S3FS documentation for detailed information.

For local MinIO buckets, the following config may be appropriate:

{
    "overlay_fs_args": {
        "key":"bucketkey",
        "secret":"bucketsecret",
        "endpoint_url":"http://10.30.121.49:7070",
        "client_kwargs":{
            "region_name":"us-east-1"
        }
}

Set up your static project

In the case of CZ cryoET data portal datasets, setting up the static project is as easy as specifying one or multiple dataset IDs. The below example selects runs from datasets 10301 and 10302.

{
  "dataset_ids": [
    10301,
    10302
  ]
}

Configuration Type

When using the CZ cryoET data portal, the config_type-field should be set to cryoet-data-portal.

Cofiguration Template
{
    "name": "Example Project",
    "description": "This is an example project, demonstrating overlaying a smb-backend on a CZ cryoET Data Portal dataset.",
    "version": "0.5.0",
    "pickable_objects": [
        {
            "name": "ribosome",
            "is_particle": true,
            "label": 1,
            "color": [
                0,
                117,
                220,
                255
            ],
            "go_id": "GO:0022626",
            "radius": 150.0
        },
        {
            "name": "atpase",
            "is_particle": true,
            "label": 2,
            "color": [
                251,
                192,
                147,
                255
            ],
            "go_id": "GO:0045259",
            "radius": 150.0
        },
        {
            "name": "membrane",
            "is_particle": false,
            "label": 3,
            "color": [
                200,
                200,
                200,
                255
            ],
            "go_id": "GO:0016020",
            "radius": 10.0
        }
    ],
    "user_id": "example.user",
    "config_type": "cryoet_data_portal",
    "overlay_root": "smb:///shared_drive/copick_project/",
    "dataset_ids": [
        10301,
        10302
    ],
    "overlay_fs_args": {
        "host": "192.158.1.38",
        "username": "user.name",
        "password": "1234",
        "temppath": "/shared_drive",
        "auto_mkdir": true
    }
}

Set up your overlay project

This SMB-share will contain all newly created data for your project.

In the config file, the location should be passed to the overlay_root-field. Any arguments specified to the overlay_fs_args-field will be passed to SMBFileSystem.

{
    "overlay_root": "smb:///shared_drive/copick_project/",
    "overlay_fs_args": {
        "host": "192.158.1.38",
        "username": "user.name",
        "password": "1234",
        "temppath": "/shared_drive",
        "auto_mkdir": true,
    }
}
More about overlay_fs_args ...

The auto_mkdir-flag is necessary to create copick-directories if they do not yet exist. The tmpath-flag is not strictly necessary, this depends on your SMB setup (e.g. if only a specific directory is shared).

{
    "overlay_root": "smb:///shared_drive/copick_project/",
    "overlay_fs_args": {
        "host": "192.158.1.38",
        "username": "user.name",
        "password": "1234",
        "auto_mkdir": true,
    }
}

Set up your static project

In the case of CZ cryoET data portal datasets, setting up the static project is as easy as specifying one or multiple dataset IDs. The below example selects runs from datasets 10301 and 10302.

{
  "dataset_ids": [
    10301,
    10302
  ]
}

Configuration Type

When using the CZ cryoET data portal, the config_type-field should be set to cryoet-data-portal.

Cofiguration Template
{
    "name": "Example Project",
    "description": "This is an example project, demonstrating overlaying a ssh-backend on a CZ cryoET Data Portal dataset.",
    "version": "0.5.0",
    "pickable_objects": [
        {
            "name": "ribosome",
            "is_particle": true,
            "label": 1,
            "color": [
                0,
                117,
                220,
                255
            ],
            "go_id": "GO:0022626",
            "radius": 150.0
        },
        {
            "name": "atpase",
            "is_particle": true,
            "label": 2,
            "color": [
                251,
                192,
                147,
                255
            ],
            "go_id": "GO:0045259",
            "radius": 150.0
        },
        {
            "name": "membrane",
            "is_particle": false,
            "label": 3,
            "color": [
                200,
                200,
                200,
                255
            ],
            "go_id": "GO:0016020",
            "radius": 10.0
        }
    ],
    "user_id": "example.user",
    "config_type": "cryoet_data_portal",
    "overlay_root": "ssh:///hpc/storage/copick_project/",
    "dataset_ids": [
        10301,
        10302
    ],
    "overlay_fs_args": {
        "username": "user.name",
        "host": "hpc.example.com",
        "port": 22
    }
}

Set up your overlay project

This directory will contain all newly created data for your project.

SSH authentication

Copick will work best via SSH if you have set up passwordless SSH authentication. Refer to the SSH documentation for more information. In general, adding plain text passwords into copick configuration files is strongly discouraged.

In cases of mandatory 2-FA authentication, you may need to set up an SSH tunnel to the remote filesystem, e.g.

ssh -L 2222:localhost:22 user.name@hpc.example.com
and then use localhost:2222 as the host in the config and commands below.

Make sure it exists on the remote filesystem and is writable:

ssh -p 22 user.name@hpc.example.com "touch /path/to/copick_project"
# Replace port, user name and path to the project overlay with the correct values

If it does not yet exist, create it with the following command:

ssh -p 22 user.name@hpc.example.com "mkdir /path/to/copick_project"
# Replace port, user name and path to the project overlay with the correct values

In the config file, the location should be passed to the overlay_root-field. Any arguments specified to the overlay_fs_args-field will be passed to sshfs.SSHFileSystem.

{
  "overlay_root": "ssh:///path/to/copick_project/",

    "overlay_fs_args": {
        "username": "user.name",
        "host": "hpc.example.com",
        "port": 22
    }
}
More about overlay_fs_args ...

The username, host and port-fields are necessary to set up the SSH connection. Refer to the SSHFS documentation for detailed information.

An easy way to use the SSH filesystem is to tunnel to the remote filesystem via SSH, e.g.

ssh -L 2222:localhost:22 user.name@hpc.example.com

and then use localhost:2222 as the host in the config and commands above.

{
  "overlay_fs_args": {
      "username": "user.name",
      "host": "localhost",
      "port": 2222
  }
}

Set up your static project

In the case of CZ cryoET data portal datasets, setting up the static project is as easy as specifying one or multiple dataset IDs. The below example selects runs from datasets 10301 and 10302.

{
  "dataset_ids": [
    10301,
    10302
  ]
}

Configuration Type

When using the CZ cryoET data portal, the config_type-field should be set to cryoet-data-portal.