Exporting and Importing Data

For importing, there are currently two different tool-sets available in CATMAID. A front-end in Django’s admin interface is only available for importing project and stack information. If you want to import tracing data, you have to resort to the command line.

Exporting and importing neuron tracing data

Two management commands for Django’s manage.py tool are available in CATMAID that allow exporting and importing neuron tracing data. They are called catmaid_export_data and catmaid_import_data. To use them, you have to be in the virtualenv and it is probably easiest to work from the django/projects/ directory.

Exporting data

At the moment, the export command is able to create a JSON representation of neurons, connectors, tags and annotations. To constrain the exported neurons, annotations can be used. To export data, you have to use the catmaid_export_data command:

manage.py catmaid_export_data

Adding the --help option will show an overview over all available options. When called without any option, the command will ask the user for the project to export from and will start exporting the whole project right away. Use the additional options to be more precise about what should be exported.

Without any parameter, everything is exported. The type of data to be exported can be adjusted by the --notreenodes, --noconnectors, --noannotations and --notags parameters. To constrain the exported neurons, the --required-annotation option can be used. For instance, to export all neurons from the project with ID 1 that are annotated with “Kenyon cells”, one would have to call:

manage.py catmaid_export_data --source 1 --required-annotation "Kenyon cells"

This will create a file called export_pid_1.json.

Importing data

The JSON file generated in the previous section can be used to import data into a CATMAID instance. Currently, the importer won’t change the primary key IDs in the input data, so be aware of potential data loss if you import into an instance with existing data. For now, this is only practical to import data into a new CATMAID instance. To do this, the catmaid_import_data management command has to be used:

manage.py catmaid_import_data

You can use the --help switch to get an overview of the available options. Like the exporter, the importer will ask a user if it needs more information. Required are currently the source file and a user. The importer does currently not preserve ownership if the imported models. The user (and editor and reviewer) is needed to override the information in the source data set.

Assuming a file called export_pid_1.json is available and a new CATMAID project with ID 1 has been created, the following command will start the import:

manage.py catmaid_import_data --source export_pid_1.json --target 1

The tool will ask for a user to use for all data before it actually starts the import.

Importing project and stack information

Image data in CATMAID is referenced by stack mirrors, which belong to a particular stack. Stacks in turn are organized in projects. The data used by a stack mirror can have one of various types of data sources. A simple and often used source is a simple folder structure of tiled image data for each stack. To be accessible, a stack mirror’s image base has to give access to such a folder from the web. Of course, stacks, stack mirrors and projects can be created by hand, but there is also an importing tool available in Django’s admin interface. It can be found under Custom Views and is named Image data importer. For now, the importing tool only supports this standard data source.

Therefore, the importing tool expects a certain data folder layout to work on and also relies on so called project files (which are very simple) to identify potential projects. The next section will introduce the project file format and after that the data layout will be explained.

How to use the importing tool will be shown in the last section.

Project Files

If the importing tool encounters a folder with a file called project.yaml in it, it will look at it as a potential project. If this file is not available, the folder is ignored. However, if the file is there it gets parsed and if all information is found the tool is looking for, the project can be imported. So let’s assume we have a project with two stacks having one image data copy each in folder with the following layout:

project1/
  project.yaml
  stack1/
  stack2/

A project file contains the basic properties of a project and its associated stacks. It is a simple YAML file and could look like this for the example above:

project:
    title: "Wing Disc 1"
    stacks:
      - title: "Channel 1"
        description: "PMT Offset: 10, Laser Power: 0.5, PMT Voltage: 550"
        dimension: "(3886,3893,55)"
        resolution: "(138.0,138.0,1.0)"
        zoomlevels: 2
        mirrors:
          - title: "Channel 2 overlay"
            folder: "stack1"
            fileextension: "jpg"
      - title: "Channel 2"
        description: "PMT Offset: 10, Laser Power: 0.7, PMT Voltage: 500"
        dimension: "(3886,3893,55)"
        resolution: "(138.0,138.0,1.0)"
        zoomlevels: 2
        mirrors:
          - title: Channel 2 image data
            folder: "stack2"
            fileextension: "jpg"
        stackgroups:
          - title: "Example group"
            relation: "has_channel"
      - title: "Remote stack"
        dimension: "(3886,3893,55)"
        resolution: "(138.0,138.0,1.0)"
        zoomlevels: 3
        translation: "(10.0, 20.0, 30.0)"
        mirrors:
          - tile_width: 512
            tile_height: 512
            tile_source_type: 2
            fileextension: "png"
            url: "http://my.other.server.net/examplestack/"
        stackgroups:
          - title: "Example group"
            relation: "has_channel"

As can be seen, a project has only two properties: a name and a set of stacks. A stack, however, needs more information. In general, there are two ways to specify the data source for a folder: 1. an optional path and a folder, both together are expected to be relative to the IMPORTER_DEFAULT_IMAGE_BASE settings or 2. a url, which is used as a stack mirror’s image base.

The first stack in the example above is based on a folder in the same directory as the project file. The folder property names this image data folder for this stack, relative to the project file. The name of stack is stored in the title field and metadata (which is shown when a stack is displayed) can be added with the metadata property. A stack also needs dimensions and resolution information. Dimensions are the stacks X, Y and Z extent in pixel. The resolution should be in in nanometers per pixel, in X, Y and Z.

Additionally to the folder information, the second stack above uses the zoomlevels field to declare the number of available zoom levels. It also specifies the file extension of the image files with the fileextension key. Both fields are required.

The last stack in the example above doesn’t use a local stack folder, but declares the stack mirror’s image base explicitly by using the url setting. Like done for the folder based stacks, a url based stack mirror needs the tile_width, tile_height and tile_source_type fields. The corresponding stack defines the resolution and dimension fields.

CATMAID can link stacks to so called stack groups. These are general data structures that relate stacks to each other, for instance to denote that they represent channels of the same data, orthogonal views or simple overlays. There is no limit on how many stack groups a stack can be part of. Each stack in a project file can reference stack groups by title and the type of relation this stack has to this stack group. At the moment, valid relations are channel and view. All stacks referencing a stack group with the same name will be linked to the same new stack group in the new project. In the example above, a single stack group named “Example group” will be created, having stack 2 and 3 as members—each representing a layer/channel. Stack groups are used by the front-end to open multiple stacks at once in a more intelligent fashion (e.g. open multi-channel stack groups as layers in the same viewer).

All specified stacks within a project are linked into a single space. By default each stack origin is mapped to the project space origin (0,0,0). An optional translation can be applied to this mapping: If a stack has a translation field, the stack is mapped with this offset into project space. Note that this translation is in project space coordinates (physical space, nanometers). The example above will link the last stack (“Remote stack”) to the project “Wing Disc 1” with an offset of (10.0, 20.0, 30.0) nanometers. Both other stacks will be mapped to the project space origin.

Also, it wouldn’t confuse the tool if there is more YAML data in the project file than needed. It only uses what is depicted in the sample above. But please keep in mind to not use the tab character in the whitespace indentation (but simple spaces) as this isn’t allowed in YAML.

Ontology and classification import

The project files explained in the last section can also be used to import ontologies and classifications. While CATMAID supports arbitrary graphs to represent ontologies and classifications,only tree structures can be imported at the moment.

The project object supports an optional ontology field, which defines an ontology hierarchy with lists of lists. An optional classification field can be used to define a list of ontology paths that get instantiated based on the provided ontology. Classification fields require that an ontology is defined and can be used on project level, stack level and the stackgroup level. Consider this example:

project:
   title: "test"
   ontology:
     - class: 'Metazoa'
       children:
         - relation: 'has_a'
           class: 'Deuterostomia'
         - relation: 'has_a'
           class: 'Protostomia'
           children:
             - relation: 'has_a'
               class: 'Lophotrochozoa'
               children:
                 - relation: 'has_a'
                   class: 'Nematostella'
                   children:
                     - relation: 'has_a'
                       class: 'Lineus longissimus'
   stackgroups:
     - title: 'Test group'
       classification:
          - ['Metazoa', 'Protostomia', 'Lophotrochozoa', 'Nematostella', 'Lineus longissimus']
   stacks:
     - title: "Channel 1"
       description: "PMT Offset: 10, Laser Power: 0.5, PMT Voltage: 550"
       dimension: "(1024,1024,800)"
       resolution: "(2.0,2.0,1.0)"
       zoomlevels: 1
       translation: "(10.0, 20.0, 30.0)"
       classification:
          - ['Metazoa', 'Deuterostomia']
       mirrors:
          - title:  Channel 1
            url: "https://example.org/data/imagestack/"
            fileextension: "jpg"
     - title: "Channel 1"
       description: "PMT Offset: 10, Laser Power: 0.5, PMT Voltage: 550"
       dimension: "(1024,1024,800)"
       resolution: "(2.0,2.0,1.0)"
       zoomlevels: 1
       translation: "(10.0, 20.0, 30.0)"
       mirrors:
         - title: Channel 1
           url: "https://example.org/data/imagestack-sample-108/"
           fileextension: "jpg"
       stackgroups:
        - title: "Test group"
          relation: "has_channel"
     - title: "Channel 2"
       description: "PMT Offset: 10, Laser Power: 0.5, PMT Voltage: 550"
       dimension: "(1024,1024,800)"
       resolution: "(2.0,2.0,1.0)"
       zoomlevels: 1
       mirrors:
        - title: Channel 2
          folder: "Sample108_FIB_catmaid copy"
          fileextension: "jpg"
       stackgroups:
        - title: "Test group"
          relation: "has_channel"

The project level ontology definition represent an ontology with the root node “Metazoa” which has two children: “Deuterostomia” and “Protostomia”, connected through a “has_a” relation. While the first child is a leaf node and has no children, the second child has a child node as well (and so on). It is possible to have multiple roots (i.e. separate ontology graphs) and multiple children, both are lists.

Individual stacks and stackgroups are then allowed to instantiate a certain path of the ontology and be linked to the leaf node of the path. They do this by supporting a classification field. The example creates two classification paths and links one leaf node to the stack group and one to an individual stack.

Currently, the importer expects that those two classes are only related on the ontology level a single time. This allows for an easier file syntax with a simple list. An import will fail if the project defined ontology doesn’t contain a class used in a classification.

File and Folder Layout

The importing tool expects a certain file any folder layout to work with. It assumes that there is one data folder per CATMAID instance that is accessible from the outside world and is somehow referred to within a stack mirror’s image base (if referring to folders in the project file). As an example, let’s say a link named data has been placed in CATMAID’s httpdocs directory. This link links to your actual data storage and has a layout like the following:

data/
  project1/
  project2/
  project3/
  tests/
    project4/

Each project folder has contents similar to the example in the previous section. Due to having placed the link in the httpdocs directory it is already accessible under (if your webserver user has reading permissions on it):

http://<CATMAID-URL>/data

A typical URL to a tile of a stack could then look like this (if you use jpeg as the file extension):

http://<CATMAID-URL>/data/project1/stack1/0/0_0_0.jpeg

The importer uses this data directory or a folder below it as working directory. In this folder it treats every sub-directory as a potential project directory and tests if it contains a project file named project.yaml. If this file is found a folder remains potential project. A folder is ignored, though, when the project file is not available.

Importing skeletons through the API

The CATMAID API supports raw skeleton data import using SWC files. As can be seen under /apis, the {project_id}/skeletons/import URL can be used to import skeletons that are repesented as SWC. The script scripts/remote/upload_swc.py can be of help here. It is also possible to just use cURL for this:

curl --basic -u fly -X POST --form file=@<file-name> \
    <catmaid_url>/<project_id>/skeletons/import \
    --header "X-Authorization: Token <api-token>"

Using the importer admin tool

The import offers to import from local project files, remote CATMAID instances or remote project files/exports.

To use the importer with project files, you have to adjust your CATMAID settings file to make your data path known to CATMAID. This can be done with the CATMAID_IMPORT_PATH settings. Sticking to the examples from before, this setting might be:

CATMAID_IMPORT_PATH = <CATMAID-PATH>/httpdocs/data

For imported stack mirrors that don’t provide an image URL by themselves, CATMAID can construct an image base from the the IMPORTER_DEFAULT_IMAGE_BASE setting plus the imported project and stack names. For the example above, this variable could be set to:

IMPORTER_DEFAULT_IMAGE_BASE = http://<CATMAID-URL>/data

With this in place, the importer can be used through Django’s admin interface. It is listed as Image data importer under Custom Views. The first step is to give the importer more detail about which folders to look in for potential projects:

_images/path_setup.png

With these settings, you can narrow down the set of folders looked at. The relative path setting can be used to specify a sub-directory below the import path. When doing so, the working directory will be changed to CATMAID_IMPORT_PATH plus the relative path. If left empty, just the CATMAID_IMPORT_PATH setting will be used. Additionally, you can filter folders in tho working directory by specifying a filter term, which supports Unix shell-style wildcards. The next setting lets you decide how to deal with already existing (known) projects and what is considered known in the first place. A project is known can be declared to be known if the name of an imported project matches the name of an already existing one. Or, it can be considered known if if there is a project that is linked to the very same stacks like the project to be imported. A stack in turn is known if there is already a stack with the same mirror image base. The last setting on this dialog is the Base URL. By default it is set to the value of IMPORTER_DEFAULT_IMAGE_BASE (if available). This setting plus the relative path stay the same for every project to be imported in this run. It is used if imported stacks don’t provide a URL explicitly. To continue, click on the next step button.

The importer will tell you if it doesn’t find any projects based on the settings of the first step. However, if it does find potential projects, it allows you to unselect projects that shouldn’t get imported and to add more details:

_images/project_setup.png

Besides deciding which projects to actually import, you can also add tags which will be attached to the new projects. If the tile size differs from the standard, it can be adjusted here. If you want your projects to be accessible publicly, you can mark the corresponding check-box.

When the Check classification links option is selected, the importer tries to suggest existing classification graphs to be linked to the new project(s). These suggestions are optional and based on the tags you entered before. If existing projects have the same tags or a super set of it, their linked classification graphs will be suggested.

The last adjustment to make are permissions. With the help of a list box you can select one or more group/permission combinations that the new projects will be assigned. If all is how you want it, you can proceed to the next dialog.

The third and last step is a confirmation where all the information is shown the importer found about the projects and stacks to be imported. To change things in this import, simply go back to a step before, using the buttons at the bottom of the page. If all the project and stack properties as well as the tags and permissions are correct, the actual import can start.

In the end the importer will tell you which projects have been imported and, if there were problems, which ones not.