Datasets
Overview#
The Datasets feature in Nextflow Tower allows users to store CSV and TSV formatted dataset files in a workspace, to use as an input one or more pipelines.
In order for your pipeline to use your dataset as input during runtime, information about the dataset and file format must be included in the relevant parameters of your pipeline-schema. We recommend using the nf-core tools schema build feature to simplify the schema creation process. Commands include an option to validate and lint your schema file according to best practice guidelines from the nf-core community.
Note
This feature is only available in organization workspaces.
Creating a new Dataset#
To create a new dataset, follow these steps:
-
Open the
Datasets
tab in your organization workspace. -
Select
New dataset
to open the dataset creation dialog shown below.
-
Complete the Name and Description fields using information relevant to your dataset.
-
You can add the dataset file to your workspace using drag-and-drop, or the system file explorer dialog.
-
You can customize views for the dataset using the
First row as header
option, for dataset files that use the first row for column names.
Warning
The size of the dataset file cannot exceed 10MB.
Dataset versions#
The Datasets feature can accommodate multiple versions of a dataset. To add a new version for a dataset, follow these steps:
-
Select Edit next to the dataset you wish to update.
-
In the Edit dialog, select Add a new version.
-
Upload the newer version of the dataset and select Update.
Warning
All subsequent versions of a dataset must be in the same data format as the initial version.
Using a Dataset#
To use a dataset with the saved pipelines in your workspace, follow these steps:
-
Open any pipeline that contains a pipeline-schema from the Launchpad.
-
Select the input field for the pipeline, removing any default value.
-
Pick the desired dataset for your pipeline.
Warning
The datasets shown in the dropdown menu depend upon the validation in your pipeline-schema. If the schema specifies only CSV
format, no TSV
dataset would appear in the dropdown.