Skip to content

Phenotypic and assessment data

Template:

phenotype/
    <measurement_tool_name>.tsv
    <measurement_tool_name>.json

Optional: Yes

If the dataset includes multiple sets of participant level measurements (for example responses from multiple questionnaires) they can be split into individual files separate from participants.tsv.

Each of the measurement tool files MUST be kept in a /phenotype directory placed at the root of the BIDS dataset and MUST end with the .tsv extension. Filenames SHOULD be chosen to reflect the contents of the file. For example, the "Adult ADHD Clinical Diagnostic Scale" could be saved in a file called /phenotype/acds_adult.tsv.

The files can include an arbitrary set of columns, but one of them MUST be participant_id and the entries of that column MUST correspond to the subjects in the BIDS dataset and participants.tsv file.

Column name Requirement Level Data type Description
participant_id REQUIRED string A participant identifier of the form sub-<label>, matching a participant entity found in the dataset. Note that data for one participant MAY be represented across multiple rows in case of multiple sessions or runs, and therefore the entry in the participant_id column will be repeated.

The combination of participant_id, session_id and run_id MUST be unique.

This column must appear first in the file.
session_id OPTIONAL, but REQUIRED if sessions are defined in the dataset string A session identifier of the form ses-<label>, matching a session found in the dataset. A session_id column MUST be added to all tabular files in the phenotype directory as soon as multiple sessions are present in the data set regardless of whether those sessions are in the phenotype/ data, sub-<label>/ data, or a combination of the two.

The combination of participant_id, session_id and run_id MUST be unique.

This column must appear second in the file.
run_id OPTIONAL, but REQUIRED if there are multiple runs within any session string A run identifier that corresponds to an existing run-<index> entity used in a filename(s). A chronological run number is used when a measurement tool or assessment described by a tabular file was repeated within a session.

The combination of participant_id, session_id and run_id MUST be unique.

This column must appear third in the file.
HED OPTIONAL string Hierarchical Event Descriptor (HED) tags. See the HED Appendix for details.

This column may appear anywhere in the file.
Additional Columns OPTIONAL n/a Additional columns are allowed.

As with all other tabular data, the additional tabular phenotypic data MAY be accompanied by a JSON data dictionary file describing the columns in detail (see Tabular files). When the AdditionalValidation key contains "Phenotype" in the dataset_description.json, then the additional tabular phenotypic data MUST be accompanied by a JSON data dictionary file.

In addition to the column descriptions, the JSON file MAY contain the following fields:

Key name Requirement Level Data type Description
MeasurementToolMetadata OPTIONAL object A description of the measurement tool as a whole. Contains two fields: "Description" and "TermURL". "Description" is a free text description of the measurement tool. "TermURL" is a URL to an entity in an ontology corresponding to this tool. RECOMMENDED by AdditionalValidation containing "Phenotype" in dataset_description.json.
Derivative OPTIONAL boolean Indicates that values in the corresponding column are transformations of values from other columns (for example a summary score based on a subset of items in a questionnaire).

Must be one of: "true", "false".

As an example, consider the contents of a file called phenotype/acds_adult.json:

{
  "MeasurementToolMetadata": {
    "Description": "Adult ADHD Clinical Diagnostic Scale V1.2",
    "TermURL": "https://www.cognitiveatlas.org/task/id/trm_5586ff878155d"
  },
  "adhd_b": {
    "Description": "B. CHILDHOOD ONSET OF ADHD (PRIOR TO AGE 7)",
    "Levels": {
      "1": "YES",
      "2": "NO"
    }
  },
  "adhd_c_dx": {
    "Description": "As child met A, B, C, D, E and F diagnostic criteria",
    "Levels": {
      "1": "YES",
      "2": "NO"
    }
  }
}

Please note that in this example MeasurementToolMetadata includes information about the questionnaire and adhd_b and adhd_c_dx correspond to individual columns.

In addition to the keys available to describe columns in all tabular files (LongName, Description, Levels, Units, and TermURL) the participants.json file as well as phenotypic files can also include column descriptions with a Derivative field that, when set to true, indicates that values in the corresponding column is a transformation of values from other columns (for example a summary score based on a subset of items in a questionnaire).

Additional validation

When the AdditionalValidation key contains "Phenotype" in the dataset_description.json, the following tabular phenotypic data guidelines apply to phenotypic and assessment data.

  • 1. Aggregate data across sessions

  • 2. Always pair tabular data with data dictionaries

  • 3. Add MeasurementToolMetadata to each tabular phenotypic measurement tool

  • 4. Ensure minimal annotation for phenotypic and assessment data

  • 5. Store demographic data in the participants file and instrument data in the phenotype directory

To read more about the guidelines for tabular phenotypic data and examples, see the tabular phenotypic data guidelines appendix.