Phenotypic and assessment data

Template:

phenotype/
    <measurement_tool_name>.tsv
    <measurement_tool_name>.json

Optional: Yes

If the dataset includes multiple sets of participant level measurements (for example responses from multiple questionnaires) they can be split into individual files separate from participants.tsv.

Each of the measurement tool files MUST be kept in a /phenotype directory placed at the root of the BIDS dataset and MUST end with the .tsv extension. Filenames SHOULD be chosen to reflect the contents of the file. For example, the "Adult ADHD Clinical Diagnostic Scale" could be saved in a file called /phenotype/acds_adult.tsv.

The files can include an arbitrary set of columns, but one of them MUST be participant_id and the entries of that column MUST correspond to the subjects in the BIDS dataset and participants.tsv file.

Column name	Requirement Level	Data type	Description
participant_id	REQUIRED	string	A participant identifier of the form `sub-<label>`, matching a participant entity found in the dataset. Note that data for one participant MAY be represented across multiple rows in case of multiple sessions or runs, and therefore the entry in the `participant_id` column will be repeated. The combination of `participant_id`, `session_id` and `run_id` MUST be unique. This column must appear first in the file.
session_id	OPTIONAL	string	A session identifier of the form `ses-<label>`, matching a session found in the dataset. A `session_id` column MUST be added to all tabular files in the phenotype directory as soon as multiple sessions are present in the data set. The combination of `participant_id`, `session_id` and `run_id` MUST be unique. This column must appear second in the file.
run_id	OPTIONAL, but REQUIRED if there are multiple runs within any session	string	A run identifier that corresponds to an existing `run-<index>` entity used in a filename(s). A chronological `run` number is used when a measurement tool or assessment described by a tabular file was repeated within a session. The combination of `participant_id`, `session_id` and `run_id` MUST be unique. This column must appear third in the file.
HED	OPTIONAL	string	Hierarchical Event Descriptor (HED) tags. See the HED Appendix for details. This column may appear anywhere in the file.
Additional Columns	OPTIONAL	`n/a`	Additional columns are allowed.

As with all other tabular data, the additional tabular phenotypic data MAY be accompanied by a JSON data dictionary file describing the columns in detail (see Tabular files). When the AdditionalValidation key contains "Phenotype" in the dataset_description.json, then the additional tabular phenotypic data MUST be accompanied by a JSON data dictionary file.

In addition to the column descriptions, the JSON file MAY contain the following fields:

Key name	Requirement Level	Data type	Description
MeasurementToolMetadata	OPTIONAL	object	A description of the measurement tool as a whole. Contains two fields: `"Description"` and `"TermURL"`. `"Description"` is a free text description of the measurement tool. `"TermURL"` is a URL to an entity in an ontology corresponding to this tool. RECOMMENDED by `AdditionalValidation` containing `"Phenotype"` in `dataset_description.json`.
Derivative	OPTIONAL	boolean	Indicates that values in the corresponding column are transformations of values from other columns (for example a summary score based on a subset of items in a questionnaire). Must be one of: `"true"`, `"false"`.

As an example, consider the contents of a file called phenotype/acds_adult.json:

{
  "MeasurementToolMetadata": {
    "Description": "Adult ADHD Clinical Diagnostic Scale V1.2",
    "TermURL": "https://www.cognitiveatlas.org/task/id/trm_5586ff878155d"
  },
  "adhd_b": {
    "Description": "B. CHILDHOOD ONSET OF ADHD (PRIOR TO AGE 7)",
    "Levels": {
      "1": "YES",
      "2": "NO"
    }
  },
  "adhd_c_dx": {
    "Description": "As child met A, B, C, D, E and F diagnostic criteria",
    "Levels": {
      "1": "YES",
      "2": "NO"
    }
  }
}

Please note that in this example MeasurementToolMetadata includes information about the questionnaire and adhd_b and adhd_c_dx correspond to individual columns.

In addition to the keys available to describe columns in all tabular files (LongName, Description, Levels, Units, and TermURL) the participants.json file as well as phenotypic files can also include column descriptions with a Derivative field that, when set to true, indicates that values in the corresponding column is a transformation of values from other columns (for example a summary score based on a subset of items in a questionnaire).

Additional validation

When the AdditionalValidation key contains "Phenotype" in the dataset_description.json, the following tabular phenotypic data guidelines apply to phenotypic and assessment data.

1. Aggregate data across sessions
2. Always pair tabular data with data dictionaries
3. Add MeasurementToolMetadata to each tabular phenotypic measurement tool
4. Ensure minimal annotation for phenotypic and assessment data
5. Store demographic data in the participants file and instrument data in the phenotype directory

To read more about the guidelines for tabular phenotypic data and examples, see the tabular phenotypic data guidelines appendix.