Phenotypic and assessment data
Template:
phenotype/
<measurement_tool_name>.tsv
<measurement_tool_name>.json
Optional: Yes
If the dataset includes multiple sets of participant level measurements (for
example responses from multiple questionnaires) they can be split into
individual files separate from participants.tsv.
Each of the measurement tool files MUST be kept in a /phenotype directory placed
at the root of the BIDS dataset and MUST end with the .tsv extension.
Filenames SHOULD be chosen to reflect the contents of the file.
For example, the "Adult ADHD Clinical Diagnostic Scale" could be saved in a file
called /phenotype/acds_adult.tsv.
The files can include an arbitrary set of columns, but one of them MUST be
participant_id and the entries of that column MUST correspond to the subjects
in the BIDS dataset and participants.tsv file.
| Column name | Requirement Level | Data type | Description |
|---|---|---|---|
| participant_id | REQUIRED | string | A participant identifier of the form sub-<label>, matching a participant entity found in the dataset. Note that data for one participant MAY be represented across multiple rows in case of multiple sessions or runs, and therefore the entry in the participant_id column will be repeated.The combination of participant_id, session_id and run_id MUST be unique.This column must appear first in the file. |
| session_id | OPTIONAL, but REQUIRED if sessions are defined in the dataset | string | A session identifier of the form ses-<label>, matching a session found in the dataset. A session_id column MUST be added to all tabular files in the phenotype directory as soon as multiple sessions are present in the data set regardless of whether those sessions are in the phenotype/ data, sub-<label>/ data, or a combination of the two.The combination of participant_id, session_id and run_id MUST be unique.This column must appear second in the file. |
| run_id | OPTIONAL, but REQUIRED if there are multiple runs within any session | string | A run identifier that corresponds to an existing run-<index> entity used in a filename(s). A chronological run number is used when a measurement tool or assessment described by a tabular file was repeated within a session.The combination of participant_id, session_id and run_id MUST be unique.This column must appear third in the file. |
| HED | OPTIONAL | string | Hierarchical Event Descriptor (HED) tags. See the HED Appendix for details. This column may appear anywhere in the file. |
| Additional Columns | OPTIONAL | n/a |
Additional columns are allowed. |
As with all other tabular data, the additional tabular phenotypic data
MAY be accompanied by a JSON data dictionary file describing the columns in detail
(see Tabular files).
When the AdditionalValidation key
contains "Phenotype" in the dataset_description.json,
then the additional tabular phenotypic data
MUST be accompanied by a JSON data dictionary file.
In addition to the column descriptions, the JSON file MAY contain the following fields:
| Key name | Requirement Level | Data type | Description |
|---|---|---|---|
| MeasurementToolMetadata | OPTIONAL | object | A description of the measurement tool as a whole. Contains two fields: "Description" and "TermURL". "Description" is a free text description of the measurement tool. "TermURL" is a URL to an entity in an ontology corresponding to this tool. RECOMMENDED by AdditionalValidation containing "Phenotype" in dataset_description.json. |
| Derivative | OPTIONAL | boolean | Indicates that values in the corresponding column are transformations of values from other columns (for example a summary score based on a subset of items in a questionnaire). Must be one of: "true", "false". |
As an example, consider the contents of a file called
phenotype/acds_adult.json:
{
"MeasurementToolMetadata": {
"Description": "Adult ADHD Clinical Diagnostic Scale V1.2",
"TermURL": "https://www.cognitiveatlas.org/task/id/trm_5586ff878155d"
},
"adhd_b": {
"Description": "B. CHILDHOOD ONSET OF ADHD (PRIOR TO AGE 7)",
"Levels": {
"1": "YES",
"2": "NO"
}
},
"adhd_c_dx": {
"Description": "As child met A, B, C, D, E and F diagnostic criteria",
"Levels": {
"1": "YES",
"2": "NO"
}
}
}
Please note that in this example MeasurementToolMetadata includes information
about the questionnaire and adhd_b and adhd_c_dx correspond to individual
columns.
In addition to the keys available to describe columns in all tabular files
(LongName, Description, Levels, Units, and TermURL) the
participants.json file as well as phenotypic files can also include column
descriptions with a Derivative field that, when set to true, indicates that
values in the corresponding column is a transformation of values from other
columns (for example a summary score based on a subset of items in a
questionnaire).
Additional validation
When the AdditionalValidation key
contains "Phenotype" in the dataset_description.json,
the following tabular phenotypic data guidelines
apply to phenotypic and assessment data.
-
1. Aggregate data across sessions
-
2. Always pair tabular data with data dictionaries
-
3. Add
MeasurementToolMetadatato each tabular phenotypic measurement tool -
4. Ensure minimal annotation for phenotypic and assessment data
-
5. Store demographic data in the participants file and instrument data in the phenotype directory
To read more about the guidelines for tabular phenotypic data and examples, see the tabular phenotypic data guidelines appendix.