About Metadata

When data is uploaded into the PsychENCODE Knowledge Portal, it must be done so in a way that makes it easily accessible and interpreted by the users who download it. This is where metadata comes in.

Metadata is standardized information provided along with data that helps with data organization and description. It is, essentially, data about the data.

Metadata is useful for you and everyone else that has a need for discovering, accessing, and/or using the corresponding data or repository the data is stored in. Therefore, effort is placed on the importance of metadata being understandable to a diverse set of users. This gives you the ability to interpret the data in a well-rounded, comprehensive way that tells the whole picture of the data.

What is the purpose of metadata?

Metadata serves many purposes:

Data understandability

Metadata gives context to the data, allowing it to be understood by others, including those who were not involved in the data generation process. For example, descriptive metadata such as the study name, the assay performed, tissue type, species, etc., provides general information about the data that allows a user, such as a bioinformatician, to decide if they can reuse the data for analysis or other purposes.

Data discovery and accessibility

Because metadata is provided in a standardized format, it allows the data to be searchable by various metadata elements, as well as accessible, all in an organized way. This standardization can greatly improve the accuracy of a search and reduce ambiguity.

Data interoperability

The standardized nature of metadata allows the data to be efficiently integrated with other applications.

Data reuse

All of the aforementioned purposes of metadata contribute to a data’s ability to be re-used. If data is not understandable, discoverable, accessible, or interoperable, it will be difficult or even impossible to reuse.

Metadata requirements

All data and metadata that gets uploaded into Synapse is curated by the PsychENCODE Consortium to ensure the metadata properly allows for data usability. In most cases, we require three things for every set of data:

The data itself, of course
A description of the study and methods used to generate the data
The metadata

A subset of the metadata is also used to annotate the files for easier discovery. The data uploader for a study ensures that the metadata provided will be both useful and play well with the metadata from other datasets. To do so, metadata must be submitted in a structured way, which is described below.

We can think of metadata as being four tables, which get submitted to the portal as four files:

1. Individual metadata

This is information about the participants in the study, such as their sex and diagnosis.

2. Biospecimen metadata

This is information about a specimen from a participant in the study, such as what organ and tissue the specimen was taken from.

3. Assay(s) metadata

This is information about what was done to the specimen in order to generate the data. For example, for RNA sequencing the metadata would specify what platform was used and how the specimen was prepared.

These first three metadata files can be joined by the individualID and specimenID columns to create a full metadata set.

4. Manifest

Finally, the manifest file includes metadata about the files themselves. The data uploader uses this file to provide a set of metadata variables for the purpose of file annotations. These annotations are what allow you to perform queries while exploring data, in order to find and download the specific data you need.

Click the link below (ctrl+click on Windows or command+click on Mac) for a downloadable PDF version of the information presented above.

Documentation.Data.pdf

Metadata dictionary

The metadata dictionary provides the proper terms that you need to know in order to successfully query for data as a user, or properly contribute data as a contributor. As a user, knowing this information will allow you to search for and find the data you need.

In general, the metadata dictionary is useful if you are:

A user who needs to know what terms in the annotations and metadata files mean so you can properly search for and find the data you need.
A contributor who needs to know what the accepted values are for a given key. For example, if you used a HiSeq 2000 machine to generate your data, then you would need to know that the accepted value for that machine is HiSeq2000 (no space). You also need to know that it goes in the column with the key platform.
A curator who needs to know what keys/values are currently available in order to direct contributors to use them, or gauge a need for any additional ones.

How to access the metadata dictionary

In order to access the metadata dictionary, you must be logged into Synapse. If you are not already logged into Synapse when you land on this page, you will be prompted to do so. Click the Synapse link provided in the pop-up window to log in to Synapse, and then revisit or refresh the metadata dictionary—you should now be able to access it.

Additionally, if you have an ad blocker installed on your browser, you must disable it for this website. To do so, click the ad blocker icon in the top right corner of your browser, and clock the toggle option to disable ads on this page.

How to use the metadata dictionary

To make use of the metadata dictionary, the various search fields are very helpful. First, notice that there’s a global search bar at the top of the page. You can use this to search for any field. Then, notice that each column has its own dedicated search bar. You can search for specific terms within each column to narrow down the list. You can find a breakdown of each column below. A final note is that what you see on your screen is just a subset of metadata—just a few rows out of hundreds. Scroll down to the bottom to see a page navigator, where you can jump to a different page/row in the table.

Here’s a breakdown of the various columns in the dictionary:

Key: Name of metadata term.

Key description: Definition of the metadata term.

Type: Expected data type, such as string or integer, for values associated with the metadata term.

Value: For keys that have the type string, this will either be empty if any text is accepted or will be a specific text value if there are a limited set of values to choose from.

Value description: Description of the value, if any.

Source: Link or name of resource used for the value description, if any.

Module: The group of terms the key is associated with.