Glossary

Annotations

Annotations are extra pieces of information that get associated to data in a project, file, folder, table, or view to help users find data. This additional information, in the form of controlled vocabulary, helps to surface the data in a structured way.

As a user, annotations are what allow you to systematically search for and find specific data of interest. If you haven’t already, learn how to filter and find data on the PsychENCODE Knowledge Portal here.

Looking for descriptions of annotations? Visit the metadata dictionary. And to learn more about metadata and the metadata dictionary, we have a dedicated page for that here.

Controlled Access Data

While all data uploaded to Synapse falls under the principles of Open Data, individual level human data is Controlled Access Data, as governed by the NIMH Repository and Genomics Resources (NRGR), and access to this data must be requested. Access requests are reviewed by the NRGR Data Access Committee (DAC), and controlled access data is only available for download with approval by the NRGR DAC.

You can find more detailed information on this and instructions on how to submit a requests here.

Controlled Value

A pre-formatted value that must be used as defined. Ex: True instead of yes; female instead of woman

Data

In our context, this refers to the data generated across studies. Explore all data on the portal here.

Data Subtype

Data Subtype (also referred to as dataSubtype) is a file annotation that indicates if data in the file are raw, processed, or normalized, or if the file contains metadata.

Individual ID

An individual ID is the identifier for a specific individual (human subject or single animal).

File schema

The JSON schema associated with the Synapse File Entity.

Governance

Due to the open-access nature of the platform, Synapse operates under comprehensive governance policies that define the rights and responsibilities of Synapse users. This includes our standard operating procedures (SOPs), privacy policy, code of conduct, community standards, and more.

Grant

A grant is represented by a contract number assigned to a NIMH-funded project.

Explore all grants on the portal here.

Manifest

A list of files and their metadata. There are several different types of manifests used throughout Synapse:

Upload manifest: This is a .tsv file used to upload metadata—more details, along with a template, are provided here.
Download manifest: This is used when downloading data programmatically—the template is provided by Synapse Python Client.
File Schema Driven Manifest: This is based on the new File Schema.
Portals Manifest: This is currently provided when exporting data.

Metadata

Metadata is additional, standardized information included alongside the data to give it context—data about the data, if you will. Metadata is what allows data in the portal to be searchable, discoverable, accessible, re-usable, and understandable to others, including those who were not involved in the data generation process.

Metadata can be descriptive (i.e., the name of the file), administrative (i.e., provenance information), or research-based (i.e., information about the sampling and handling of data).

There is a lot to understand about using metadata in the PsychENCODE Knowledge Portal, including how to use our metadata dictionary—find everything you need to know about using metadata here.

NIMH (National Institute of Mental Health)

NIMH is the lead federal agency in America for research on mental disorders.

The NIMH Repository and Genomics Resource (NRGR) is a collaborative venture between the NIMH and several academic institutions, and access to PEC data from human studies is managed by the NRGR Access Request system.

Open Data/Open Science

Open data represents transparent and accessible knowledge that is shared and developed through collaborative networks, based on the principles of open science. The goal of open science is to make scientific research—including publications, data, physical samples, and software—and its dissemination accessible to all levels of an inquiring society, whether amateur or professional.

The general driving idea behind open science and data is that scientific research can and should be accessible to anyone—because, well, why not? This system benefits all parties involved—the researchers gain wider-reaching recognition and appreciation for their work, the study subjects get to witness the palpable value of providing their personal data, scientists and other professionals are able to use properly funded research to aid in their own research/work, and the general public gains helpful information and knowledge from trusted sources. This is truly a win-win—collective consciousness is a global good!

People

These are the researchers and supporting stakeholders who contribute to the portal and make up the consortium.

Explore all people on the portal here.

You can filter for people based on the grants they are associated with. Clicking on a name will take you to their Synapse profile and provide basic information about the person, including their Synapse email.

Publications

Publications are a core output of research studies—many of them are Open Access and can be directly accessed by anyone.

Explore all publications on the portal here.

Results

Data analyses that surfaces on the portal through biological and computational tools and is boosted through information available on the portal such as metadata and provenance.

From a reusability perspective, data is the most useful to future users. Both results and data can be shared, but “data” is more important for reproducibility and reuse.

We consider data to be raw or partially processed information, depending on the type of experiment. Results are generally post-analysis information or manuscript figures. For example, if you are sharing gene expression information, raw data would be the raw, zipped, fastq.gz files, while differential expression analysis and volcano plots would be considered results. This distinction is well defined for many types of data, but for assays that we encounter less often this may be less clear. Results might also be acceptable for assays that do not lend themselves to re-analysis, such as western blotting. We can work with you to help figure this out.

Schema

An overlapping concept to data model, a metadata schema provides further rules and standardization of a data model. It outlines additional rules governing the management of metadata through constraints such as the optionality or valid values of attributes.

Specimen ID

A specimen ID is the identifier for a sample from a specific individual – for example, a brain sample from a specific region or a blood sample.

Study

A study is the primary unit of data organization in the portal. Essentially, each study represents an individual research project with specific objectives and focus (one project can operate multiple studies) A study can represent data generated from a specific human cohort, data from experiments on a model system, cross-consortium data processing and analysis efforts, or data associated with a specific publication.

Explore all studies on the portal here.