Schema Reference

The mzTab-M JSON schema provides a machine-readable definition of the format structure. It is used for domain model generation, parser testing, and automated validation.

Schema Files

The schemas are located in the schema/ directory of the repository:

schema/mzTab_2_1-M.json

The primary JSON schema for mzTab-M 2.1, using JSON Schema draft-07 / OpenAPI 3.1 conventions.

schema/mzTab_2_1-M_mapping.xml

Mapping file that associates mzTab-M fields with their controlled vocabulary term roots.

schema/mzTab_2_1-M_metabolights_mapping.xml

MetaboLights-specific mapping for submission validation.

Field Reference

The field reference below is generated from schema/mzTab_2_1-M.json via schema/generate_schema_adoc.py.

Field Reference

This document provides a reference for all fields defined in the mzTab-M format, organised by section and ordered by mzTab-M section hierarchy. Each field entry includes a description, type, mandatory status, and example usage.

Sections

The mzTab-M format consists of four cross-referenced data tables: metadata (MTD), Small Molecule (SML), Small Molecule Feature (SMF), and Small Molecule Evidence (SME). The MTD and SML tables are mandatory. SMF and SME sections SHOULD also be included to capture full identification evidence.

Metadata Section

The metadata section provides additional information about the dataset(s) reported in the mzTab file. All fields in the metadata section are optional apart from those noted as mandatory. The fields in the metadata section MUST be reported in the order listed below. The field name and value MUST be separated by a tab character.

mzTab-version

Description

Version number of the mzTab format used.

Format: major.minor.patch-variant Must end with "-M" suffix for metabolomics variant.

Used to ensure compatibility and processing correctness.

Type

Regex

^\d{1}\.\d{1}\.\d{1}-[A-Z]{1}$


Mandatory

True

Example

MTD	mzTab-version	2.0.0-M
MTD	mzTab-version	2.1.0-M

mzTab-ID

Description

Unique identifier for the mzTab-M document. REQUIRED. Can be: - Repository accession number (e.g., MTBLS214) - Laboratory internal identifier - Study-specific identifier NOT intended as a globally unique identifier, but SHOULD have local meaning within its context.

Type

String

Mandatory

True

Example

MTD	mzTab-ID	MTBLS214
MTD	mzTab-ID	LAB001_2023

title

Description

Human-readable title of the experiment or study. OPTIONAL. SHOULD be: - Concise but informative - Reflect the main focus of the study - Unique within a collection of related studies

Type

String

Mandatory

False

Example

MTD	title	Metabolomic Analysis of Human Plasma in Diabetes Type 2
MTD	title	Lipidomics Study of Brain Tissue in Alzheimer's Disease

description

Description

Detailed description of the experiment or study. OPTIONAL. SHOULD include: - Study objectives - Experimental design overview - Key methodological approaches - Any unique aspects of the study Provides context for understanding the data and its significance.

Type

String

Mandatory

False

Example

MTD	description	Investigation of metabolic changes in human plasma samples from type 2 diabetes patients compared to healthy controls. Study includes both fasting and post-prandial measurements.
MTD	description	Analysis of lipid profiles in brain tissue samples examining the relationship between specific lipid species and Alzheimer's disease progression.

sample_processing[1-n]

Description

Parameters specifying sample processing that was applied within one step.

Type

Parameter List

Mandatory

False

Example

MTD	sample_processing[1]	[MSIO, MSIO:0000107, metabolism quenching using precooled 60 percent methanol ammonium bicarbonate buffer,]
MTD	sample_processing[2]	[MSIO, MSIO:0000146, centrifugation,]
MTD	sample_processing[3]	[MSIO, MSIO:0000141, metabolite extraction,]
MTD	sample_processing[4]	[MSIO, MSIO:0000141, silylation,]

The name, source, analyzer and detector of the instruments used in the experiment. Multiple instruments are numbered [1-n].

instrument[1-n]-name

Description

The instrument’s name.

Type

Parameter

Mandatory

False

instrument[1-n]-source

Description

The instrument’s ion source.

Type

Parameter

Mandatory

False

instrument[1-n]-analyzer[1-n]

Description

The instrument’s mass analyzer, as defined by the parameter.

Type

Parameter List

Mandatory

False

instrument[1-n]-detector

Description

The instrument’s mass analyzer, as defined by the parameter.

Type

Parameter

Mandatory

False

software[1-n]

Description

The software utilized.

Type

Parameter

Mandatory

False

Example

MTD	software[1]	[MS, MS:1002879, Progenesis QI, 3.0]
MTD	software[1]-setting	Fragment tolerance = 0.1 Da
…
MTD	software[2]-setting	Parent tolerance = 0.5 Da

software[1-n]-setting[1-n]

Description

A software setting used. This field MAY occur multiple times for a single software. The value of this field is deliberately set as a String, since there currently do not exist cvParams for every possible setting.

Type

String List

Mandatory

False

publication[1-n]

Description

The publication item ids referenced by this publication.

Type

String List

Mandatory

True

Example

MTD	publication[1]	pubmed:21063943|doi:10.1007/978-1-60761-987-1_6
MTD	publication[2]	pubmed:20615486|doi:10.1016/j.jprot.2010.06.008

The contact’s name, affiliation and e-mail. Several contacts can be given by indicating the number in the square brackets after "contact". A contact has to be supplied in the format [first name] [initials] [last name].

contact[1-n]-name

Description

The contact’s name.

Type

String

Mandatory

False

contact[1-n]-affiliation

Description

The contact’s affiliation.

Type

String

Mandatory

False

contact[1-n]-email

Description

The contact’s e-mail address.

Type

String

Mandatory

False

contact[1-n]-orcid

Description

The contact’s orcid id, without https prefix.

Type

Regex

^[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{3}[0-9X]{1}$


Mandatory

False

uri[1-n]

Description

The URI pointing to the external resource.

Type

URI

Mandatory

False

Example

MTD	uri[1]	https://www.ebi.ac.uk/metabolights/MTBLS517
…
MTD	external_study_uri[1]	https://www.ebi.ac.uk/metabolights/MTBLS517/files/i_Investigation.txt

external_study_uri[1-n]

Description

The URI pointing to the external resource.

Type

URI

Mandatory

False

Example

MTD	uri[1]	https://www.ebi.ac.uk/metabolights/MTBLS517
…
MTD	external_study_uri[1]	https://www.ebi.ac.uk/metabolights/MTBLS517/files/i_Investigation.txt

quantification_method

Description

The quantification method used in the experiment reported in the file.

Type

Parameter

Mandatory

True

sample[1-n]

Description

The sample’s name.

Type

String

Mandatory

False

Example

COM	Experiment where all samples consisted of the same two species
MTD	sample[1]	individual number 1
MTD	sample[1]-species[1]	[NCBITaxon, NCBITaxon:9606, Homo sapiens, ]
MTD	sample[1]-tissue[1]	[BTO, BTO:0000759, liver, ]
MTD	sample[1]-cell_type[1]	[CL, CL:0000182, hepatocyte, ]
MTD	sample[1]-disease[1]	[DOID, DOID:684, hepatocellular carcinoma, ]
MTD	sample[1]-disease[2]	[DOID, DOID:9451, alcoholic fatty liver, ]
MTD	sample[1]-description	Hepatocellular carcinoma samples.
MTD	sample[1]-custom[1]	[,,Extraction date, 2011-12-21]
MTD	sample[1]-custom[2]	[,,Extraction reason, liver biopsy]
MTD	sample[2]	individual number 2
MTD	sample[2]-species[1]	[NCBITaxon, NCBITaxon:9606, Homo sapiens, ]
MTD	sample[2]-tissue[1]	[BTO, BTO:0000759, liver, ]
MTD	sample[2]-cell_type[1]	[CL, CL:0000182, hepatocyte, ]
MTD	sample[2]-description	Healthy control samples.

sample[1-n]-species[1-n]

Description

Biological species information on the sample.

Type

Parameter List

Mandatory

False

sample[1-n]-tissue[1-n]

Description

Biological tissue information on the sample.

Type

Parameter List

Mandatory

False

sample[1-n]-cell_type[1-n]

Description

Biological cell type information on the sample.

Type

Parameter List

Mandatory

False

sample[1-n]-disease[1-n]

Description

Disease information on the sample.

Type

Parameter List

Mandatory

False

sample[1-n]-description

Description

A free form description of the sample.

Type

String

Mandatory

False

sample[1-n]-custom[1-n]

Description

Additional user or cv parameters.

Type

Parameter List

Mandatory

False

Specification of ms_run. location: Location of the external data file e.g. raw files on which analysis has been performed. If the actual location of the MS run is unknown, a “null” MUST be used as a place holder value, since the [1-n] cardinality is referenced elsewhere. If pre-fractionation has been performed, then [1-n] ms_runs SHOULD be created per assay. instrument_ref: If different instruments are used in different runs, instrument_ref can be used to link a specific instrument to a specific run. format: Parameter specifying the data format of the external MS data file. If ms_run[1-n]-format is present, ms_run[1-n]-id_format SHOULD also be present, following the parameters specified in Table 1. id_format: Parameter specifying the id format used in the external data file. If ms_run[1-n]-id_format is present, ms_run[1-n]-format SHOULD also be present. fragmentation_method: The type(s) of fragmentation used in a given ms run. scan_polarity: The polarity mode of a given run. Usually only one value SHOULD be given here except for the case of mixed polarity runs. hash: Hash value of the corresponding external MS data file defined in ms_run[1-n]-location. If ms_run[1-n]-hash is present, ms_run[1-n]-hash_method SHOULD also be present. hash_method: A parameter specifying the hash methods used to generate the String in ms_run[1-n]-hash. Specifics of the hash method used MAY follow the definitions of the mzML format. If ms_run[1-n]-hash is present, ms_run[1-n]-hash_method SHOULD also be present.

ms_run[1-n]-location

Description

The msRun’s location URI.

Type

URI

Mandatory

True

ms_run[1-n]-instrument_ref

Description

Sample reference.

Type

Integer

Mandatory

False

ms_run[1-n]-format

Description

The format of the MS run file.

Type

Parameter

Mandatory

False

ms_run[1-n]-id_format

Description

The format of the IDs in the MS run file.

Type

Parameter

Mandatory

False

ms_run[1-n]-fragmentation_method[1-n]

Description

The fragmentation methods applied during this msRun.

Type

Parameter List

Mandatory

False

ms_run[1-n]-scan_polarity[1-n]

Description

The scan polarity/polarities used during this msRun.

Type

Parameter List

Mandatory

False

ms_run[1-n]-hash

Description

The file hash value of this msRun’s data file.

Type

String

Mandatory

False

ms_run[1-n]-hash_method

Description

The method used to calculate the hash.

Type

Parameter

Mandatory

False

ms_run[1-n]-parameters[1-n]

Description

Additional parameters of the assay, separated by bars.

Type

Parameter List

Mandatory

False

Example

MTD	ms_run[1]-parameter[1]	[MS, MS:1000031, instrument model, [MS, MS:1000449, LTQ Orbitrap,]]

assay[1-n]

Description

The assay name.

Type

String

Mandatory

True

Example

MTD	assay[1]	first assay
MTD	assay[1]-custom[1]	[MS, , Assay operator, Fred Blogs]
MTD	assay[1]-external_uri	https://www.ebi.ac.uk/metabolights/MTBLS517/files/i_Investigation.txt?STUDYASSAY=a_e04_c18pos.txt
MTD	assay[1]-sample_ref	sample[1]
MTD	assay[1]-ms_run_ref	ms_run[1]

assay[1-n]-custom[1-n]

Description

Additional user or cv parameters.

Type

Parameter List

Mandatory

False

assay[1-n]-external_uri

Description

An external URI to further information about this assay.

Type

URI

Mandatory

False

assay[1-n]-sample_ref

Description

Sample reference.

Type

Integer

Mandatory

False

assay[1-n]-ms_run_ref[1-n]

Description

The ms run(s) referenced by this assay.

Type

Integer List

Mandatory

True

assay[1-n]-protocol_refs[1-n]

Description

The protocol(s) referenced by this assay.

Type

Integer List

Mandatory

False

Example

MTD	assay[1]-protocol_ref	protocol[1]| protocol[2]

assay[1-n]-parameters[1-n]

Description

Additional parameters of the assay, separated by bars.

Type

Parameter List

Mandatory

False

Example

MTD	assay[1]-parameter[1]	[MS, MS:1000031, instrument model, [MS, MS:1000449, LTQ Orbitrap,]]

study_variable[1-n]

Description

The study variable value. Encoded according to the datatype declared on the referenced study_variable_group: either a literal value (for xsd:* datatypes) or a Parameter (for the Parameter datatype, e.g. [NO, NO:12345, Male,] or [,,Male,]).

Type

Study Variable List

Mandatory

True

Example

MTD	study_variable[1]	control
MTD	study_variable[1]-assay_refs	assay[1]| assay[2]| assay[3]
MTD	study_variable-average_function	[MS, MS:1002883, median, ]
MTD	study_variable-variation_function	[MS, MS:1002885, standard error, ]
MTD	study_variable[1]-description	Group B (spike-in 0.74 fmol/uL)
MTD	study_variable[2]	1 minute 0.5mg rapamycin

study_variable[1-n]-assay_refs[1-n]

Description

The assays referenced by this study variable.

Type

Integer List

Mandatory

False

study_variable[1-n]-ms_run_refs[1-n]

Description

The ms run(s) referenced by this study variable.

Type

Integer List

Mandatory

False

Example

MTD	study_variable[1]-ms_run_ref	ms_run[1]| ms_run[2]

study_variable[1-n]-description

Description

A free-form description of this study variable.

Type

String

Mandatory

False

study_variable[1-n]-group_refs[1-n]

Description

The study variable group this study variable belongs to.

Type

Integer List

Mandatory

False

Example

MTD	study_variable[1]-group_ref	study_variable_group[1]| study_variable_group[2]

study_variable[1-n]-average_function

Description

The function used to calculate the study variable quantification value and the operation used is not arithmetic mean (default). e.g. geometric mean, median.

Type

Parameter

Mandatory

False

study_variable[1-n]-variation_function

Description

The function used to calculate the study variable quantification variation value if it is reported and the operation used is not coefficient of variation (default). e.g. standard error.

Type

Parameter

Mandatory

False

study_variable_group[1-n]

Description

The study variable group name.

Type

Parameter

Mandatory

True

Example

MTD	study_variable_group[1]	[PATO, PATO:0000383, sex, ]
MTD	study_variable_group[1]-description	Sex of the individual
MTD	study_variable_group[1]-type	[STATO, STATO:0000252, categorical variable, ]
MTD	study_variable_group[1]-datatype	xsd:string
MTD	study_variable_group[2]	[PATO, PATO:0000384, timepoint, ]
MTD	study_variable_group[2]-description	Time after treatment
MTD	study_variable_group[2]-type	[STATO, STATO:0000228, ordinal variable, ]
MTD	study_variable_group[2]-datatype	xsd:integer
MTD	study_variable_group[2]-unit	[UO, UO:0000033, day, ]
MTD	study_variable[1]	Female_0day
MTD	study_variable[1]-group_ref	study_variable_group[1]
MTD	study_variable[1]-assay_refs	assay[1]|assay[2]|assay[3]
MTD	study_variable[2]	Female_1day
MTD	study_variable[2]-group_ref	study_variable_group[2]
MTD	study_variable[2]-assay_refs	assay[4]|assay[5]|assay[6]

study_variable_group[1-n]-description

Description

Description of the study variable group.

Type

String

Mandatory

False

study_variable_group[1-n]-type

Description

The study variable group type, as defined by the parameter.

Type

Parameter

Mandatory

False

study_variable_group[1-n]-datatype

Description

The datatype of the group variable, which determines how they can be encoded and parsed in mzTab-M files, and how the values could be handled in programming languages.

Producers of mzTab-M 2.1.0 SHOULD provide a datatype for each study_variable_group to simplify interpretation by downstream consumers of the format. The field is not mandatory, but its presence removes ambiguity about how the associated values are encoded and should be parsed.

The following datatypes are supported:

  • xsd:string – Character string.

  • xsd:integer – Arbitrary‑size integer.

  • xsd:decimal – Arbitrary‑precision decimal number.

  • xsd:boolean – Boolean value (true / false).

  • xsd:date – Calendar date, encoded in ISO 8601 format (YYYY-MM-DD).

  • xsd:time – Time of day, encoded in ISO 8601 format (hh:mm:ss, optional fractional seconds and timezone).

  • xsd:dateTime – Combined date and time, encoded in ISO 8601 format (YYYY-MM-DDThh:mm:ss, with optional fractional seconds and timezone, e.g. YYYY-MM-DDThh:mm:ssZ).

  • xsd:anyURI – A Uniform Resource Identifier reference.

  • Parameter – The values of the linked study_variables are reported as a Parameter (user-defined or CV Parameter), using the standard mzTab-M Parameter syntax [CV, accession, name, value].

Writers MUST ensure that the values of the study_variable entries belonging to the same study_variable_group all have the same type (e.g. string, number, …​) and use the same convention of reporting the value directly or as a Parameter, consistent with the datatype declared on the group. If the study_variable_group defines the datatype as Parameter, the CVParam qualifying the study_variable_group itself can be of a different CV origin than the CVParams used in the linked study_variable values.

Tools and parsers implementing mzTab-M SHOULD apply the appropriate data type interpretation when reading study_variable_group data and constructing analysis data structures (e.g., data frames, matrices, or tables).

Type

Parameter

Mandatory

False

Example

MTD	study_variable_group[1]-datatype	....
MTD	study_variable_group[1]-datatype	xsd:string
MTD	study_variable_group[2]-datatype	xsd:decimal
MTD	study_variable_group[3]-datatype	xsd:date
MTD	study_variable_group[4]-datatype	Parameter

COM	plain string value:
MTD	study_variable[1]	Male
MTD	study_variable[1]-group_ref	study_variable_group[1]

COM	user-defined Parameter value:
MTD	study_variable[2]	[,,Male,]
MTD	study_variable[2]-group_ref	study_variable_group[4]

COM	CV Parameter value:
MTD	study_variable[3]	[NCIT, NCIT:C20197, Male, ]
MTD	study_variable[3]-group_ref	study_variable_group[4]
....

study_variable_group[1-n]-unit

Description

The study variable group unit, as defined by the parameter.

Type

Parameter

Mandatory

False

A protocol describing one or more steps of an experimental procedure, such as sample preparation, data acquisition or data processing. Protocols are referenced from Assay objects. Added in mzTab-M 2.1.

protocol[1-n]-name

Description

The protocol name.

Type

String

Mandatory

True

protocol[1-n]-type

Description

The protocol type, as defined by the parameter.

Type

Parameter

Mandatory

True

protocol[1-n]-description

Description

Description of the protocol.

Type

String

Mandatory

False

protocol[1-n]-parameters[1-n]

Description

The protocol parameters.

Type

Parameter List

Mandatory

False

custom[1-n]

Description

Any additional parameters describing the analysis reported.

Type

Parameter List

Mandatory

False

Example

MTD	custom	[MS, MS:1000001, custom param, value]

Specification of controlled vocabularies. label: A string describing the labels of the controlled vocabularies/ontologies used in the mzTab file as a short-hand e.g. "MS" for PSI-MS. full_name: A string describing the full names of the controlled vocabularies/ontologies used in the mzTab file. version: A string describing the version of the controlled vocabularies/ontologies used in the mzTab file. uri: A string containing the URIs of the controlled vocabularies/ontologies used in the mzTab file.

cv[1-n]-label

Description

The abbreviated CV label.

Type

String

Mandatory

True

cv[1-n]-full_name

Description

The full name of this CV, for humans.

Type

String

Mandatory

True

cv[1-n]-version

Description

The CV version used when the file was generated.

Type

String

Mandatory

True

cv[1-n]-uri

Description

A URI to the CV definition.

Type

URI

Mandatory

True

database[1-n]

Description

The database name.

Type

Database List

Mandatory

True

Example

MTD	database[1]	[MIRIAM, MIR:00100079, HMDB, ]
MTD	database[1]-prefix	hmdb
MTD	database[1]-version	3.6
MTD	database[1]-uri	https://www.hmdb.ca
MTD	database[2]	[,, "de novo", ]
MTD	database[2]-prefix	dn
MTD	database[2]-version	Unknown
MTD	database[2]-uri	null
MTD	database[3]	[,, "no database", null ]
MTD	database[3]-prefix	null
MTD	database[3]-version	Unknown
MTD	database[3]-uri	null

database[1-n]-prefix

Description

The prefix used in the “identifier” column of data tables. For the 'no database' case 'null' must be used.

Type

String

Mandatory

True

database[1-n]-version

Description

The database version is mandatory where identification has been performed. This may be a formal version number e.g. “1.4.1”, a date of access “2016-10-27” (ISO-8601 format) or “Unknown” if there is no suitable version that can be annotated.

Type

String

Mandatory

True

database[1-n]-uri

Description

The URI to the database. For the “no database” case, 'null' must be reported.

Type

String

Mandatory

True

derivatization_agent[1-n]

Description

A description of derivatization agents applied to small molecules, using userParams or CV terms where possible.

Type

Parameter List

Mandatory

False

Example

MTD	derivatization_agent[1]	[XLMOD, XLMOD:07014, N-methyl-N-t-butyldimethylsilyltrifluoroacetamide, ]

small_molecule-quantification_unit

Description

Defines what type of units are reported in the small molecule summary quantification / abundance fields

Type

Parameter

Mandatory

True

Example

MTD	small_molecule-quantification_unit	[MS, MS:1001113, peak area, ]

small_molecule_feature-quantification_unit

Description

Defines what type of units are reported in the small molecule feature quantification / abundance fields.

Type

Parameter

Mandatory

False

Example

MTD	small_molecule_feature-quantification_unit	[MS, MS:1001113, peak area, ]

small_molecule-identification_reliability

Description

The system used for giving reliability / confidence codes to small molecule identifications MUST be specified if not using the default codes.

Type

Parameter

Mandatory

False

Example

MTD	small_molecule-identification_reliability	[MS, MS:1000932, identification reliability, ]

id_confidence_measure[1-n]

Description

Small molecule identification confidence metrics.<br/>Scoring System - Use CV parameters numbered [1-n] - Define score direction (high-to-low or low-to-high) - Order by importance for identification ranking

Scores determine confidence in molecular identifications

Type

Parameter List

Mandatory

True

Example

MTD	id_confidence_measure[1]	[MS,MS:1002890,fragmentation score,]
MTD	id_confidence_measure[2]	[MS,MS:1002891,retention time score,]

colunit-small_molecule

Description

Unit definitions for small molecule data columns.

Format - Pattern: {column_name}={unit_parameter} - Use CV parameters for units when possible

Important Notes - Not for quantification columns - Use small_molecule-quantification_unit for quantification values

Type

Column Parameter Mapping List

Mandatory

False

Example

MTD	colunit-small_molecule	retention_time=[UO,UO:0000031,minute,]
MTD	colunit-small_molecule	mass=[UO,UO:0000221,dalton,]

colunit-small_molecule_feature

Description

Defines the used unit for a column in the small molecule feature section. The format of the value has to be {column name}={Parameter defining the unit}. This field MUST NOT be used to define a unit for quantification columns. The unit used for small molecule quantification values MUST be set in small_molecule_feature-quantification_unit.

Type

Column Parameter Mapping List

Mandatory

False

Example

MTD	colunit-small_molecule_feature	retention_time=[UO, UO:0000031, minute, ]

colunit-small_molecule_evidence

Description

Defines the used unit for a column in the small molecule evidence section. The format of the value has to be {column name}={Parameter defining the unit}.

Type

Column Parameter Mapping List

Mandatory

False

Example

MTD	colunit-small_molecule_evidence	retention_time=[UO, UO:0000031, minute, ]

Small Molecule (SML) Section

The small molecule section is table-based. It MUST always come after the metadata section. Each row reports one final quantified molecule result. All columns are MANDATORY except for "opt_" columns.

The order of columns MUST follow the order specified below. All table columns MUST be Tab separated. There MUST NOT be any empty cells. Missing values MUST be reported using "null".

SML_ID

Description

A within file unique identifier for the small molecule summary.

Type

Integer

Mandatory

True

Is Nullable:

FALSE

Example

SMH	...	SML_ID	...
SML	...	1	...

SMF_ID_REFS

Description

References to the small molecule features (SMF elements) via referencing SMF_ID values. Multiple values MAY be provided as a | separated list to indicate which features were used to aggregate the SML row.

Type

Integer List

Mandatory

False

Is Nullable:

TRUE

Example

SMH	...	SMF_ID_REFS	...
SML	...	2|3|11	...

database_identifier

Description

A list of | separated possible identifiers for the small molecule; multiple values MUST only be provided to indicate ambiguity in the identification of the molecule and not to demonstrate different identifier types for the same molecule. Alternative identifiers for the same molecule MAY be provided as optional columns. The database identifier must be preceded by the resource description (prefix) followed by a colon, as specified in the metadata section. A null value MAY be provided if the identification is sufficiently ambiguous as to be meaningless for reporting or the small molecule has not been identified.

Type

String List

Mandatory

False

Is Nullable:

TRUE

Example

SMH	...	database_identifier	...
SML	...	CID:00027395|HMDB:HMDB0001847	...

chemical_formula

Description

The chemical formula of the identified compound e.g. in a database, assumed to match the theoretical mass to charge (in some cases this will be the derivatized form, including adducts and protons). This should be specified in Hill notation (EA Hill 1900), i.e. elements in the order C, H and then alphabetically all other elements. Counts of one may be omitted. Elements should be capitalized properly to avoid confusion (e.g., “CO” vs. “Co”). The chemical formula reported should refer to the neutral form. Charge state is reported by the charge field in the SME and SMF section. Example N-acetylglucosamine would be encoded by the string “C8H15NO6”

Type

String List

Mandatory

False

Is Nullable:

TRUE

Example

SMH	...	chemical_formula	...
SML	...	C17H20N4O2	...

smiles

Description

The potential molecule’s structure in the simplified molecular-input line-entry system (SMILES) for the small molecule.

Type

String List

Mandatory

False

Is Nullable:

TRUE

Example

SMH	...	smiles	...
SML	...	C1=CC=C(C=C1)CCNC(=O)CCNNC(=O)C2=CC=NC=C2	...

inchi

Description

A standard IUPAC International Chemical Identifier (InChI) for the given substance.

Type

String List

Mandatory

False

Is Nullable:

TRUE

Example

SMH	...	inchi	...
SML	...	InChI=1S/C17H20N4O2/c22-16(19-12-6-14-4-2-1-3-5-14)9-13-20-21-17(23)15-7-10-18-11-8-15/h1-5,7-8,10-11,20H,6,9,12-13H2,(H,19,22)(H,21,23)	...

chemical_name

Description

The small molecule’s chemical/common name, or general description if a chemical name is unavailable.

Type

String List

Mandatory

False

Is Nullable:

TRUE

Example

SMH	...	chemical_name	...
SML	...	N-(2-phenylethyl)-3-[2-(pyridine-4-carbonyl)hydrazinyl]propanamide	...

uri

Description

A URI pointing to the small molecule’s entry in a database (e.g., the small molecule’s HMDB, Chebi or KEGG entry).

Type

String List

Mandatory

False

Is Nullable:

TRUE

Example

SMH	...	uri	...
SML	...	http://www.genome.jp/dbget-bin/www_bget?cpd:C00031	...
SML	...	http://www.hmdb.ca/metabolites/HMDB0001847	...

theoretical_neutral_mass

Description

The theoretical neutral mass of the small molecule. This should be calculated from the chemical formula.

Type

Double List

Mandatory

False

Is Nullable:

TRUE

Example

SMH	...	theoretical_neutral_mass	...
SML	...	1234.5	...

adduct_ions

Description

A | separated list of the detected adduct ion forms for this small molecule. The terms should follow the general style in the 2013 IUPAC recommendations on terms relating to MS e.g. [M+H]1+, [M+Na]1+, [M+NH4]1+, [M-H]1-, [M+Cl]1-.

Type

Regex List

^\[\d*M([+-][\w\d]+)*\]\d*[+-]$

Mandatory

False

Is Nullable:

TRUE

Example

SMH	...	adduct_ions	...
SML	...	[M+H]1+|[M+Na]1+	...

reliability

Description

The reliability of the given small molecule identification. This must be supplied by the resource and should be reported as an integer between 1-4:

1: identified, rigorous. …​ 2: identified. …​ 3: putatively characterized class. …​ 4: unknown. …​

Type

String

Mandatory

False

Is Nullable:

TRUE

Example

SMH	...	reliability	...
SML	...	3	...
SML	...	0	...

best_id_confidence_measure

Description

The small molecule confidence measure/score of the best identification for this small molecule summary. The type of the value is defined by the best_id_confidence_measure CV parameter. The value is reported in the best_id_confidence_value column.

Type

Parameter

Mandatory

False

Is Nullable:

TRUE

Example

SMH	...	best_id_confidence_measure	...
SML	...	[MS, MS:1001477, SpectraST,,]	...

best_id_confidence_value

Description

The small molecule confidence measure/score value of the best identification for this small molecule summary.

Type

Double

Mandatory

True

Is Nullable:

FALSE

Example

SMH	...	best_id_confidence_value	...
SML	...	0.85	...

abundance_assay

Description

The small molecule’s abundance in every assay described in the metadata section MUST be reported. Null or zero values may be reported as appropriate.

Type

Double List

Mandatory

False

Is Nullable:

TRUE

Example

SMH	...	abundance_assay	...
SML	...	12340	...

abundance_study_variable

Description

The small molecule’s abundance in every study variable described in the metadata section. Null or zero values may be reported as appropriate.

Type

Double List

Mandatory

False

Is Nullable:

TRUE

Example

SMH	...	abundance_study_variable	...
SML	...	1230	...

abundance_variation_study_variable

Description

The small molecule’s abundance variation in every study variable described in the metadata section. Null or zero values may be reported as appropriate.

Type

Double List

Mandatory

False

Is Nullable:

TRUE

Example

SMH	...	abundance_variation_study_variable	...
SML	...	0.2	...

opt_{identifier}_*

Description

Additional columns can be added to the end of the small molecule table. These column headers MUST start with the prefix “opt_” followed by the {identifier} of the object they reference: assay, study variable, MS run or “global” (if the value relates to all replicates). Column names MUST only contain the following characters: 'A'-'Z', 'a'-'z', '0'-'9', '', '-', '[', ']', and ':'. CV parameter accessions MAY be used for optional columns following the format: opt{identifier}_cv_{accession}_{parameter name}. Spaces within the parameter’s name MUST be replaced by '_'.

Type

Optional Column

Mandatory

False

Is Nullable:

TRUE

Example

SMH	...	opt_global_cv_value	...
SML	...	opt_global_cv_MS:1002217_decoy_peptide=null	...

Small Molecule Feature (SMF) Section

The small molecule feature section is table-based, representing individual MS regions (generally the elution profile for all isotopomers from a single charge state). It MUST always come after the Small Molecule Section. All columns are MANDATORY except for "opt_" columns.

The order of columns MUST follow the order specified below. All table columns MUST be Tab separated. There MUST NOT be any empty cells. Missing values MUST be reported using "null".

SMF_ID

Description

A within file unique identifier for the small molecule feature.

Type

Integer

Mandatory

True

Is Nullable:

FALSE

Example

SFH	...	SMF_ID	...
SMF	...	1	...

SME_ID_REFS

Description

References to the identification evidence (SME elements) via referencing SME_ID values. Multiple values MAY be provided as a | separated list to indicate ambiguity in the identification or to indicate that different types of data supported the identifiction (see sme_id_ref_ambiguity_code). For the case of a consensus approach where multiple adduct forms are used to infer the SML ID, different features should just reference the same SME_ID value(s).

Type

Integer List

Mandatory

False

Is Nullable:

TRUE

Example

SFH	...	SME_ID_REFS	...
SMF	...	5|6|12	...

SME_ID_REF_ambiguity_code

Description

If multiple values are given under SME_ID_REFS, one of the following codes MUST be provided. 1=Ambiguous identification; 2=Only different evidence streams for the same molecule with no ambiguity; 3=Both ambiguous identification and multiple evidence streams. If there are no or one value under SME_ID_REFs, this MUST be reported as null.

Type

Integer

Mandatory

False

Is Nullable:

TRUE

Example

SFH	...	SME_ID_REF_ambiguity_code	...
SMF	...	1	...

adduct_ion

Description

The assumed classification of this molecule’s adduct ion after detection, following the general style in the 2013 IUPAC recommendations on terms relating to MS e.g. [M+H]1+, [M+Na]1+, [M+NH4]1+, [M-H]1-, [M+Cl]1-.

Type

String

Mandatory

False

Is Nullable:

TRUE

Example

SFH	...	adduct_ion	...
SMF	...	[M+H]1+	...
SMF	...	[M+2Na]2+	...

isotopomer

Description

If de-isotoping has not been performed, then the isotopomer quantified MUST be reported here e.g. “+1”, “+2”, “13C peak” using CV terms, otherwise (i.e. for approaches were SMF rows are de-isotoped features) this MUST be null.

Type

Parameter

Mandatory

False

Is Nullable:

TRUE

Example

SFH	...	isotopomer	...
SMF	...	[MS,MS:1002957,”isotopomer MS peak”,”13C peak”]	...

exp_mass_to_charge

Description

The experimental mass/charge value for the feature, by default assumed to be the mean across assays or a representative value. For approaches that report isotopomers as SMF rows, then the m/z of the isotopomer MUST be reported here.

Type

Double

Mandatory

True

Is Nullable:

FALSE

Example

SFH	...	exp_mass_to_charge	...
SMF	...	1234.5	...

charge

Description

The feature’s charge value using positive integers both for positive and negative polarity modes.

Type

Integer

Mandatory

False

Is Nullable:

FALSE

Example

SFH	...	charge	...
SMF	...	1	...

retention_time_in_seconds

Description

The apex of the feature on the retention time axis, in a Master or aggregate MS run. Retention time MUST be reported in seconds. Retention time values for individual MS runs (i.e. before alignment) MAY be reported as optional columns. Retention time SHOULD only be null in the case of direct infusion MS or other techniques where a retention time value is absent or unknown. Relative retention time or retention time index values MAY be reported as optional columns, and could be considered for inclusion in future versions of mzTab as appropriate.

Type

Double

Mandatory

False

Is Nullable:

TRUE

Example

SFH	...	retention_time_in_seconds	...
SMF	...	1345.7	...

retention_time_in_seconds_start

Description

The start time of the feature on the retention time axis, in a Master or aggregate MS run. Retention time MUST be reported in seconds. Retention time start and end SHOULD only be null in the case of direct infusion MS or other techniques where a retention time value is absent or unknown and MAY be reported in optional columns.

Type

Double

Mandatory

False

Is Nullable:

TRUE

Example

SFH	...	retention_time_in_seconds_start	...
SMF	...	1327	...

retention_time_in_seconds_end

Description

The end time of the feature on the retention time axis, in a Master or aggregate MS run. Retention time MUST be reported in seconds. Retention time start and end SHOULD only be null in the case of direct infusion MS or other techniques where a retention time value is absent or unknown and MAY be reported in optional columns.

Type

Double

Mandatory

False

Is Nullable:

TRUE

Example

SFH	...	retention_time_in_seconds_end	...
SMF	...	1327.8	...

abundance_assay

Description

The feature’s abundance in every assay described in the metadata section MUST be reported. Null or zero values may be reported as appropriate.

Type

Double List

Mandatory

False

Is Nullable:

TRUE

Example

SFH	...	abundance_assay	...
SMF	...	38648	...

opt_{identifier}_*

Description

Additional columns can be added to the end of the small molecule feature table. These column headers MUST start with the prefix “opt_” followed by the {identifier} of the object they reference: assay, study variable, MS run or “global” (if the value relates to all replicates). Column names MUST only contain the following characters: 'A'-'Z', 'a'-'z', '0'-'9', '', '-', '[', ']', and ':'. CV parameter accessions MAY be used for optional columns following the format: opt{identifier}_cv_{accession}_parameter name}. Spaces within the parameter’s name MUST be replaced by '_'.

Type

Optional Column

Mandatory

False

Is Nullable:

TRUE

Example

SFH	...	opt_global_cv_value	...
SMF	...	opt_assay[1]_my_value=My value	...
SMF	...	opt_global_another_value=some other value	...

Small Molecule Evidence (SME) Section

The small molecule evidence section is table-based, representing identification evidence for small molecules or features (e.g., database search results). It MUST always come after the Small Molecule Feature Section. All columns are MANDATORY except for "opt_" columns.

The order of columns MUST follow the order specified below. All table columns MUST be Tab separated. There MUST NOT be any empty cells. Missing values MUST be reported using "null".

SME_ID

Description

A within file unique identifier for the small molecule evidence result.

Type

Integer

Mandatory

True

Is Nullable:

FALSE

Example

SEH	...	SME_ID	...
SME	...	1	...

evidence_input_id

Description

A within file unique identifier for the input data used to support this identification e.g. fragment spectrum, RT and m/z pair, isotope profile that was used for the identification process, to serve as a grouping mechanism, whereby multiple rows of results from the same input data share the same ID. The identifiers may be human readable but should not be assumed to be interpretable. For example, if fragmentation spectra have been searched then the ID may be the spectrum reference, or for accurate mass search, the ms_run[2]:458.75.

Type

String

Mandatory

True

Is Nullable:

FALSE

Example

SEH	...	evidence_input_id	...
SME	...	ms_run[1]:mass=278.65;rt=376.5	...

database_identifier

Description

The putative identification for the small molecule sourced from an external database, using the same prefix specified in database[1-n]-prefix. This could include additionally a chemical class or an identifier to a spectral library entity, even if its actual identity is unknown. For the “no database” case, 'null' must be used. The unprefixed use of 'null' is prohibited for any other case. If no putative identification can be reported for a particular database, it MUST be reported as the database prefix followed by null.

Type

String

Mandatory

True

Is Nullable:

TRUE

Example

SEH	...	database_identifier	...
SME	...	CID:00027395	...

chemical_formula

Description

The chemical formula of the identified compound e.g. in a database, assumed to match the theoretical mass to charge (in some cases this will be the derivatized form, including adducts and protons). This should be specified in Hill notation (EA Hill 1900), i.e. elements in the order C, H and then alphabetically all other elements. Counts of one may be omitted. Elements should be capitalized properly to avoid confusion (e.g., “CO” vs. “Co”). The chemical formula reported should refer to the neutral form. Charge state is reported by the charge field. Example N-acetylglucosamine would be encoded by the string “C8H15NO6”

Type

String

Mandatory

False

Is Nullable:

TRUE

Example

SEH	...	chemical_formula	...
SME	...	C17H20N4O2	...

smiles

Description

The potential molecule’s structure in the simplified molecular-input line-entry system (SMILES) for the small molecule.

Type

String

Mandatory

False

Is Nullable:

TRUE

Example

SEH	...	smiles	...
SME	...	C1=CC=C(C=C1)CCNC(=O)CCNNC(=O)C2=CC=NC=C2	...

inchi

Description

A standard IUPAC International Chemical Identifier (InChI) for the given substance.

Type

String

Mandatory

False

Is Nullable:

TRUE

Example

SEH	...	inchi	...
SME	...	InChI=1S/C17H20N4O2/c22-16(19-12-6-14-4-2-1-3-5-14)9-13-20-21-17(23)15-7-10-18-11-8-15/h1-5,7-8,10-11,20H,6,9,12-13H2,(H,19,22)(H,21,23)	...

chemical_name

Description

The small molecule’s chemical/common name, or general description if a chemical name is unavailable.

Type

String

Mandatory

False

Is Nullable:

TRUE

Example

SEH	...	chemical_name	...
SME	...	N-(2-phenylethyl)-3-[2-(pyridine-4-carbonyl)hydrazinyl]propanamide	...

uri

Description

A URI pointing to the small molecule’s entry in a database (e.g., the small molecule’s HMDB, Chebi or KEGG entry).

Type

URI

Mandatory

False

Is Nullable:

TRUE

Example

SEH	...	uri	...
SME	...	http://www.hmdb.ca/metabolites/HMDB00054	...

derivatized_form

Description

The derivatized form of the small molecule, if the identification was based on a specific derivative (e.g. 2 TMS). This MUST be specified using CV terms (where possible) otherwise “null”.

Type

Parameter

Mandatory

False

Is Nullable:

TRUE

Example

SEH	...	derivatized_form	...
SME	...	[CHEBI, CHEBI:51088, trimethylsilyl group, 3]	...

adduct_ion

Description

The assumed classification of this molecule’s adduct ion after detection, following the general style in the 2013 IUPAC recommendations on terms relating to MS e.g. [M+H]1+, [M+Na]1+, [M+NH4]1+, [M-H]1-, [M+Cl]1-. If the adduct classification is ambiguous with regards to identification evidence it MAY be null.

Type

Regex

^\[\d*M([-][\w\d])\]\d[+-]$


Mandatory

False

Is Nullable:

TRUE

Example

SEH	...	adduct_ion	...
SME	...	[M+H]+	...

exp_mass_to_charge

Description

The experimental mass/charge value for the precursor ion. If multiple adduct forms have been combined into a single identification event/search, then a single value e.g. for the protonated form SHOULD be reported here.

Type

Double

Mandatory

True

Is Nullable:

FALSE

Example

SEH	...	exp_mass_to_charge	...
SME	...	1234.5	...

charge

Description

The small molecule evidence’s charge value using positive integers both for positive and negative polarity modes.

Type

Integer

Mandatory

True

Is Nullable:

FALSE

Example

SEH	...	charge	...
SME	...	1	...

theoretical_mass_to_charge

Description

The theoretical mass/charge value for the small molecule or the database mass/charge value (for a spectral library match).

Type

Double

Mandatory

True

Is Nullable:

FALSE

Example

SEH	...	theoretical_mass_to_charge	...
SME	...	1234.71	...

spectra_ref

Description

Reference to a spectrum in a spectrum file, for example a fragmentation spectrum has been used to support the identification. If a separate spectrum file has been used for fragmentation spectrum, this MUST be reported in the metadata section as additional ms_runs. The reference must be in the format ms_run[1-n]:{SPECTRA_REF} where SPECTRA_REF MUST follow the format defined in 5.2 (including references to chromatograms where these are used to inform identification). Multiple spectra MUST be referenced using a | delimited list for the (rare) cases in which search engines have combined or aggregated multiple spectra in advance of the search to make identifications. If a fragmentation spectrum has not been used, the value should indicate the ms_run to which is identification is mapped e.g. “ms_run[1]”.

Type

String List

Mandatory

True

Is Nullable:

FALSE

Example

SEH	...	spectra_ref	...
SME	...	ms_run[1]:index=5|ms_run[2]:index=3	...

identification_method

Description

The search engine or algorithm used for the identification. This SHOULD be specified using CV terms.

Type

Parameter

Mandatory

True

Is Nullable:

FALSE

Example

SEH	...	identification_method	...
SME	...	[MS, MS:1001477, SpectraST,]	...

ms_level

Description

The MS level of the spectrum used for the identification. This SHOULD be specified using CV terms.

Type

Parameter

Mandatory

True

Is Nullable:

FALSE

Example

SEH	...	ms_level	...
SME	...	[MS, MS:1000511, ms level, 2]	...

id_confidence_measure

Description

Any statistical value or score for the identification. The metadata section reports the type of score used, as id_confidence_measure[1-n] of type Param.

Type

Double List

Mandatory

False

Is Nullable:

TRUE

Example

SEH	...	id_confidence_measure	...
SME	...	0.7	...

rank

Description

The rank of this identification from this approach as increasing integers from 1 (best ranked identification). Ties (equal score) are represented by using the same rank - defaults to 1 if there is no ranking system used.

Type

Integer

Mandatory

True

Is Nullable:

FALSE

Example

SEH	...	rank	...
SME	...	1	...

opt_{identifier}_*

Description

Additional columns can be added to the end of the small molecule evidence table. These column headers MUST start with the prefix “opt_” followed by the {identifier} of the object they reference: assay, study variable, MS run or “global” (if the value relates to all replicates). Column names MUST only contain the following characters: 'A'-'Z', 'a'-'z', '0'-'9', '', '-', '[', ']', and ':'. CV parameter accessions MAY be used for optional columns following the format: opt{identifier}_cv_{accession}_{parameter name}. Spaces within the parameter’s name MUST be replaced by '_'.

Type

Optional Column

Mandatory

False

Is Nullable:

TRUE

Example

SEH	...	opt_global_cv_value	...
SME	...	opt_assay[1]_my_value=My value	...
SME	...	opt_global_another_value=some other value	...

Refer to mzTab_m_schema.adoc in the repository root for the current generated field reference, or run the generator:

python3 schema/generate_schema_adoc.py schema/mzTab_2_1-M.json > mzTab_m_schema.adoc

Using the Schema

Validator Generation

The JSON schema can be used directly with json-schema-validator (Java) or jsonschema (Python) to validate mzTab-M metadata objects.

Domain Model Generation

OpenAPI-compatible tooling (e.g., openapi-generator) can produce client models from schema/mzTab_2_1-M_openapi.json. The jmzTab-m Java library is the reference implementation generated this way.