Schema Reference
The mzTab-M JSON schema provides a machine-readable definition of the format structure. It is used for domain model generation, parser testing, and automated validation.
Schema Files
The schemas are located in the schema/ directory of the repository:
schema/mzTab_2_1-M.json-
The primary JSON schema for mzTab-M 2.1, using JSON Schema draft-07 / OpenAPI 3.1 conventions.
schema/mzTab_2_1-M_mapping.xml-
Mapping file that associates mzTab-M fields with their controlled vocabulary term roots.
schema/mzTab_2_1-M_metabolights_mapping.xml-
MetaboLights-specific mapping for submission validation.
Field Reference
The field reference below is generated from schema/mzTab_2_1-M.json via schema/generate_schema_adoc.py.
Field Reference
This document provides a reference for all fields defined in the mzTab-M format, organised by section and ordered by mzTab-M section hierarchy. Each field entry includes a description, type, mandatory status, and example usage.
Sections
The mzTab-M format consists of four cross-referenced data tables: metadata (MTD), Small Molecule (SML), Small Molecule Feature (SMF), and Small Molecule Evidence (SME). The MTD and SML tables are mandatory. SMF and SME sections SHOULD also be included to capture full identification evidence.
Metadata Section
The metadata section provides additional information about the dataset(s) reported in the mzTab file. All fields in the metadata section are optional apart from those noted as mandatory. The fields in the metadata section MUST be reported in the order listed below. The field name and value MUST be separated by a tab character.
mzTab-version
Description |
Version number of the mzTab format used. Format: Used to ensure compatibility and processing correctness. |
Type |
|
Mandatory |
True |
Example |
MTD mzTab-version 2.0.0-M MTD mzTab-version 2.1.0-M |
mzTab-ID
Description |
Unique identifier for the mzTab-M document. REQUIRED. Can be: - Repository accession number (e.g., MTBLS214) - Laboratory internal identifier - Study-specific identifier NOT intended as a globally unique identifier, but SHOULD have local meaning within its context. |
Type |
String |
Mandatory |
True |
Example |
MTD mzTab-ID MTBLS214 MTD mzTab-ID LAB001_2023 |
title
Description |
Human-readable title of the experiment or study. OPTIONAL. SHOULD be: - Concise but informative - Reflect the main focus of the study - Unique within a collection of related studies |
Type |
String |
Mandatory |
False |
Example |
MTD title Metabolomic Analysis of Human Plasma in Diabetes Type 2 MTD title Lipidomics Study of Brain Tissue in Alzheimer's Disease |
description
Description |
Detailed description of the experiment or study. OPTIONAL. SHOULD include: - Study objectives - Experimental design overview - Key methodological approaches - Any unique aspects of the study Provides context for understanding the data and its significance. |
Type |
String |
Mandatory |
False |
Example |
MTD description Investigation of metabolic changes in human plasma samples from type 2 diabetes patients compared to healthy controls. Study includes both fasting and post-prandial measurements. MTD description Analysis of lipid profiles in brain tissue samples examining the relationship between specific lipid species and Alzheimer's disease progression. |
sample_processing[1-n]
Description |
Parameters specifying sample processing that was applied within one step. |
Type |
Parameter List |
Mandatory |
False |
Example |
MTD sample_processing[1] [MSIO, MSIO:0000107, metabolism quenching using precooled 60 percent methanol ammonium bicarbonate buffer,] MTD sample_processing[2] [MSIO, MSIO:0000146, centrifugation,] MTD sample_processing[3] [MSIO, MSIO:0000141, metabolite extraction,] MTD sample_processing[4] [MSIO, MSIO:0000141, silylation,] |
The name, source, analyzer and detector of the instruments used in the experiment. Multiple instruments are numbered [1-n].
instrument[1-n]-analyzer[1-n]
Description |
The instrument’s mass analyzer, as defined by the parameter. |
Type |
Parameter List |
Mandatory |
False |
instrument[1-n]-detector
Description |
The instrument’s mass analyzer, as defined by the parameter. |
Type |
Parameter |
Mandatory |
False |
software[1-n]
Description |
The software utilized. |
Type |
Parameter |
Mandatory |
False |
Example |
MTD software[1] [MS, MS:1002879, Progenesis QI, 3.0] MTD software[1]-setting Fragment tolerance = 0.1 Da … MTD software[2]-setting Parent tolerance = 0.5 Da |
software[1-n]-setting[1-n]
Description |
A software setting used. This field MAY occur multiple times for a single software. The value of this field is deliberately set as a String, since there currently do not exist cvParams for every possible setting. |
Type |
String List |
Mandatory |
False |
publication[1-n]
Description |
The publication item ids referenced by this publication. |
Type |
String List |
Mandatory |
True |
Example |
MTD publication[1] pubmed:21063943|doi:10.1007/978-1-60761-987-1_6 MTD publication[2] pubmed:20615486|doi:10.1016/j.jprot.2010.06.008 |
The contact’s name, affiliation and e-mail. Several contacts can be given by indicating the number in the square brackets after "contact". A contact has to be supplied in the format [first name] [initials] [last name].
uri[1-n]
Description |
The URI pointing to the external resource. |
Type |
URI |
Mandatory |
False |
Example |
MTD uri[1] https://www.ebi.ac.uk/metabolights/MTBLS517 … MTD external_study_uri[1] https://www.ebi.ac.uk/metabolights/MTBLS517/files/i_Investigation.txt |
external_study_uri[1-n]
Description |
The URI pointing to the external resource. |
Type |
URI |
Mandatory |
False |
Example |
MTD uri[1] https://www.ebi.ac.uk/metabolights/MTBLS517 … MTD external_study_uri[1] https://www.ebi.ac.uk/metabolights/MTBLS517/files/i_Investigation.txt |
quantification_method
Description |
The quantification method used in the experiment reported in the file. |
Type |
Parameter |
Mandatory |
True |
sample[1-n]
Description |
The sample’s name. |
Type |
String |
Mandatory |
False |
Example |
COM Experiment where all samples consisted of the same two species MTD sample[1] individual number 1 MTD sample[1]-species[1] [NCBITaxon, NCBITaxon:9606, Homo sapiens, ] MTD sample[1]-tissue[1] [BTO, BTO:0000759, liver, ] MTD sample[1]-cell_type[1] [CL, CL:0000182, hepatocyte, ] MTD sample[1]-disease[1] [DOID, DOID:684, hepatocellular carcinoma, ] MTD sample[1]-disease[2] [DOID, DOID:9451, alcoholic fatty liver, ] MTD sample[1]-description Hepatocellular carcinoma samples. MTD sample[1]-custom[1] [,,Extraction date, 2011-12-21] MTD sample[1]-custom[2] [,,Extraction reason, liver biopsy] MTD sample[2] individual number 2 MTD sample[2]-species[1] [NCBITaxon, NCBITaxon:9606, Homo sapiens, ] MTD sample[2]-tissue[1] [BTO, BTO:0000759, liver, ] MTD sample[2]-cell_type[1] [CL, CL:0000182, hepatocyte, ] MTD sample[2]-description Healthy control samples. |
sample[1-n]-species[1-n]
Description |
Biological species information on the sample. |
Type |
Parameter List |
Mandatory |
False |
sample[1-n]-tissue[1-n]
Description |
Biological tissue information on the sample. |
Type |
Parameter List |
Mandatory |
False |
sample[1-n]-cell_type[1-n]
Description |
Biological cell type information on the sample. |
Type |
Parameter List |
Mandatory |
False |
sample[1-n]-disease[1-n]
Description |
Disease information on the sample. |
Type |
Parameter List |
Mandatory |
False |
sample[1-n]-description
Description |
A free form description of the sample. |
Type |
String |
Mandatory |
False |
sample[1-n]-custom[1-n]
Description |
Additional user or cv parameters. |
Type |
Parameter List |
Mandatory |
False |
Specification of ms_run. location: Location of the external data file e.g. raw files on which analysis has been performed. If the actual location of the MS run is unknown, a “null” MUST be used as a place holder value, since the [1-n] cardinality is referenced elsewhere. If pre-fractionation has been performed, then [1-n] ms_runs SHOULD be created per assay. instrument_ref: If different instruments are used in different runs, instrument_ref can be used to link a specific instrument to a specific run. format: Parameter specifying the data format of the external MS data file. If ms_run[1-n]-format is present, ms_run[1-n]-id_format SHOULD also be present, following the parameters specified in Table 1. id_format: Parameter specifying the id format used in the external data file. If ms_run[1-n]-id_format is present, ms_run[1-n]-format SHOULD also be present. fragmentation_method: The type(s) of fragmentation used in a given ms run. scan_polarity: The polarity mode of a given run. Usually only one value SHOULD be given here except for the case of mixed polarity runs. hash: Hash value of the corresponding external MS data file defined in ms_run[1-n]-location. If ms_run[1-n]-hash is present, ms_run[1-n]-hash_method SHOULD also be present. hash_method: A parameter specifying the hash methods used to generate the String in ms_run[1-n]-hash. Specifics of the hash method used MAY follow the definitions of the mzML format. If ms_run[1-n]-hash is present, ms_run[1-n]-hash_method SHOULD also be present.
ms_run[1-n]-id_format
Description |
The format of the IDs in the MS run file. |
Type |
Parameter |
Mandatory |
False |
ms_run[1-n]-fragmentation_method[1-n]
Description |
The fragmentation methods applied during this msRun. |
Type |
Parameter List |
Mandatory |
False |
ms_run[1-n]-scan_polarity[1-n]
Description |
The scan polarity/polarities used during this msRun. |
Type |
Parameter List |
Mandatory |
False |
ms_run[1-n]-hash
Description |
The file hash value of this msRun’s data file. |
Type |
String |
Mandatory |
False |
ms_run[1-n]-hash_method
Description |
The method used to calculate the hash. |
Type |
Parameter |
Mandatory |
False |
ms_run[1-n]-parameters[1-n]
Description |
Additional parameters of the assay, separated by bars. |
Type |
Parameter List |
Mandatory |
False |
Example |
MTD ms_run[1]-parameter[1] [MS, MS:1000031, instrument model, [MS, MS:1000449, LTQ Orbitrap,]] |
assay[1-n]
Description |
The assay name. |
Type |
String |
Mandatory |
True |
Example |
MTD assay[1] first assay MTD assay[1]-custom[1] [MS, , Assay operator, Fred Blogs] MTD assay[1]-external_uri https://www.ebi.ac.uk/metabolights/MTBLS517/files/i_Investigation.txt?STUDYASSAY=a_e04_c18pos.txt MTD assay[1]-sample_ref sample[1] MTD assay[1]-ms_run_ref ms_run[1] |
assay[1-n]-custom[1-n]
Description |
Additional user or cv parameters. |
Type |
Parameter List |
Mandatory |
False |
assay[1-n]-external_uri
Description |
An external URI to further information about this assay. |
Type |
URI |
Mandatory |
False |
assay[1-n]-ms_run_ref[1-n]
Description |
The ms run(s) referenced by this assay. |
Type |
Integer List |
Mandatory |
True |
assay[1-n]-protocol_refs[1-n]
Description |
The protocol(s) referenced by this assay. |
Type |
Integer List |
Mandatory |
False |
Example |
MTD assay[1]-protocol_ref protocol[1]| protocol[2] |
assay[1-n]-parameters[1-n]
Description |
Additional parameters of the assay, separated by bars. |
Type |
Parameter List |
Mandatory |
False |
Example |
MTD assay[1]-parameter[1] [MS, MS:1000031, instrument model, [MS, MS:1000449, LTQ Orbitrap,]] |
study_variable[1-n]
Description |
The study variable value. Encoded according to the datatype declared on the referenced study_variable_group: either a literal value (for xsd:* datatypes) or a Parameter (for the Parameter datatype, e.g. |
Type |
Study Variable List |
Mandatory |
True |
Example |
MTD study_variable[1] control MTD study_variable[1]-assay_refs assay[1]| assay[2]| assay[3] MTD study_variable-average_function [MS, MS:1002883, median, ] MTD study_variable-variation_function [MS, MS:1002885, standard error, ] MTD study_variable[1]-description Group B (spike-in 0.74 fmol/uL) MTD study_variable[2] 1 minute 0.5mg rapamycin |
study_variable[1-n]-assay_refs[1-n]
Description |
The assays referenced by this study variable. |
Type |
Integer List |
Mandatory |
False |
study_variable[1-n]-ms_run_refs[1-n]
Description |
The ms run(s) referenced by this study variable. |
Type |
Integer List |
Mandatory |
False |
Example |
MTD study_variable[1]-ms_run_ref ms_run[1]| ms_run[2] |
study_variable[1-n]-description
Description |
A free-form description of this study variable. |
Type |
String |
Mandatory |
False |
study_variable[1-n]-group_refs[1-n]
Description |
The study variable group this study variable belongs to. |
Type |
Integer List |
Mandatory |
False |
Example |
MTD study_variable[1]-group_ref study_variable_group[1]| study_variable_group[2] |
study_variable[1-n]-average_function
Description |
The function used to calculate the study variable quantification value and the operation used is not arithmetic mean (default). e.g. geometric mean, median. |
Type |
Parameter |
Mandatory |
False |
study_variable[1-n]-variation_function
Description |
The function used to calculate the study variable quantification variation value if it is reported and the operation used is not coefficient of variation (default). e.g. standard error. |
Type |
Parameter |
Mandatory |
False |
study_variable_group[1-n]
Description |
The study variable group name. |
Type |
Parameter |
Mandatory |
True |
Example |
MTD study_variable_group[1] [PATO, PATO:0000383, sex, ] MTD study_variable_group[1]-description Sex of the individual MTD study_variable_group[1]-type [STATO, STATO:0000252, categorical variable, ] MTD study_variable_group[1]-datatype xsd:string MTD study_variable_group[2] [PATO, PATO:0000384, timepoint, ] MTD study_variable_group[2]-description Time after treatment MTD study_variable_group[2]-type [STATO, STATO:0000228, ordinal variable, ] MTD study_variable_group[2]-datatype xsd:integer MTD study_variable_group[2]-unit [UO, UO:0000033, day, ] MTD study_variable[1] Female_0day MTD study_variable[1]-group_ref study_variable_group[1] MTD study_variable[1]-assay_refs assay[1]|assay[2]|assay[3] MTD study_variable[2] Female_1day MTD study_variable[2]-group_ref study_variable_group[2] MTD study_variable[2]-assay_refs assay[4]|assay[5]|assay[6] |
study_variable_group[1-n]-description
Description |
Description of the study variable group. |
Type |
String |
Mandatory |
False |
study_variable_group[1-n]-type
Description |
The study variable group type, as defined by the parameter. |
Type |
Parameter |
Mandatory |
False |
study_variable_group[1-n]-datatype
Description |
The datatype of the group variable, which determines how they can be encoded and parsed in mzTab-M files, and how the values could be handled in programming languages. Producers of mzTab-M 2.1.0 SHOULD provide a The following datatypes are supported:
Writers MUST ensure that the values of the Tools and parsers implementing mzTab-M SHOULD apply the appropriate data type interpretation when reading study_variable_group data and constructing analysis data structures (e.g., data frames, matrices, or tables). |
Type |
Parameter |
Mandatory |
False |
Example |
MTD study_variable_group[1]-datatype .... MTD study_variable_group[1]-datatype xsd:string MTD study_variable_group[2]-datatype xsd:decimal MTD study_variable_group[3]-datatype xsd:date MTD study_variable_group[4]-datatype Parameter COM plain string value: MTD study_variable[1] Male MTD study_variable[1]-group_ref study_variable_group[1] COM user-defined Parameter value: MTD study_variable[2] [,,Male,] MTD study_variable[2]-group_ref study_variable_group[4] COM CV Parameter value: MTD study_variable[3] [NCIT, NCIT:C20197, Male, ] MTD study_variable[3]-group_ref study_variable_group[4] .... |
study_variable_group[1-n]-unit
Description |
The study variable group unit, as defined by the parameter. |
Type |
Parameter |
Mandatory |
False |
A protocol describing one or more steps of an experimental procedure, such as sample preparation, data acquisition or data processing. Protocols are referenced from Assay objects. Added in mzTab-M 2.1.
protocol[1-n]-type
Description |
The protocol type, as defined by the parameter. |
Type |
Parameter |
Mandatory |
True |
protocol[1-n]-parameters[1-n]
Description |
The protocol parameters. |
Type |
Parameter List |
Mandatory |
False |
custom[1-n]
Description |
Any additional parameters describing the analysis reported. |
Type |
Parameter List |
Mandatory |
False |
Example |
MTD custom [MS, MS:1000001, custom param, value] |
Specification of controlled vocabularies. label: A string describing the labels of the controlled vocabularies/ontologies used in the mzTab file as a short-hand e.g. "MS" for PSI-MS. full_name: A string describing the full names of the controlled vocabularies/ontologies used in the mzTab file. version: A string describing the version of the controlled vocabularies/ontologies used in the mzTab file. uri: A string containing the URIs of the controlled vocabularies/ontologies used in the mzTab file.
cv[1-n]-version
Description |
The CV version used when the file was generated. |
Type |
String |
Mandatory |
True |
database[1-n]
Description |
The database name. |
Type |
Database List |
Mandatory |
True |
Example |
MTD database[1] [MIRIAM, MIR:00100079, HMDB, ] MTD database[1]-prefix hmdb MTD database[1]-version 3.6 MTD database[1]-uri https://www.hmdb.ca MTD database[2] [,, "de novo", ] MTD database[2]-prefix dn MTD database[2]-version Unknown MTD database[2]-uri null MTD database[3] [,, "no database", null ] MTD database[3]-prefix null MTD database[3]-version Unknown MTD database[3]-uri null |
database[1-n]-prefix
Description |
The prefix used in the “identifier” column of data tables. For the 'no database' case 'null' must be used. |
Type |
String |
Mandatory |
True |
database[1-n]-version
Description |
The database version is mandatory where identification has been performed. This may be a formal version number e.g. “1.4.1”, a date of access “2016-10-27” (ISO-8601 format) or “Unknown” if there is no suitable version that can be annotated. |
Type |
String |
Mandatory |
True |
database[1-n]-uri
Description |
The URI to the database. For the “no database” case, 'null' must be reported. |
Type |
String |
Mandatory |
True |
derivatization_agent[1-n]
Description |
A description of derivatization agents applied to small molecules, using userParams or CV terms where possible. |
Type |
Parameter List |
Mandatory |
False |
Example |
MTD derivatization_agent[1] [XLMOD, XLMOD:07014, N-methyl-N-t-butyldimethylsilyltrifluoroacetamide, ] |
small_molecule-quantification_unit
Description |
Defines what type of units are reported in the small molecule summary quantification / abundance fields |
Type |
Parameter |
Mandatory |
True |
Example |
MTD small_molecule-quantification_unit [MS, MS:1001113, peak area, ] |
small_molecule_feature-quantification_unit
Description |
Defines what type of units are reported in the small molecule feature quantification / abundance fields. |
Type |
Parameter |
Mandatory |
False |
Example |
MTD small_molecule_feature-quantification_unit [MS, MS:1001113, peak area, ] |
small_molecule-identification_reliability
Description |
The system used for giving reliability / confidence codes to small molecule identifications MUST be specified if not using the default codes. |
Type |
Parameter |
Mandatory |
False |
Example |
MTD small_molecule-identification_reliability [MS, MS:1000932, identification reliability, ] |
id_confidence_measure[1-n]
Description |
Small molecule identification confidence metrics.<br/>Scoring System
- Use CV parameters numbered Scores determine confidence in molecular identifications |
Type |
Parameter List |
Mandatory |
True |
Example |
MTD id_confidence_measure[1] [MS,MS:1002890,fragmentation score,] MTD id_confidence_measure[2] [MS,MS:1002891,retention time score,] |
colunit-small_molecule
Description |
Unit definitions for small molecule data columns. Format
- Pattern: Important Notes
- Not for quantification columns
- Use |
Type |
Column Parameter Mapping List |
Mandatory |
False |
Example |
MTD colunit-small_molecule retention_time=[UO,UO:0000031,minute,] MTD colunit-small_molecule mass=[UO,UO:0000221,dalton,] |
colunit-small_molecule_feature
Description |
Defines the used unit for a column in the small molecule feature section. The format of the value has to be {column name}={Parameter defining the unit}. This field MUST NOT be used to define a unit for quantification columns. The unit used for small molecule quantification values MUST be set in small_molecule_feature-quantification_unit. |
Type |
Column Parameter Mapping List |
Mandatory |
False |
Example |
MTD colunit-small_molecule_feature retention_time=[UO, UO:0000031, minute, ] |
colunit-small_molecule_evidence
Description |
Defines the used unit for a column in the small molecule evidence section. The format of the value has to be {column name}={Parameter defining the unit}. |
Type |
Column Parameter Mapping List |
Mandatory |
False |
Example |
MTD colunit-small_molecule_evidence retention_time=[UO, UO:0000031, minute, ] |
Small Molecule (SML) Section
The small molecule section is table-based. It MUST always come after the metadata section. Each row reports one final quantified molecule result. All columns are MANDATORY except for "opt_" columns.
The order of columns MUST follow the order specified below. All table columns MUST be Tab separated. There MUST NOT be any empty cells. Missing values MUST be reported using "null".
SML_ID
Description |
A within file unique identifier for the small molecule summary. |
Type |
Integer |
Mandatory |
True |
Is Nullable: |
FALSE |
Example |
SMH ... SML_ID ... SML ... 1 ... |
SMF_ID_REFS
Description |
References to the small molecule features (SMF elements) via referencing SMF_ID values. Multiple values MAY be provided as a | separated list to indicate which features were used to aggregate the SML row. |
Type |
Integer List |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SMH ... SMF_ID_REFS ... SML ... 2|3|11 ... |
database_identifier
Description |
A list of | separated possible identifiers for the small molecule; multiple values MUST only be provided to indicate ambiguity in the identification of the molecule and not to demonstrate different identifier types for the same molecule. Alternative identifiers for the same molecule MAY be provided as optional columns. The database identifier must be preceded by the resource description (prefix) followed by a colon, as specified in the metadata section. A null value MAY be provided if the identification is sufficiently ambiguous as to be meaningless for reporting or the small molecule has not been identified. |
Type |
String List |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SMH ... database_identifier ... SML ... CID:00027395|HMDB:HMDB0001847 ... |
chemical_formula
Description |
The chemical formula of the identified compound e.g. in a database, assumed to match the theoretical mass to charge (in some cases this will be the derivatized form, including adducts and protons). This should be specified in Hill notation (EA Hill 1900), i.e. elements in the order C, H and then alphabetically all other elements. Counts of one may be omitted. Elements should be capitalized properly to avoid confusion (e.g., “CO” vs. “Co”). The chemical formula reported should refer to the neutral form. Charge state is reported by the charge field in the SME and SMF section. Example N-acetylglucosamine would be encoded by the string “C8H15NO6” |
Type |
String List |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SMH ... chemical_formula ... SML ... C17H20N4O2 ... |
smiles
Description |
The potential molecule’s structure in the simplified molecular-input line-entry system (SMILES) for the small molecule. |
Type |
String List |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SMH ... smiles ... SML ... C1=CC=C(C=C1)CCNC(=O)CCNNC(=O)C2=CC=NC=C2 ... |
inchi
Description |
A standard IUPAC International Chemical Identifier (InChI) for the given substance. |
Type |
String List |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SMH ... inchi ... SML ... InChI=1S/C17H20N4O2/c22-16(19-12-6-14-4-2-1-3-5-14)9-13-20-21-17(23)15-7-10-18-11-8-15/h1-5,7-8,10-11,20H,6,9,12-13H2,(H,19,22)(H,21,23) ... |
chemical_name
Description |
The small molecule’s chemical/common name, or general description if a chemical name is unavailable. |
Type |
String List |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SMH ... chemical_name ... SML ... N-(2-phenylethyl)-3-[2-(pyridine-4-carbonyl)hydrazinyl]propanamide ... |
uri
Description |
A URI pointing to the small molecule’s entry in a database (e.g., the small molecule’s HMDB, Chebi or KEGG entry). |
Type |
String List |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SMH ... uri ... SML ... http://www.genome.jp/dbget-bin/www_bget?cpd:C00031 ... SML ... http://www.hmdb.ca/metabolites/HMDB0001847 ... |
theoretical_neutral_mass
Description |
The theoretical neutral mass of the small molecule. This should be calculated from the chemical formula. |
Type |
Double List |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SMH ... theoretical_neutral_mass ... SML ... 1234.5 ... |
adduct_ions
Description |
A | separated list of the detected adduct ion forms for this small molecule. The terms should follow the general style in the 2013 IUPAC recommendations on terms relating to MS e.g. [M+H]1+, [M+Na]1+, [M+NH4]1+, [M-H]1-, [M+Cl]1-. |
Type |
Regex List ^\[\d*M([+-][\w\d]+)*\]\d*[+-]$ |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SMH ... adduct_ions ... SML ... [M+H]1+|[M+Na]1+ ... |
reliability
Description |
The reliability of the given small molecule identification. This must be supplied by the resource and should be reported as an integer between 1-4: 1: identified, rigorous. … 2: identified. … 3: putatively characterized class. … 4: unknown. … |
Type |
String |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SMH ... reliability ... SML ... 3 ... SML ... 0 ... |
best_id_confidence_measure
Description |
The small molecule confidence measure/score of the best identification for this small molecule summary. The type of the value is defined by the best_id_confidence_measure CV parameter. The value is reported in the best_id_confidence_value column. |
Type |
Parameter |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SMH ... best_id_confidence_measure ... SML ... [MS, MS:1001477, SpectraST,,] ... |
best_id_confidence_value
Description |
The small molecule confidence measure/score value of the best identification for this small molecule summary. |
Type |
Double |
Mandatory |
True |
Is Nullable: |
FALSE |
Example |
SMH ... best_id_confidence_value ... SML ... 0.85 ... |
abundance_assay
Description |
The small molecule’s abundance in every assay described in the metadata section MUST be reported. Null or zero values may be reported as appropriate. |
Type |
Double List |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SMH ... abundance_assay ... SML ... 12340 ... |
abundance_study_variable
Description |
The small molecule’s abundance in every study variable described in the metadata section. Null or zero values may be reported as appropriate. |
Type |
Double List |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SMH ... abundance_study_variable ... SML ... 1230 ... |
abundance_variation_study_variable
Description |
The small molecule’s abundance variation in every study variable described in the metadata section. Null or zero values may be reported as appropriate. |
Type |
Double List |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SMH ... abundance_variation_study_variable ... SML ... 0.2 ... |
opt_{identifier}_*
Description |
Additional columns can be added to the end of the small molecule table. These column headers MUST start with the prefix “opt_” followed by the {identifier} of the object they reference: assay, study variable, MS run or “global” (if the value relates to all replicates). Column names MUST only contain the following characters: 'A'-'Z', 'a'-'z', '0'-'9', '', '-', '[', ']', and ':'. CV parameter accessions MAY be used for optional columns following the format: opt{identifier}_cv_{accession}_{parameter name}. Spaces within the parameter’s name MUST be replaced by '_'. |
Type |
Optional Column |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SMH ... opt_global_cv_value ... SML ... opt_global_cv_MS:1002217_decoy_peptide=null ... |
Small Molecule Feature (SMF) Section
The small molecule feature section is table-based, representing individual MS regions (generally the elution profile for all isotopomers from a single charge state). It MUST always come after the Small Molecule Section. All columns are MANDATORY except for "opt_" columns.
The order of columns MUST follow the order specified below. All table columns MUST be Tab separated. There MUST NOT be any empty cells. Missing values MUST be reported using "null".
SMF_ID
Description |
A within file unique identifier for the small molecule feature. |
Type |
Integer |
Mandatory |
True |
Is Nullable: |
FALSE |
Example |
SFH ... SMF_ID ... SMF ... 1 ... |
SME_ID_REFS
Description |
References to the identification evidence (SME elements) via referencing SME_ID values. Multiple values MAY be provided as a | separated list to indicate ambiguity in the identification or to indicate that different types of data supported the identifiction (see sme_id_ref_ambiguity_code). For the case of a consensus approach where multiple adduct forms are used to infer the SML ID, different features should just reference the same SME_ID value(s). |
Type |
Integer List |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SFH ... SME_ID_REFS ... SMF ... 5|6|12 ... |
SME_ID_REF_ambiguity_code
Description |
If multiple values are given under SME_ID_REFS, one of the following codes MUST be provided. 1=Ambiguous identification; 2=Only different evidence streams for the same molecule with no ambiguity; 3=Both ambiguous identification and multiple evidence streams. If there are no or one value under SME_ID_REFs, this MUST be reported as null. |
Type |
Integer |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SFH ... SME_ID_REF_ambiguity_code ... SMF ... 1 ... |
adduct_ion
Description |
The assumed classification of this molecule’s adduct ion after detection, following the general style in the 2013 IUPAC recommendations on terms relating to MS e.g. [M+H]1+, [M+Na]1+, [M+NH4]1+, [M-H]1-, [M+Cl]1-. |
Type |
String |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SFH ... adduct_ion ... SMF ... [M+H]1+ ... SMF ... [M+2Na]2+ ... |
isotopomer
Description |
If de-isotoping has not been performed, then the isotopomer quantified MUST be reported here e.g. “+1”, “+2”, “13C peak” using CV terms, otherwise (i.e. for approaches were SMF rows are de-isotoped features) this MUST be null. |
Type |
Parameter |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SFH ... isotopomer ... SMF ... [MS,MS:1002957,”isotopomer MS peak”,”13C peak”] ... |
exp_mass_to_charge
Description |
The experimental mass/charge value for the feature, by default assumed to be the mean across assays or a representative value. For approaches that report isotopomers as SMF rows, then the m/z of the isotopomer MUST be reported here. |
Type |
Double |
Mandatory |
True |
Is Nullable: |
FALSE |
Example |
SFH ... exp_mass_to_charge ... SMF ... 1234.5 ... |
charge
Description |
The feature’s charge value using positive integers both for positive and negative polarity modes. |
Type |
Integer |
Mandatory |
False |
Is Nullable: |
FALSE |
Example |
SFH ... charge ... SMF ... 1 ... |
retention_time_in_seconds
Description |
The apex of the feature on the retention time axis, in a Master or aggregate MS run. Retention time MUST be reported in seconds. Retention time values for individual MS runs (i.e. before alignment) MAY be reported as optional columns. Retention time SHOULD only be null in the case of direct infusion MS or other techniques where a retention time value is absent or unknown. Relative retention time or retention time index values MAY be reported as optional columns, and could be considered for inclusion in future versions of mzTab as appropriate. |
Type |
Double |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SFH ... retention_time_in_seconds ... SMF ... 1345.7 ... |
retention_time_in_seconds_start
Description |
The start time of the feature on the retention time axis, in a Master or aggregate MS run. Retention time MUST be reported in seconds. Retention time start and end SHOULD only be null in the case of direct infusion MS or other techniques where a retention time value is absent or unknown and MAY be reported in optional columns. |
Type |
Double |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SFH ... retention_time_in_seconds_start ... SMF ... 1327 ... |
retention_time_in_seconds_end
Description |
The end time of the feature on the retention time axis, in a Master or aggregate MS run. Retention time MUST be reported in seconds. Retention time start and end SHOULD only be null in the case of direct infusion MS or other techniques where a retention time value is absent or unknown and MAY be reported in optional columns. |
Type |
Double |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SFH ... retention_time_in_seconds_end ... SMF ... 1327.8 ... |
abundance_assay
Description |
The feature’s abundance in every assay described in the metadata section MUST be reported. Null or zero values may be reported as appropriate. |
Type |
Double List |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SFH ... abundance_assay ... SMF ... 38648 ... |
opt_{identifier}_*
Description |
Additional columns can be added to the end of the small molecule feature table. These column headers MUST start with the prefix “opt_” followed by the {identifier} of the object they reference: assay, study variable, MS run or “global” (if the value relates to all replicates). Column names MUST only contain the following characters: 'A'-'Z', 'a'-'z', '0'-'9', '', '-', '[', ']', and ':'. CV parameter accessions MAY be used for optional columns following the format: opt{identifier}_cv_{accession}_parameter name}. Spaces within the parameter’s name MUST be replaced by '_'. |
Type |
Optional Column |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SFH ... opt_global_cv_value ... SMF ... opt_assay[1]_my_value=My value ... SMF ... opt_global_another_value=some other value ... |
Small Molecule Evidence (SME) Section
The small molecule evidence section is table-based, representing identification evidence for small molecules or features (e.g., database search results). It MUST always come after the Small Molecule Feature Section. All columns are MANDATORY except for "opt_" columns.
The order of columns MUST follow the order specified below. All table columns MUST be Tab separated. There MUST NOT be any empty cells. Missing values MUST be reported using "null".
SME_ID
Description |
A within file unique identifier for the small molecule evidence result. |
Type |
Integer |
Mandatory |
True |
Is Nullable: |
FALSE |
Example |
SEH ... SME_ID ... SME ... 1 ... |
evidence_input_id
Description |
A within file unique identifier for the input data used to support this identification e.g. fragment spectrum, RT and m/z pair, isotope profile that was used for the identification process, to serve as a grouping mechanism, whereby multiple rows of results from the same input data share the same ID. The identifiers may be human readable but should not be assumed to be interpretable. For example, if fragmentation spectra have been searched then the ID may be the spectrum reference, or for accurate mass search, the ms_run[2]:458.75. |
Type |
String |
Mandatory |
True |
Is Nullable: |
FALSE |
Example |
SEH ... evidence_input_id ... SME ... ms_run[1]:mass=278.65;rt=376.5 ... |
database_identifier
Description |
The putative identification for the small molecule sourced from an external database, using the same prefix specified in database[1-n]-prefix. This could include additionally a chemical class or an identifier to a spectral library entity, even if its actual identity is unknown. For the “no database” case, 'null' must be used. The unprefixed use of 'null' is prohibited for any other case. If no putative identification can be reported for a particular database, it MUST be reported as the database prefix followed by null. |
Type |
String |
Mandatory |
True |
Is Nullable: |
TRUE |
Example |
SEH ... database_identifier ... SME ... CID:00027395 ... |
chemical_formula
Description |
The chemical formula of the identified compound e.g. in a database, assumed to match the theoretical mass to charge (in some cases this will be the derivatized form, including adducts and protons). This should be specified in Hill notation (EA Hill 1900), i.e. elements in the order C, H and then alphabetically all other elements. Counts of one may be omitted. Elements should be capitalized properly to avoid confusion (e.g., “CO” vs. “Co”). The chemical formula reported should refer to the neutral form. Charge state is reported by the charge field. Example N-acetylglucosamine would be encoded by the string “C8H15NO6” |
Type |
String |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SEH ... chemical_formula ... SME ... C17H20N4O2 ... |
smiles
Description |
The potential molecule’s structure in the simplified molecular-input line-entry system (SMILES) for the small molecule. |
Type |
String |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SEH ... smiles ... SME ... C1=CC=C(C=C1)CCNC(=O)CCNNC(=O)C2=CC=NC=C2 ... |
inchi
Description |
A standard IUPAC International Chemical Identifier (InChI) for the given substance. |
Type |
String |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SEH ... inchi ... SME ... InChI=1S/C17H20N4O2/c22-16(19-12-6-14-4-2-1-3-5-14)9-13-20-21-17(23)15-7-10-18-11-8-15/h1-5,7-8,10-11,20H,6,9,12-13H2,(H,19,22)(H,21,23) ... |
chemical_name
Description |
The small molecule’s chemical/common name, or general description if a chemical name is unavailable. |
Type |
String |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SEH ... chemical_name ... SME ... N-(2-phenylethyl)-3-[2-(pyridine-4-carbonyl)hydrazinyl]propanamide ... |
uri
Description |
A URI pointing to the small molecule’s entry in a database (e.g., the small molecule’s HMDB, Chebi or KEGG entry). |
Type |
URI |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SEH ... uri ... SME ... http://www.hmdb.ca/metabolites/HMDB00054 ... |
derivatized_form
Description |
The derivatized form of the small molecule, if the identification was based on a specific derivative (e.g. 2 TMS). This MUST be specified using CV terms (where possible) otherwise “null”. |
Type |
Parameter |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SEH ... derivatized_form ... SME ... [CHEBI, CHEBI:51088, trimethylsilyl group, 3] ... |
adduct_ion
Description |
The assumed classification of this molecule’s adduct ion after detection, following the general style in the 2013 IUPAC recommendations on terms relating to MS e.g. [M+H]1+, [M+Na]1+, [M+NH4]1+, [M-H]1-, [M+Cl]1-. If the adduct classification is ambiguous with regards to identification evidence it MAY be null. |
Type |
|
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SEH ... adduct_ion ... SME ... [M+H]+ ... |
exp_mass_to_charge
Description |
The experimental mass/charge value for the precursor ion. If multiple adduct forms have been combined into a single identification event/search, then a single value e.g. for the protonated form SHOULD be reported here. |
Type |
Double |
Mandatory |
True |
Is Nullable: |
FALSE |
Example |
SEH ... exp_mass_to_charge ... SME ... 1234.5 ... |
charge
Description |
The small molecule evidence’s charge value using positive integers both for positive and negative polarity modes. |
Type |
Integer |
Mandatory |
True |
Is Nullable: |
FALSE |
Example |
SEH ... charge ... SME ... 1 ... |
theoretical_mass_to_charge
Description |
The theoretical mass/charge value for the small molecule or the database mass/charge value (for a spectral library match). |
Type |
Double |
Mandatory |
True |
Is Nullable: |
FALSE |
Example |
SEH ... theoretical_mass_to_charge ... SME ... 1234.71 ... |
spectra_ref
Description |
Reference to a spectrum in a spectrum file, for example a fragmentation spectrum has been used to support the identification. If a separate spectrum file has been used for fragmentation spectrum, this MUST be reported in the metadata section as additional ms_runs. The reference must be in the format ms_run[1-n]:{SPECTRA_REF} where SPECTRA_REF MUST follow the format defined in 5.2 (including references to chromatograms where these are used to inform identification). Multiple spectra MUST be referenced using a | delimited list for the (rare) cases in which search engines have combined or aggregated multiple spectra in advance of the search to make identifications. If a fragmentation spectrum has not been used, the value should indicate the ms_run to which is identification is mapped e.g. “ms_run[1]”. |
Type |
String List |
Mandatory |
True |
Is Nullable: |
FALSE |
Example |
SEH ... spectra_ref ... SME ... ms_run[1]:index=5|ms_run[2]:index=3 ... |
identification_method
Description |
The search engine or algorithm used for the identification. This SHOULD be specified using CV terms. |
Type |
Parameter |
Mandatory |
True |
Is Nullable: |
FALSE |
Example |
SEH ... identification_method ... SME ... [MS, MS:1001477, SpectraST,] ... |
ms_level
Description |
The MS level of the spectrum used for the identification. This SHOULD be specified using CV terms. |
Type |
Parameter |
Mandatory |
True |
Is Nullable: |
FALSE |
Example |
SEH ... ms_level ... SME ... [MS, MS:1000511, ms level, 2] ... |
id_confidence_measure
Description |
Any statistical value or score for the identification. The metadata section reports the type of score used, as id_confidence_measure[1-n] of type Param. |
Type |
Double List |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SEH ... id_confidence_measure ... SME ... 0.7 ... |
rank
Description |
The rank of this identification from this approach as increasing integers from 1 (best ranked identification). Ties (equal score) are represented by using the same rank - defaults to 1 if there is no ranking system used. |
Type |
Integer |
Mandatory |
True |
Is Nullable: |
FALSE |
Example |
SEH ... rank ... SME ... 1 ... |
opt_{identifier}_*
Description |
Additional columns can be added to the end of the small molecule evidence table. These column headers MUST start with the prefix “opt_” followed by the {identifier} of the object they reference: assay, study variable, MS run or “global” (if the value relates to all replicates). Column names MUST only contain the following characters: 'A'-'Z', 'a'-'z', '0'-'9', '', '-', '[', ']', and ':'. CV parameter accessions MAY be used for optional columns following the format: opt{identifier}_cv_{accession}_{parameter name}. Spaces within the parameter’s name MUST be replaced by '_'. |
Type |
Optional Column |
Mandatory |
False |
Is Nullable: |
TRUE |
Example |
SEH ... opt_global_cv_value ... SME ... opt_assay[1]_my_value=My value ... SME ... opt_global_another_value=some other value ... |
Refer to mzTab_m_schema.adoc in the repository root for the current generated field reference, or run the generator:
python3 schema/generate_schema_adoc.py schema/mzTab_2_1-M.json > mzTab_m_schema.adoc
Using the Schema
Validator Generation
The JSON schema can be used directly with json-schema-validator (Java) or jsonschema (Python) to validate mzTab-M metadata objects.
Domain Model Generation
OpenAPI-compatible tooling (e.g., openapi-generator) can produce client models from schema/mzTab_2_1-M_openapi.json.
The jmzTab-m Java library is the reference implementation generated this way.