Note: This document is for public review. If you have any comments about these principles, please email CoastalCarbon@si.edu by Friday April 13th at 12 pm EST

Contents

Overview

Individual studies and contributors will each have their own way of naming variables, representing units, naming categorical variables, and representing notes or observations. We propose the following standardized vocabulary for data and metadata in order to make datasets machine readable and interoperable within our soil carbon synthesis products. Each sub-heading lists a level of data and metadata hierarchy from depth series to core to site to study level information. We also include accompanying sets of recommended controlled vocabulary for key categorical variables. Some variables have controlled units that we wish to keep uniform across datasets. Data that we curate will follow naming conventions outlined herein. Data that we ingest from outside sources will be converted to these conventions when being ingested into the central GitHub databased using custom-built R scripts.

Soil Depth Series Variables

Variable Name Description Data Type Controlled Units
sample_id Sample identification unique to the core character user-defined
depth_min Minimum depth of a sampling increment numeric centimeters below surface
depth_max Maximum depth of a sampling increment numeric centimeters below surface
loss_on_ignition Fraction mass lost following ignition in a muffle furnace, a common proxy for organic matter numeric fraction
dry_bulk_density Dry mass per unit volume of a soil sample numeric g cm-3
organic_carbon Mass of organic carbon relative to mass of organic matter numeric fraction
organic_carbon_measured_or_modeled Specification indicating whether organic carbon is measured from a machine such as elemental analysis, or is modeled as a function of loss_on_ignition character measured or modeled
total_carbon Mass of total carbon relative to mass of organic matter numeric fraction
total_carbon_measured_or_modeled Specification indicating whether organic carbon is measured from a machine such as elemental analysis, or is modeled as a function of loss_on_ignition character measured or modeled
loss_on_ignition_notes Any user-generated notes about the loss on ignition and bulk density processes character user-defined
compaction_fraction Fraction of the sample depth interval reduced due to compaction numeric fraction
compaction_notes Any user-generated notes on compaction character user-defined
cs137_activity Radioactivity counts per unit dry weight for radio-cesium (137Cs) numeric disintegrations per minute per gram dry weight
cs137_activity_sd 1 standard deviation of uncertainty associated with cs137_activity numeric disintegrations per minute per gram dry weight
cs137_notes Any user-generated notes on 137Cs dating process character user-defined
pb210_activity Radioactivity counts per unit dry weight for lead-210 (210Pb) numeric disintegrations per minute per gram dry weight
pb210_activity_sd 1 standard deviation of uncertainty associated with pb210_activity numeric disintegrations per minute per gram dry weight
pb210_notes Any user-generated notes on 210Pb dating process character user-defined
c14_material Description of the material selected for radiocarbon (14C) dating character user-defined
c14_age Radiocarbon age as estimated from AMS measurements numeric radiocarbon years
c14_age_sd Estimated uncertainty in c14_age numeric standard deviation of radiocarbon years
c14_notes Any relevant user-generated notes on 14C dating process character user-defined
(other) Other variables could be included, but variable name should follow good data management practices, the variable should be described, and the data type defined, in the accompanying metadata user-defined user-defined

Core Level Variables

Note positional data can be assigned at the core level, or at the site level, however, it is important that this is specified, that site coordinates are not attributed as core coordinates, and that the method of measurement and precision is noted.

Variable Name Description Data Type Controlled Units
core_id Core identification code unique to each site character user-defined
year Year of collection numeric four digit year (common era)
month Numerbed month of year of collection numeric 1-12
day Day of month of collection numeric 1-31
core_latitude Positional latitude of the core numeric decimal degrees (world geological survey 1988 [WGS1988])
core_longitude Positional longitude of the core numeric decimal degrees (WGS1988)
core_position_accuracy Accuracy of latitude and longitude measurement, if determined and recorded numeric meters
core_position_method Code indicating how latitude and longitude were determined. character rtk_gps, handheld_gps, other_high_resolution, other_moderate_resolution, other_low_resolution, or other user-defined code
core_position_notes Any relevant user-generated notes on how latitude and longitude were determined character user-defined
core_elevation Surface elevation of the core numeric meters relative to a datum
core_elevation_datum The datum relative to which the core was measured character Example: NAVD88, various tidal datums, other user-defined code
core_elevation_accuracy Accuracy of elevation measurement, if determined and recorded numeric meters
core_elevation_method Code indicating how elevation was determined character rtk_gps, other_high_resolution, lidar_based, dem_based, or other user-defined code
core_elevation_notes Any relevant user-generated notes on how elevation was determined character user-defined
core_notes Any other relevant user-generated notes on how cores were collected character user-defined
(other) Other variables could be included, but variable name should follow good data management practices, the variable should be described, and the data type defined, in the accompanying metadata user-defined user-defined

Core Level Controlled Vocabulary

Variable Controlled Vocabulary Description
position_method rtk_gps Real-time kinematic global position system (GPS)
handheld_gps Conventional Commercially available hand-held GPS
other_high_resolution Any other technique resulting in positional error < 1 meter
other_moderate_resolution Any other technique resulting in positional error < 30 meters
other_low_resolution Any other technique resulting in positional error > 30 meters
core_elevation_datum NAVD88 A gravity-based geodetic datum, North American Vertical Datum from 1988
MSL A tidal datum, Mean Sea Level as measured against a local tide gauge
MTL A tidal datum, Mean Tidal Level as measured against a local tide gauge
MHW A tidal datum, Mean High Water as measured against a local tide gauge
MHHW A tidal datum, Mean Higher High Water as measured against a local tide gauge
MHHWS A tidal datum, Mean Higher High Water for Spring Tides as measured against a local tide gauge
MLW A tidal datum, Mean Low Water as measured against a local tide gauge
MLLW A tidal datum, Mean Lower Low Water as measured against a local tide gauge
core_elevation_method rtk_gps Real-time kinematic GPS
other_high_resolution Any other technique resulting in positional error < 5 cm of random error
lidar_based Handheld GPS matched to lidar-based digital elevation model
dem_based Handheld GPS matched to another digital elevation model
other_low_resolution Any other technique resulting in positional error > 5 cm of random error

Site Level Variables

Variable Name Description Data Type Controlled Units
site_id Site identification code unique to each study character user-defined
site_latitude_max Maximum latitude defining a bounding box for the site. numeric decimal degrees WGS88
site_latitude_min Minimum latitude defining a bounding box for the site. numeric decimal degrees WGS88
site_longitude_max Maximum longitude defining a bounding box for the site. numeric decimal degrees WGS88
site_longitude_min Minimum longitude defining a bounding box for the site. numeric decimal degrees WGS88
site_latitude_centroid Latitude defining the center of the site. numeric decimal degrees WGS88
site_longitude_centroid Longitude defining the center of the site. numeric decimal degrees WGS88
site_extent Alternatively, data submitters could submit a shapefile (.shp) of keyhole markup language (.kml) file defining the extent of a site Separate .shp or .kml file

Variables that can be assigned to either cores or sites

For the following variable, salinity, vegetation, impacts, and management status could either be listed for each core separately or entered for the whole site. If entered at the site level, cores would inherit the variables associated with the site.

Variable Name Description Data Type Controlled Units
inundation_class Code based on user’s field observation or measurement indicating how often the coring location is inundated character high_marsh, mid_marsh, low_marsh, levee, back_marsh, or other user-defined code
inundation_method Indicate whether inundation_class was determined using a field observation or a measurement character field observation, measurement, or other user-defined code
inundation_notes Any relevant user-generated notes on how elevation was determined character user-defined
salinity_class Code based on user’s field observation or measurement indicating average annual salinity character estuarine, palustrine, saline, brackish, fresh, polyhaline, etc.
salinity_method Indicate whether salinity_class was determined using a field observation or a measurement character field observation, measurement, or other user-defined code
salinity_notes Any relevant user-generated notes on how salinity_class was determined character user-defined
vegetation_class Code based on user’s field observations or measurement indicating dominant wetland vegetation type character emergent, scrub_shrub, forest, seagrass, or other user-defined code
dominant_species Single string of “;” separated species codes indicating the dominant species. character Species codes are user-defined, but should be summarized in a separate table listing species_code, genus, species, sub_species, hybrid, and species_notes for the entire study
vegetation_method Indicate whether salinity_class was determined using a field observation or a measurement character field observation, measurement, or other user-defined code
vegetation_notes Any relevant user-generated notes on how vegetation_class and dominant_species were determined character user-defined
impact_class Code indicating any major anthropogenic impacts historically and currently affecting the coring location. List these especially if the study was specifically designed to address these. If more than one are applicable, include them all as a single character string separated by “;”. character impounded, , managed_impounded, ditched, drained, farmed, other user-defined variable
impact_notes Any relevant additional user-generated notes detailing impact_class character user-defined
management_class Code indicating any active management or restoration efforts are directly affecting the coring location. List these especially if the study was specifically designed to address these. If more than one is applicable, include them all as a single character string separated by “;”. character impoundment_removed, revegetated, invasive_plants_removed, invasive_herbivores_removed, sediment_added, wetlands_built
management_notes Any relevant additional user-generated notes detailing management_class character user-defined

Core- or Site-level Controlled Vocabulary

Variable Controlled Vocabulary Description
inundation_class high_marsh Study-specific definition of an elevation relatively high in the tidal frame, typically defined by vegetation type
mid_marsh Study-specific definition of an elevation in the relative middle of the tidal frame, typically defined by vegetation type
low_marsh Study-specific definition of an elevation in relatively low in the tidal frame, typically defined by vegetation type
levee Study-specific definition of a relatively high elevation zone built up on the edge of a river, creek, or channel
back_marsh Study-specific definition of a relatively low elevation zone behind a levee
salinity_class estuarine 5-35 parts per thousand salinity (ppt)
palustrine < 5 ppt
brine >50 ppt
saline 30-50 ppt
brackish 0.5-30 ppt
fresh <0.5 ppt
mixoeuhaline 30-40 ppt
polyhaline 18-30 ppt
mesohaline 5-18 ppt
oligohaline 0.5-5 ppt
vegetation_class emergent Describes wetlands dominated by persistent emergent vascular plants
scrub_shrub Describes wetlands dominated by woody vegetation < 5 meters in height
forested Describes wetlands dominated by woody vegetation > 5 meters in height
seagrass Describes tidal or subtidal communities dominated by rooted vascular plants
impact_class impounded Tidal flow is muted or blocked by built structures
managed_impounded Tidal flow is muted or blocked seasonally, and other times follow natural or semi-natural flooding patterns
ditched Tidal hydrology is altered because artificial ditches have been cut to improve drainage
drained Tidal hydrology is altered because artificial the wetland has been drained by any combination of impoundments, ditching, pumping, etc.
farmed Special case of impoundment, managed impoundment, or drainage in which wetland have been converted to agricultural land
management_class impoundment_removed Tidal flow has been restored by removing an artificial impoundment
revegetated Tidal vegetation has been reintroduced by replanting on unvegetated surfaces
invasive_plants_removed Natural plant communities have been restored by the active removal of invasive plant species
invasive_herbivores_removed Tidal wetland vegetation has been managed by the removal of invasive herbivores
sediment_added Elevation has been managed by artificially adding sediment to the site using techniques such as ‘thin layering’ or sediment diversion
wetlands_built Wetlands previously eroded but have been rebuilt using sediments such as dredge spoils or other reclamation techniques

Study Level Species Table

If species codes or common names are used anywhere in the study, there should be a separate table included defining all names using scientific names.

Variable Name Description Data Type Controlled Units
species_code user-defined common name, shortened name, or code used to refer to a species throughout the study integer or character user-defined
genus Genus according to the most up to date taxonomic binomial nomenclature character user-defined
species Species according to the most up to date taxonomic binomial nomenclature character user-defined
sub_species Any additional naming or description appropriate at the subspecies level character user-defined
hybrid Any additional naming or description appropriate to describe hybridization character user-defined

Materials and Methods Variables

Materials and methods can be assigned to depth series, core level data, or can exist in a separate table if materials and methods are the same for every core in the study.

Variable Name Description Data Type Controlled Units
coring_method Code indicating what type of device was used to collect soil depth profiles. character russian_auger, gouge_auger, hargas, other_shallow_corer, other_deep_corer, or other user-defined
coring_method_notes Other user-generated notes on coring character user-defined
complete_profile Indicated whether or not the coring team believes they recovered a full sediment profile, down to bedrock, or other non-marsh interface boolean True or False
complete_profile_notes Any other user-generated notes about the completeness of the profile, such as whether the core met refusal, bedrock, non-marsh interface, etc. character user-defined
bulk_density_temperature Temperature at which samples were dried at for bulk density numeric degrees celsius
bulk_density_duration Length of time for which samples were dried at for bulk density numeric number of hours
loss_on_ignition_temperature Temperature at which samples were ignited for loss-on-ignition to as a measure for organics numeric degrees celsius
loss_on_ignition_duration Length of ignition time for loss-on-ignition to as a measure for organics numeric number of hours
loi_bd_sample_size Sample size for loss on ignition an bulk density numeric cm3
loi_bd_notes Any other user-generated notes on loss-on-ignition and bulk density methodology character user-defined
carbonates_removal_method Code indicating how carbonates were removed for measuring fraction organic carbon by elemental analysis. character acid_fumigation, direct_acid_treatment, or other user-defined method
carbonates_removal_notes Any other user-generated notes on how carbonates were removed for measuring fraction organic carbon by elemental analysis. character user-defined
roots_removed A note describing whether or not boolean True or False
(other) Any other user-defined variables user-defined user-defined

Materials and Methods Vocabulary

Variable Controlled Vocabulary Description
coring_method russian_auger A half cylinder coring device with the coring section sealed off by a fin attached to a rotating pivot point
gouge_auger A half cylinder coring device in which the coring section is open, not sealed off by a fin
hargas A large diameter (>10 cm) coring device consisting of a tube, piston, and a cutting head
vibracore A technique involving collecting a core by sinking a continuous pipe into sediment attaching a source of vibration, then recovering using a winch and pulley
cryo_core A technique involving collecting a core by freezing soil using liquid nitrogen to a copper tube
push_corer Any number of coring types involving driving a pipe or PVC into the sediment to recover a core
other_shallow_corer Any other method of collecting cores shallower than 1 meter
other_deep_corer Any other method that collects cores 1 meter or deeper
carbonates_removal_method acid_fumigation Carbonates were removed by fumigation with strong acid
direct_acid_treatment Carbonates were removed by direct treatment with dilute acid

Funding Sources Table

We recognize that funding situations are often complex, so for each project, we include a table to ensure that all funding sources are credited appropriately

Variable Name Description Data Type Controlled Units
funding_agency Agency name funding the research character user-defined
funding_id Internally used number referring to the project funding character user-defined
funding_start_year Year of the start of the funding period numeric four-digit year (common era)
funding_start_month Numbered month of the start of the funding period numeric 1-12
funding_start_day Day of the month of the start of the funding period numeric 1-31
funding_end_year Year of the end of the funding period numeric four-digit year (common era)
funding_end_month Numbered month of the end of the funding period numeric 1-12
funding_end_day Day of the month of the end of the funding period numeric 1-31
funding_notes Any other user-generated notes about the project funding character user-defined

Study Authors Table

Variable Name Description Data Type Controlled Units
last_name The family name of the contributor or author character user-defined
first_name The given name of the author or contributor character user-defined
middle_name The middle name or initial of the author or contributor character user-defined
institution This category can either refer to institutional affiliation of the author or the institution or team itself in the case that a group of authors is credited together under one identification character user-defined
email The email address of the author or contributor. This is a required field for corresponding authors and optional for others character user-defined
address The mailing address of the author or contributor. This is a required field for corresponding authors and optional for others character user-defined
corresponding_author Please whether the author should be contacted for further information or requests. The field requires at least one marked as True Boolean True or False