Note: This document is for public review. If you have any comments about these principles, please email CoastalCarbon@si.edu by Friday April 13th at 12 pm EST
Overview
Individual studies and contributors will each have their own way of naming variables, representing units, naming categorical variables, and representing notes or observations. We propose the following standardized vocabulary for data and metadata in order to make datasets machine readable and interoperable within our soil carbon synthesis products. Each sub-heading lists a level of data and metadata hierarchy from depth series to core to site to study level information. We also include accompanying sets of recommended controlled vocabulary for key categorical variables. Some variables have controlled units that we wish to keep uniform across datasets. Data that we curate will follow naming conventions outlined herein. Data that we ingest from outside sources will be converted to these conventions when being ingested into the central GitHub databased using custom-built R scripts.
Soil Depth Series Variables
sample_id |
Sample identification unique to the core |
character |
user-defined |
depth_min |
Minimum depth of a sampling increment |
numeric |
centimeters below surface |
depth_max |
Maximum depth of a sampling increment |
numeric |
centimeters below surface |
loss_on_ignition |
Fraction mass lost following ignition in a muffle furnace, a common proxy for organic matter |
numeric |
fraction |
dry_bulk_density |
Dry mass per unit volume of a soil sample |
numeric |
g cm-3 |
organic_carbon |
Mass of organic carbon relative to mass of organic matter |
numeric |
fraction |
organic_carbon_measured_or_modeled |
Specification indicating whether organic carbon is measured from a machine such as elemental analysis, or is modeled as a function of loss_on_ignition |
character |
measured or modeled |
total_carbon |
Mass of total carbon relative to mass of organic matter |
numeric |
fraction |
total_carbon_measured_or_modeled |
Specification indicating whether organic carbon is measured from a machine such as elemental analysis, or is modeled as a function of loss_on_ignition |
character |
measured or modeled |
loss_on_ignition_notes |
Any user-generated notes about the loss on ignition and bulk density processes |
character |
user-defined |
compaction_fraction |
Fraction of the sample depth interval reduced due to compaction |
numeric |
fraction |
compaction_notes |
Any user-generated notes on compaction |
character |
user-defined |
cs137_activity |
Radioactivity counts per unit dry weight for radio-cesium (137Cs) |
numeric |
disintegrations per minute per gram dry weight |
cs137_activity_sd |
1 standard deviation of uncertainty associated with cs137_activity |
numeric |
disintegrations per minute per gram dry weight |
cs137_notes |
Any user-generated notes on 137Cs dating process |
character |
user-defined |
pb210_activity |
Radioactivity counts per unit dry weight for lead-210 (210Pb) |
numeric |
disintegrations per minute per gram dry weight |
pb210_activity_sd |
1 standard deviation of uncertainty associated with pb210_activity |
numeric |
disintegrations per minute per gram dry weight |
pb210_notes |
Any user-generated notes on 210Pb dating process |
character |
user-defined |
c14_material |
Description of the material selected for radiocarbon (14C) dating |
character |
user-defined |
c14_age |
Radiocarbon age as estimated from AMS measurements |
numeric |
radiocarbon years |
c14_age_sd |
Estimated uncertainty in c14_age |
numeric |
standard deviation of radiocarbon years |
c14_notes |
Any relevant user-generated notes on 14C dating process |
character |
user-defined |
(other) |
Other variables could be included, but variable name should follow good data management practices, the variable should be described, and the data type defined, in the accompanying metadata |
user-defined |
user-defined |
Core Level Variables
Note positional data can be assigned at the core level, or at the site level, however, it is important that this is specified, that site coordinates are not attributed as core coordinates, and that the method of measurement and precision is noted.
core_id |
Core identification code unique to each site |
character |
user-defined |
year |
Year of collection |
numeric |
four digit year (common era) |
month |
Numerbed month of year of collection |
numeric |
1-12 |
day |
Day of month of collection |
numeric |
1-31 |
core_latitude |
Positional latitude of the core |
numeric |
decimal degrees (world geological survey 1988 [WGS1988]) |
core_longitude |
Positional longitude of the core |
numeric |
decimal degrees (WGS1988) |
core_position_accuracy |
Accuracy of latitude and longitude measurement, if determined and recorded |
numeric |
meters |
core_position_method |
Code indicating how latitude and longitude were determined. |
character |
rtk_gps, handheld_gps, other_high_resolution, other_moderate_resolution, other_low_resolution, or other user-defined code |
core_position_notes |
Any relevant user-generated notes on how latitude and longitude were determined |
character |
user-defined |
core_elevation |
Surface elevation of the core |
numeric |
meters relative to a datum |
core_elevation_datum |
The datum relative to which the core was measured |
character |
Example: NAVD88, various tidal datums, other user-defined code |
core_elevation_accuracy |
Accuracy of elevation measurement, if determined and recorded |
numeric |
meters |
core_elevation_method |
Code indicating how elevation was determined |
character |
rtk_gps, other_high_resolution, lidar_based, dem_based, or other user-defined code |
core_elevation_notes |
Any relevant user-generated notes on how elevation was determined |
character |
user-defined |
core_notes |
Any other relevant user-generated notes on how cores were collected |
character |
user-defined |
(other) |
Other variables could be included, but variable name should follow good data management practices, the variable should be described, and the data type defined, in the accompanying metadata |
user-defined |
user-defined |
Core Level Controlled Vocabulary
position_method |
rtk_gps |
Real-time kinematic global position system (GPS) |
|
handheld_gps |
Conventional Commercially available hand-held GPS |
|
other_high_resolution |
Any other technique resulting in positional error < 1 meter |
|
other_moderate_resolution |
Any other technique resulting in positional error < 30 meters |
|
other_low_resolution |
Any other technique resulting in positional error > 30 meters |
core_elevation_datum |
NAVD88 |
A gravity-based geodetic datum, North American Vertical Datum from 1988 |
|
MSL |
A tidal datum, Mean Sea Level as measured against a local tide gauge |
|
MTL |
A tidal datum, Mean Tidal Level as measured against a local tide gauge |
|
MHW |
A tidal datum, Mean High Water as measured against a local tide gauge |
|
MHHW |
A tidal datum, Mean Higher High Water as measured against a local tide gauge |
|
MHHWS |
A tidal datum, Mean Higher High Water for Spring Tides as measured against a local tide gauge |
|
MLW |
A tidal datum, Mean Low Water as measured against a local tide gauge |
|
MLLW |
A tidal datum, Mean Lower Low Water as measured against a local tide gauge |
core_elevation_method |
rtk_gps |
Real-time kinematic GPS |
|
other_high_resolution |
Any other technique resulting in positional error < 5 cm of random error |
|
lidar_based |
Handheld GPS matched to lidar-based digital elevation model |
|
dem_based |
Handheld GPS matched to another digital elevation model |
|
other_low_resolution |
Any other technique resulting in positional error > 5 cm of random error |
Site Level Variables
site_id |
Site identification code unique to each study |
character |
user-defined |
site_latitude_max |
Maximum latitude defining a bounding box for the site. |
numeric |
decimal degrees WGS88 |
site_latitude_min |
Minimum latitude defining a bounding box for the site. |
numeric |
decimal degrees WGS88 |
site_longitude_max |
Maximum longitude defining a bounding box for the site. |
numeric |
decimal degrees WGS88 |
site_longitude_min |
Minimum longitude defining a bounding box for the site. |
numeric |
decimal degrees WGS88 |
site_latitude_centroid |
Latitude defining the center of the site. |
numeric |
decimal degrees WGS88 |
site_longitude_centroid |
Longitude defining the center of the site. |
numeric |
decimal degrees WGS88 |
site_extent |
Alternatively, data submitters could submit a shapefile (.shp) of keyhole markup language (.kml) file defining the extent of a site |
Separate .shp or .kml file |
|
Variables that can be assigned to either cores or sites
For the following variable, salinity, vegetation, impacts, and management status could either be listed for each core separately or entered for the whole site. If entered at the site level, cores would inherit the variables associated with the site.
inundation_class |
Code based on user’s field observation or measurement indicating how often the coring location is inundated |
character |
high_marsh, mid_marsh, low_marsh, levee, back_marsh, or other user-defined code |
inundation_method |
Indicate whether inundation_class was determined using a field observation or a measurement |
character |
field observation, measurement, or other user-defined code |
inundation_notes |
Any relevant user-generated notes on how elevation was determined |
character |
user-defined |
salinity_class |
Code based on user’s field observation or measurement indicating average annual salinity |
character |
estuarine, palustrine, saline, brackish, fresh, polyhaline, etc. |
salinity_method |
Indicate whether salinity_class was determined using a field observation or a measurement |
character |
field observation, measurement, or other user-defined code |
salinity_notes |
Any relevant user-generated notes on how salinity_class was determined |
character |
user-defined |
vegetation_class |
Code based on user’s field observations or measurement indicating dominant wetland vegetation type |
character |
emergent, scrub_shrub, forest, seagrass, or other user-defined code |
dominant_species |
Single string of “;” separated species codes indicating the dominant species. |
character |
Species codes are user-defined, but should be summarized in a separate table listing species_code, genus, species, sub_species, hybrid, and species_notes for the entire study |
vegetation_method |
Indicate whether salinity_class was determined using a field observation or a measurement |
character |
field observation, measurement, or other user-defined code |
vegetation_notes |
Any relevant user-generated notes on how vegetation_class and dominant_species were determined |
character |
user-defined |
impact_class |
Code indicating any major anthropogenic impacts historically and currently affecting the coring location. List these especially if the study was specifically designed to address these. If more than one are applicable, include them all as a single character string separated by “;”. |
character |
impounded, , managed_impounded, ditched, drained, farmed, other user-defined variable |
impact_notes |
Any relevant additional user-generated notes detailing impact_class |
character |
user-defined |
management_class |
Code indicating any active management or restoration efforts are directly affecting the coring location. List these especially if the study was specifically designed to address these. If more than one is applicable, include them all as a single character string separated by “;”. |
character |
impoundment_removed, revegetated, invasive_plants_removed, invasive_herbivores_removed, sediment_added, wetlands_built |
management_notes |
Any relevant additional user-generated notes detailing management_class |
character |
user-defined |
Core- or Site-level Controlled Vocabulary
inundation_class |
high_marsh |
Study-specific definition of an elevation relatively high in the tidal frame, typically defined by vegetation type |
|
mid_marsh |
Study-specific definition of an elevation in the relative middle of the tidal frame, typically defined by vegetation type |
|
low_marsh |
Study-specific definition of an elevation in relatively low in the tidal frame, typically defined by vegetation type |
|
levee |
Study-specific definition of a relatively high elevation zone built up on the edge of a river, creek, or channel |
|
back_marsh |
Study-specific definition of a relatively low elevation zone behind a levee |
salinity_class |
estuarine |
5-35 parts per thousand salinity (ppt) |
|
palustrine |
< 5 ppt |
|
brine |
>50 ppt |
|
saline |
30-50 ppt |
|
brackish |
0.5-30 ppt |
|
fresh |
<0.5 ppt |
|
mixoeuhaline |
30-40 ppt |
|
polyhaline |
18-30 ppt |
|
mesohaline |
5-18 ppt |
|
oligohaline |
0.5-5 ppt |
vegetation_class |
emergent |
Describes wetlands dominated by persistent emergent vascular plants |
|
scrub_shrub |
Describes wetlands dominated by woody vegetation < 5 meters in height |
|
forested |
Describes wetlands dominated by woody vegetation > 5 meters in height |
|
seagrass |
Describes tidal or subtidal communities dominated by rooted vascular plants |
impact_class |
impounded |
Tidal flow is muted or blocked by built structures |
|
managed_impounded |
Tidal flow is muted or blocked seasonally, and other times follow natural or semi-natural flooding patterns |
|
ditched |
Tidal hydrology is altered because artificial ditches have been cut to improve drainage |
|
drained |
Tidal hydrology is altered because artificial the wetland has been drained by any combination of impoundments, ditching, pumping, etc. |
|
farmed |
Special case of impoundment, managed impoundment, or drainage in which wetland have been converted to agricultural land |
management_class |
impoundment_removed |
Tidal flow has been restored by removing an artificial impoundment |
|
revegetated |
Tidal vegetation has been reintroduced by replanting on unvegetated surfaces |
|
invasive_plants_removed |
Natural plant communities have been restored by the active removal of invasive plant species |
|
invasive_herbivores_removed |
Tidal wetland vegetation has been managed by the removal of invasive herbivores |
|
sediment_added |
Elevation has been managed by artificially adding sediment to the site using techniques such as ‘thin layering’ or sediment diversion |
|
wetlands_built |
Wetlands previously eroded but have been rebuilt using sediments such as dredge spoils or other reclamation techniques |
Study Level Species Table
If species codes or common names are used anywhere in the study, there should be a separate table included defining all names using scientific names.
species_code |
user-defined common name, shortened name, or code used to refer to a species throughout the study |
integer or character |
user-defined |
genus |
Genus according to the most up to date taxonomic binomial nomenclature |
character |
user-defined |
species |
Species according to the most up to date taxonomic binomial nomenclature |
character |
user-defined |
sub_species |
Any additional naming or description appropriate at the subspecies level |
character |
user-defined |
hybrid |
Any additional naming or description appropriate to describe hybridization |
character |
user-defined |
Materials and Methods Variables
Materials and methods can be assigned to depth series, core level data, or can exist in a separate table if materials and methods are the same for every core in the study.
coring_method |
Code indicating what type of device was used to collect soil depth profiles. |
character |
russian_auger, gouge_auger, hargas, other_shallow_corer, other_deep_corer, or other user-defined |
coring_method_notes |
Other user-generated notes on coring |
character |
user-defined |
complete_profile |
Indicated whether or not the coring team believes they recovered a full sediment profile, down to bedrock, or other non-marsh interface |
boolean |
True or False |
complete_profile_notes |
Any other user-generated notes about the completeness of the profile, such as whether the core met refusal, bedrock, non-marsh interface, etc. |
character |
user-defined |
bulk_density_temperature |
Temperature at which samples were dried at for bulk density |
numeric |
degrees celsius |
bulk_density_duration |
Length of time for which samples were dried at for bulk density |
numeric |
number of hours |
loss_on_ignition_temperature |
Temperature at which samples were ignited for loss-on-ignition to as a measure for organics |
numeric |
degrees celsius |
loss_on_ignition_duration |
Length of ignition time for loss-on-ignition to as a measure for organics |
numeric |
number of hours |
loi_bd_sample_size |
Sample size for loss on ignition an bulk density |
numeric |
cm3 |
loi_bd_notes |
Any other user-generated notes on loss-on-ignition and bulk density methodology |
character |
user-defined |
carbonates_removal_method |
Code indicating how carbonates were removed for measuring fraction organic carbon by elemental analysis. |
character |
acid_fumigation, direct_acid_treatment, or other user-defined method |
carbonates_removal_notes |
Any other user-generated notes on how carbonates were removed for measuring fraction organic carbon by elemental analysis. |
character |
user-defined |
roots_removed |
A note describing whether or not |
boolean |
True or False |
(other) |
Any other user-defined variables |
user-defined |
user-defined |
Materials and Methods Vocabulary
coring_method |
russian_auger |
A half cylinder coring device with the coring section sealed off by a fin attached to a rotating pivot point |
|
gouge_auger |
A half cylinder coring device in which the coring section is open, not sealed off by a fin |
|
hargas |
A large diameter (>10 cm) coring device consisting of a tube, piston, and a cutting head |
|
vibracore |
A technique involving collecting a core by sinking a continuous pipe into sediment attaching a source of vibration, then recovering using a winch and pulley |
|
cryo_core |
A technique involving collecting a core by freezing soil using liquid nitrogen to a copper tube |
|
push_corer |
Any number of coring types involving driving a pipe or PVC into the sediment to recover a core |
|
other_shallow_corer |
Any other method of collecting cores shallower than 1 meter |
|
other_deep_corer |
Any other method that collects cores 1 meter or deeper |
carbonates_removal_method |
acid_fumigation |
Carbonates were removed by fumigation with strong acid |
|
direct_acid_treatment |
Carbonates were removed by direct treatment with dilute acid |
Funding Sources Table
We recognize that funding situations are often complex, so for each project, we include a table to ensure that all funding sources are credited appropriately
funding_agency |
Agency name funding the research |
character |
user-defined |
funding_id |
Internally used number referring to the project funding |
character |
user-defined |
funding_start_year |
Year of the start of the funding period |
numeric |
four-digit year (common era) |
funding_start_month |
Numbered month of the start of the funding period |
numeric |
1-12 |
funding_start_day |
Day of the month of the start of the funding period |
numeric |
1-31 |
funding_end_year |
Year of the end of the funding period |
numeric |
four-digit year (common era) |
funding_end_month |
Numbered month of the end of the funding period |
numeric |
1-12 |
funding_end_day |
Day of the month of the end of the funding period |
numeric |
1-31 |
funding_notes |
Any other user-generated notes about the project funding |
character |
user-defined |
Study Authors Table
last_name |
The family name of the contributor or author |
character |
user-defined |
first_name |
The given name of the author or contributor |
character |
user-defined |
middle_name |
The middle name or initial of the author or contributor |
character |
user-defined |
institution |
This category can either refer to institutional affiliation of the author or the institution or team itself in the case that a group of authors is credited together under one identification |
character |
user-defined |
email |
The email address of the author or contributor. This is a required field for corresponding authors and optional for others |
character |
user-defined |
address |
The mailing address of the author or contributor. This is a required field for corresponding authors and optional for others |
character |
user-defined |
corresponding_author |
Please whether the author should be contacted for further information or requests. The field requires at least one marked as True |
Boolean |
True or False |