About

This example will walk through processing a reef life survey datasheet using the rls2dwc package. The example dataset is survey data from Hawaii in 2017.

Setup

Install the rls2dwc package from github using devtools.

#install.packages("devtools") # uncomment if you have not already installed the devtools package
#devtools::install_github("Smithsonian/rls2dwc")
library(rls2dwc)

Import reef life survey dataframe into R

The function readRLS cleans up the header, combines date and time columns and adds a source column using the filename of the excel spreadsheet. The function takes the path to the excel file to load and (optional) the name of the tab. The name of the tab with the data should be provided if different from the default value of ‘DATA’.

rlsdata <- readRLS(xlsx = "../example_data.xlsx", sheetname = "DATA") # loads the excel spreadsheet by providing path & sheetname with data
head(rlsdata) 

Verifying species names

The raw data loaded from the reef life survey datasheet needs to have all the scientific names validated. We will use the World Register of Marine Species to validate all the scientific names in the data sheet. This is an interactive process….. the first pass will try to match all the scientific names and then return the unmatched values. It is up to the user to figure out what is wrong with the scientific name and then correct it. Once the names have been corrected the data must then be validated again against WoRMS. Note: verifying against the WoRMS webservice requires an internet connection.

# inital attempt to validate species names against worms
verify_sciName(rlsdata, dryrun=TRUE)

The inital attempt to validate scientific names against WoRMS was generally pretty successful. Only 10 name of out 60 scientific names did not get a match. A couple of these names look like they will be easily fixed - several have the format of ‘genus sp.’. Let’s remove the ‘sp.’ from the name and try again.

rlsdata2 <- rlsdata %>% 
  replace_sciName("Acanthurus sp", "Acanthurus") %>% 
  replace_sciName("Scaridae sp.", "Scaridae") %>% 
  replace_sciName("Abudefduf sp.", "Abudefduf")
verify_sciName(rlsdata2, dryrun=TRUE)

Great! That fixed those three scientific names. It’s ok to have the records at the genus level. The best practice is to use the lowest full identifiable name in the scientific name field. The rest of the scientific names will have to be verified manually by going to World Register of Marine Species and figuring out the correct. There are several records that have had genus name changes and/or spelling corrections.

rlsdata3 <- rlsdata2 %>%  
  replace_sciName("Ophiodesoma spectabilis", "Opheodesoma spectabilis") %>% 
  replace_sciName("Osterhinchus maculiferus", "Ostorhinchus maculiferus") %>% 
  replace_sciName("Asteropteryx semipunctatus", "Asterropteryx semipunctata")
verify_sciName(rlsdata3, dryrun=TRUE)

At this point, there still are a few more names that need some more effort to correct. Let’s leave them in the dataset for the time being as examples of unknown or unvalidated scientific names.

# verify the scientific names once again but this time let's save the result to an object
rlsdata_validated <- verify_sciName(rlsdata3, dryrun=FALSE) 
rlsdata_validated # the WoRMS data is joined to our RLS dataframe

Gather to long format

Right now the reef life survey data is in a wide & untidy format. Each row contains multiple sizes classes and counts. In order to make the data tidy, we need to gather all these observations into a long format

rlsdata_long <- gather_measures(rlsdata_validated)
head(rlsdata_long %>% select(ID, IndividualCount, measurementValue, measurementType, measurementUnit), 100)

Unique IDs for events and occurrences

Darwin Core requires each event and occurrence to have an unique identifer. For reef life survey data, there is two types of events. The parent event is each dive and the subevent is the dive block and method.

The parentEventID is built by combining the site ID and the timestamp. The eventID is built by combining the parentEventID, the method and the block.

The occurrences unique identifier is just a universally unique identifier (UUID) that is randomly generated.

rlsdata_long_id <- makeIDs(rlsdata_long)
head(rlsdata_long_id %>% select(parentEventID, eventID, SiteID, DateTime, Method, Block, occurrenceID))

Darwin Core Tables

Last step…. make the Darwin Core Event, Occurrence & MeasurementOrFact tables!

Event

events <- event(rlsdata_long_id)
head(events)

Occurrence

occur <- occurence(rlsdata_long_id)
head(occur)

MeasurementOrFact

measOrFact <- emof(rlsdata_long_id)
head(measOrFact)
LS0tCnRpdGxlOiAiUmVlZiBMaWZlIFN1cnZleSAyIERhcndpbiBDb3JlIEV4YW1wbGUiCmF1dGhvcjogIkFuZHkgQmVsbCAoYmVsbGFuQHNpLmVkdSkiCmRhdGU6ICIzLzE2LzIwMTgiCm91dHB1dDogaHRtbF9kb2N1bWVudAotLS0KCmBgYHtyIHNldHVwLCBpbmNsdWRlPUZBTFNFfQprbml0cjo6b3B0c19jaHVuayRzZXQoZWNobyA9IFRSVUUpCmxpYnJhcnkodGlkeXZlcnNlKQpgYGAKCiMjIEFib3V0CgpUaGlzIGV4YW1wbGUgIHdpbGwgd2FsayB0aHJvdWdoIHByb2Nlc3NpbmcgYSByZWVmIGxpZmUgc3VydmV5IGRhdGFzaGVldCB1c2luZyB0aGUgYHJsczJkd2NgIHBhY2thZ2UuIFRoZSBleGFtcGxlIGRhdGFzZXQgaXMgc3VydmV5IGRhdGEgZnJvbSBIYXdhaWkgaW4gMjAxNy4gCgoKIyMgU2V0dXAKCkluc3RhbGwgdGhlIGBybHMyZHdjYCBwYWNrYWdlIGZyb20gZ2l0aHViIHVzaW5nIGBkZXZ0b29sc2AuCgpgYGB7ciBsb2FkcGFja2FnZX0KI2luc3RhbGwucGFja2FnZXMoImRldnRvb2xzIikgIyB1bmNvbW1lbnQgaWYgeW91IGhhdmUgbm90IGFscmVhZHkgaW5zdGFsbGVkIHRoZSBkZXZ0b29scyBwYWNrYWdlCiNkZXZ0b29sczo6aW5zdGFsbF9naXRodWIoIlNtaXRoc29uaWFuL3JsczJkd2MiKQpsaWJyYXJ5KHJsczJkd2MpCmBgYAoKIyMgSW1wb3J0IHJlZWYgbGlmZSBzdXJ2ZXkgZGF0YWZyYW1lIGludG8gUgoKVGhlIGZ1bmN0aW9uIGByZWFkUkxTYCBjbGVhbnMgdXAgdGhlIGhlYWRlciwgY29tYmluZXMgZGF0ZSBhbmQgdGltZSBjb2x1bW5zIGFuZCBhZGRzIGEgc291cmNlIGNvbHVtbiB1c2luZyB0aGUgZmlsZW5hbWUgb2YgdGhlIGV4Y2VsIHNwcmVhZHNoZWV0LiBUaGUgZnVuY3Rpb24gdGFrZXMgdGhlIHBhdGggdG8gdGhlIGV4Y2VsIGZpbGUgdG8gbG9hZCBhbmQgKG9wdGlvbmFsKSB0aGUgbmFtZSBvZiB0aGUgdGFiLiBUaGUgbmFtZSBvZiB0aGUgdGFiIHdpdGggdGhlIGRhdGEgc2hvdWxkIGJlIHByb3ZpZGVkIGlmIGRpZmZlcmVudCBmcm9tIHRoZSBkZWZhdWx0IHZhbHVlIG9mICdEQVRBJy4KCgpgYGB7ciBsb2FkUmxzfQpybHNkYXRhIDwtIHJlYWRSTFMoeGxzeCA9ICIuLi9leGFtcGxlX2RhdGEueGxzeCIsIHNoZWV0bmFtZSA9ICJEQVRBIikgIyBsb2FkcyB0aGUgZXhjZWwgc3ByZWFkc2hlZXQgYnkgcHJvdmlkaW5nIHBhdGggJiBzaGVldG5hbWUgd2l0aCBkYXRhCgpoZWFkKHJsc2RhdGEpIApgYGAKCiMjIFZlcmlmeWluZyBzcGVjaWVzIG5hbWVzCgpUaGUgcmF3IGRhdGEgbG9hZGVkIGZyb20gdGhlIHJlZWYgbGlmZSBzdXJ2ZXkgZGF0YXNoZWV0IG5lZWRzIHRvIGhhdmUgYWxsIHRoZSBzY2llbnRpZmljIG5hbWVzIHZhbGlkYXRlZC4gV2Ugd2lsbCB1c2UgdGhlIFtXb3JsZCBSZWdpc3RlciBvZiBNYXJpbmUgU3BlY2llc10oaHR0cDovL3d3dy5tYXJpbmVzcGVjaWVzLm9yZy8pIHRvIHZhbGlkYXRlIGFsbCB0aGUgc2NpZW50aWZpYyBuYW1lcyBpbiB0aGUgZGF0YSBzaGVldC4gVGhpcyBpcyBhbiBpbnRlcmFjdGl2ZSBwcm9jZXNzLi4uLi4gdGhlIGZpcnN0IHBhc3Mgd2lsbCB0cnkgdG8gbWF0Y2ggYWxsIHRoZSBzY2llbnRpZmljIG5hbWVzIGFuZCB0aGVuIHJldHVybiB0aGUgdW5tYXRjaGVkIHZhbHVlcy4gSXQgaXMgdXAgdG8gdGhlIHVzZXIgdG8gZmlndXJlIG91dCB3aGF0IGlzIHdyb25nIHdpdGggdGhlIHNjaWVudGlmaWMgbmFtZSBhbmQgdGhlbiBjb3JyZWN0IGl0LiBPbmNlIHRoZSBuYW1lcyBoYXZlIGJlZW4gY29ycmVjdGVkIHRoZSBkYXRhIG11c3QgdGhlbiBiZSB2YWxpZGF0ZWQgYWdhaW4gYWdhaW5zdCBXb1JNUy4gTm90ZTogdmVyaWZ5aW5nIGFnYWluc3QgdGhlIFdvUk1TIHdlYnNlcnZpY2UgcmVxdWlyZXMgYW4gaW50ZXJuZXQgY29ubmVjdGlvbi4gCgpgYGB7ciB2ZXJpZnl9CiMgaW5pdGFsIGF0dGVtcHQgdG8gdmFsaWRhdGUgc3BlY2llcyBuYW1lcyBhZ2FpbnN0IHdvcm1zCnZlcmlmeV9zY2lOYW1lKHJsc2RhdGEsIGRyeXJ1bj1UUlVFKQoKYGBgCgpUaGUgaW5pdGFsIGF0dGVtcHQgdG8gdmFsaWRhdGUgc2NpZW50aWZpYyBuYW1lcyBhZ2FpbnN0IFdvUk1TIHdhcyBnZW5lcmFsbHkgcHJldHR5IHN1Y2Nlc3NmdWwuIE9ubHkgMTAgbmFtZSBvZiBvdXQgNjAgc2NpZW50aWZpYyBuYW1lcyBkaWQgbm90IGdldCBhIG1hdGNoLiBBIGNvdXBsZSBvZiB0aGVzZSBuYW1lcyBsb29rIGxpa2UgdGhleSB3aWxsIGJlIGVhc2lseSBmaXhlZCAtIHNldmVyYWwgaGF2ZSB0aGUgZm9ybWF0IG9mICdnZW51cyBzcC4nLiBMZXQncyByZW1vdmUgdGhlICdzcC4nIGZyb20gdGhlIG5hbWUgYW5kIHRyeSBhZ2Fpbi4gCgoKCmBgYHtyIHZlcmlmeTJ9CnJsc2RhdGEyIDwtIHJsc2RhdGEgJT4lIAogIHJlcGxhY2Vfc2NpTmFtZSgiQWNhbnRodXJ1cyBzcCIsICJBY2FudGh1cnVzIikgJT4lIAogIHJlcGxhY2Vfc2NpTmFtZSgiU2NhcmlkYWUgc3AuIiwgIlNjYXJpZGFlIikgJT4lIAogIHJlcGxhY2Vfc2NpTmFtZSgiQWJ1ZGVmZHVmIHNwLiIsICJBYnVkZWZkdWYiKQoKdmVyaWZ5X3NjaU5hbWUocmxzZGF0YTIsIGRyeXJ1bj1UUlVFKQoKYGBgCgpHcmVhdCEgVGhhdCBmaXhlZCB0aG9zZSB0aHJlZSBzY2llbnRpZmljIG5hbWVzLiBJdCdzIG9rIHRvIGhhdmUgdGhlIHJlY29yZHMgYXQgdGhlIGdlbnVzIGxldmVsLiBUaGUgYmVzdCBwcmFjdGljZSBpcyB0byB1c2UgdGhlIGxvd2VzdCBmdWxsIGlkZW50aWZpYWJsZSBuYW1lIGluIHRoZSBzY2llbnRpZmljIG5hbWUgZmllbGQuIFRoZSByZXN0IG9mIHRoZSBzY2llbnRpZmljIG5hbWVzIHdpbGwgaGF2ZSB0byBiZSB2ZXJpZmllZCBtYW51YWxseSBieSBnb2luZyB0byBbV29ybGQgUmVnaXN0ZXIgb2YgTWFyaW5lIFNwZWNpZXNdKGh0dHA6Ly93d3cubWFyaW5lc3BlY2llcy5vcmcvKSBhbmQgZmlndXJpbmcgb3V0IHRoZSBjb3JyZWN0LiBUaGVyZSBhcmUgc2V2ZXJhbCByZWNvcmRzIHRoYXQgaGF2ZSBoYWQgZ2VudXMgbmFtZSBjaGFuZ2VzIGFuZC9vciBzcGVsbGluZyBjb3JyZWN0aW9ucy4KCgpgYGB7ciB2ZXJpZnkzfQpybHNkYXRhMyA8LSBybHNkYXRhMiAlPiUgIAogIHJlcGxhY2Vfc2NpTmFtZSgiT3BoaW9kZXNvbWEgc3BlY3RhYmlsaXMiLCAiT3BoZW9kZXNvbWEgc3BlY3RhYmlsaXMiKSAlPiUgCiAgcmVwbGFjZV9zY2lOYW1lKCJPc3RlcmhpbmNodXMgbWFjdWxpZmVydXMiLCAiT3N0b3JoaW5jaHVzIG1hY3VsaWZlcnVzIikgJT4lIAogIHJlcGxhY2Vfc2NpTmFtZSgiQXN0ZXJvcHRlcnl4IHNlbWlwdW5jdGF0dXMiLCAiQXN0ZXJyb3B0ZXJ5eCBzZW1pcHVuY3RhdGEiKQoKdmVyaWZ5X3NjaU5hbWUocmxzZGF0YTMsIGRyeXJ1bj1UUlVFKQpgYGAKCgpBdCB0aGlzIHBvaW50LCB0aGVyZSBzdGlsbCBhcmUgYSBmZXcgbW9yZSBuYW1lcyB0aGF0IG5lZWQgc29tZSBtb3JlIGVmZm9ydCB0byBjb3JyZWN0LiBMZXQncyBsZWF2ZSB0aGVtIGluIHRoZSBkYXRhc2V0IGZvciB0aGUgdGltZSBiZWluZyBhcyBleGFtcGxlcyBvZiB1bmtub3duIG9yIHVudmFsaWRhdGVkIHNjaWVudGlmaWMgbmFtZXMuCgpgYGB7ciBmaW5hbFZhbH0KIyB2ZXJpZnkgdGhlIHNjaWVudGlmaWMgbmFtZXMgb25jZSBhZ2FpbiBidXQgdGhpcyB0aW1lIGxldCdzIHNhdmUgdGhlIHJlc3VsdCB0byBhbiBvYmplY3QKcmxzZGF0YV92YWxpZGF0ZWQgPC0gdmVyaWZ5X3NjaU5hbWUocmxzZGF0YTMsIGRyeXJ1bj1GQUxTRSkgCgpybHNkYXRhX3ZhbGlkYXRlZCAjIHRoZSBXb1JNUyBkYXRhIGlzIGpvaW5lZCB0byBvdXIgUkxTIGRhdGFmcmFtZQoKYGBgCgoKIyMgR2F0aGVyIHRvIGxvbmcgZm9ybWF0IAoKUmlnaHQgbm93IHRoZSByZWVmIGxpZmUgc3VydmV5IGRhdGEgaXMgaW4gYSB3aWRlICYgdW50aWR5IGZvcm1hdC4gRWFjaCByb3cgY29udGFpbnMgbXVsdGlwbGUgc2l6ZXMgY2xhc3NlcyBhbmQgY291bnRzLiBJbiBvcmRlciB0byBtYWtlIHRoZSBkYXRhIHRpZHksIHdlIG5lZWQgdG8gZ2F0aGVyIGFsbCB0aGVzZSBvYnNlcnZhdGlvbnMgaW50byBhIGxvbmcgZm9ybWF0IAoKYGBge3IgbG9uZ30KcmxzZGF0YV9sb25nIDwtIGdhdGhlcl9tZWFzdXJlcyhybHNkYXRhX3ZhbGlkYXRlZCkKCmhlYWQocmxzZGF0YV9sb25nICU+JSBzZWxlY3QoSUQsIEluZGl2aWR1YWxDb3VudCwgbWVhc3VyZW1lbnRWYWx1ZSwgbWVhc3VyZW1lbnRUeXBlLCBtZWFzdXJlbWVudFVuaXQpLCAxMDApCmBgYAoKCiMjIFVuaXF1ZSBJRHMgZm9yIGV2ZW50cyBhbmQgb2NjdXJyZW5jZXMKCkRhcndpbiBDb3JlIHJlcXVpcmVzIGVhY2ggZXZlbnQgYW5kIG9jY3VycmVuY2UgdG8gaGF2ZSBhbiB1bmlxdWUgaWRlbnRpZmVyLiBGb3IgcmVlZiBsaWZlIHN1cnZleSBkYXRhLCB0aGVyZSBpcyB0d28gdHlwZXMgb2YgZXZlbnRzLiBUaGUgcGFyZW50IGV2ZW50IGlzIGVhY2ggZGl2ZSBhbmQgdGhlIHN1YmV2ZW50IGlzIHRoZSBkaXZlIGJsb2NrIGFuZCBtZXRob2QuIAoKVGhlIHBhcmVudEV2ZW50SUQgaXMgYnVpbHQgYnkgY29tYmluaW5nIHRoZSBzaXRlIElEIGFuZCB0aGUgdGltZXN0YW1wLiAKVGhlIGV2ZW50SUQgaXMgYnVpbHQgYnkgY29tYmluaW5nIHRoZSBwYXJlbnRFdmVudElELCB0aGUgbWV0aG9kIGFuZCB0aGUgYmxvY2suCgpUaGUgb2NjdXJyZW5jZXMgdW5pcXVlIGlkZW50aWZpZXIgaXMganVzdCBhIHVuaXZlcnNhbGx5IHVuaXF1ZSBpZGVudGlmaWVyIChVVUlEKSB0aGF0IGlzIHJhbmRvbWx5IGdlbmVyYXRlZC4gCgpgYGB7ciBpZH0KcmxzZGF0YV9sb25nX2lkIDwtIG1ha2VJRHMocmxzZGF0YV9sb25nKQoKaGVhZChybHNkYXRhX2xvbmdfaWQgJT4lIHNlbGVjdChwYXJlbnRFdmVudElELCBldmVudElELCBTaXRlSUQsIERhdGVUaW1lLCBNZXRob2QsIEJsb2NrLCBvY2N1cnJlbmNlSUQpKQoKCgpgYGAgCgoKIyMgRGFyd2luIENvcmUgVGFibGVzCgpMYXN0IHN0ZXAuLi4uIG1ha2UgdGhlIERhcndpbiBDb3JlIEV2ZW50LCBPY2N1cnJlbmNlICYgTWVhc3VyZW1lbnRPckZhY3QgdGFibGVzIQoKIyMjIEV2ZW50CgpgYGB7ciBkd2NfdGFibGVzX2V2ZW50fQoKZXZlbnRzIDwtIGV2ZW50KHJsc2RhdGFfbG9uZ19pZCkKCmhlYWQoZXZlbnRzKQpgYGAKCiMjIyBPY2N1cnJlbmNlCgpgYGB7ciBkd2NfdGFibGVzX29jY3VyfQoKb2NjdXIgPC0gb2NjdXJlbmNlKHJsc2RhdGFfbG9uZ19pZCkKCmhlYWQob2NjdXIpCmBgYAoKCiMjIyBNZWFzdXJlbWVudE9yRmFjdAoKYGBge3IgZHdjX3RhYmxlc19tb2Z9CgptZWFzT3JGYWN0IDwtIGVtb2YocmxzZGF0YV9sb25nX2lkKQoKaGVhZChtZWFzT3JGYWN0KQpgYGAK