1. Smithsonian National Zoo and Conservation Biology Institute, Conservation Ecology Center, 1500 Remount Rd, Front Royal, VA 22630, USA.

  2. Working Land and Seascapes, Conservation Commons, Smithsonian Institution, Washington, DC 20013, USA.


This is a guide for modeling species distributions and habitat suitability in Google Earth Engine. This guide is intended to explain the details of the Earth Engine code developed for this manuscript.

We first cover the basics for importing data and setting the main arguments used in different functions, such as, grid size and the area of interest. We then expand on different modelling workflows using three different case studies to demonstrate how to adapt the code workflow for different goals.

For information on how to set up a Google Earth Engine account as well as user guidelines and tutorials visit: https://developers.google.com/earth-engine/

The code found below can also be accessed through the GEE repository for this study: https://code.earthengine.google.com/?accept_repo=users/ramirocrego84/SDM_Manuscript

1 General settings for running SDMs in Google Earth Engine

1.1 Importing species location data as an asset

Datasets need to be uploaded as assets in Google Earth Engine. The easiest way to do this is by creating a csv file with spatial coordinates and any other desired attribute information. Note that you can also upload an ESRI Shapefile with the species location data.

Below is an example for uploading the Bradypus variegatus data set from a csv file. Prepare a csv file with coordinates in latitude and longitude (EPSG:4326). To include a column with date use format Year-Month-Day (e.g., 2000-01-30).

Figure S1. Steps for uploading assets to Google Earth Engine. 1) Click ‘New’ under the Assets tab and then select ‘CSV file (.csv)’. 2) Click ‘SELECT’. 3) Browse and select the file from your computer. 4) Provide a name for the asset and the names of the columns containing coordinates in degrees.

1.2 Loading and cleaning your species data

To import the asset into your active script you can click on the forward arrow icon on your asset manager or you can use code to programmatically load the data as a new object. We recommend using code to import data. To import the asset with your species presence data, use the ee.FeatureCollection() function and provide the asset ID. For example:

var Data = ee.FeatureCollection('users/yourfolder/yourdata');

One important step in modeling species distributions is to limit the potential effect of geographic sampling bias on the model output due to data aggregation resulting from multiple nearby observations.

We thin the location data to one randomly selected occurrence record per pixel at the chosen spatial resolution (the raster pixel or grain size of the analysis).

Here, we will apply a function to remove all points that lay within the same raster cell at a given grain size. For this, we first need to define the spatial resolution of our study.

// Define spatial resolution to work with (m)
var GrainSize = 10000; // e.g. 10 km

Then, we can define a function to remove duplicates and apply it to the species data set.

function RemoveDuplicates(data){
  var randomraster = ee.Image.random().reproject('EPSG:4326', null, GrainSize);
  var randpointvals = randomraster.sampleRegions({collection:ee.FeatureCollection(data), scale: 10, geometries: true});
  return randpointvals.distinct('random');
}

var Data = RemoveDuplicates(Data);

The following figure exemplifies how points are rarefied at a 1 km grain size.

Figure S2. Example of presence point filtering. A) Original dataset; B) Final dataset with only one presence point retained per pixel.

You can evaluate the number of points before and after removing duplicates.

print(ee.FeatureCollection('users/yourfolder/yourimage').size())
print(Data.size())

1.3 Define your area of interest for modeling

The extent of the analysis should be carefully selected and constrained to a realistic realm of the species of study, avoiding unrealistic extents that can hamper model accuracy and predictions (Guisan et al., 2017; Leroy et al., 2018; Sillero et al., 2021).

There are different ways you can define your area of interest. You can directly draw a polygon using the drawing tools in GEE or manually set the polygon (e.g., Case Study 2 in this tutorial). Here, we present two methods for automating this process.

If you are interested in working with a specific country or continent, you can use the Large Scale International Boundary Polygons data set available in GEE catalog.

Here an example to select Kenya:

// Load region boundary from data catalog if working at a larger scale
var AOI = ee.FeatureCollection('USDOS/LSIB_SIMPLE/2017').filter(ee.Filter.eq('country_co', 'KE'));

You can see the list of country codes at: https://en.wikipedia.org/wiki/List_of_FIPS_country_codes

If you are interested in working within the entire African continent, you can use:

// Load country boundary from data catalog if working at a country scale
var AOI = ee.FeatureCollection('USDOS/LSIB_SIMPLE/2017').filter(ee.Filter.eq('wld_rgn', 'Africa'));

Another option is to select a bounding box around your species location data. For example, we can define a bounding box using the function bounds() and add a buffer of 50 km.

// Define the area of interest
var AOI = Data.geometry().bounds().buffer(50000);

To display the study area on the map use the following code and assign the map layer the name ‘AOI’:

// Add AOI to the map
Map.addLayer(AOI, {}, 'AOI', 1); // The number 1 indicates the zoom level. Higher numbers increases zoom level.

1.4 Selecting predictor variables

One of the main advantages of implementing SDMs in Google Earth Engine is to make use of the large number of datasets available as predictor variables. This includes not only the bioclimatic variables from Hijmans et al. (2005), but elevation data and derivatives (slope, aspect, hillside, etc.), diverse vegetation indices, human modification indices, nighttime light images, water bodies, hourly climatic data, land cover classifications, roads or other infrastructure and even the raw pixel values of satellite data. Depending on your area of interest, certain regions have greater data availability. GEE also offers the opportunity to directly include user-derived datasets in your analysis, such as processed satellite imagery (e.g., a land cover classification that you previously developed for your area of interest).

Selecting predictor variables is a step in which the researcher needs to rely on existing knowledge of the study species, such as the variables that may affect its distribution, etc.

To find spatial data sets, you can use the search bar. All information related to each spatial dataset is available by clicking on the name of the product. The code necessary to import the dataset is available as shown in the following figure.