1 CEFE, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France. E-mail:

2 AMAP, Univ Montpellier, CIRAD, CNRS, INRAE, IRD, Montpellier, France E-mail:

RATIONALE

The formal description of species new to science is a painstaking process that can be particularly time-consuming when dealing with highly diverse and taxonomically unresolved groups of organisms. The compilation of sample lists (including types and other material), as well as the writing of morpho-anatomical descriptions, must then be repeated many times, considerably increasing the time needed to prepare the manuscripts and to publish the new taxa. The existence of tools such as R Markdown (Allaire et al. 2021) offers a solution to circumvent these difficulties and speed up the manuscript writing process in an automated way. R Markdown offers a simplified syntax that allows formating documents containing text, R functions and the output provided by R when evaluating these functions. Specimen lists and morpho-anatomical descriptions can then be generated using well-written functions that will use the information contained in previously established specimen lists and character tables.

In this paper we present the script lines that have been used to speed up the process of describing one genus and 18 species new to science in Decaëns et al. (2024). The proposed example corresponds to Martiodrilus flavus Decaëns & Bartz, 2022 as published in the original article, and has been formated to comply with the formating of the journal Zoosystema. This template was then repeated and adjusted to write in a standardized way all the original descriptions included in this publication.

INPUT FILES

The scripts use the tables available in Supplemantary Information:

The first step consists in loading these two files to create two dataframes. You can use for this the following code lines embeded in a R MarkDown chunk (Allaire et al. (2021)). To avoid the code blocks to appear once kniting the final document, specify {r, include=FALSE} in the setup of the chunk.

### create a dataframe with the characters for each new species:
c<-read.csv2("1_Datasets/Characters.csv",h=T,row.names = 1)

### create a dataframe with the list of specimens:
m<-read.csv2("1_Datasets/Specimens.csv")

The following sections successively present the scripts that were used to assemble the original description of M. flavus.

LOADING THE NECESSARY INFORMATION

We used the plyr library in order to manipulate the data tables and extract all the necessary information. You first need to load the library using the function:

library(plyr)

The plyr library (Anonymous 2011) is then used to manipulate the tables to extract the information relevant to the species under consideration (here Martiodrilus_TD064 in the different input tables). An example of whan can be done, once embeded in a R MarkDown chunk is given bellow (remember to specify {r, include=TRUE} in the setup line):

c2<-c["Martiodrilus flavus",] # load the morpho-anatomical characters for the target species, here Martiodrilus flavus
m2<-subset(m, Species == "Martiodrilus flavus") # select the specimens of the target species in the list of specimens
ht<-subset(m2, Types=="Holotype") # select information relative to the Holotype
pt<-subset(m2, Types=="Paratype") # select information relative to the Paratypes
om<-subset(m2, Types=="Other") # select information relative to other studied specimens (i.e. other material)
ls<-count(om, "Life.Stage") # count specimens for all life stages in other material
pt_r<-pt[,-c(1:7, 18:22)] # delete columns in pt in order to keep only collecting data
pt_d<-pt_r[!duplicated(pt_r), ] # delete duplicated lines in pt_r, so as only one column per sampling locality is kept
pt_d<-pt_d[order(pt_d$Exact.Site.2,decreasing=F), ] # organise pt_d by alphabetical order of Exact.Site.2
pt_d["Nb.Specimens"]<-by(pt_r, pt_r$Exact.Site.2, nrow) # insert a column in pt_d with the number of specimens per Exact.Site.2
om_r<-om[,-c(1:7, 18:22)] # delete columns in om in order to keep only collecting data
om_d<-om_r[!duplicated(om_r), ] # delete duplicated lines in om_r, so as only one column per sampling locality is kept
om_d<-om_d[order(om_d$Exact.Site.2,decreasing=F), ] # organise om_d by alphabetical order of Exact.Site.2
om_d["Nb.Specimens"]<-by(om_r, om_r$Exact.Site.2, nrow) # insert a column in om_d with the number of specimens per Exact.Site.2

##### Assembling the list of Paratype localities #####
# Create a template vector with the first locality appearing in pt_d; here you can adapt the script to the format of the journal where the description will be published
x1 = paste(sep = "", # use the function 'paste' to concatenate a vector of characters where terms are separated by ""
           pt_d[1,"Sector"], # concatenate the 'sector' from the first line of pf_d 
           ", ", # concatenate a coma followed by a space
           pt_d[1,"Exact.Site"], # concatenate the 'exact site' from the first line of pf_d 
           ", ", # concatenate a coma followed by a space
           pt_d[1,"Exact.Site.2"], # concatenate the 'exact.site.2' from the first line of pf_d 
           "; latitude/longitude: ", # concatenate the text '; latitude/longitude: '
           pt_d[1,"Lat"], # concatenate the latitude from the first line of pf_d 
           "/", # concatenate a / followed by a space
           pt_d[1,"Lon"], # concatenate the latitude from the first line of pf_d 
           "; elevation: ", # concatenate the text '; elevation: '
           pt_d[1,"Elev"], # concatenate the elevation from the first line of pf_d 
           " m asl; ", # concatenate the text ' m asl' after the elevation
           pt_d[1,"Collection.Date"], # concatenate the collection date from the first line of pf_d 
           "; ", # concatenate the a semi-colon followed by a space;
           pt_d[1,"Collectors"], # concatenate the collector names from the first line of pf_d 
           " leg. (", # concatenate the text 'leg. ' after the collector names, and open a bracket
           pt_d[1,"Nb.Specimens"], # concatenate the number of specimens from the first line of pf_d 
           " specimens); ") # concatenate the text ' specimens);' after the elevation

print(x1) # see the result
## [1] "Tumuc-Humac, Mitaraka Massif, plateau forest on DIADEMA project D trail; latitude/longitude: 2.216/-54.457; elevation: 381 m asl; March 2015; T. Decaëns, E. Lapied leg. (3 specimens); "
# create a loop to add in x1 the collecting data for all the localities (i.e. the lines) in pt_d 
if (nrow(pt_d)>1){ # if the number of lines in pt_d is > 1
  for(i in 2:nrow(pt_d)) { # then for all lines from the second one
  y1 = paste(sep = "", pt_d[i,"Sector"], ", ", pt_d[i,"Exact.Site"], ", ", pt_d[i,"Exact.Site.2"],"; latitude/longitude: " ,pt_d[i,"Lat"],"/",pt_d[i,"Lon"],"; elevation: ", pt_d[i,"Elev"], " m asl; ", pt_d[i,"Collection.Date"], "; ", pt_d[i,"Collectors"], " leg. (", pt_d[i,"Nb.Specimens"], " specimens); ") # create a vector structured as x1 that contain the collecting data each of these lines
  x1 = paste (x1, y1) # concatenate x1 and y1
}
}

print(x1) # see the result, in the example pt_d contains two lines, with three specimens for the first locality and two specimens for the second locality 
## [1] "Tumuc-Humac, Mitaraka Massif, plateau forest on DIADEMA project D trail; latitude/longitude: 2.216/-54.457; elevation: 381 m asl; March 2015; T. Decaëns, E. Lapied leg. (3 specimens);  Tumuc-Humac, Mitaraka Massif, slope forest on trail to Sommet en Cloche Inselberg; latitude/longitude: 2.23509/-54.4532; elevation: 352 m asl; March 2015; T. Decaëns, E. Lapied leg. (2 specimens); "
##### Assembling the list of localities for "other material" #####
# Releat the same procedure using om_d instead of pt_d
x = paste(sep = "", om_d[1,"Sector"], ", ", om_d[1,"Exact.Site"], ", ", om_d[1,"Exact.Site.2"],"; latitude/longitude: ", om_d[1,"Lat"],"/",om_d[1,"Lon"],"; elevation: ", om_d[1,"Elev"], " m asl; ", om_d[1,"Collection.Date"], "; ", om_d[1,"Collectors"], " leg. (", om_d[1,"Nb.Specimens"], " specimens); ")

if (nrow(om_d)>1){
for(i in 2:nrow(om_d)) {
  y = paste(sep = "", om_d[i,"Sector"], ", ", om_d[i,"Exact.Site"], ", ", om_d[i,"Exact.Site.2"], "; latitude/longitude: ", om_d[i,"Lat"],"/",om_d[i,"Lon"],"; elevation: ", om_d[i,"Elev"], " m asl; ", om_d[i,"Collection.Date"], "; ", om_d[i,"Collectors"], " leg. (", om_d[i,"Nb.Specimens"], " specimens); ")
  x = paste (x, y)
}
}
print(x)
## [1] "Tumuc-Humac, Mitaraka Massif, plateau forest at base camp; latitude/longitude: 2.23398/-54.4503; elevation: 331 m asl; March 2015; T. Decaëns, E. Lapied leg. (1 specimens);  Tumuc-Humac, Mitaraka Massif, plateau forest on DIADEMA project D trail; latitude/longitude: 2.216/-54.457; elevation: 381 m asl; March 2015; T. Decaëns, E. Lapied leg. (9 specimens); "
##### Calculate the % per habitats and microhabitats #####
mic<-c(by(m2, m2$Microhabitat, nrow)) # count the number of specimens (i.e. lines) in m2 for each microhabitat
mic
## in decaying trunk       in the soil 
##                13                 3
hab<-c(by(m2, m2$Habitat, nrow)) # count the number of specimens (i.e. lines) in m2 for each habitat
hab
##    plateau forest      slope forest transition forest 
##                13                 2                 1

You can then use the following text blocks with inline codes to successively format the Holotype, Paratype and other material information in the knited document. These templates are design to fullfil with the formating requirements of the journal Zoosystema, but they can easily be adjusted for any other journal.

<span style="font-variant:small-caps;">Type material.</span> -- **Holotype. `r ht[,"Country.Ocean"]` ** • `r ht[,"Life.Stage"]` (with posterior regeneration);  `r ht[1,"Sector"]`, `r ht[,"Exact.Site"]`, `r ht[,"Exact.Site.2"]`, `r ht[,"Microhabitat"]`; latitude/longitude: `r ht[,"Lat"]`/`r ht[,"Lon"]`; elevation: `r ht[,"Elev"]` m asl; `r ht[,"Collection.Date"]`; `r ht[,"Collectors"]` leg.; BOLD Sample ID: `r ht[,"Sample.ID"]`; deposited at MNHN.

Which gives in the knited document:

Type material.Holotype. French Guiana • Adult (with posterior regeneration); Tumuc-Humac, Mitaraka Massif, transition forest on DIADEMA project C trail, in decaying trunk; latitude/longitude: 2.23865/-54.4352; elevation: 389 m asl; March 2015; T. Decaëns, E. Lapied leg.; BOLD Sample ID: EW-MI15-0001; deposited at MNHN.

**Paratypes. `r pt[1,"Country.Ocean"]`** • `r nrow(pt)` `r pt[1,"Life.Stage"]` specimens; `r x1` BOLD Sample ID: `r pt[,"Sample.ID"]`; deposited  as follow: 3 specimens at CEFE, 2 at MNHN.

Which gives in the knited document:

Paratypes. French Guiana • 5 adult specimens; Tumuc-Humac, Mitaraka Massif, plateau forest on DIADEMA project D trail; latitude/longitude: 2.216/-54.457; elevation: 381 m asl; March 2015; T. Decaëns, E. Lapied leg. (3 specimens); Tumuc-Humac, Mitaraka Massif, slope forest on trail to Sommet en Cloche Inselberg; latitude/longitude: 2.23509/-54.4532; elevation: 352 m asl; March 2015; T. Decaëns, E. Lapied leg. (2 specimens); BOLD Sample ID: EW-MI15-0358, EW-MI15-0359, EW-MI15-0008, EW-MI15-0009, EW-MI15-0010; deposited as follow: 3 specimens at CEFE, 2 at MNHN.

**Other material. `r om[1,"Country.Ocean"]`** • `r by(om,om$Life.Stage,nrow)[1]` juvenile specimens, `r by(om,om$Life.Stage,nrow)[2]` cocoons; `r x` BOLD Sample ID: `r om[,"Sample.ID"]`; deposited at MNHN.

Which gives in the knited document:

Other material. French Guiana • 2 juvenile specimens, 8 cocoons; Tumuc-Humac, Mitaraka Massif, plateau forest at base camp; latitude/longitude: 2.23398/-54.4503; elevation: 331 m asl; March 2015; T. Decaëns, E. Lapied leg. (1 specimens); Tumuc-Humac, Mitaraka Massif, plateau forest on DIADEMA project D trail; latitude/longitude: 2.216/-54.457; elevation: 381 m asl; March 2015; T. Decaëns, E. Lapied leg. (9 specimens); BOLD Sample ID: EW-MI15-0024, EW-MI15-0028, EW-MI15-0012, EW-MI15-0013, EW-MI15-0014, EW-MI15-0015, EW-MI15-0016, EW-MI15-0017, EW-MI15-0018, EW-MI15-0175; deposited at MNHN.

Similarly, you can mix text and inline codes to format the morpho-anatomical description of the new species:

<span style="font-variant:small-caps;">Description</span>

*External morphology.*

Body shape `r c2[,"Shape"]`. Body pigmentation `r c2[,"Pigmentation"]`. Body length: `r c2[,"Size"]` after ethanol fixation. Body mass: `r c2[,"Weight"]` after ethanol fixation. Diameter: `r c2[,"Diam_pre"]` in the preclitellar region, `r c2[,"Diam_clit"]` in the clitellum, `r c2[,"Diam_post"]` in the postclitellar region. Number of segments: `r c2[,"Segments"]`. Prostomium `r c2[,"Prostomium"]`. Setae `r c2[,"Setae_relation"]`. Postclitellar setal arrangement aa:ab:bc:cd:dd = `r c2[,"aa"]`:`r c2[,"ab"]`:`r c2[,"bc"]`:`r c2[,"cd"]`:`r c2[,"dd"]`. Clitellum in `r c2[,"Clitellum"]`. Genital markings in `r c2[,"Marks"]`. Tubercula pubertatis `r c2[,"Puberculum"]`. Male pores `r c2["Male_pore"]`, and ovipores `r c2[,"Female_pore"]`. Spermathecal pores `r c2[,"Spermatecal_pore"]`. Nephridial pores `r c2[,"Nephridial_pore"]`.

*Internal anatomy.*

Septa: `r c2[,"Septa"]`. Gizzard: `r c2[,"Gizzard"]`, with an average size (width x length) of `r c2[,"Gizzard_size"]`. Calciferous glands: `r c2[,"Calcíferous_glands"]`. Esophagus-intestine transition `r c2[,"Intestine"]`. Typhlosole `r c2[,"Typhlosole"]`. Hearts: `r c2[,"Hearts"]`. Excretory system: `r c2[,"Nephridia"]`. Testes sacs: `r c2[,"Testae"]`. Seminal vesicles: `r c2[,"Seminal_vesicles"]`. Spermathecae: `r c2[,"Spermatheca"]`.

Which gives in the knited document:

Description

External morphology.

Body shape cylindrical. Body pigmentation yellow when alive, sometimes with a transition to grey toward the tail and in regerating parts of the body, clitellum pink when alive, beige after ethanol fixation. Body length: 265 to 300 mm after ethanol fixation. Body mass: 15.45 to 23.70 g after ethanol fixation. Diameter: 9.3 to 12.5 mm in the preclitellar region, 12.2 to 13 mm in the clitellum, 10.5 to 13 mm in the postclitellar region. Number of segments: 234 to 264. Prostomium proepilobic. Setae closely paired, ab and cd beginning in III or IV. Postclitellar setal arrangement aa:ab:bc:cd:dd = 10:1:12:1:34. Clitellum in (1/2 XIII) XIV–1/2 XXVI, saddle–shaped. Genital markings in in V–XIII and intraclitellar in XIV–XXIV (ab position). Tubercula pubertatis linear in XIX–XXV. Male pores not observed, and ovipores in XIV, anterior and slightly dorsal to b. Spermathecal pores in 5/6, 6/7 and 7/8, in line of ab. Nephridial pores begin in III–IV, in D line.

Internal anatomy.

Septa: membranous, slightly thickened in 9/10 to 13/14. Gizzard: muscular and well developed in VI, but displaced to X, XI, with an average size (width x length) of 6.15 x 8.05 mm. Calciferous glands: eight pairs in VII–XIV, yellow bean shaped with a brown round distal appendix in VII–XII, deprived of appendix and kidney shaped in XIII–XIV; all with composite tubular structure. Esophagus-intestine transition in XIX; intestine without caeca. Typhlosole abruptly begins in XXV, structured as a large folded tissue. Hearts: six pairs, the two intestinal pairs well developed in X–XI, enclosed in the testes sacs. Excretory apparatus holoic, nephridia with simple nephrostome. Testes sacs: midventral or hypoesophagic in X and XI. Seminal vesicles: two pairs in XI–XII, short and flat lobulated, and respectively inserted ventrally in X and XI by a tube passing throught the septa. Spermathecae: three pairs, VI, VII and VIII, without diverticula.

The same applies for the habitat and microhabitat preferences:

<span style="font-variant:small-caps;">Ecology.</span> -- *M. (B.) flavus* Decaëns & Bartz n. sp. was predominantly found in plateau forests (`r round(hab[1]*100/sum(hab), digits=2)` % of specimens) and other well drained habitats such as slope forests (`r hab[2]*100/sum(hab)` %) and transition forest at the edge of rocky savannah (`r hab[3]*100/sum(hab)` %). It preferentially inhabits under large decaying trunks fallen at the soil surface (`r mic[1]*100/sum(mic)` %), but was also occasionally found within the soil (`r mic[2]*100/sum(mic)` %).

Which gives in the knited document:

Ecology.M. (B.) flavus Decaëns & Bartz n. sp. was predominantly found in plateau forests (81.25 % of specimens) and other well drained habitats such as slope forests (12.5 %) and transition forest at the edge of rocky savannah (6.25 %). It preferentially inhabits under large decaying trunks fallen at the soil surface (81.25 %), but was also occasionally found within the soil (18.75 %).

DISCUSSION

Using the scripts described in this paper, we have been able to significantly speed up the process of writing the original descriptions presented in (Decaëns et al. 2024). Our approach paves the way for a integrative turbo taxonomy (sensu Butcher et al. (2012)) based on the use of molecular and morpho-anatomical information and a formatting pipeline for alpha-taxonomy manuscript. In the near future, it would be interesting to assess the possibility of implementing this type of tool directly on bioinformatics platforms such as the BOLD database (Ratnasingham & Hebert 2007), which would allow the taxonomist community to have access to it without having to master the R Markdown language. This would undoubtedly provide an opportunity to speed up the description of the unknown fraction of biodiversity while maintaining a high standard of description quality.

REFERENCES

Allaire J.J., Xie Y., McPherson J., Luraschi J., Ushey K., Atkins A., Wickham H., Cheng J., Chang W. & Iannone R. 2021. — Rmarkdown: Dynamic documents for R. R package version 2.10. URL: Https://rmarkdown.rstudio.com
Butcher B.A., Smith M.A., Sharkey M.J. & Quicke D.L. 2012. — A turbo-taxonomic study of Thai Aleiodes (Aleiodes) and Aleiodes (Arcaleiodes) (Hymenoptera: Braconidae: Rogadinae) based largely on COI barcoded specimens, with rapid descriptions of 179 new species. Zootaxa 3457 (1): 1–232. https://doi.org/10.11646/zootaxa.3457.1.1
Decaëns T., Bartz M.L.C., Goulpeau A., Marchan D.F., Maggia M.-E., Lapied E., Feijoo A.M. & James S.W. 2024. — Earthworms (Oligochaeta: Clitellata) of the Mitaraka range (French Guiana): Commented checklist with description of one genus and eighteen species new to science. Zoosystema 46 (9): 195–244
Ratnasingham S. & Hebert P.D.N. 2007. — BOLD: The Barcode of Life Data System (www.barcodinglife.org). Molecular Ecology Notes 7 (3): 355–364. https://doi.org/10.1111/j.1471-8286.2007.01678.x
Anonymous 2011. — The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software 40 (1): 1–29. https://doi.org/10.18637/jss.v040.i01