Simtho: The Syriac Thesaurus is a medium-size database of Syriac literary texts. The Beta version was launched at AAR/SBL in San Diego in 2019 and consists of 7.3 million tokens (ca. 6.5 million words). Users can search the corpus using different methods: simple word and phrase search, regular expressions, and a Corpus Query Language. Search operations can be filtered by a rich set of metadata fields such as author, composition date periods, genre, poetic meter (when applicable), and much more. In addition to concordance results, users can find collocations and frequencies of occurrence. Search results can be saved or exported in text and XML formats. Simtho is freely available online. Volunteers who are interested in helping can read the Call for Volunteers and Call for Texts sections below.

The Thesaurus by the Numbers…

Tokens

Words

Unique Words

Documents

The Team

George A. Kiraz (Beth Mardutho and Institute for Advanced Study, Princeton), Simtho Editor-in-Chief

Sebastian P. Brock (University of Oxford, Emeritus), Senior Advisor

Slavomír Čéplö (Austrian Academy of Sciences / Slovak Academy of Sciences), Corpus Building and Management Specialist

Jack Tannous (Princeton University), Area Editor (Early Syriac to Renaissance)

Ephrem Ishac (Karl-Franzens-Universität Graz), Area Editor (Post-Renaissance to Kthobonoy Literature; Liturgical Texts)

Shelby Loster (Beth Mardutho: The Syriac Institute), Simtho Projects Coordinator, Seibel Digital Humanities Fellow, 2019–2020

Joshua Hood (Catholic University of America/Catholic Distance University), Seibel Digital Humanities Fellow, 2021

Jacob Margason, (Vanderbilt University), Data Engineer

Daniel Stoekl (École Pratique des Hautes Études – Université PSL), eScriptorium Model Training

Benjamin Kiessling (Université PSL), Kraken Advisor

Fadi Homsi (Hama University), Dr. Khalid and Mrs. Amira Dinno Fellow, Summer 2021

Anton Fleissner, Edward Y. Hannoush Memorial Fellow (sponsored by Dr. Peter and Dr. Gretchen Hannoush), Summer 2021

Maroun El Houkayem (Duke University), Dr. Suhail and Mrs. Luna Zavaro, Summer 2021

Yanir Marmor (Tel Aviv University), Dr. Nebil and Mrs. Jennifer Aydin Fellow, Summer 2021

Justinian Mândrilă (University of Göttingen), Dr. Jack Jallo and Mrs. Gage Johnston Fellow, Summer 2021

Ian Ollila (Princeton Theological Seminary), PTS Field Ed Intern, Summer 2021

 

Former Contributors

William Clocksin (University of Hertfordshire), OCR Specialist

Johan M. V. Lundberg (University of Cambridge), Seibel Digital Humanities Fellow, 2019

Brandon Allen (Princeton Theological Seminary/University of Oxford), Seibel Digital Humanities Fellow, 2020

William Bunce (University of Oxford), Dr. Khalid and Mrs. Amira Dinno Digital Humanities Fellow, Summer 2019

Patrick Conlin (Marquette University), Beth Mardutho Work-Study Fellow, Summer 2019

Kyle Brunner (New York University), Dr. Talal and Mrs. Wesal Findakly Fellow, 2019

Abigail Pearson (University of Exeter), Beth Mardutho Work-Study Fellow, 2018-2019

Emily Chesley (Princeton Theological Seminary), Dr. Jack Jallo and Mrs. Gage Johnston Fellow, 2018

Jillian Marcantonio (Duke University), Beth Mardutho Digital Humanities Fellow, Summer 2018

Muhannad Maher (St. Ephrem Seminary), Mr. Malak Yunan and Dr. Evelyne Yunan Fellow, Summer 2019

Tony Wardeh, Beth Mardutho Digital Humanities Corresponding Fellow, 2020

Jonathan Warner (Cornell University), Mr. Malak Yunan and Dr. Evelyne Yunan Fellow, Summer 2020

Joss Childs (University of Chicago), Dr. Nebil and Mrs. Jennifer Aydin Fellow, Summer 2020

Omri Matarasso (Princeton University), Dr. Khalid and Mrs. Amira Dinno Fellow, Summer 2020

Srecko Koralija (University of Cambridge), Beth Mardutho Digital Humanities Corresponding Fellow, 2020

Yevgeniy Safronov (Princeton Theological Seminary), PTS Field Ed Intern, Summer 2020

David Micahel Felsch (Princeton Theological Seminary), PTS Field Ed Intern, Summer 2020

Briana Grenert (Princeton Theological Seminary), PTS Field Ed Intern, Summer 2020

Elias Jallo, Beth Mardutho Digital Humanities Intern, Summer 2020

Ethan Laster (Saint Louis University), Beth Mardutho Digital Humanities Volunteer, 2020

Fr. Mathew Jacob, Beth Mardutho Volunteer, 2020

Tetiana Shyshkina (National University of Kyiv-Mohyla Academy), Beth Mardutho Digital Humanities Volunteer, 2020

Jacob Mathew, Beth Mardutho Digital Humanities Volunteer, Summer 2020

Patrick Robert Kiernan (Princeton Theological Seminary), PTS Non-Field Ed Intern, Summer 2021

Call for Submissions

We welcome contributions of typed texts. Scholars who published critical editions of texts (in book or article format) are encouraged to send us their texts for inclusion. As Simtho is a concordance software, it does not violate the copyright of published material. Please send submissions to simtho@bethmardutho.org.

 

Call for Participation

We welcome volunteers who know Syriac at any level. There are tasks for those who can only recognize Syriac letters and tasks for experts on Syriac literature—and everything in between. Please contact us at simtho@bethmardutho.org.

 

Technical Notes

Users are encouraged to read the SketchEngine user guide. As of now, the software is not able to ignore diacritics during search. As such, the Simtho project had to compromise as follows:

  1. All vowel marks, most diacritical dots, and non-Latin punctuation marks were removed. If future releases of the software will support search with diacritical marks, these will be reinstated.
  2. The plural syome double-dot was moved by convention to the end of the string (otherwise, the user has to know where it is to perform a search); e.g., singular ܟܬܒܐ, plural ܟܬܒܐ̈.
  3. The feminine dot on ܗ̇ was retained. In the case of plurals, the syome will come after it like this ܗ̇̈. This order is important to perform search operations.
  4. The single disambiguation dot, when present, was retained; e.g. ܡ̇ܢ vs. ܡ̣ܢ. We thought that this will disambiguate the text.

Users can use regular expressions to perform searches that ignore these marks.

 

SketchEngine implements a Corpus Query Language (CQL) and regular expression searching. Users are encouraged to learn this query language to make the most of Simtho. This will permit one, for example, to search a word form with or without ܒܕܘܠ prefixes or suffixes.

 

The metadata is self-explanatory. The tag .CompositionYear is a zero-padded estimate to the date of the text. Early nth century has been encoded as 0m25 where m=n-1; e.g., early 6th century is encoded 0525. Similarly, mid or simply 6th century is 0550, late 6th century is 0575. When the date is a range over more than one century (e.g. 6th or 7th century), the later date is encoded. The zero-padding permits users to sort concordance results chronologically as the algorithms use lexical sorting. In the case of Greek texts translated into Syriac, when the Syriac translator is known, the year is given for the translator because that reflects the Syriac composition. (The translator abbreviation is given next to the author; e.g. “SevAnt(JacEdes)” for Severus of Antioch (tr. Jacob of Edessa). Otherwise, the date of the Greek author is given.

 

The tag .DocumentType explains the method by which the electronic version of the text was created. The value OCR stands for optical character recognition, an automatic way to convert images of texts into texts. While we used highly reliable software, OCR is never 100% accurate. Before citing texts in a research paper, make sure you check the text in the print publication or the manuscript from which it is taken. This data is given in the .Reference tag. One should also double check the page (page.nr) and line number (line.nr) references. 

Please send feedback to simtho@bethmardutho.org. We welcome to hear about errors we have made!