Simtho: The Syriac Thesaurus is a medium-size database of Syriac literary texts. The Beta version was launched at AAR/SBL in San Diego in 2019 and consists of 7.3 million tokens (ca. 6 million words). Users can search the corpus using different methods: simple word and phrase search, regular expressions, and a Corpus Query Language. Search operations can be filtered by a rich set of metadata fields such as author, composition date periods, genre, poetic meter (when applicable), and much more. In addition to concordance results, users can find collocations and frequencies of occurrence. Search results can be saved or exported in text and XML formats. Simtho is freely available online. Volunteers who are interested in helping can read the Call for Volunteers and Call for Texts sections below.
The Thesaurus by the Numbers…
- George A. Kiraz (Beth Mardutho and Institute for Advanced Study, Princeon), Simtho Editor-in-Chief
- Johan M. V. Lundberg (University of Cambridge), Seibel Digital Humanities Fellow, 2019
- Shelby Loster (Fuller Theological Seminary), Seibel Digital Humanities Fellow, 2019–2020
- Sebastian P. Brock (University of Oxford, Emeritus), Senior Advisor
- William Clocksin (University of Hertfordshire), OCR Specialist
- Slavomír Čéplö (Austrian Academy of Sciences / Slovak Academy of Sciences), Corpus Building and Management Specialist
- William Bunce (University of Oxford), Dr. Khalid and Mrs. Amira Dinno Digital Humanities Fellow, Summer 2019
- Patrick Conlin, Beth Mardutho Work-Study Fellow, Summer 2019
Call for Submissions
We welcome contributions of typed texts. Scholars who published critical editions of texts (in book or article format) are encouraged to send us their texts for inclusion. As Simtho is a concordance software, it does not violate the copyright of published material. Please send submissions to firstname.lastname@example.org.
Call for Participation
We welcome volunteers who know Syriac at any level. There are tasks for those who can only recognize Syriac letters and tasks for experts on Syriac literature—and everything in between. Please contact us at email@example.com.
Users are encouraged to read the SketchEngine user guide. As of now, the software is not able to ignore diacritics during search. As such, the Simtho project had to compromise as follows:
- All vowel marks, most diacritical dots, and non-Latin punctuation marks were removed. If future releases of the software will support search with diacritical marks, these will be reinstated.
- The plural syome double-dot was moved by convention to the end of the string (otherwise, the user has to know where it is to perform a search); e.g., singular ܟܬܒܐ, plural ܟܬܒܐ̈.
- The feminine dot on ܗ̇ was retained. In the case of plurals, the syome will come after it like this ܗ̇̈. This order is important to perform search operations.
- The single disambiguation dot, when present, was retained; e.g. ܡ̇ܢ vs. ܡ̣ܢ. We thought that this will disambiguate the text.
Users can use regular expressions to perform searches that ignore these marks.
SketchEngine implements a Corpus Query Language (CQL) and regular expression searching. Users are encouraged to learn this query language to make the most of Simtho. This will permit one, for example, to search a word form with or without ܒܕܘܠ prefixes or suffixes.
The metadata is self-explanatory. The tag .CompositionYear is a zero-padded estimate to the date of the text. Early nth century has been encoded as 0m25 where m=n-1; e.g., early 6th century is encoded 0525. Similarly, mid or simply 6th century is 0550, late 6th century is 0575. When the date is a range over more than one century (e.g. 6th or 7th century), the later date is encoded. The zero-padding permits users to sort concordance results chronologically as the algorithms use lexical sorting. In the case of Greek texts translated into Syriac, when the Syriac translator is known, the year is given for the translator because that reflects the Syriac composition. (The translator abbreviation is given next to the author; e.g. “SevAnt(JacEdes)” for Severus of Antioch (tr. Jacob of Edessa). Otherwise, the date of the Greek author is given.
The tag .DocumentType explains the method by which the electronic version of the text was created. The value OCR stands for optical character recognition, an automatic way to convert images of texts into texts. While we used highly reliable software, OCR is never 100% accurate. Before citing texts in a research paper, make sure you check the text in the print publication or the manuscript from which it is taken. This data is given in the .Reference tag. One should also double check the page (page.nr) and line number (line.nr) references.
Please send feedback to firstname.lastname@example.org. We welcome to hear about errors we have made!