For Immediate Release: November 23, 2019
Launching Simtho: The Syriac Thesaurus
[Piscataway, NJ, November 23, 2019] Beth Mardutho: The Syriac Institute released today a beta version of project Simtho: The Syriac Thesaurus. Simtho is a textual database of Syriac literary works of all periods. The initial beta release contains more than 7 million tokens (words, punctuation marks, and other symbols) with over 6 million words of texts. Simtho is available freely online.
Users will be able to perform simple and complex searches on the text. Simple searches take the form of searching for a word or a phrase. Complex search expressions can contain regular expressions and corpus query language (CQL) expressions. Users can also filter their searches by date, genre, and a dozen other metadata attributes.
The current beta release aims at receiving feedback from the user community for a larger release scheduled in 2020 which will hopefully include linguistic lemmatization and part-of-speech tagging.
The corpus has been produced by a team of graduate students who held fellowships in the digital humanities at Beth Mardutho. Work on the current project began in earnest in March 2019 with impressive results within nine months.
The Institute wishes to thank the Herman and Mary K. Seibel Foundation for providing full-year fellowships and the following donors for providing summer fellowships for the digital humanities program:
- The Dr. Khalid and Mrs. Amira Dinno Fellowship
- The Mr. Malak Yunan and Dr. Evelyne Yunan Fellowship
- The Dr. Suhail and Mrs. Luna Zavaro Fellowship
- The Dr. Talal and Mrs. Wesal Findakly Fellowship
- The Edward Y. Hannoush Memorial Fellowship (sponsored by Dr. Peter and Dr. Gretchen Hannoush)
The Institute welcomes feedback on the beta release. Please send comments and feedback to firstname.lastname@example.org.
The Institute acknowledges the following for their work on the project: Johan M. V. Lundberg (University of Cambridge), Shelby Loster (Fuller Theological Seminary), Sebastian P. Brock (University of Oxford, Emeritus), William Clocksin (University of Hertfordshire), Slavomír Čéplö (Austrian Academy of Sciences / Slovak Academy of Sciences), William Bunce (University of Oxford), and Patrick Conlin (Marquette University). The project is directed by George A. Kiraz.
The project’s home page is bethmardutho.org/simtho.
Release of Beth Mardutho’s Qoruyo Project for Syriac OCR & HTR
FOR IMMEDIATE RELEASE: September 18, 2019
Beth Mardutho: The Syriac Institute, Piscataway, NJ
Beth Mardutho: The Syriac Institute (www.bethmardutho.org) is pleased to release Qoruyo, its handwritten-text recognition (HTR) models that permit scholars to convert images of ancient, medieval, and modern manuscripts into searchable texts.
The work was carried out by two of Beth Mardutho’s summer Fellows in the Digital Humanities who, within less than three months, created recognition models for the three Syriac scripts using the Transkribus software. The Estrangela, Serto (West Syriac), and East Syriac models regularly obtained accuracies between 96% and 98%.
The project was conceived by Abigail Pearson (University of Exeter) who held a Work-Study Fellowship for two years in a row at Beth Mardutho. Abigail had worked last summer with Digital Humanities Fellows Emily Chesley (Princeton University) and Jillian Marcantonio (Duke University) to evaluate Tesseract 4.0, Google’s OCR engine, for Syriac. She continued to be interested in the subject and returned to Beth Mardutho this summer with an idea of creating modules for handwritten manuscripts. Abigail concentrated on building a model for the East Syriac hand.
Kyle Brunner (New York University, Institute for the Study of the Ancient World) was the recipient of the Dr. Talal and Mrs. Wesal Findakly Fellowship in the Digital Humanities for 2019. He joined the project and built two models: one for Estrangelo and the other for Serto. Kyle gathered manuscript images of various hands and enhanced the models so that they can recognize texts written from various periods of time, beginning as early as the sixth century. The modules were tested on both printed texts and handwritten manuscripts.
Beth Mardutho now is able to share these models with scholars who desire to apply OCR on printed texts or HTR on handwritten manuscripts. The models are available on our website: http://bethmardutho.org/qoruyo/.
Beth Mardutho: The Syriac Institute (www.bethmardutho.org) is a non-profit education institution dedicated to the promotion of the Syriac language and its heritage especially via digital humanities. The Institute holds annual intensive courses in the Digital Humanities (not Syriac-specific) in January and Syriac language courses in July-August.
Beth Mardutho is supported primarily by annual membership; joining at different levels is available at http://bethmardutho.org/membership/. Those interested in supporting a named fellowship may inquire at http://bethmardutho.org/fellowships/.