TRANKRIBUS SEMİNER RAPORU

digitalottomanstudies
Nov 3, 2022
5 min read

The Report of TRANSKRIBUS SEMINAR

*Please scroll down for English

Digital Ottoman Studies Platformu (DOS) olarak 20 Ekim 2022 tarihinde organize ettiğimiz "Transkribus: Osmanlı Türkçesi Yapay Zeka Modeli" başlıklı seminerimizi ilgili bir katılımcı kitlesiyle gerçekleştirdik. Süphan Kırmızıaltın (NYU Abu Dhabi), Elif Derin Can (Marmara Üniversitesi), and Fatma Aladağ (Universität Leipzig) farklı temalardaki sunumlarıyla birlikte Osmanlı Türkçesi için bir El Yazısı Metin Tanıma (Handwritten Text Recognition-HTR) modelinin oluşturulması üzerine potansiyeller ve sınırları tartıştı. Bu kapsamda çalışmalar yapmak isteyen araştırmacılar için bir yol haritası çizildi ve Osmanlı çalışmaları alanındaki bilim insanlarının bir araya gelerek tartışması gereken hususlar detaylandırıldı.

Seminerin ilk bölümü Fatma Aladağ'ın "neden Osmanlı Türkçesi için bir yapay zeka modeline ihtiyacımız var?" sorusuna odaklandı. Aladağ, Osmanlı Türkçesi'nin büyük verisi olarak adlandırdığı milyonlarca farklı format ve çeşitlilikteki belgeden oluşan arşivler, el yazmaları, soyut somut kültürel miras varlıklarına dikkat çekerek bu mirasın henüz bir bilgi üretiminin parçası olamadığının altını çizdi. Bugün farklı coğrafyalardaki neredeyse kırk ülkenin tarihine ışık tutacak böyle devasa bir malzemenin veriye dönüştürülmesi, geçmiş ve geleceğe dair söyleyeceklerimizi derinden etkileyebilecek güce sahip diyen Aladağ, bilgisayarların desteği olmadan bunun mümkün olamayacağını ifade etti. Aladağ, bu noktada bu büyük mirası bilgisayarlar ile konuşturmak ve onların anlayabileceği bir formata dönüştürmek için bir yapay zeka modelinin oluşturulması konusunun ne derece önemli olduğunu vurguladı. Sunumda ayrıca bu alanda çalışmalar yapan ticari ve akademik projeler de tanıtılarak matbu kaynaklar için mevcut HTR modelleri tartışıldı. DOS Platformunun bir parçası olarak Elif Derin Can ve Fatma Aladağ tarafından yürütülen HTR-Sicil Projesi, el yazması kadı sicilleri üzerine odaklanan ilk HTR modeli oluşturma girişimi olarak sunumda yer aldı.

Seminerin ikinci kısmında Süphan Kırmızıaltın Transkribus'un tanıtımını yaparak, altyapının detaylarından bahsetti. Kırmızıaltın, özellikle yapay zeka tabanlı diğer platformlar açısından Osmanlı Türkçesi HTR modeli oluşturmak için neden Transkribus'un tercih edilebileceği ve avantajlarını aktardı. Oldukça pahalı altyapılara ihtiyaç duyan yapay zeka sistemleri için araştırmacılara hazır bir çalışma ortamı sunan Transkribus'un projelerin sürdürülebilirliği için önemli bir potansiyele sahip olduğu vurgulandı. Yapay zeka destekli metin tanıma, transkripsiyon ve tarihi belgelerin kodlanması için kapsamlı bir platform olan Transkribus'un bu işlemler için talep ettiği düşük miktardaki ücretin ise altyapının bakımı ve güncellenmesi için kullanıldığı ifade edildi. Platformun bir başka önemli avantajı olarak ise oluşturulan HTR modellerin paylaşılması için açık erişim politikasının desteklenmesi tartışıldı. Dijital Beşeri Bilimler'in önemli bir misyonu olarak bilginin demokratikleştirilmesi kapsamında değerlendirilen açık erişim altyapısı, Osmanlı Türkçesi belgelerin herkes tarafından kullanılabilir ve okunabilir hale gelmesi için de bir fırsat olarak değerlendirildi.

Sunumun üçüncü ve son kısmında ise Elif Derin Can, Osmanlı Türkçesi için HTR modeli oluşturmanın sınırlarıdan ve yapılması gereken süreçlerden bahsetti. Mevcut teknolojik altyapılara göre bir arşiv belgesini yapay zeka ile okunabilir hale getirmek için hangi aşamaların gerçekleşmesi gerektiği detaylandırıldı. Can'a göre bu süreçte karşılaşılan en büyük zorluklardan biri Osmanlı Türkçesi metinlerin transkripsiyon modeli için Osmanlıca çalışanlar arasında bir konsensüsün olmamasıdır. Bu nedenle hangi transkripsiyon modeline göre HTR modelinin oluşturulacağı hususu gündeme alınmalıdır. Can mevcut altyapılarda karşılaşılan bir diğer zorluk olarak ise arşivlerin üstveri ve indeks sistemlerinin standart olmamasının altını çizdi. Can ayrıca arşivler belgelerinin çözünürlük kalitesinin yapay zeka modeli başarı oranını etkileyen bir diğer önemli faktör olarak üzerinde durulması ve iyileştirilmesi için adım atılması gereken bir husus olduğunu vurguladı.

Digital Ottoman Studies Platformu'nun bir sonraki etkinliği olarak katılımcıların isteği üzerine uygulamalı Transkribus eğitiminin organize edilmesi gündeme geldi.

As the Digital Ottoman Studies Platform (DOS), we held our seminar titled "Transkribus: The Artificial Intelligence Model of Ottoman Turkish", which we organized on October 20, 2022. Potentials and limits on the creation of a Handwritten Text Recognition (HTR) model for Ottoman Turkish were discussed, with presentations by Süphan Kırmızıaltın (NYU Abu Dhabi), Elif Derin Can (Marmara University), and Fatma Aladağ (Universität Leipzig) on different themes. In this context, a road map was drawn for researchers who want to study this field, and the issues that scientists in Ottoman studies should discuss were detailed.

The first part of the seminar focused on Fatma Aladağ's question, "Why do we need an artificial intelligence-based HTR model for Ottoman Turkish? Aladag drew attention to the archives, manuscripts, and intangible tangible cultural heritage assets consisting of millions of different formats and various documents, which she calls the big data of Ottoman Turkish, and underlined that this heritage has not yet been a part of knowledge production. Aladağ said that the transformation of such a huge material that will shed light on the history of almost forty countries in different geographies into data has the power to deeply affect what we will say about the past and the future and that this would not be possible without the support of computers. At this point, she emphasized how important it is to create an artificial intelligence model in order to make computers contact with this great heritage and transform it into a machine-readable format. In addition, commercial and academic projects focus on creating Ottoman Turkish HTR for printed archives were included in the presentation. The HTR-Sicil Project, conducted by Elif Derin Can and Fatma Aladağ as part of the DOS Platform, was included in the presentation as an attempt to create the first HTR model focusing on kadı registers as a type of manuscript.

In the second part of the seminar, Süphan Kırmızıaltın introduced Transkribus and talked about the details of its infrastructure. Kırmızıaltın explained why Transkribus can be preferred to create an Ottoman Turkish HTR model, and its advantages especially in comparison with other artificial intelligence-based platforms. It was emphasized that Transkribus, which provides a strong study environment for researchers for artificial intelligence systems that need very expensive infrastructures, has significant potential for the sustainability of projects. It was stated that the required low amount of fee to use Transkribus, which is a comprehensive platform for artificial intelligence-supported text recognition, transcription, and coding of historical documents, is used for the maintenance and updating of the infrastructure. As an important advantage of the platform, supporting the open access policy for sharing the created HTR models was discussed. Considered within the scope of democratizing information as an important mission of Digital Humanities, open access infrastructure was also evaluated as an opportunity to make Ottoman Turkish documents accesseble and readable by everyone.

In the third and last part of the presentation, Elif Derin Can talked about the limits of creating an HTR model for Ottoman Turkish and the processes that need to be done by institutions and scholars. According to the existing technological infrastructures, it was detailed what steps should be taken to make an archival document readable with artificial intelligence. According to Can, one of the biggest difficulties in this process is the lack of consensus among those working in Ottoman Turkish for the transcription model of texts. For this reason, the issue of which transcription model will be created according to the HTR model should be taken into consideration. Can underlined that the metadata and index systems of archives are not standard, which is another difficulty encountered in the process. Can also emphasized that the resolution quality of archival documents is another important factor that affects the success rate of the HTR model and that it is an issue that needs to be emphasized and steps should be taken to improve it.

As the next event of the Digital Ottoman Studies Platform, the organization of the applied Transkribus training came to the fore upon the request of the participants.

TRANKRIBUS SEMİNER RAPORU

The Report of TRANSKRIBUS SEMINAR

Recent Posts

Comments