Digitalisation of Ottoman Turkish with Transkribus

ederincan
Oct 23, 2023
4 min read

Updated: Jan 28, 2025

Author: Elif Derin, Phd Student, FSMV University History Department

Transkribus is a comprehensive artificial intelligence (AI)-assisted platform for text recognition (HTR), automatic transcription and thematic tagging of historical documents. Started in 2013 with European Union funding under the name tranScriptorium, the project turned into a cooperative community in 2019. The European co-operative READ COOP SCE is responsible for the sustainability and updating of the Transkribus platform. Today, the community has 135 members, both individuals and institutes, from 35 different countries and has more than one hundred thousand users. Thus, the artificial intelligence infrastructure, which is quite costly for individual initiatives of academicians, is developed in a sustainable environment with the support of an international community and institutions.

The platform, besides automatic transcription, also offers opportunities for digitizing documents, training artificial intelligence, collecting and processing data, and publishing studies. Therefore, you can transcribe automatically manuscript and printed texts, train transcription models for different documents, scan these documents, tag texts in terms of their structure and content, and output the generated data in different formats (such as TEI, TXT, PDF, Word) in the Transkribus platform. Notably, the platform is web-based, meaning it is always accessible online, facilitating collaborative work.

In June 2023, Süphan Kırmızıaltın (NYU Abu Dhabi), Fatma Aladağ (Universität Leipzig), and Elif Derin (FSMV University) made the first printed Ottoman Turkish automatic transcription model available as open access on the Transkribus platform. Detailed information about this HTR model and other Ottoman Turkish digitalization efforts can be found on the Digital Ottoman Corpora website. In this article, we will try to show how you can use the mentioned HTR model developed for printed texts at a basic level.

A Practice for Ottoman Turkish in Transkribus:

You need to start by registering on the platform. Membership to the platform is free of charge, and you are initially given 500 credits for the beginning. The credit usage differs for printed and manuscript documents. For example, while 1 page of manuscript documents can be transcribed with 1 credit, this rate varies as 6 pages with 1 credit for printed models. Before automatic transcription, page analysis, or other work you do on the platform to create your own model does not require any credit.

Immediately after creating a membership by entering your information in the Register section, you can create your own collection by clicking on Collections on the top right.

Transkribus automatically assigns an ID number to the collection you create. However, during collection creation, giving distinctive names, considering that your collections will grow as your work progresses, will make your work more manageable.

Once you've named the collection, create it by clicking the Create button. Then you can upload the file of any size and any format (image or document) you want. Inside the collection, you can add your documents by using the Upload Document or Upload Files buttons. When uploading documents, you can choose between Image (image files with a .jpg extension) or PDF, based on your file type. Add the selected document or file from your computer to the collection by clicking the Submit button.

After uploading your documents to the collection, you have the option to perform automatic transcription in a single step. However, especially for documents with complex page layouts, such as multi-column newspapers, it will be more efficient to begin with a Layout Analysis to ensure that reading zones and lines are accurately determined before proceeding to automatic transcription. To perform both analyses, simply click on the "T" icon in the lower-left corner of the collection you want to work on.

The "T" icon takes you to the page where you can select the type of analysis. If you want to transcribe directly, click on Text Recognition. If you want it to first determine the page and line layout, click on Layout and select the appropriate model. Several different models are available in this section, depending on your document's text layout. For straightforward page layouts, choose the Universal Lines model. For more complex, multi-column, or mixed layout documents, opt for the Mixed Line Orientation model. After selecting the model, click the Start Recognition button in the upper-right corner of the page.

If you've performed a Layout Analysis first, after verifying the order of rows and columns, the integrity of lines, and whether they cover the entire line, you can proceed to transcription by clicking the "T" icon and selecting Text Recognition once more. When you click the Text Recognition button, the Transkribus platform provides a list of publicly available HTR models that you can use. You can select the OttomanTurkish_Print_1 model from this list and complete the process by clicking Start Recognition again.

You can monitor the progress of all your work on the platform from the Jobs tab in the upper-right corner of the homepage. Layout Analysis is completed quickly, while Text Analysis may take a bit longer depending on the host computer's performance and your internet connection. When the work status reads Finished, you can return to your document and review your automatic transcription.

After the automatic transcription process is finished, you can review the transcription and make necessary corrections, bearing in mind that the model has an accuracy of 92.8%. As shown in the image below, you can manually correct or add parts that may be faded in the original document.

As mentioned in the introduction, currently, there is only one available Ottoman printed HTR model for general use. If your documents contain different content, the accuracy rate may be lower. In such cases, you can use the model as a starting point for your own documents. After making necessary corrections to the transcription, you can retrain the model based on your documents and transcription features. For documents significantly different from the printed model, such as manuscripts, you can create your own model from scratch. This is a somewhat more time-consuming process that involves manually entering the transcription after conducting the Layout Analysis of your documents, but it will streamline and expedite your work in the long run.

After completing your transcription, you can tag thematic data such as persons, events, places, and dates for content analysis within the text. You can print out your transcriptions in various formats and analyze this data according to your research topic and methodology. Additionally, you can use these outputs as a dataset for NLP and other text-mining methods. Another notable feature of the platform is the Read & Search section, where you can digitally publish your documents. You can explore other digital edition projects prepared with Transkribus and publish your own digital editions here.

5 Comments

edam tom

Jan 29

I found matlab assignment help useful while practicing MATLAB problems, especially for understanding logic errors and improving my coding approach.

rogerhaaspacekimdamian057

Dec 02, 2025

777vip enhances the online entertainment journey with fluid navigation, optimized interfaces, and consistent system stability. Through https://777vip1.ph/, players step into upgraded sports betting, casino tables, and card battles shaped by cutting-edge technology. With transparent standards and secure infrastructure, 777vip1 ph strengthens its presence as a trusted entertainment hub.

Với khả năng tối ưu hiệu suất mạnh mẽ và giao diện dễ thao tác, KP88 luôn mang đến hành trình giải trí liền mạch và thú vị. Khám phá ngay https://kp88.space/ để mở ra thế giới game gồm slot nổ hũ, casino live, bắn cá, đá gà và thể thao kịch tính. Nhờ sự uy tín và tính ổn định, kp88 space đã trở thành cái tên được nhiều game thủ tin chọn.

milanadorovskih

Nov 26, 2025

I really enjoy how simple yet surprisingly exciting Rock Paper Scissor can be. It only takes a second to start a round, but every outcome keeps you guessing. It’s the kind of game you can play anywhere — at home, at work, or even while waiting in line https://rock-paper-scissor.net/de

Kali Crack

May 27, 2025

Quem trabalha com produtividade sabe o quanto é importante ter o Office ativado e funcional. O ativador office 2016 kmspico download garante justamente isso: desempenho completo, sem restrições ou falhas. Recursos como salvamento automático, edição de planilhas, criação de apresentações e uso do Outlook são liberados sem limitações. Isso faz com que o ativador seja uma solução valiosa não apenas para usuários domésticos, mas também para pequenas empresas e freelancers que dependem do Office diariamente.