The Use of Generative Artificial Intelligence in the Digitisation of Printed and Manuscript Documents and Its Contribution to Historical and Archival Education

Authors

  • Klára Rybenská Filozofická fakulta Univerzita Hradec Králové
  • Sylva Sklenářová Univerzita Hradec Králové

DOI:

https://doi.org/10.14712/23362189.2025.4918

Keywords:

generative artificial intelligence, large language models (LLMs), handwritten texts, HTR (handwritten text recognition), archival science, historical research, manuscript transcription, AI ethics, digital humanities, AI in education

Abstract

Objectives: This study examines how contemporary generative language models can support archival and historical work with Czech handwritten texts, focusing on transcription and basic preliminary analysis, and it outlines key limitations and ethical implications for educational use in archival science and digital humanities.

Methods: A qualitative case study was conducted using seven modern personal handwritten Czech texts from the 1980s and 1990s (a poem written by a child, love poems, a school test, and study notes). Three widely available tools in their free versions (ChatGPT, Claude, and Copilot) were tested using identical task instructions. Outputs were comparatively evaluated with regard to transcription accuracy, content and stylistic interpretation, and recognition of selected formal features of the texts. The empirical comparison was complemented by a critical review of relevant scholarly literature and reflection on authenticity, data integrity, epistemic security, and personal data protection.

Results: Claude achieved the best overall performance, followed by ChatGPT, while Copilot produced substantially weaker results in the tested tasks. Across tools, interpretation and analysis proved more challenging than transcription, and outputs included errors and over-interpretations that require expert verification.

Conclusions: Generative language models can function as supportive tools for transcription, preliminary analysis, and didactic work, but they cannot replace professional archival or historical expertise. Responsible use requires critical human supervision and explicit attention to ethical and data-protection considerations.

References

ABBYY. (2023). Old fonts recognition. https://pdf.abbyy.com/learning-center/old-fonts-recognition

Alkendi, W., Gechter, F., Heyberger, L., & Guyeux, C. (2024). Advancements and challenges in handwritten text recognition: A comprehensive survey. Journal of Imaging, 10(1), Article 18.

https://doi.org/10.3390/jimaging10010018

Arthur, K., Müller, R., & Strobel, H. (2004). Recognizing digitization as a preservation reformatting method. Microform & Imaging Review, 33(4), 171-177.

https://doi.org/10.1515/MFIR.2004.171

Baheti, P. (2022). Handwriting recognition: definition, techniques & uses. V7 Labs. https://www.v7labs.com/blog/handwriting-recognition-guide

Bartz, C., Seidel, L., Nguyen, D.-H., Bethge, J., Yang, H., & Meinel, C. (2020). Synthetic data for the analysis of archival documents: Handwriting determination. 2020 Digital Image Computing: Techniques and Applications (DICTA), Melbourne, Australia, 1-8.

https://doi.org/10.1109/DICTA51227.2020.9363410

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610-623.

https://doi.org/10.1145/3442188.3445922

Berchmans, D., & Kumar, S. S. (2014, July). Optical character recognition: An overview and an insight. 2014 International Conference on Communication and Network Technologies (ICCNT), 59-63.

https://doi.org/10.1109/ICCICCT.2014.6993174

Carey, E. (2023). AI (artificial intelligence) in education: Evaluating sources. https://libguides.sbcc.edu/c.php?g=1306678&p=10027413

Celli, F., & Spathulas, G. (2025). Language models reach higher agreement than humans in historical interpretation. arXiv:2504.02572.

https://doi.org/10.21203/rs.3.rs-6375256/v1

Colavizza, G. et al. (2021) Archives and AI: An overview of current debates and future perspectives. Association for Computing Machinery, 15(1), 1-15.

https://doi.org/10.1145/3479010

Crosilla, G., Klic, L., & Colavizza, G. (2025). Benchmarking large language models for handwritten text recognition. arXiv:2503.15195.

https://doi.org/10.1108/JD-03-2025-0082

de Sousa Neto, A. F., Bezerra, B. L. D., de Moura, G. C. D. et al. (2024). Data augmentation for offline handwritten text recognition: A systematic literature review. SN Computer Science, 5, 258.

https://doi.org/10.1007/s42979-023-02583-6

Dietrich, F. (2021, May 9). OCR vs. HTR or "What is AI, actually?". READ-COOP. https://readcoop.eu/insights/ocr-vs-htr/

Emerson, A., Cloude, E. B., Azevedo, R., & Lester, J. (2020). Multimodal learning analytics for game‐based learning. British Journal of Educational Technology, 51(5), 1505-1526.

https://doi.org/10.1111/bjet.12992

Epstein, Z., Hertzmann, A., Herman, L. et al. (2023). Art and the science of generative AI: A deeper dive. arXiv:2306.03819.

https://doi.org/10.1126/science.adh4451

Ferro, S., Pelillo, M., & Traviglia, A. (2023, June 24). AI-assisted digitalisation of historical documents. ISPRS, XLVIII-M-2-2023, 557-563.

https://doi.org/10.5194/isprs-archives-XLVIII-M-2-2023-557-2023

Frontoni, E., Contigiani, M., Zingaretti, P., Bernardini, M., Placidi, V., & Mecocci, A. (2022). Trusted data forever: Is AI the answer? arXiv:2203.03712.

Gartner. (2024). Gartner experts answer the top generative AI questions for your enterprise. https://www.gartner.com/en/topics/generative-ai

Godwin‐Jones, R. (2024). Distributed agency in second language learning and teaching through generative AI. arXiv:2403.20216.

https://doi.org/10.64152/10125/73570

Goodfellow, I., Pouget-Abadie, J., Mirza, M. et al. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139-144.

https://doi.org/10.1145/3422622

Gozalo-Brizuela, R., & Garrido-Merchán, C. E. (2023). A survey of generative AI applications. arXiv:2306.02781.

Hedau, S. (2024). OCR vs ICR: Battle of digitizing printed & handwritten text. Softspace Solutions. https://softspacesolutions.com/blog/ocr-vs-icr/

Kadaruddin, K. (2023). Empowering education through generative AI: Innovative instructional strategies for tomorrowʼs learners. International Journal of Business, Law and Education, 4(2), 618-625.

https://doi.org/10.56442/ijble.v4i2.215

Lauricella, S., & Kay, R. (2013). Exploring the use of text and instant messaging in higher education classrooms. Research in Learning Technology, 21, 19061.

https://doi.org/10.3402/rlt.v21i0.19061

Lee, D., & Palmer, E. (2025). Prompt engineering in higher education: A systematic review to help inform curricula. International Journal of Educational Technology in Higher Education, 22(1), 14.

https://doi.org/10.1186/s41239-025-00503-7

Leviner, S. (2023). The role of OCR in digitizing historical and archival documents. CharacTell. https://www.charactell.com/resources/the-role-of-ocr-in-digitizing-historical-and-archival-documents/

Ma, Y., Liu, J., & Fan, Y. (2023). AI vs. human: Differentiation analysis of scientific content generation. arXiv:2301.10416.

Mah, D.-K., & Gross, N. (2024). Artificial intelligence in higher education: Exploring faculty use, self-efficacy, distinct profiles, and professional development needs. International Journal of Educational Technology in Higher Education, 21(1), 58.

https://doi.org/10.1186/s41239-024-00490-1

Mello, F. R., Lopes, R. R., Oliveira, R. D., & Silva, A. P. (2023). Education in the age of generative AI: Context and recent developments. arXiv:2309.12332.

Memon, J., Sami, M., Khan, R. A., & Uddin, M. (2020). Handwritten optical character recognition (OCR): A comprehensive systematic literature review (SLR). IEEE Access, 8, 142929-142951.

https://doi.org/10.1109/ACCESS.2020.3012542

Mondal, A., & Jawahar, C. V. (2022). Enhancing Indic handwritten text recognition using global semantic information. arXiv:2212.07776.

https://doi.org/10.1007/978-3-031-21648-0_25

Nockels, J., Gooding, P., Ames, S. et al. (2022). Understanding the application of handwritten text recognition technology in heritage contexts: A systematic review of Transkribus in published research. Archival Science, 22, 367-392.

https://doi.org/10.1007/s10502-022-09397-0

Philips, J., & Tabrizi, N. (2020). Historical document processing: A survey of techniques, tools, and trends. arXiv:2002.06300.

https://doi.org/10.5220/0010177403410349

Sabharwal, A. (2015). Digital history, archives, and curating digital cultural heritage. In A. Sabharwal (Ed.), Digital curation in the digital humanities (pp. 33-56). Chandos.

https://doi.org/10.1016/B978-0-08-100143-1.00003-9

Spina, S. (2023). Artificial intelligence in archival and historical scholarship workflow: HTS and ChatGPT. arXiv:2308.02044.

Downloads

Published

2026-02-17