The Use of Generative Artificial Intelligence in the Digitisation of Printed and Manuscript Documents and Its Contribution to Historical and Archival Education
DOI:
https://doi.org/10.14712/23362189.2025.4918Keywords:
generative artificial intelligence, large language models (LLMs), handwritten texts, HTR (handwritten text recognition), archival science, historical research, manuscript transcription, AI ethics, digital humanities, AI in educationAbstract
Objectives: This study examines how contemporary generative language models can support archival and historical work with Czech handwritten texts, focusing on transcription and basic preliminary analysis, and it outlines key limitations and ethical implications for educational use in archival science and digital humanities.
Methods: A qualitative case study was conducted using seven modern personal handwritten Czech texts from the 1980s and 1990s (a poem written by a child, love poems, a school test, and study notes). Three widely available tools in their free versions (ChatGPT, Claude, and Copilot) were tested using identical task instructions. Outputs were comparatively evaluated with regard to transcription accuracy, content and stylistic interpretation, and recognition of selected formal features of the texts. The empirical comparison was complemented by a critical review of relevant scholarly literature and reflection on authenticity, data integrity, epistemic security, and personal data protection.
Results: Claude achieved the best overall performance, followed by ChatGPT, while Copilot produced substantially weaker results in the tested tasks. Across tools, interpretation and analysis proved more challenging than transcription, and outputs included errors and over-interpretations that require expert verification.
Conclusions: Generative language models can function as supportive tools for transcription, preliminary analysis, and didactic work, but they cannot replace professional archival or historical expertise. Responsible use requires critical human supervision and explicit attention to ethical and data-protection considerations.
References
ABBYY. (2023). Old fonts recognition. https://pdf.abbyy.com/learning-center/old-fonts-recognition
Alkendi, W., Gechter, F., Heyberger, L., & Guyeux, C. (2024). Advancements and challenges in handwritten text recognition: A comprehensive survey. Journal of Imaging, 10(1), Article 18.
https://doi.org/10.3390/jimaging10010018
Arthur, K., Müller, R., & Strobel, H. (2004). Recognizing digitization as a preservation reformatting method. Microform & Imaging Review, 33(4), 171-177.
https://doi.org/10.1515/MFIR.2004.171
Baheti, P. (2022). Handwriting recognition: definition, techniques & uses. V7 Labs. https://www.v7labs.com/blog/handwriting-recognition-guide
Bartz, C., Seidel, L., Nguyen, D.-H., Bethge, J., Yang, H., & Meinel, C. (2020). Synthetic data for the analysis of archival documents: Handwriting determination. 2020 Digital Image Computing: Techniques and Applications (DICTA), Melbourne, Australia, 1-8.
https://doi.org/10.1109/DICTA51227.2020.9363410
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610-623.
https://doi.org/10.1145/3442188.3445922
Berchmans, D., & Kumar, S. S. (2014, July). Optical character recognition: An overview and an insight. 2014 International Conference on Communication and Network Technologies (ICCNT), 59-63.
https://doi.org/10.1109/ICCICCT.2014.6993174
Carey, E. (2023). AI (artificial intelligence) in education: Evaluating sources. https://libguides.sbcc.edu/c.php?g=1306678&p=10027413
Celli, F., & Spathulas, G. (2025). Language models reach higher agreement than humans in historical interpretation. arXiv:2504.02572.
https://doi.org/10.21203/rs.3.rs-6375256/v1
Colavizza, G. et al. (2021) Archives and AI: An overview of current debates and future perspectives. Association for Computing Machinery, 15(1), 1-15.
https://doi.org/10.1145/3479010
Crosilla, G., Klic, L., & Colavizza, G. (2025). Benchmarking large language models for handwritten text recognition. arXiv:2503.15195.
https://doi.org/10.1108/JD-03-2025-0082
de Sousa Neto, A. F., Bezerra, B. L. D., de Moura, G. C. D. et al. (2024). Data augmentation for offline handwritten text recognition: A systematic literature review. SN Computer Science, 5, 258.
https://doi.org/10.1007/s42979-023-02583-6
Dietrich, F. (2021, May 9). OCR vs. HTR or "What is AI, actually?". READ-COOP. https://readcoop.eu/insights/ocr-vs-htr/
Emerson, A., Cloude, E. B., Azevedo, R., & Lester, J. (2020). Multimodal learning analytics for game‐based learning. British Journal of Educational Technology, 51(5), 1505-1526.
https://doi.org/10.1111/bjet.12992
Epstein, Z., Hertzmann, A., Herman, L. et al. (2023). Art and the science of generative AI: A deeper dive. arXiv:2306.03819.
https://doi.org/10.1126/science.adh4451
Ferro, S., Pelillo, M., & Traviglia, A. (2023, June 24). AI-assisted digitalisation of historical documents. ISPRS, XLVIII-M-2-2023, 557-563.
https://doi.org/10.5194/isprs-archives-XLVIII-M-2-2023-557-2023
Frontoni, E., Contigiani, M., Zingaretti, P., Bernardini, M., Placidi, V., & Mecocci, A. (2022). Trusted data forever: Is AI the answer? arXiv:2203.03712.
Gartner. (2024). Gartner experts answer the top generative AI questions for your enterprise. https://www.gartner.com/en/topics/generative-ai
Godwin‐Jones, R. (2024). Distributed agency in second language learning and teaching through generative AI. arXiv:2403.20216.
https://doi.org/10.64152/10125/73570
Goodfellow, I., Pouget-Abadie, J., Mirza, M. et al. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139-144.
https://doi.org/10.1145/3422622
Gozalo-Brizuela, R., & Garrido-Merchán, C. E. (2023). A survey of generative AI applications. arXiv:2306.02781.
Hedau, S. (2024). OCR vs ICR: Battle of digitizing printed & handwritten text. Softspace Solutions. https://softspacesolutions.com/blog/ocr-vs-icr/
Kadaruddin, K. (2023). Empowering education through generative AI: Innovative instructional strategies for tomorrowʼs learners. International Journal of Business, Law and Education, 4(2), 618-625.
https://doi.org/10.56442/ijble.v4i2.215
Lauricella, S., & Kay, R. (2013). Exploring the use of text and instant messaging in higher education classrooms. Research in Learning Technology, 21, 19061.
https://doi.org/10.3402/rlt.v21i0.19061
Lee, D., & Palmer, E. (2025). Prompt engineering in higher education: A systematic review to help inform curricula. International Journal of Educational Technology in Higher Education, 22(1), 14.
https://doi.org/10.1186/s41239-025-00503-7
Leviner, S. (2023). The role of OCR in digitizing historical and archival documents. CharacTell. https://www.charactell.com/resources/the-role-of-ocr-in-digitizing-historical-and-archival-documents/
Ma, Y., Liu, J., & Fan, Y. (2023). AI vs. human: Differentiation analysis of scientific content generation. arXiv:2301.10416.
Mah, D.-K., & Gross, N. (2024). Artificial intelligence in higher education: Exploring faculty use, self-efficacy, distinct profiles, and professional development needs. International Journal of Educational Technology in Higher Education, 21(1), 58.
https://doi.org/10.1186/s41239-024-00490-1
Mello, F. R., Lopes, R. R., Oliveira, R. D., & Silva, A. P. (2023). Education in the age of generative AI: Context and recent developments. arXiv:2309.12332.
Memon, J., Sami, M., Khan, R. A., & Uddin, M. (2020). Handwritten optical character recognition (OCR): A comprehensive systematic literature review (SLR). IEEE Access, 8, 142929-142951.
https://doi.org/10.1109/ACCESS.2020.3012542
Mondal, A., & Jawahar, C. V. (2022). Enhancing Indic handwritten text recognition using global semantic information. arXiv:2212.07776.
https://doi.org/10.1007/978-3-031-21648-0_25
Nockels, J., Gooding, P., Ames, S. et al. (2022). Understanding the application of handwritten text recognition technology in heritage contexts: A systematic review of Transkribus in published research. Archival Science, 22, 367-392.
https://doi.org/10.1007/s10502-022-09397-0
Philips, J., & Tabrizi, N. (2020). Historical document processing: A survey of techniques, tools, and trends. arXiv:2002.06300.
https://doi.org/10.5220/0010177403410349
Sabharwal, A. (2015). Digital history, archives, and curating digital cultural heritage. In A. Sabharwal (Ed.), Digital curation in the digital humanities (pp. 33-56). Chandos.
https://doi.org/10.1016/B978-0-08-100143-1.00003-9
Spina, S. (2023). Artificial intelligence in archival and historical scholarship workflow: HTS and ChatGPT. arXiv:2308.02044.
