CoToHiLi: Computational Tools for Historical Linguistics

Project PN-III-P4-ID-PCE-2020-1544, funded by the Romanian National Authority for Scientific Research and Innovation, UEFISCDI: “Dezvoltarea de sisteme automate suport pentru lingvistica istorică”.

Abstract

This project represents a computational framework for historical linguistics (“Computational Tools for Historical Linguistics” – CoToHiLi). The general purpose of the CoToHiLi project is to integrate expert knowledge and computational power to address the following topics: cognate identification, cognate-borrowing discrimination, Latin protoword reconstruction and semantic divergence. The goal of the project is twofold: 1) to automate certain parts of the traditional work-flow of the comparative method (such as the collection and selection of valid data, the initial pre-processing, or the automatic alignment based on predefined or inferred rules), and 2) to bring new insights or avenues of investigation, which might not be easily accessible otherwise (for example, the automatic identification of patterns and regularities in large amounts of data). The project is focused on the Romance languages, and will provide tools for the main Romance kernel group: Romanian, Italian, French, Spanish, Portuguese, including, of course, the mother-tongue, Latin. Nonetheless, we envision that the methodologies and computational tools proposed by the CoToHiLi project will also serve as a basis for further development for other comparable language families, including less studied languages, with scarce resources available.

Principal investigator

Members

Project objective for 2021: Related word analysis

To achieve this goal, the following activities were planned and executed:

Activity 1.1: Analysis and inspection of existing cognate resources in Romance languages ​​(Ro, It, Es, Fr, It, Pt)

Activity 1.2: Design and construction of the database of cognate pairs for Romance languages

Activity 1.3: Analysis, design and development of computer-assisted tools for detecting cognate pairs

Activity 1.4: Analysis and inspection of borrowed word resources and their harmonization

Articles

  1. Alina Maria Cristea, Anca Dinu, Liviu P. Dinu, Simona Georgescu, Ana Sabina Uban, Laurențiu Zoicaș, 2022. CoToHiLi at LSCDiscovery: the Role of Linguistic Features inPredicting Semantic Change. In Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change (LChange @ ACL 2022), pages 187-192, May 26-27, 2022, Dublin, Ireland. [PDF]
  2. Alina Maria Cristea, Anca Dinu, Liviu P Dinu, Simona Georgescu, Ana Uban, Laurențiu Zoicaș, 2022. CoToHiLi: Computational Tools for Historical Linguistics. In Proceedings of the 38th Annual Conference of the Spanish Association for Natural Language Processing: Projects and Demonstrations (SEPLN-PD 2022 @ SEPLN 2022), pages 31-34, September 21-23, 2022, A Coruña, Spain. [PDF]
  3. Alina Maria Cristea, Anca Dinu, Liviu P. Dinu, Simona Georgescu, Ana Sabina Uban, Laurențiu Zoicaș, 2022. A semantic change time-lapse for Romance languages and English. In Proceedings of the 25th International Conference on Historical Linguistics (ICHL25), August 1-5, 2022, Oxford, UK.
  4. Alina Maria Cristea, Anca Dinu, Liviu P. Dinu, Simona Georgescu, Ana Sabina Uban, Laurențiu Zoicaș, 2022. Computational approaches for protoword reconstruction. In Proceedings of the 25th International Conference on Historical Linguistics (ICHL25), August 1-5, 2022, Oxford, UK.
  5. Sergiu Nisioi, Ana Sabina Uban and Liviu P. Dinu, 2022. Identifying Source-language Dialects in Translation. Mathematics 2022, Special Issue on Natural Language Processing (NLP) and Machine Learning (ML) - Theory and Applications.
  6. Alina Maria Cristea, Liviu P. Dinu, Simona Georgescu, Mihnea-Lucian Mihai, Ana Sabina Uban, 2021. Automatic Discrimination between Inherited and Borrowed Latin Words in Romance Languages. In Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings 2021), pages 2845–2855, Dominican Republic. [PDF]
  7. Simona Georgescu, Alina Maria Cristea, Anca Dinu, Liviu P. Dinu, Ana Sabina Uban, Laurențiu Zoicaș, 2021. Herramientas computacionalespara el análisis del léxico de origen latino en inglés y en las lenguas románicas. In Proceedings Congreso Internacional “Ciencia, Tecnología y Lenguajes”, Universidad Complutense de Madrid, July 1-2, 2021.
  8. Liviu P. Dinu, Ioan-Bogdan Iordache, Ana Sabina Uban, Marcos Zampieri, 2021. A Computational Exploration of Pejorative Language in Social Media. In Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings 2021), Dominican Republic. [PDF]
  9. Alina Maria Cristea, Anca Dinu, Liviu P. Dinu, Simona Georgescu, Ana Sabina Uban, Laurențiu Zoicaș, 2021. Towards an Etymological Map of Romanian. In Proceedings of Recent Advances in Natural Language Processing (RANLP 2021), pages 315-324, September 1–3, 2021. [PDF]
  10. Anca Dinu, Andreea-Codrina Moldovan, 2021. Automatic Detection and Classification of Mental Illnesses from General Social Media Texts. In Proceedings of Recent Advances in Natural Language Processing (RANLP 2021), pages 358–366, September 1–3, 2021.
  11. Ana Sabina Uban, Alina Maria Cristea, Anca Dinu, Liviu P. Dinu, Simona Georgescu, Laurențiu Zoicaș, 2021. Tracking Semantic Change in Cognate Sets for English and Romance Languages. In Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change (LChange @ ACL-IJCNLP 2021), pages 64–74, Bangkok, Thailand (online). [PDF]
  12. Ana Sabina Uban, Cornelia Caragea, Liviu Dinu, 2021. Studying the Evolution of Scientific Topics and their Relationships. In Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Findings (ACL-IJCNLP Findings 2021), Bangkok, Thailand (online). [PDF]
  13. Ana Uban, Liviu P Dinu, 2020. Automatically Building a Multilingual Lexicon of False Friends With No Supervision.* In Proceedings of LREC 2020. [PDF]
  14. Alina Maria Ciobanu, Liviu P. Dinu, Laurențiu Zoicaș, 2020. Automatic Reconstruction of Missing Romanian Cognates and Unattested Latin Words.* In Proceedings of LREC 2020. [PDF]
  15. Alina Maria Ciobanu, Liviu P. Dinu, 2019. Automatic Identification and Production of Related Words for Historical Linguistics.* In Computational Linguistics, 45(4), 667–704.
  16. Ana Uban, Alina Maria Ciobanu, Liviu P. Dinu, 2019. Studying Laws of Semantic Divergence across Languages using Cognate Sets.* In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change (LChange @ ACL 2019). [PDF]
  17. Ana Uban, Alina Maria Ciobanu, Liviu P. Dinu, 2019. A Computational Approach to Measuring the Semantic Divergence of Cognates.* In Proceedings of CICLING 2019.
  18. Alina Maria Ciobanu, Liviu P. Dinu, 2018. Ab Initio: Automatic Latin Proto-word Reconstruction.* In Proceedings of COLING 2018, 1604-1614. [PDF]
  19. Alina Maria Ciobanu, Liviu P. Dinu, 2015. Automatic Discrimination between Cognates and Borrowings.* In Proceedings of the 53nd Annual Meeting of the Association for Computational Linguistics (ACL (2) 2015), pages 431-437, July 26-31, 2015, Beijing, China. [PDF]
  20. Alina Maria Ciobanu, Liviu P. Dinu, 2014. An Etymological Approach to Cross-Language Orthographic Similarity. Application on Romanian.* In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2014), pages 1047-1058, October 25–29, 2014, Doha, Qatar. [PDF]
  21. Alina Maria Ciobanu, Liviu P. Dinu, 2014. Automatic Detection of Cognates Using Orthographic Alignment.* In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL (2) 2014), pages 99-105, June 22-27, 2014, Baltimore, MD, USA. [PDF]
  22. Alina Maria Ciobanu, Liviu P. Dinu, 2014. Building a Dataset of Multilingual Cognates for the Romanian Lexicon.* In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), pages 1038-1043, May 26-31 2014, Reykjavik, Iceland. [PDF]
  23. Alina Maria Ciobanu, Liviu P. Dinu, 2013. A Dictionary-Based Approach for Evaluating Orthographic Methods in Cognates Identification.* In Proceedings of Recent Advances in Natural Language Processing (RANLP 2013), pages 141–147, September 7-13, 2013, Hissar, Bulgaria. [PDF]

*Published before the beginning of the project

Chapters in Books

  1. Ana Uban, Alina Maria Ciobanu, Liviu P Dinu, 2021. Cross-lingual laws of semantic change. In Nina Tahmasebi, Lars Borin, Adam Jatowt, Yang Xu, Simon Hengchen, editors, Computational Approaches to Semantic Change. Berlin: Language Science Press, pages 219-260, 2021.
  2. Alina Maria Cristea, Anca Dinu, Liviu P. Dinu, Simona Georgescu, Ana Uban, 2021. Computer-assisted methods in historical linguistics. In Patrimoniul în era digitală.

Books

  1. Simona Georgescu, 2021. La regularidad en el cambio semantico. Las onomatopeyas en cuanto centrosde expansion en las lenguas romanicas. Editions de linguistique et de philologie, Strasbourg 2021.

Talks

  1. Computational Tools in Historical Linguistics for cognate detection, borrowing discrimination and protoword reconstruction. Cardamom Seminar, National University of Ireland Galway, October 31, 2022.
  2. On the Romance languages similarity: a syllabic-based approach. Programa de Doctorado de Sistemas Inteligentes, UNED, Madrid, Spain, June 13, 2022.
  3. An old-fashion investigator. Interdisciplinary School of Doctoral Studies, University of Bucharest, March 17, 2022.
  4. Marcus și schimbarea stilistică. Universitatea Apolonia, Iași, March 1, 2022.
  5. Liviu P. Dinu, 2021. Are computational approaches viable solutions for borrowing and semantic change problems? Invited talk, Working group Language variation, interaction, pragmatics, Language In The Human-Machine Era, Online, October 20, 2021.
  6. Simona Georgescu, 2021. Ce pot învăța lingviștii de la computere și computerele de la lingviști? Colocviul Internațional Discurs critic și variație lingvistică, “Abordări inter- și transdisciplinare ale trecutului și prezentului”, Universitatea din Suceava, July 8-9, 2021.
  7. Simona Georgescu, 2021. Herramientas computacionalespara la lingüística histórica. Congreso Internacional “Ciencia, Tecnología y Lenguajes”, Universidad Complutense de Madrid, July 1-2, 2021.
  8. Liviu P. Dinu, 2021. Marcus și timpurile sale. Facultatea de sociologie, Universitatea din Bucuresti, seria “Conceptualizări ale timpului în practica cercetării științifice. Dialoguri interdisciplinare”, June 10, 2021.
  9. Liviu P. Dinu, 2021. EthicAI, Goethe-Institut Bulgaria, EthicAI Linguistics workshop, June 8, 2021.
  10. Liviu P. Dinu, 2021. Etica si lingvistica computationala. Comisia Naționala a României pentru UNESCO, June 3, 2021.
  11. Liviu P. Dinu, 2021. Timpul și cuvintele. University of Bucharest, seria “Conceptualizări ale timpului în practica cercetării științifice. Dialoguri interdisciplinare”, May 13, 2021.
  12. Liviu P. Dinu, 2021.Cu un kil de carne de vacă nu mori de foame, cu un litru de vin nu mori de sete. Interdisciplinary School of Doctoral Studies, University of Bucharest, March 4, 2021.
  13. Liviu P. Dinu, 2021. From Classical to Computational Approaches in Historical Linguistics. Universitatea Apolonia, Iași, March 1, 2021.
  14. Liviu P. Dinu, 2021. O analiză computațională a discursului politic în Parlamentul European. Universitatea Apolonia, Iași, March 1, 2021.