We use cookies in order to improve the quality and usability of the HSE website. More information about the use of cookies is available here, and the regulations on processing personal data can be found here. By continuing to use the site, you hereby confirm that you have been informed of the use of cookies by the HSE website and agree with our rules for processing personal data. You may disable cookies in your browser settings.
Berlin: De Gruyter Mouton, 2024.
Chrabaszcz A., Laurinavichyute A., Ladinskaya N. et al.
Memory and Cognition. 2024.
In bk.: (Counter-)Archive: Memorial Practices of the Soviet Underground. Switzerland: Palgrave Macmillan, 2024. Ch. 2. P. 443-475.
Kuznetsov Egor.
Linguistics. WP BRP. НИУ ВШЭ, 2023. No. 113.
Daniil Alexeevsky, doctoral student in Philology, presented the final part of his thesis on the development of a large electronic lexical database of the Russian language, similar to Princeton’s Wordnet.
Books similar to Princeton’s WordNet are widely used for solving various problems arising in automatic text processing, which involves determining the semantic similarity of words, as well as problems relating to automatic translation. Although these resources are in clear demand, today there is no open-access Russian language thesaurus that meets Princeton WordNet’s standards.
Daniil Alexeevsky has developed a chain of programmes that process dictionaries in order to encode relations between words as ‘super-subordinate’ (also called hyperonymy, hyponymy or ISA relation) – WordNet also works in a similar way. This correctly defines the word (general term) and interpretation (to an accuracy of 85%, or significantly higher than in similar published works), although disambiguation (interpretation of the general term), requires further improvements.
However, for some noun classes, disambiguation works successfully, for example, the terms for musical instruments and technical tools and devices are correctly extracted and divided in the dictionary.
Daniil plans to improve disambiguation using Word2Vec, and then analyze and compare the results of processing several dictionaries.