Abstract
The “big data” research technology has proven its effectiveness in various fields such as management, medicine, logistics, etc. However, its application in linguistics has been fragmented. Several articles have been published on the non-automated use of this technology in etymology and comparative linguistics. Based on the obtained results, a constructive approach to word etymologization has been developed, differing from the traditional approach. Its essence lies in the fact that in most languages, there is a corpus of words related to core vocabulary, which, in addition to traditional elements such as roots, affixes, and flexions, also includes other structural components: constructs, determinatives, negation formants, the combination of which can be referred to as pramorphology. The goal of this article is to justify and demonstrate the possibilities of using “big data” technology in etymology and comparative linguistics. Additionally, it aims to develop theoretical prerequisites for creating a new type of etymological dictionary that, based on the constructive approach, will determine the etymological meaning of a specific word in multiple languages. The article illustrates how various techniques and methods of big data analysis, established in research in other scientific fields such as Data Mining, modeling, statistics, etc., can be used to enhance the efficiency of etymological research. This study demonstrates the possibilities of using different methods of big data analysis in linguistics.