ISSN 0236-235X (P)
ISSN 2311-2735 (E)

Bookmark

Next issue

4
Publication date:
16 December 2020
-->

Journal articles №3 2020

21. Text analysis method for tariff classification goods in customs [№3 за 2020 год]
Authors: E.V. Zhiryaeva (Zhiryaeva-ev@ranepa.ru) - North-West Institute of Management of the Russian Residential Academy of National Economy and Public Administration (Associate Professor), Ph.D; V.N. Naumov (naumov122@list.ru ) - North-West Institute of Management of the Russian Residential Academy of National Economy and Public Administration (Professor), Ph.D;
Abstract: The use of artificial intelligence in customs administration is the most important practical sphere of the digital transformation of socio-economic systems. The paper considers one of the particular problems of this area: the choice of a product code according to the tariff classification based on declarations presented in Russian. The object of study was a voltage stabilizer. Analysis of customs declarations, performed by a person using the keywords in the product description, showed the need for machine learning methods. To do this, 1005 customs declarations were submitted, filed for three commodity items, which were regarded as three classes in the classification problem. Using the Orange Anaconda Navigator platform, it was possible to apply visual design methods to build a workflow diagram for solving the problem. The diagram includes a pre-processing stage, at which word clouds and a word bag were built, and a da-ta set was formed, the columns of which are lemmas, and the lines are individual declarations. In order to reduce the dimension of the problem, methods of filtering, removing n-grams, and stop words were applied. The resulting data set allows us to choose the best classifier in terms of accuracy, specificity, sensitivity, as well as using the error matrix and AUC curve. Training and test samples, as well as cross-validation were used.. The best in terms of the range of indicators analyzed was the classifier based on logistic regression, the equation of which made it possible to determine the most important lemmas for solving the classification problem. Since the complexity of solving the problem depends on the number of identifiable classes, it is ad-visable to use specific classifications for a small number of classes, including them in the information-analytical systems along with accounting systems, databases of customs declarations, request-response systems, and others.
Keywords: customs administration, digital technologies, AI methods, classification errors, classification methods, text mining, harmonized system (HS), commodity nomenclature
Visitors: 792

← Preview | 1 | 2 | 3