世界贸易组织《超越六位数:利用自然语言处理实现自动关税编码(HS编码)转换》
发布时间:2025-04-21
浏览次数:48
作者:世界贸易组织
This paper explores the application of Natural Language Processing (NLP) techniques to automate Harmonized System (HS) tariff line transposition, employing a three-stage process: unique 1:1 tariff cod···
This paper explores the application of Natural Language Processing (NLP) techniques to automate Harmonized System (HS) tariff line transposition, employing a three-stage process: unique 1:1 tariff code matching (Round 1), exact description matching (Round 2), and “smart” description matching (Round 3) using Artificial Intelligence (AI) and lexical similarity methods paired with harmonized 6- digit concordance and cosine similarity. Similarity is calculated using either Term Frequency Inverse Document Frequency (TF-IDF) vectors or Sentence-BERT (SBERT) embeddings, comparing two scenarios: a straightforward case (Economy A) with standardized descriptions, and a complex case (Economy B), with more detailed technical descriptions. Results indicate that automated HS transposition can significantly augment the efficiency of traditionally manual methods, reducing processing time from two to three weeks to approximately half a day (up to 30 times faster). The overall accuracy rate is 99.6% for the simpler scenario and 98.8% for the complex one, for a standard set of approximately 10,000 HS codes. While non-AI techniques cover most of the accurate matches, AI-based Round 3 techniques address cases requiring the most manual effort. SBERT generally outperforms TF-IDF, however including subheadings tends to reduce its accuracy. In certain cases, particularly for highly technical tariffs, TF-IDF's straightforward approach provides an advantage over SBERT. Overall, NLP techniques hold significant potential for improving HS transposition methods and facilitating the development of richer tariffs and trade datasets to enable more in-depth analyses. Future research should focus on refining these techniques across diverse datasets to optimize their broader application in tariff and trade data analysis.
本文探讨了将自然语言处理(NLP)技术应用于自动进行协调制度(HS)税则号列转换,采用三阶段流程:唯一 1:1税则编码匹配(第一轮)、精确描述匹配(第二轮)以及利用人工智能(AI)和词法相似度方法结合协调一致的6位编码对照表和余弦相似度进行的“智能”描述匹配(第三轮)。相似度通过词频逆文档频率(TF-IDF)向量或句子 BERT(SBERT)嵌入进行计算,对比了两种情况:一种是标准化描述的简单情形(经济体 A),另一种是包含更详细技术描述的复杂情形(经济体 B)。结果表明,自动化的HS 转换能够显著提高传统手动方法的效率,将处理时间从两到三周缩短至约半天(快 30 倍)。对于约10.000 个标准 HS 编码,简单情形的总体准确率为 99.6%,复杂情形为 98.8%。虽然非人工智能技术涵盖了大多数准确匹配的情况,但基于人工智能的第三轮技术解决了那些需要最多人工干预的案例。SBERT通常优于 TF-IDF,但包含副标题往往会降低其准确性。在某些情况下,特别是对于高度技术性的关税,TF-DF 直接了当的方法比 SBERT 更具优势。总体而言,自然语言处理技术在改进 HS 转换方法以及促进更丰富关税和贸易数据集的开发方面具有巨大潜力,从而能够进行更深入的分析。未来的研究应侧重于在各种数据集上完善这些技术,以优化其在关税和贸易数据分析中的更广泛应用。
https://www.wto.org/english/res_e/reser_e/ersd202504_e.htm