前沿分享 前沿分享
前沿分享

世界贸易组织《超越六位数:利用自然语言处理实现自动关税编码(HS编码)转换》

发布时间:2025-04-21
浏览次数:48
作者:世界贸易组织

This paper explores the application of Natural Language Processing (NLP) techniques to automate Harmonized System (HS) tariff line transposition, employing a three-stage process: unique 1:1 tariff cod···

This paper explores the application of Natural Language Processing (NLP) techniques to automate  Harmonized System (HS) tariff line transposition, employing a three-stage process: unique 1:1 tariff  code matching (Round 1), exact description matching (Round 2), and “smart” description matching (Round 3) using Artificial Intelligence (AI) and lexical similarity methods paired with harmonized 6- digit concordance and cosine similarity. Similarity is calculated using either Term Frequency Inverse  Document Frequency (TF-IDF) vectors or Sentence-BERT (SBERT) embeddings, comparing two  scenarios: a straightforward case (Economy A) with standardized descriptions, and a complex case  (Economy B), with more detailed technical descriptions. Results indicate that automated HS transposition can significantly augment the efficiency of traditionally manual methods, reducing  processing time from two to three weeks to approximately half a day (up to 30 times faster). The  overall accuracy rate is 99.6% for the simpler scenario and 98.8% for the complex one, for a  standard set of approximately 10,000 HS codes. While non-AI techniques cover most of the accurate  matches, AI-based Round 3 techniques address cases requiring the most manual effort. SBERT  generally outperforms TF-IDF, however including subheadings tends to reduce its accuracy. In certain cases, particularly for highly technical tariffs, TF-IDF's straightforward approach provides an  advantage over SBERT. Overall, NLP techniques hold significant potential for improving HS  transposition methods and facilitating the development of richer tariffs and trade datasets to enable  more in-depth analyses. Future research should focus on refining these techniques across diverse  datasets to optimize their broader application in tariff and trade data analysis.

本文探讨了将自然语言处理(NLP)技术应用于自动进行协调制度(HS)税则号列转换,采用三阶段流程:唯一 1:1税则编码匹配(第一轮)、精确描述匹配(第二轮)以及利用人工智能(AI)和词法相似度方法结合协调一致的6位编码对照表和余弦相似度进行的“智能”描述匹配(第三轮)。相似度通过词频逆文档频率(TF-IDF)向量或句子 BERT(SBERT)嵌入进行计算,对比了两种情况:一种是标准化描述的简单情形(经济体 A),另一种是包含更详细技术描述的复杂情形(经济体 B)。结果表明,自动化的HS 转换能够显著提高传统手动方法的效率,将处理时间从两到三周缩短至约半天(快 30 倍)。对于约10.000 个标准 HS 编码,简单情形的总体准确率为 99.6%,复杂情形为 98.8%。虽然非人工智能技术涵盖了大多数准确匹配的情况,但基于人工智能的第三轮技术解决了那些需要最多人工干预的案例。SBERT通常优于 TF-IDF,但包含副标题往往会降低其准确性。在某些情况下,特别是对于高度技术性的关税,TF-DF 直接了当的方法比 SBERT 更具优势。总体而言,自然语言处理技术在改进 HS 转换方法以及促进更丰富关税和贸易数据集的开发方面具有巨大潜力,从而能够进行更深入的分析。未来的研究应侧重于在各种数据集上完善这些技术,以优化其在关税和贸易数据分析中的更广泛应用。

https://www.wto.org/english/res_e/reser_e/ersd202504_e.htm

21日2.jpg

在线留言

Online message

姓名

电话

留言内容

联系我们

Contact us
全球创新与治理实验室
扫一扫关注
  • 地址:北京市朝阳区惠新东街10号求真楼
  • 电话:   64492060
  • 邮箱:giglab2024@163.com
  • 网址:http://gig-lab.cn
Copyright © 2024 全球创新与治理实验室 All Rights Reserved.