Abstract: Named Entity Recognition (NER) in the medical field targets to extract names of disease, surgery, and the organ location from medical texts, which is considered as the fundamental work for medical robots and intelligent diagnosis systems. It is very challenging to recognize the named entities in Chinese medical texts, because (a) one single Chinese medical named entity is usually expressed with more characters/words than other languages, i.e. 3.2 words and 7.3 characters in average; (b) different types of medical named entities are usually nested together. To address the above issue, this paper presents a neural framework that is constructed by two modules: a pre-trained module to distinguish each individual entity from the nested expressions, while a modified Bi-LSTM module to effectively identify long entities. We conducted the experiments based on the CCKS 2019 dataset, our proposed method can identify the medical entity in Chinses, especially for those nested entities embodied in long expressions, and 88.84% was achieved in terms of F1-score, and 15.09% improvement was achieved compared to the baseline models.
Authors: Zhengyi Zhao, Weichuan Xing, Junlin Wu, Xurui Sun, Yuan Chang and Binyang Li (University of International Relations, China)
Email: email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org