Abstract: With the popularity of social media, Twitter, Facebook, and Weibo etc. platforms have become an indispensable part of people's life, where users can freely release and spread information. Meanwhile, the information credibility cannot be guaranteed and there exist a great amount of rumors in social media. These information will usually bring negative impact, and even affect the real society. To solve this problem, there are some work on the rumor corpora construction for automatic rumor detection. However, existing work focused on political domain and most of them were limited in English texts. As a result, these corpora cannot be well applied into other domains with resource-poor languages. This paper proposes a Chinese rumor detection corpus, named CRDC. This corpus consists of 10,000 rumors and 14,472 non-rumors from Weibo. Moreover, other information including language-independent features are also acquired, including rumors' retweet and like information, which can effectively help rumor detection and rumor propagation research in other languages. To better demonstrate the corpus, we also conducted some initial experiments to show details and statistics of our corpus.
Authors: Bo Yan (The 54th Research Institute of China Electronics Technology Group Corporation, China); Yan Gao, Shubo Zhang, Yi Zhang, Yan Du and Binyang Li (University of International Relations, China)
Email: byli.uir@gmail.com, beyondlee1982@163.com, zoey_bo@163.com, zhang_yi_0203@163.com, DYAngel22@163.com, byli@uir.edu.cn