自然语言处理与信息检索共享平台 自然语言处理与信息检索共享平台

                  Research and Implementation of Chinese Text Automatic Proofreading System

                  NLPIR SEMINAR Y2019#3

                  INTRO?

                         In the new semester, our Lab, Web Search Mining and Security Lab, plans to hold an academic seminar every Wednesdays, and each time a keynote speaker will share understanding of papers published in recent years with you.

                  Arrangement

                  This week’s seminar is organized as follows:
                  1. The seminar time is 1.pm, Wed., at Zhongguancun Technology Park ,Building 5, 1306.
                  2. The lecturer is Jinjing Wan, the paper’s title is Research and Implementation of Chinese Text Automatic Proofreading System.
                  3. The seminar will be hosted by WangGang.
                  4. Attachment is the paper of this seminar, please download in advance.

                  Anyone interested in this topic is welcomed to join us. the following is the abstract for this week’s paper.

                  Research and Implementation of Chinese Text Automatic Proofreading System

                  Yonggang Gong, Junying Fu, Xiaoqin Lian and Yuying Li

                  Abstract

                         The news media platform has a huge amount of original news releases every day, it is impractical to use manual review of text typos. This paper designed and implemented a Chinese text automatic proofreading system for large-scale text content and high-speed processing. The proofreading content is first analyzed and classified: typos and sensitive information. Firstly, the system used the n-gram model to statistically analyze the corpus after segmentation to form a 2-gram model library and a contextual context library; secondly, builded a typo confusion set, and then calculated the probability of the target word in the knowledge base to realize automatic error detection and correction of  Chinese  text. The system has been successfully applied to the error of the content of many government news media platforms, each server can handle one million articles every day. The results show that the recall rate of the article is 78.9% and the accuracy rate is 85.1%. It meets the demand of high  speed  and  accurate processing of massive text error, and has important practical significance and application fields.

                  You May Also Like

                  About the Author: nlpvv

                  发表评论

                  玩投彩的女生

                        
                        

                                        
                                        

                                                  新疆25选7走势图 pc蛋蛋幸运28预测网 捕鱼达人3无限金币破解版 11选5选号技巧 快乐十分每天必出号 合买重庆彩票有猫腻 河北快3今日中奖号 幸运农场开奖结果查052 江苏快三开奖分布图 北单胜平负 辽宁十一选五走 泳坛夺金河南 天津十一选五预测号码 奖金就是尊严 内蒙古福利快三今天推荐号码