Automatic document summarisation aims to create a representative summary or abstract of the entire document using software and its underlying algorithm. By extracting the highlights and key messages of document, processing time and cost can be greatly reduced whether processing financial documents or analysing social conversations.
Built upon artificial intelligence (AI) and natural language processing (NLP) technologies, Wisers’ proprietary document summarisation technology supports both Chinese and English multilingual analysis. This engine does not rely on training data through unsupervised learning and can automatically summarise documents from any domain on the fly.
It comprises two major components. The first component performs content analysis based on a document’s topical, statistical and linguistic features. The results of which are used to identify important sentences in the document that most concisely covers the subject matters.
The second component performs summary compilation, carefully designed to address the poor coherence problem commonly faced by extractive summarisation methods that simply extract and assemble sentences into a summary. This improves the logic and understandability of the summaries and overcomes the poor readability of sentences generated by abstractive summarisation methods.
Our winning formula for this engine comprises:
- Content analysis and identification, which detects and rectifies dangling referents (e.g., pronouns) and sentence fragments (e.g., incomplete numbered items) on important sentences.
- Sentence denoising and compression to ensure the fluency and readability for humans and not just a list or assembled sentences.
- Deep learning and semantic embedding-based spam and advertisement identifier and topic segmentation functions are built into this specific algorithm.
Since January 2018 we have already implemented this award-winning technology into our daily mission critical data curation flow.
Wisers AI Lab’s topic tagging technology also ranked 4th among 16 participating teams in the NLPCC 2018 competition of “Automatic Tagging of Zhihu Questions”, which aims at automatically assigning relevant topic tags to an unseen question on Zhihu (a Chinese social knowledge sharing website) from a set of pre-defined 25,000+ topic tags. Wisers’ automatic topic classification technology applies ensemble learning techniques based on deep learning and semantic embedding models; and supports fast and flexible new topic addition via an efficient proprietary customisation tool.
NLPCC is a leading international conference series specialised in the fields of Natural Language Processing (NLP) and Chinese Computing (CC), organised by the CCF TCCI (Technical Committee of Chinese Information, China Computer Federation). NLPCC 2018 is the 7th annual meeting of this series, to be held in Hohhot, China during 26-30 August, 2018. Participants include researchers and practitioners in the fields of NLP and CC from around the world.
The NLPCC competitions (shared tasks) cover both classic and emerging problems important in the natural language processing and Chinese computing communities. Participants from both academia and industry are welcomed. Participants of each task are provided with a training data set for algorithm development, and their solutions are evaluated and ranked by the task organisers based on rigorously designed testing data sets and performance metrics.
About Wisers and Wisers AI Lab
Wisers is one of the world’s leading providers of Chinese media and business intelligence. Over 2500 companies rely on our data and analytics for mission critical decisions and research daily.
Our leading-edge solutions are enhanced by the skills and experience of our business research and industry analysts who train the engines and refine our capabilities.
Our team of more than 900 people in 9 offices across Hong Kong, Macau, Taiwan and Mainland China continuously innovates, and collaborates with top notch academic institutions, to push the boundaries of what’s possible with AI, Natural Language Processing (NLP), language taxonomy, image recognition and emerging data technologies.
Wisers Analytics focuses on industry analytics, benchmarking and rankings with tested methodologies and best practices for driving insight from big media data.
Established in Hong Kong in July 2014, Wisers AI Lab is dedicated to the research and development of innovative AI technologies for addressing challenges in automated analysis and mining of diversified real-world open-domain cross-media big data.
The Lab was awarded HK$8.53 Million from the Innovation and Technology Fund of the HKSAR government in April 2016 to pursue this path. The team is composed of PhDs or Master degree engineers specialising in AI or computational linguistics from major international universities.
Our advanced proprietary technologies cover the full spectrum of automated media intelligence, ranging from text-based entity recognition, relation extraction, topic classification, sentiment analysis and critical event detection, to image-based logo and face recognition. Live technical demos and our past award information are available at Wisers AI Lab’s official website – www.wisers.ai.
For detailed evaluation results, please refer to NLPCC2018 conference proceedings: Li L., Wan X. (2018) Overview of the NLPCC 2018 Shared Task: Single Document Summarization. In: Zhang M., Ng V., Zhao D., Li S., Zan H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2018. Lecture Notes in Computer Science, vol 11109. Springer, Cham