- NLP Foundational Technologies
NLP foundational technologies are the grounds on which more advanced NLP systems are built. The team delves into topics such as multilingual tokenization, part-of-speech tagging, named entity recognition, information extraction, spell check/grammatical error correction, syntax and semantic analysis, deep learning for language modeling, semantic representation and similarity, and text summarization. By leveraging the AliNLP platform, the NLP foundational technologies team provides solutions to thousands of business scenarios within the Alibaba ecosystem. Coupled with technologies such as search, recommendation, question answering, and knowledge graph, NLP foundational technologies gain growing influence in sectors such as address identification, security, healthcare, energy, and customs through Alibaba Cloud. Furthermore, the application of these technologies further promotes the business values and expands the commercial availability of NLP technologies.
- Translation Technology
The translation technology team is responsible for building the translation infrastructure in multiple languages to facilitate the globalization strategy of Alibaba. The systems and services developed by the team have been widely used in the cross-border business of Alibaba, such as Aliexpress.com, Alibaba.com, and Lazada.com. The team aims to conduct research on innovative multilingual processing technologies powered by cutting-edge AI technologies. The platform allows users to tailor services to their scenarios, which delivers quick and cost-efficient language solutions and helps break down the language barrier for sectors such as e-commerce, education, and healthcare.
- Multilingual Technology (Singapore)
The multilingual technology team focuses on multilingual and cross-lingual technologies, such as basic NLP technologies for Southeast/South Asian languages, cross-domain/lingual adaptation, self-supervised learning and low-resource NLP problems. Multilingual technologies, such as multilingual named entity recognition, tokenization and word segmentation of Thai and Vietnamese, multilingual sentiment analysis, and multilingual address processing, empower multiple globalization businesses within Alibaba, such as Lazada, Daraz, cloud telcom and DingTalk. These technologies also empower regional Alibaba Cloud teams and provide AI-based value-added capabilities.
- Conversational AI
The Conversational AI team, dedicated to the innovative research and expansive application of human-machine dialogue technologies, has designed and developed Dialog Studio platform, KBQA Engine, FAQ answering, TableQA Engine, etc. Dialog Studio was designed as dialog development platform for developer. The team also developed intelligent question answering technologies such as knowledge-based question answering (KBQA), TableQA, FAQ. The team has made significant headways in technologies such as natural language understanding, multi-turn dialog management, transfer learning and knowledge graph-based question answering. Dialog Studio and intelligent customer service developed by the team have been widely used not only in business on Taobao and Tmall, but also in enterprises from many fields.-. The application of these technologies has helped Alibaba Cloud become a leader in the intelligent customer service market.
- Applied Algorithm
The applied algorithm team empowers key business inside and outside Alibaba with core technologies in information extraction, text classification, text summarization, text generation, semantic understanding, active learning, sentiment analysis, and anti-spam text. These technologies are used in important sectors (e.g., justice, telecommunication, government affairs, education, and finance) and scenarios (e.g., contract signing, telemarketing, public opinion monitoring, and reviewing). The team has developed their own NLP self-learning platform and is constantly making technological breakthroughs, unlocking business value, and empowering commercialization.
- Semantic Analysis and Matching
The team provides services to business platforms inside and outside of the Alibaba economy by semantic analysis and matching technologies. Technologies such as Mutimodal image and text recognition, pricing recommendations, title optimization, and communication assistance are typically used in Xianyu, a secondhand goods marketplace. These technologies are also used in AE and AIRec. The team focuses on technologies such as dialog generation, text generation, deep learning for language modeling, multi-modal content understanding, and search recommendation.
- Document Translation
Document translation is built upon the translation model developed by DAMO Academy, which provides more accurate translations based on tags. The service can deliver translations with a high degree of consistency even with complex layouts, where text contained in tables and figures is accurately identified, translated, and formatted. The service can parse over 50 document formats, including DOC, PPT, XLS, PDF, and HTML.
- Multimodal translation
For translation of multimodal information such as text, speech, images, and videos, DAMO Academy combines cutting-edge algorithms and technologies such as speech recognition, OCR, NLP, machine translation, computer vision, and smart layout and image synthesis, implementing cross-lingual and cross-modal conversion of multimodal contents from multiple sources. The multimodal translation capabilities have been used in scenarios such as cross-border e-commerce, multilingual conferences, multilingual video subtitles, cross-border trips, and translation of documents and certificates. The Language Technology Lab also developed the world's first e-commerce liveshow translation systems, which has been deployed in AliExpress with multiple language directions.
- Intelligent Justice
Built based on legal knowledge graphs, Intelligent Justice is an open platform for litigation and non-litigation scenarios. The platform uses NLP as the core technology. It is designed to help legal practitioners such as judges, prosecutors, lawyers, and corporate law professionals to familiarize themselves with International law and supplement their legal knowledge. The platform provides intelligent case-handling assistance in litigation scenarios that involve case filing, trial, and execution. It can also be used for litigation risk assessment, search and recommendation of relevant cases and regulations, assistance and prediction in conviction and sentencing, reasoning and generation of dispute focus, and parsing and generation of judgment documents. In non-litigation scenarios, the platform provides intelligent contract management, such as contract information extraction, contract review, contract comparison, contract summary, and risk analysis of relevant parties. The Intelligent Justice platform has been used in tertiary courts, medium-sized enterprises, and large enterprises. The platform has significantly improved the case handling efficiency of these entities by delivering more accurate, standardized, and intelligent legal knowledge services. Furthermore, it also extensively contributes to the improvement of judicial justice and business environment.
- Cloud Customer Service System Based on Conversational AI
The Language Technology Lab turns its advanced research results in NLP and human-machine dialogs, as well as the experience of Alibaba in customer service into the Cloud Customer Service System. The Cloud Customer Service System is an industrial human-machine solution and an intelligent service matrix that serves enterprise customers. The system can help enterprises build and run their intelligent customer service systems that offer cost-efficient interaction capabilities in natural language. By using the system, enterprises and their customers can communicate in 24/7 way. The core capabilities developed by the Language Technology Lab include Dialog Studio, TableQA, FAQ, and KBQA. Dialog Studio achieved two significant breakthroughs, namely, from shallow understanding to deep understanding and from rule-based finite state machine to data-driven deep dialog management models. The TableQA question answering engine ranks first in the cross-domain Semantic Parsing in Context (SParC) and Conversational Text-to-SQL (CoSQL) challenges jointly organized by Yale University and Salesforce. The Cloud Customer Service System is used in sectors such as government affairs, telecommunications, banking and insurance. For example, the Interactive Voice Response (IVR) robot services 150 million calls for China Mobile on an annual basis, which greatly reduces labor costs and provides better services for customers. Furthermore, during the COVID-19 pandemic, the robot has made 20 million calls to aid in the pandemic control, which helped take the pressure off first-responders.

Luo Si is one of the first AI scientists who have shifted from academia to the industrial community. Prior to his post at Alibaba Cloud, he was a tenured professor of the Department of Computer and Information Technology at Purdue University. He managed 20 research projects, all of which received funds from the U.S. government or the industrial community. He is a recipient of multiple awards from institutions such as the National Science Foundation, Yahoo, and Google. He is the author of more than 150 academic publications that are extensively cited. He worked as an associate editor for Transactions on Information Systems (TOIS) and Transactions on Interactive Intelligent Systems (TIIS) of ACM as well as the Information Processing & Management journal. He assumed important roles in international academic conferences, such as Research Track Program Chair of ACM Conference on Information and Knowledge Management (CIKM) 2016. He holds a master's degree and a bachelor's degree in computer science from Tsinghua University, and holds a PhD from Carnegie Mellon University. He joined Alibaba Cloud as an AI scientist in 2014 and has led the NLP team to achieve significant technological milestones.
- Hai Ye, Qingyu Tan, Ruidan He, Juntao Li, Hwee Tou Ng, Lidong Bing. 2020. Feature Adaptation of Pre-Trained Language Models across Languages and Domains with Robust Self-Training [PDF]. The Conference on Empirical Methods in Natural Language Processing (EMNLP'20), 2020.
- Ning Ding, Dingkun Long, Guangwei Xu, Muhua Zhu, Pengjun Xie, Xiaobin Wang and Haitao Zheng. 2020. Coupling Distant Annotation and Adversarial Training for Cross-Domain Chinese Word Segmentation. ACL 2020. (regular long paper)
- Jie Zhou, Chunping Ma, Dingkun Long, Guangwei Xu, Ning Ding, Haoyu Zhang, Pengjun Xie and Gongshen Liu. 2020. Hierarchy-Aware Global Model for Hierarchical Text Classification. ACL 2020. (regular long paper)
- Haoyu Zhang, Dingkun Long, Guangwei Xu, Muhua Zhu, Pengjun Xie, Fei Huang, Ji Wang. 2020. Learning with Noise: Improving Distantly-Supervised Fine-grained Entity Typing via Automatic Relabeling. IJCAI 2020. (regular long paper)
- Chuanqi Tan, Wei Qiu, Mosha Chen, Rui Wang, Fei Huang. 2020. Boundary Enhanced Neu
- Bo Zhang, Yue Zhang, Rui Wang, Zhenghua Li, Min Zhang. 2020. Syntax-Aware Opinion Role Labeling with Dependency Graph Convolutional Networks. ACL 2020 (long)
- Kai Wang, Weizhou Shen, Yunyi Yang, Xiaojun Quan, Rui Wang. 2020. Relational Graph Attention Network for Aspect-based Sentiment Analysis. ACL 2020 (long)
- Kai Wang, Junfeng Tian, Rui Wang, Xiaojun Quan, Jianxing Yu. 2020. Multi-Domain Dialogue Acts and Response Co-Generation. ACL 2020 (long)
- Haojie Pan, Rongqin Yang, Xin Zhou, Rui Wang, Deng Cai, Xiaozhong Liu. 2020. Large Scale Abstractive Multi-Review Summarization (LSARS) via Aspect Alignment. SIGIR 2020 (long)
- Zuyi Bao, Chen Li, Rui Wang. Chunk-based Chinese Spelling Check with Global Optimization. 2020. EMNLP Findings 2020 (long)
- Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Fei Huang and Kewei Tu. 2020. Structure-Level Knowledge Distillation For Multilingual Sequence Labeling. ACL 2020 (long)
- Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang and Kewei Tu. 2020. AIN: Fast and Accurate Sequence Labeling with Approximate Inference Network (short of EMNLP 2020)
- Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang and Kewei Tu. 2020. More Embeddings, Better Sequence Labelers? (short findings of EMNLP 2020)
- Zechuan Hu, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang and Kewei Tu. 2020. An Investigation of Potential Function Designs for Neural CRF.(short findings of EMNLP 2020)
- Wei Wang, Bin Bi, Ming Yan, Chen Wu, Zuyi Bao, Jiangnan Xia, Liwei Peng, Luo Si. 2020. StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding. ICLR 2020
- Bin Bi, Chenliang Li, Chen Wu, Ming Yan, Wei Wang, Songfang Huang, Fei Huang and Luo Si. 2020. PALM: Pre-training an Autoencoding&autoregressive Language Model for Context-conditioned Generation. EMNLP 2020 (regular long paper)
- Qiao Jin, Chuanqi Tan, Mosha Chen, Xiaozhong Liu and Songfang Huang. 2020. Predicting Clinical Trial Results by Implicit Evidence Integration. EMNLP 2020 (regular long paper)
- Yao Fu, Chuanqi Tan, Bin Bi, Mosha Chen, Yansong Feng, Alexander Rush. 2020. Latent Template Induction with Gumbel-CRFs. NeurIPS 2020
- Tianyi Wang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Qiong Zhang. 2020. Masking Orchestration: Multi-task Pretraining for Multi-role Dialogue Representation Learning. AAAI, 2020. (regular long paper)
- Changlong Sun, Yating Zhang, Qiong Zhang, Xiaozhong Liu. 2020. Legal Artificial Intelligence - Have You Lost a Piece from Jigsaw Puzzle? MAKE@AAAI, 2020. (short paper)
- Jiancheng Wang, Jingjing Wang, Changlong Sun, Shoushan Li, Xiaozhong Liu, Luo Si, Min Zhang, Guodong Zhou. 2020. Sentiment Classification in Customer Service Dialogue with Topic-Aware Multi-Task Learning. AAAI, 2020. (regular long paper)
- Zhe Gao, Zhuoren Jiang, Yu Duan, Yangyang Kang, Changlong Sun, Qiong Zhang, Xiaozhong Liu. 2020. Camouflaged Chinese Spam Content Detection with Semi-supervised Generative Active Learning. ACL, 2020. (short paper)
- Yu Duan, Canwen Xu, Jiaxin Pei, Jialong Han, Chenliang Li. 2020. Pre-train and Plug-in: Flexible Conditional Text Generation with Variational Auto-Encoders. ACL, 2020. (regular long paper)
- Xiao Chen, Changlong Sun, Jingjing Wang, Shoushan Li, Luo Si, Min Zhang, Guodong Zhou. 2020. Aspect Sentiment Classification with Document-level Sentiment Preference Modeling. ACL, 2020. (regular long paper)
- Zheng Gao, Hongsong Li, Zhuoren Jiang, Xiaozhong Liu. 2020. Detecting User Community in Sparse Domain via Cross-Graph Pairwise Learning. SIGIR, 2020. (regular long paper)
- Yougzhen Wang, Jian Wang, Heng Huang, Hongsonbg Li, Xiaozhong Liu. 2020. Evolutionary Product Description Generation: A Dynamic Fine-Tuning Approach Leveraging User Click Behavior. SIGIR, 2020. (regular long paper)
- Guoxiu He, Yangyang Kang, Zhuoren Jiang, Jiawei Liu, Changlong Sun, Xiaozhong Liu, Wei Lu. 2020. Creating a Children-Friendly Reading Environment via Joint Learning of Content and Human Attention. SIGIR, 2020. (regular long paper)
- Guoxiu He, Yangyang Kang, Zhuoren Jiang, Jiawei Liu, Changlong Sun, Xiaozhong Liu, Wei Lu. 2020. Creating a Children-Friendly Reading Environment via Joint Learning of Content and Human Attention. SIGIR, 2020. (regular long paper)
- Mengxi Wei, Yifan He, and Qiong Zhang. 2020. Robust Layout-aware IE for Visually Rich Documents with Pre-trained Language Models. SIGIR, 2020. (regular long paper)
- Shifeng Li, Shi Feng, Daling Wang, Kaisong Song, Yifei Zhang, Weichao Wang. 2020. EmoElicitor: An Open Domain Response Generation Model with User Emotional Reaction Awareness. IJCAI, 2020. (regular long paper)
- Quanzhi Li, Qiong Zhang. 2020. A Unified Model for Financial Event Classification, Detection and Summarization. IJCAI, 2020. (regular long paper)
- Quanzhi Li, Qiong Zhang. 2020. Abstractive Event Summarization on Twitter. WWW (Companion Volume), 2020. (short paper)
- Quanzhi Li, Satish Avadhanam, Qiong Zhang. 2020. An End-to-End Tool for News Processing and Semantic Search. WWW (Companion Volume), 2020. (short paper)
- Changzhen Ji, Xin zhou, Yating Zhang, Xiaozhong Liu, Changlong Sun, Conghui Zhu, Tiejun Zhao. 2020. Cross Copy Network for Dialogue Generation. EMNLP, 2020. (regular long paper)
- Yiquan Wu, Kun Kuang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Jun Xiao, Yueting Zhuang, Luo Si, Fei Wu. 2020. De-biased Court’s View Generation with Causality. EMNLP, 2020. (regular long paper)
- WeiSheng Zhang, Kaisong Song, Yangyang Kang, Zhongqing Wang, Changlong Sun, Xiaozhong Liu, Shoushan Li, Min Zhang, Luo Si. 2020. Multi-Turn Dialogue Generation in E-Commerce Platform with the Context of Historical Dialogue. EMNLP, 2020. (EMNLP findings)
- Minlong Peng, Ruotian Ma, Qi Zhang, Lujun Zhao, Mengxi Wei, Changlong Sun, Xuanjing Huang. 2020. Toward Recognizing More Entity Types in NER: An Efficient Implementation using Only Entity Lexicons. EMNLP, 2020. (EMNLP findings)
- Liying Cheng, Lidong Bing, Qian Yu, Wei Lu, Luo Si. 2020. APE: Argument Pair Extraction from Peer Review and Rebuttal via Multi-task Learning. The Conference on Empirical Methods in Natural Language Processing (EMNLP'20), 2020.
- Lu Xu, Hao Li, Wei Lu, Lidong Bing. 2020. Position-Aware Tagging for Aspect Sentiment Triplet Extraction. The Conference on Empirical Methods in Natural Language Processing (EMNLP'20), 2020.
- Bosheng Ding, Linlin Liu, Lidong Bing, Canasai Kruengkrai, Thien Hai Nguyen, Shafiq Joty, Luo Si, Chunyan Miao. 2020. DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks. The Conference on Empirical Methods in Natural Language Processing (EMNLP'20), 2020.
- Liying Cheng, Dekun Wu, Lidong Bing, Yan Zhang, Zhanming Jie, Wei Lu, Luo Si. 2020. ENT-DESC: Entity Description Generation by Exploring Knowledge Graph [PDF]. The Conference on Empirical Methods in Natural Language Processing (EMNLP'20), 2020.
- Yan Zhang, Zhijiang Guo, Zhiyang Teng, Wei Lu, Shay B. Cohen, Zouzhu Liu, Lidong Bing. 2020. Lightweight, Dynamic Graph Convolutional Networks for AMR-to-Text Generation. The Conference on Empirical Methods in Natural Language Processing (EMNLP'20), 2020.
- Yan Zhang, Ruidan He, Zouzhu Liu, Kwan Hui Lim, Lidong Bing. 2020. An Unsupervised Sentence Embedding Method by Mutual Information Maximization [PDF]. The Conference on Empirical Methods in Natural Language Processing (EMNLP'20), 2020.
- Lu Xu, Lidong Bing, Wei Lu, Fei Huang. 2020. Aspect Based Sentiment Analysis with Aspect-Specific Opinion Spans. The Conference on Empirical Methods in Natural Language Processing (EMNLP'20), 2020.
- Zihao Fu, Bei Shi, Wai Lam, Lidong Bing and Zhiyuan Liu. 2020. Partially-Aligned Data-to-Text Generation with Distant Supervision. The Conference on Empirical Methods in Natural Language Processing (EMNLP'20), 2020.
- Lu Xu, Zhanming Jie, Wei Lu, Lidong Bing. 2020. Fusing Structured Information into LSTM for Named Entity Recognition. The Conference on Empirical Methods in Natural Language Processing (EMNLP'20), 2020. (findings)
- Juntao Li, Ruidan He, Hai Ye, Hwee Tou Ng, Lidong Bing, Rui Yan. 2020. Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model [PDF]. The 29th International Joint Conference on Artificial Intelligence and the 17th Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI'20), 2020.
- Qian Yu, Lidong Bing, Qiong Zhang, Wai Lam, Luo Si. 2020. Review-based Question Generation with Adaptive Instance Transfer and Augmentation [PDF]. The 58th Annual Meeting of the Association for Computational Linguistics (ACL'20), 2020.
- Canasai Kruengkrai, Thien Hai Nguyen, Sharifah Mahani Aljunied, Lidong Bing. 2020. Improving Low-Resource Named Entity Recognition using Joint Sentence and Token Labeling [PDF]. The 58th Annual Meeting of the Association for Computational Linguistics (ACL'20), 2020.
- Haiyun Peng, Lu Xu, Lidong Bing, Wei Lu, Fei Huang, Luo Si. 2020. Knowing What, How and Why: A Near Complete Solution for Aspect-based Sentiment Analysis [PDF]. The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI'20), 2020.
- Zihao Fu, Lidong Bing, Wai Lam. 2020. Open Domain Event Text Generation [PDF]. The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI'20), 2020.
- Juntao Li, Chang Liu, Lidong Bing, Xiaozhong Liu, Hongsong Li, Jian Wang, Dongyan Zhao, Rui Yan. 2020. Cross-Lingual Low-Resource Set-to-Description Retrieval for Global E-Commerce [PDF]. The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI'20), 2020.
- Rongxiang Wen, Haoran Wei, Shujian Huang, Heng Yu, Lidong Bing, Weihua Luo, Jiajun Chen. 2020. GRET: Global Representation Enhanced Transformer [PDF]. The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI'20), 2020.
- Tianshu Lyu, Lidong Bing, Zhao Zhang, and Yan Zhang. 2020. FOX: Fast Overlapping Community Detection Algorithm in Big Weighted Networks. [PDF]. ACM Transactions on Social Computing, 2020.
- Baijun Ji, Zhirui Zhang, Xiangyu Duan, Min Zhang, Boxing Chen, Weihua Luo. 2020. Cross-lingual Pre-training Based Transfer for Zero-shot Neural Machine Translation. AAAI, 2020.
- Pengcheng Yang, Boxing Chen, Pei Zhang, Xu Sun. 2020. Visual Agreement Regularized Training for Multi-Modal Machine Translation. AAAI, 2020.
- Kai Song, Kun Wang, Heng Yu, Yue Zhang, Zhongqiang Huang, Weihua Luo, Xiangyu Duan, Min Zhang. 2020. Alignment-Enhanced Transformer for Constraining NMT with Pre-Specified Translations. AAAI, 2020.
- Rongxiang Weng, Haoran Wei, Shujian Huang, Heng Yu, Lidong Bing, Weihua Luo, Jiajun Chen. 2020. GRET: Global Representation Enhanced Transformer. AAAI, 2020.
- Rongxiang Weng, Heng Yu, Shujian Huang, Shanbo Cheng, Weihua Luo. 2020. Acquiring Knowledge from Pre-trained Model to Neural Machine Translation. AAAI, 2020.
- Kai Song, Xiaoqing Zhou, Heng Yu, Zhongqiang Huang, Yue Zhang, Weihua Luo, Xiangyu Duan, Min Zhang. 2020. Towards Better Word Alignment in Transformer. IEEE-TASLP, 2020.
- Xiangyu Duan, Baijun Ji, Hao Jia, Min Tan, Min Zhang, Boxing Chen, Weihua Luo, Yue Zhang. 2020. Bilingual Dictionary Based Neural Machine Translation without Using Parallel Sentences. ACL, 2020. (Regular Long Paper)
- Xiangpeng Wei, Heng Yu, Yue Hu, Yue Zhang, Rongxiang Weng, Weihua Luo. 2020. Multiscale Collaborative Deep Models for Neural Machine Translation. ACL, 2020. (Regular Long Paper)
- Changfeng Zhu, Heng Yu, Shanbo Cheng, Weihua Luo. 2020. Language-aware Interlingua for Multilingual Neural Machine Translation. ACL, 2020. (Regular Short Paper)
- Kai Fan, Bo Li, Jiayi Wang, Shiliang Zhang, Boxing Chen, Niyu Ge and Zhi-Jie Yan. 2020. Neural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System. InterSpeech 2020.
- Hao-Ran Wei, Zhirui Zhang, Boxing Chen, Weihua Luo. 2020. Iterative Domain-Repaired Back-Translation. EMNLP, 2020. (Regular Long Paper)
- Rongxiang Weng, Heng Yu, Xiangpeng Wei and Weihua Luo. 2020. Towards Enhancing Faithfulness for Neural Machine Translation. EMNLP, 2020. (Regular Long Paper)
- Xiangpeng Wei, Heng Yu, Yue Hu, Rongxiang Weng, Luxi Xing and Weihua Luo. 2020. Uncertainty-Aware Semantic Augmentation for Neural Machine Translation. EMNLP, 2020. (Regular Long Paper)
- Pei Zhang, Boxing Chen, Niyu Ge, Kai Fan. 2020. Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation. EMNLP, 2020. (Regular Short Paper)
- Yu Wan, Baosong Yang, Derek F. Wong, Yikai Zhou, Lidia S. Chao, Haibo Zhang, Boxing Chen. 2020. Self-Paced Learning for Neural Machine Translation. EMNLP, 2020. (Regular Short Paper)
- Yongchao Deng, Hongfei Yu, Heng Yu, Xiangyu Duan and Weihua Luo. 2020. Factorized Transformer for Multi-Domain Neural Machine Translation. EMNLP, 2020. (Findings of EMNLP)
- Ke Wang, Jiayi Wang, Niyu Ge, Yangbing Shi, Yu Zhao, Kai Fan. 2020. Computer Assisted Translation with Neural Quality Estimation and Automatic Post-Editing. EMNLP, 2020. (Findings of EMNLP)
- Junliang Guo, Zhirui Zhang, Linli Xu, Hao-Ran Wei, Boxing Chen, Enhong Chen. 2020. Incorporating BERT into Parallel Sequence Decoding with Adapters. NIPS, 2020.
- Ebrahim Ansari, Amittai Axelrod, Nguyen Bach, Ondřej Bojar, Roldano Cattoni, Fahim Dalvi, Nadir Durrani, Marcello Federico, Christian Federmann, Jiatao Gu, Fei Huang, Kevin Knight, Xutai Ma, Ajay Nagesh, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Xing Shi, Sebastian Stüker, Marco Turchi, Alexander Waibel, Changhan Wang. 2020. Findings Of The IWSLT 2020 Evaluation Campaign, IWSLT-2020 (organizer)
- Kai Fan, Bo Li, Jiayi Wang, Shiliang Zhang, Boxing Chen, Niyu Ge and Zhi-Jie Yan. 2020. Neural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System. InterSpeech 2020
- Qian Chen, Mengzhe Chen, Bo Li, Wen Wang. 2020. CONTROLLABLE TIME-DELAY TRANSFORMER FOR REAL-TIME PUNCTUATION PREDICTION AND DISFLUENCY DETECTION. IEEE ICASSP 2020.
- Pei Zhang, Boxing Chen, Niyu Ge, Kai Fan. 2020. Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation. EMNLP, 2020. (Regular Short Paper)
- Ke Wang, Jiayi Wang, Niyu Ge, Yangbing Shi, Yu Zhao, Kai Fan. 2020. Computer Assisted Translation with Neural Quality Estimation and Automatic Post-Editing. Findings of EMNLP 2020. (Long)
- Ruiying Geng, Binghua Li, Yongbin Li, Jian Sun, Xiaodan Zhu. 2020. Dynamic Memory Induction Networks for Few-Shot Text Classification, The 59th Annual Meeting of the Association for Computational Linguistics (ACL2020). Seattle, USA.
- Yinpei Dai, Hangyu Li, Chengguang Tang, Yongbin Li, Jian Sun, Xiaodan Zhu. 2020. Learning Low-Resource End-To-End Goal-Oriented Dialog for Fast and Reliable System Deployment, The 59th Annual Meeting of the Association for Computational Linguistics (ACL2020). Seattle, USA.
- Jinghan Zhang, Yuxiao Ye, Yue Zhang, Likun Qiu, Jian Sun. 2020. Multi-Point Semantic Representation for Intent Classification, Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI2020). New York City, NY, USA
- Yinpei Dai, Huihua Yu, Yixuan Jiang, Chengguang Tang, Yongbin Li, Jian Sun. 2020. A Survey on Dialog Management: Recent Advances and Challenges, arXiv: 2005.02233
- Xiaobin Wang, Deng Cai, Guangwei Xu, Hai Zhao, Linlin Li and Luo Si. 2019. Unsupervised Learning helps Supervised Neural Word Segmentation. AAAI 2019. (regular long paper)
- Zhanming Jie, Pengjun Xie, Wei Lu, Ruixue Ding and Linlin Li. 2019. Better Modeling of Incomplete Annotations for Named Entity Recognition. NAACL 2019. (short paper) Hao Li, Wei Lu, Pengjun Xie and Linlin Li. 2019. Neural Chinese Address Parsing. NAACL 2019. (regular long paper) [pdf]
- Ruixue Ding, Pengjun Xie, Xiaoyan Zhang, Wei Lu, Linlin Li, Luo Si. A Neural Multi-digraph Model for Chinese NER with Gazetteers. ACL 2019. (short paper)
- Ying Li, Zhenghua Li, Min Zhang, Rui Wang, Sheng Li, Luo Si. 2019. Self-attentive Biaffine Dependency Parsing. IJCAI 2019 (long)
- Qingrong Xia, Zhenghua Li, Min Zhang, Meishan Zhang, Guohong Fu, Rui Wang, Luo Si. 2019. Syntax-aware Neural Semantic Role Labeling. AAAI 2019 (long)
- Ming Yan, Jiangnan Xia, Chen Wu, Bin Bi Zhongzhou Zhao, Ji Zhang, Luo Si, Rui Wang, Wei Wang, Haiqing Chen. 2019. A Deep Cascade Model for Multi-Document Reading Comprehension. AAAI 2019 (long)
- Zhenghua Li, Xue Peng, Min Zhang, Rui Wang, Luo Si. 2019. Semi-supervised Domain Adaptation for Dependency Parsing. ACL 2019 (long)
- Kai Wang, Xiaojun Quan, Rui Wang. 2019. BiSET: Bi-directional Selective Encoding with Template for Abstractive Summarization. ACL 2019 (long)
- Yue Zhang, Rui Wang, Luo Si. 2019. Syntax-Enhanced Self-Attention-Based Semantic Role Labeling. EMNLP 2019 (long)
- Zuyi Bao, Rui Huang, Chen Li, Kenny Zhu. 2019. Low-Resource Sequence Labeling via Unsupervised Multilingual Contextualized Representations. EMNLP 2019 (long)
- Min Gui, Junfeng Tian, Rui Wang, Zhenglu Yang. 2019. Attention Optimization for Abstractive Document Summarization. EMNLP 2019 (short)
- Mingdong Ou, Nan Li, Cheng Yang, Shenghuo Zhu, Rong Jin. 2019. Semi-parametric sampling for stochastic bandits with many arms, AAAI, 2019. (regular long paper)
- Lujun Zhao, Kaisong Song, Changlong Sun, Qi Zhang, Xuanjing Huang, Xiaozhong Liu. 2019. Review Response Generation in E-Commerce Platforms with External Product Information. WWW, 2019. (regular long paper)
- Yongzhen Wang, Heng Huang, Yuliang Yan, Xiaozhong Liu. 2019. User-Centric Quality-Sensitive Training! Social Advertisement Generation by Leveraging User Click Behavior. WWW, 2019. (regular long paper)
- Mingdong Ou, Nan Li, Cheng Yang, Shenghuo Zhu, Rong Jin. 2019. Semi-parametric sampling for stochastic bandits with many arms, AAAI, 2019. (regular long paper)
- Kaisong Song, Wei Gao, Lujun Zhao, Jun Lin, Changlong Sun, Xiaozhong Liu. 2019. Cold-Start Aware Deep Memory Network for Multi-Entity Aspect-Based Sentiment Analysis. IJCAI, 2019. (regular long paper)
- Dong Zhang, Liangqing Wu, Changlong Sun, Shoushan Li, Qiaoming Zhu, Guodong Zhou. 2019. Modeling both Context- and Speaker-Sensitive Dependence for Emotion Detection in Multi-speaker Conversations. IJCAI, 2019. (regular long paper)
- Xin Zhou, Yating Zhang, Xiaozhong Liu, Changlong Sun, Luo Si. 2019. Legal Intelligence for E-commerce: Multi-task Learning by Leveraging Multiview Dispute Representation. SIGIR, 2019. (regular long paper)
- Zhang, Luo Si. 2019. Finding Camouflaged Needle in a Haystack?: Pornographic Products Detection via Berrypicking Tree Model. SIGIR, 2019.(regular long paper)
- Jingjing Wang, Changlong Sun, Shoushan Li, Xiaozhong Liu, Luo Si, Min Zhang, Guodong Zhou. 2019. Aspect Sentiment Classification Towards Question-Answering with Reinforced Bidirectional Attention Network. ACL, 2019. (regular long paper)
- Quanzhi Li, Qiong Zhang, Luo Si. 2019. Rumor Detection by Exploiting User Credibility Information, Attention and Multi-task Learning. ACL, 2019. (short paper)
- Kai Wang, Weizhou Shen, Yunyi Yang, Xiaojun Quan, Rui Wang. 2020. Relational Graph Attention Network for Aspect-based Sentiment Analysis. ACL 2020 (long)
- Xinyu Duan, Yating Zhang, Lin Yuan, Xin Zhou, Xiaozhong Liu, Tianyi Wang, Ruocheng Wang, Qiong Zhang, Changlong Sun, Fei Wu. 2019. Legal Summarization for Multi-role Debate Dialogue via Controversy Focus Mining and Multi-task Learning. CIKM, 2019. (regular long paper)
- Zhuoren Jiang, Jian Wang, Lujun Zhao, Changlong Sun, Yao Lu, and Xiaozhong Liu. 2019. Cross-domain Aspect Category Transfer and Detection via Traceable Heterogeneous Graph Representation Learning. CIKM, 2019. (regular long paper)
- Kaisong Song, Lidong Bing, Wei Gao, Jun Lin, Lujun Zhao, Jiancheng Wang, Changlong Sun, Xiaozhong Liu, Qiong Zhang. 2019. Using Customer Service Dialogues for Satisfaction Analysis with Context-Assisted Multiple Instance Learning. EMNLP, 2019. (regular long paper)
- Zhuoren Jiang, Zhe Gao, Guoxiu He, Yangyang Kang, Changlong Sun, Qiong Zhang, Luo Si, Xiaozhong Liu. 2019. Detect Camouflaged Spam Content via StoneSkipping: Graph and Text Joint Embedding for Chinese Character Variation Representation. EMNLP, 2019. (regular long paper)
- Jingjing Wang, Changlong Sun, Shoushan Li , Jiancheng Wang, Luo Si, Min Zhang, Xiaozhong Liu, Guodong Zhou. 2019. Human-Like Decision Making: Document-level Aspect Sentiment Classification via Hierarchical Reinforcement Learning. EMNLP, 2019. (regular long paper) Yingchi Liu, Quanzhi Li, Xiaozhong Liu, Qiong Zhang, Luo Si. 2019. Sexual Harassment Story Classification and Key Information Identification. CIKM, 2019. (short paper)
- Yingchi Liu, Quanzhi Li, Marika Cifor, Xiaozhong Liu, Qiong Zhang, Luo Si. 2019. Uncover Sexual Harassment Patterns from Personal Stories by Joint Key Element Extraction and Categorization. EMNLP, 2019. (regular long paper)
- Quanzhi Li, Qiong Zhang, Luo Si. 2019. eventAI at SemEval-2019 Task 7: Rumor Detection on Social Media by Exploiting Content, User Credibility and Propagation Information. SemEval@NAACL-HLT, 2019. (short paper)
- Quanzhi Li, Qiong Zhang, Luo Si. 2019. TweetSenti: Target-dependent Tweet Sentiment Analysis. WWW, 2019. (short paper)
- Jingjing Li, Yifan Gao, Lidong Bing, Irwin King, Michael R. Lyu. 2019. Improving Question Generation With to the Point Context [PDF]. The 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP'19), 2019.
- Linlin Liu, Xiang Lin, Shafiq Joty, Simeng Han, Lidong Bing. 2019. Hierarchical Pointer Net Parsing [PDF]. The 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP'19), 2019.
- Mingyue Shang, Piji Li, Zhenxin Fu, Lidong Bing, Dongyan Zhao, Shuming Shi, Rui Yan. 2019. Semi-supervised Text Style Transfer: Cross Projection in Latent Space [PDF]. The 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP'19), 2019.
- Zihao Wang, Kwunping Lai, Piji Li, Lidong Bing, Wai Lam. 2019. Tackling Long-Tailed Relations and Uncommon Entities in Knowledge Graph Completion [PDF]. The 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP'19), 2019.
- Zheng Li, Xin Li, Ying Wei, Lidong Bing, Yu Zhang, Qiang Yang. 2019. Transferable End-to-End Aspect-based Sentiment Analysis with Selective Adversarial Learning [PDF]. The 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP'19), 2019.
- Chuang Fan, Hongyu Yan, Jiachen Du, Lin Gui, Lidong Bing, Min Yang, Ruifeng Xu, Ruibin Mao. 2019. A Knowledge Regularized Hierarchical Approach for Emotion Cause Analysis [PDF]. The 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP'19), 2019.
- Ran Le, Wenpeng Hu, Mingyue Shang, Zhenjun You, Lidong Bing, Dongyan Zhao, Rui Yan. 2019. Who Is Speaking to Whom? Learning to Identify Utterance Addressee in Multi-Party Conversations [PDF]. The 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP'19), 2019.
- Xin Li, Lidong Bing, Wenxuan Zhang, Wai Lam. 2019. Exploiting BERT for End-to-End Aspect-Based Sentiment Analysis [PDF]. EMNLP Workshop W-NUT, 2019.
- Yifan Gao, Lidong Bing, Wang Chen, Irwin King, Michael R. Lyu. 2019. Difficulty Controllable Generation of Reading Comprehension Questions [PDF]. The 28th International Joint Conference on Artificial Intelligence (IJCAI'19), 2019.
- Wang Chen, Hou Pong Chan, Piji Li, Lidong Bing, Irwin King. 2019. An Integrated Approach for Keyphrase Generation via Exploring thePower of Retrieval and Extraction [PDF]. The 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT'19), 2019.
- Piji Li, Zihao Wang, Lidong Bing, Wai Lam. 2019. Persona-Aware Tips Generation [PDF]. The Web Conference (WWW'19), 2019.
- Yifan Gao, Lidong Bing, Piji Li, Irwin King, Michael R. Lyu. Generating Distractors for Reading Comprehension Questions from Real Examinations [PDF]. The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI'19), 2019.
- Xin Li, Lidong Bing, Piji Li, Wai Lam. 2019. A Unified Model for Opinion Target Extraction and Target Sentiment Prediction [PDF]. The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI'19), 2019.
- Juntao Li, Lidong Bing, Lisong Qiu, Min Chen, Dongyan Zhao, Rui Yan. 2019. Learning to Write Stories with Thematic Consistency and Wording Novelty [PDF]. The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI'19), 2019.
- Shen Gao, Xiuying Chen, Piji Li, Zhaochun Ren, Lidong Bing, Dongyan Zhao, Rui Yan. 2019. Abstractive Text Summarization by Incorporating Reader Comments [PDF]. The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI'19), 2019.
- Kai Fan, Jiayi Wang, Bo Li, Fengming Zhou, Boxing Chen, Luo Si. 2019. "Bilingual Expert" Can Find Translation Errors. AAAI, 2019.
- Yan Fan, Chengyu Wang, Boxing Chen, Zhongkai Hu, Xiaofeng He. 2019. SPMM: A Soft Piecewise Mapping Model for Bilingual Lexicon Induction. SDM, 2019.
- Kai Song, Yue Zhang, Heng Yu, Weihua Luo, Kun Wang, Min Zhang. 2019. Code-Switching for Enhancing NMT with Pre-Specified Translation. NAACL, 2019. (Regular Long Paper)
- Pei Zhang, Boxing Chen, Niyu Ge, Kai Fan. 2019. Lattice Transformer for Speech Translation. ACL, 2019. (Regular Long Paper)
- Xiangyu Duan, Mingming Yin, Min Zhang, Boxing Chen, Weihua Luo. 2019. Zero-Shot Cross-Lingual Abstractive Sentence Summarization through Teaching Generation and Attention. ACL, 2019. (Regular Long Paper)
- Long Zhou, Jiajun Zhang, Chengqing Zong, Heng Yu. 2019. Sequence Generation: From Both Sides to the Middle. IJCAI, 2019.
- Nguyen Bach and Fei Huang. 2019. Noisy BiLSTM-Based Models for Disfluency Detection, Interspeech 2019.
- Yuanhang Su, Kai Fan, Nguyen Bach, C.-C. Jay Kuo, Fei Huang. 2019. Unsupervised Multi-modal Neural Machine Translation, CVPR 2019 (long).
- Kai Fan, Jiayi Wang, Bo Li, Fengming Zhou, Boxing Chen , Luo Si. 2019. ``Bilingual Expert" Can Find Translation Errors. In Proceedings of AAAI. Hawaii. Jan. 2019. (Long)
- Shanchan Wu, Kai Fan, Qiong Zhang. 2019. Improving distantly supervised relation extraction with neural noise converter and conditional optimal selector. In Proceedings of AAAI. Hawaii. Jan. 2019. (Long)
- Pei Zhang, Boxing Chen , Niyu Ge, Kai Fan. 2019. Lattice Transformer for Speech Translation. In Proceedings of ACL. Florence, Italy. July. 2019 (Long)
- Ruiying Geng, Binhua Li, Yongbin Li, Xiaodan Zhu, Ping Jian and Jian Sun. 2019. Induction Networks for Few-Shot Text Classification. International Conference on Empirical Methods in Natural Language Processing (EMNLP2019), Hong Kong, China.
- Yuxiao Ye, Weikang Li, Yue Zhang, Likun Qiu, Jian Sun. 2019. Improving Cross-Domain Chinese Word Segmentation with Word Embeddings, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics(NAACL2019), Human Language Technologies, Volume 1
- Ruiying Geng, Binhua Li, Yongbin Li, Yuxiao Ye, Ping Jian, Jian Sun. 2019. Few-Shot Text Classification with Induction Network. arXiv:1902.10482
- Mingdong Ou, Nan Li, Shenghuo Zhu, Rong Jin. 2018. Multinomial logit bandit with linear utility functions. IJCAI, 2018.
- Luo Si, Min Zhang, Guodong Zhou. 2018. Sentiment Classification towards Question-Answering with Hierarchical Matching Network. EMNLP, 2018.
- Yongzhen Wang, Xiaozhong Liu, Zheng Gao. 2018. Neural Related Work Summarization with a Joint Context-driven Attention Mechanism. EMNLP, 2018.
- Xiangju Li, Kaisong Song, Shi Feng, Daling Wang, Yifei Zhang. 2018. A Co-attention Neural Network Model for Emotion Cause Analysis with Emotional Context Awareness. EMNLP, 2018.
- Yingchi Liu, Quanzhi Li, Luo Si. 2018. NAI-SEA at SemEval-2018 Task 5: An Event Search System. SemEval@NAACL-HLT, 2018.
- Yingchi Liu, Quanzhi Li, Xiaozhong Liu, Luo Si. 2018. Document Information Assisted Event Trigger Detection. BigData, 2018.
- Kai Song, Yue Zhang, Min Zhang, Weihua Luo. 2018. Improved English to Russian Translation by Neural Suffix Prediction. AAAI, 2018.
- Shaohui Kuang, Junhui Li, Antonio Branco, Weihua Luo, Deyi Xiong. 2018. Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings. ACL, 2018.
- Nguyen Bach, Hongjie Chen, Kai Fan, Cheung-Chi Leung, Bo Li, Chongjia Ni, Rong Tong, Pei Zhang, Boxing Chen, Bin Ma, Fei Huang. 2018. Alibaba Speech Translation Systems. IWSLT 2018.
- Jiayi Wang, Kai Fan, Bo Li, Fengming Zhou, Boxing Chen, Yangbin Shi & Luo Si. 2018. Alibaba Submission for WMT18 Quality Estimation Task. In: Proceedings of the Third Conference on Machine Translation (WMT). Brussels, Belgium. 2018
- Jingang Wang, Junfeng Tian, Long Qiu, Sheng Li, Jun Lang, Luo Si, Man Lan. A Multi-task Learning Approach for Improving Product Title Compression with User Search Log Data. AAAI, 2018.
- Kai Song, Yue Zhang, Min Zhang, Weihua Luo.Improved English to Russian Translation by Neural Suffix Prediction. AAAI, 2018.
- Xinzhou Jiang, Zhenghua Li, Bo Zhang, Min Zhang, Sheng Li and Luo Si. Supervised Treebank Conversion: Data and Approaches. ACL, 2018.
- Shaohui Kuang, Junhui Li, António Branco, Weihua Luo and Deyi Xiong. Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings. ACL, 2018.
- Wei Wang, ming yan and Chen Wu. Multi-Granularity Hierarchical Attention Fusion Networks for Reading Comprehension and Question Answering. ACL, 2018.
- YaoBo Ni, Dan Ou, Shichen Liu, Xiang Li, Wenwu Ou, Luo S. Perceive Your Users in Depth: Learning Universal User Representations from Multiple E-commerce Tasks. KDD, 2018.
- Jingjing Wang, Jie Li, Shoushan Li, Yangyang Kang, Min Zhang, Luo Si, Guodong Zhou. Aspect Sentiment Classification with both Word-level and Clause-level Attention Networks. IJCAI, 2018.
- Lu Wang, Shoushan Li, Changlong Sun, Xiaozhong Liu, Luo Si, Min Zhang and Guodong Zhou . One vs. Many QA Matching with both Word-level and Sentence-level Attention Network. COLING, 2018.
- Zhuoren Jiang, Yue Yin, Liangcai Gao, Yao Lu and Xiaozhong Liu. Cross-language Citation Recommendation via Hierarchical Representation Learning on Heterogeneous Graph. SIGIR, 2018.
- Chen Wu , Ming Yan , Luo Si. Session-aware Information Embedding for E-commerce Product Recommendation(Short). ACM CIKM, 2017.
- Shichen Liu, Fei Xiao, Wenwu Ou, Luo Si. Cascade Ranking for Operational E-commerce Search. KDD, 2017.
- In 2018, the Language Technology Lab won five first-place awards for automatic machine translation evaluation at the Workshop on Machine Translation (WMT). They also won first prize for six sub-task translation quality evaluations.
- At the 2018 International Semantic Comprehension Evaluation Conference, the Language Technology Lab won contests in event extraction, semantic extraction, and upper and lower word mining.
- In 2018, the Language Technology Lab ranked No. 1 in the Q&A session of the Trivia QA Web hosted by the University of Washington.
- In 2018, for the first time in the history, Alibaba's accurate machine reading technology surpassed human reading results in the famous SQUAD Machine Reading Comprehension Competition organized by Stanford University.
- In 2017, the Language Technology lab won all three levels of the Chinese Grammatical Error Diagnosis Contest.
- In 2017, the Language Technology lab received first place in the English Division of the Categorized Competition for Information Extraction, organized by the US National Institute of Standards and Technology.