- NLP Foundational Technologies
NLP foundational technologies are the grounds on which more advanced NLP systems are built. The team delves into topics such as multilingual tokenization, part-of-speech tagging, named entity recognition, information extraction, spell check/grammatical error correction, syntax and semantic analysis, deep learning for language modeling, semantic representation and similarity, and text summarization. By leveraging the AliNLP platform, the NLP foundational technologies team provides solutions to thousands of business scenarios within the Alibaba ecosystem. Coupled with technologies such as search, recommendation, question answering, and knowledge graph, NLP foundational technologies gain growing influence in sectors such as address identification, security, healthcare, energy, and customs through Alibaba Cloud. Furthermore, the application of these technologies further promotes the business values and expands the commercial availability of NLP technologies.
- Translation Technology
The translation technology team is responsible for building the translation infrastructure in multiple languages to facilitate the globalization strategy of Alibaba. The systems and services developed by the team have been widely used in the cross-border business of Alibaba, such as Aliexpress.com, Alibaba.com, and Lazada.com. The team aims to conduct research on innovative multilingual processing technologies powered by cutting-edge AI technologies. The platform allows users to tailor services to their scenarios, which delivers quick and cost-efficient language solutions and helps break down the language barrier for sectors such as e-commerce, education, and healthcare.
- Multilingual Technology (Singapore)
The multilingual technology team focuses on multilingual and cross-lingual technologies, such as basic NLP technologies for Southeast/South Asian languages, cross-domain/lingual adaptation, self-supervised learning and low-resource NLP problems. Multilingual technologies, such as multilingual named entity recognition, tokenization and word segmentation of Thai and Vietnamese, multilingual sentiment analysis, and multilingual address processing, empower multiple globalization businesses within Alibaba, such as Lazada, Daraz, cloud telcom and DingTalk. These technologies also empower regional Alibaba Cloud teams and provide AI-based value-added capabilities.
- Conversational AI
The Conversational AI team, dedicated to the innovative research and expansive application of human-machine dialogue technologies, has designed and developed Dialog Studio platform, KBQA Engine, FAQ answering, TableQA Engine, etc. Dialog Studio was designed as dialog development platform for developer. The team also developed intelligent question answering technologies such as knowledge-based question answering (KBQA), TableQA, FAQ. The team has made significant headways in technologies such as natural language understanding, multi-turn dialog management, transfer learning and knowledge graph-based question answering. Dialog Studio and intelligent customer service developed by the team have been widely used not only in business on Taobao and Tmall, but also in enterprises from many fields.-. The application of these technologies has helped Alibaba Cloud become a leader in the intelligent customer service market.
- Applied Algorithm
The applied algorithm team empowers key business inside and outside Alibaba with core technologies in information extraction, text classification, text summarization, text generation, semantic understanding, active learning, sentiment analysis, and anti-spam text. These technologies are used in important sectors (e.g., justice, telecommunication, government affairs, education, and finance) and scenarios (e.g., contract signing, telemarketing, public opinion monitoring, and reviewing). The team has developed their own NLP self-learning platform and is constantly making technological breakthroughs, unlocking business value, and empowering commercialization.
- Semantic Analysis and Matching
The team provides services to business platforms inside and outside of the Alibaba economy by semantic analysis and matching technologies. Technologies such as Mutimodal image and text recognition, pricing recommendations, title optimization, and communication assistance are typically used in Xianyu, a secondhand goods marketplace. These technologies are also used in AE and AIRec. The team focuses on technologies such as dialog generation, text generation, deep learning for language modeling, multi-modal content understanding, and search recommendation.
- Document Translation
Document translation is built upon the translation model developed by DAMO Academy, which provides more accurate translations based on tags. The service can deliver translations with a high degree of consistency even with complex layouts, where text contained in tables and figures is accurately identified, translated, and formatted. The service can parse over 50 document formats, including DOC, PPT, XLS, PDF, and HTML.
- Multimodal translation
For translation of multimodal information such as text, speech, images, and videos, DAMO Academy combines cutting-edge algorithms and technologies such as speech recognition, OCR, NLP, machine translation, computer vision, and smart layout and image synthesis, implementing cross-lingual and cross-modal conversion of multimodal contents from multiple sources. The multimodal translation capabilities have been used in scenarios such as cross-border e-commerce, multilingual conferences, multilingual video subtitles, cross-border trips, and translation of documents and certificates. The Language Technology Lab also developed the world's first e-commerce liveshow translation systems, which has been deployed in AliExpress with multiple language directions.
- Intelligent Justice
Built based on legal knowledge graphs, Intelligent Justice is an open platform for litigation and non-litigation scenarios. The platform uses NLP as the core technology. It is designed to help legal practitioners such as judges, prosecutors, lawyers, and corporate law professionals to familiarize themselves with International law and supplement their legal knowledge. The platform provides intelligent case-handling assistance in litigation scenarios that involve case filing, trial, and execution. It can also be used for litigation risk assessment, search and recommendation of relevant cases and regulations, assistance and prediction in conviction and sentencing, reasoning and generation of dispute focus, and parsing and generation of judgment documents. In non-litigation scenarios, the platform provides intelligent contract management, such as contract information extraction, contract review, contract comparison, contract summary, and risk analysis of relevant parties. The Intelligent Justice platform has been used in tertiary courts, medium-sized enterprises, and large enterprises. The platform has significantly improved the case handling efficiency of these entities by delivering more accurate, standardized, and intelligent legal knowledge services. Furthermore, it also extensively contributes to the improvement of judicial justice and business environment.
- Cloud Customer Service System Based on Conversational AI
The Language Technology Lab turns its advanced research results in NLP and human-machine dialogs, as well as the experience of Alibaba in customer service into the Cloud Customer Service System. The Cloud Customer Service System is an industrial human-machine solution and an intelligent service matrix that serves enterprise customers. The system can help enterprises build and run their intelligent customer service systems that offer cost-efficient interaction capabilities in natural language. By using the system, enterprises and their customers can communicate in 24/7 way. The core capabilities developed by the Language Technology Lab include Dialog Studio, TableQA, FAQ, and KBQA. Dialog Studio achieved two significant breakthroughs, namely, from shallow understanding to deep understanding and from rule-based finite state machine to data-driven deep dialog management models. The TableQA question answering engine ranks first in the cross-domain Semantic Parsing in Context (SParC) and Conversational Text-to-SQL (CoSQL) challenges jointly organized by Yale University and Salesforce. The Cloud Customer Service System is used in sectors such as government affairs, telecommunications, banking and insurance. For example, the Interactive Voice Response (IVR) robot services 150 million calls for China Mobile on an annual basis, which greatly reduces labor costs and provides better services for customers. Furthermore, during the COVID-19 pandemic, the robot has made 20 million calls to aid in the pandemic control, which helped take the pressure off first-responders.
Luo Si is one of the first AI scientists who have shifted from academia to the industrial community. Prior to his post at Alibaba Cloud, he was a tenured professor of the Department of Computer and Information Technology at Purdue University. He managed 20 research projects, all of which received funds from the U.S. government or the industrial community. He is a recipient of multiple awards from institutions such as the National Science Foundation, Yahoo, and Google. He is the author of more than 150 academic publications that are extensively cited. He worked as an associate editor for Transactions on Information Systems (TOIS) and Transactions on Interactive Intelligent Systems (TIIS) of ACM as well as the Information Processing & Management journal. He assumed important roles in international academic conferences, such as Research Track Program Chair of ACM Conference on Information and Knowledge Management (CIKM) 2016. He holds a master's degree and a bachelor's degree in computer science from Tsinghua University, and holds a PhD from Carnegie Mellon University. He joined Alibaba Cloud as an AI scientist in 2014 and has led the NLP team to achieve significant technological milestones.
Fei Huang is a research scientist and senior director at the Language Technology Lableading the NLP foundational technologies, dialog technologies, and innovative translation teams. He holds a PhD from the School of Computer Sciences at Carnegie Mellon University. Before joining Alibaba, he worked on NLP R&D and technical management at IBM and Facebook. He has published more than 40 papers in top NLP- and AI-related conferences and journals, and has been awarded over dozens of U.S. patents. He served as an area chair and reviewer for several international conferences and journals on NLP.
Niyu Ge holds a PhD in computational linguistics from Brown University. Her fields of research include mathematical models for syntax, semantics, and pragmatics, as well as machine translation for a number of languages such as Arabian, Chinese, English, French, Spanish, German, Italian, Portuguese, and Russian. She previously worked on natural language processing and machine translation at IBM Research.
Weihua Luo is the director of the translation platform team at the Language Technology Lab, and is responsible for the R&D of intelligent translation technologies that facilitate the international business of Alibaba. He headed over a dozen research projects funded by National Natural Science Foundation and has published more than 30 papers for top international conferences or journals. He is a recipient of the Zhejiang Province Science and Technology Award and Beijing Science and Technology Award. He used to be a member of the program committees or organization committees of conferences such as SIGIR, ACL, NAACL, NLPCC, and CWMT. Currently, he serves as a committee member of multiple institutions such as Chinese Information Processing Society of China and Chinese Association for Artificial Intelligence.
Songfang Huang is responsible for the R&D of large-scale pre-training language models and their applications in sectors such as healthcare and electricity. He holds a PhD from University of Edinburgh. Before joining Alibaba, he worked at IBM T.J. Watson Research Center and IBM China Research Laboratory for over a decade. He put his research focus on speech and language signal processing. In his previous work experience, he was involved in research projects on speech-to-speech machine translation, language models in speech recognition, natural language understanding, and question answering systems. He also implemented such technologies in sectors such as healthcare, finance, and media. He has published dozens of papers for international conferences or journals and is a recipient of the Best Paper Award at ICASSP 2010. From August 2017 to November 2018, he served as an assistant to the director of IBM China Research Laboratory, focusing on the strategy formulation and routine management of the laboratory.
Changlong Sun joined Alibaba in 2011 and made significant contributions to the development of the algorithm architecture for Search Navigation and Mobile Taobao Search Recommendations. He currently leads the development of applied algorithms for NLP. He holds multiple patents and has published more than 20 papers in top academic conferences and heads two National Key R&D Projects of China. His research fields include sentiment analysis, information extraction, dialog understanding, text generation, etc. In an effort to apply technologies to business, he delved deep into the intelligentization in fields such as justice, contract, and education. The first intelligent trial system he invented is already in use in multiple courts. The first human-computer competition on intelligent contract review that he initiated among Chinese universities drew widespread attention. He also built an NLP self-learning platform to empower more industries and scenarios.
Jian Sun holds a PhD in signal and information processing from Beijing University of Posts and Telecommunications. He has been the head of the human-machine dialog system at Alibaba Cloud since 2014. From 2014 to 2017, he designed the intelligent voice assistant for the YunOS mobile operating system and it powered many kinds of terminals such as mobile phones, TVs, automobiles and smart speakers. He set up the Beijing AlimeBot technical team in July 2017 and led the team to develop the technical architecture of question answering systems like KBQA, FAQ, and TableQA, as well as Dialog Studio, a goal-oriented developer system for human-machine dialogs. He is now a senior expert and technical director on conversational AI of Language Technology Lab. Outside Alibaba, he serves Chinese Association for Artificial Intelligence and Chinese Information Processing Society of China and is a reviewer for top international conferences such as ACL2020, COLING2020, and EMNLP2020.
Yongbin Li received the M.S. degree in automation from Tsinghua University. His research interests include NLP and conversational AI. He led the development of the Alibaba Word Segmenter (AliWS) that received the Alibaba Top 10 Algorithm Award in 2015. In recent years, he has devoted himself into the research of conversational AI and developed the Dialog Studio, a platform for developers to build intelligent dialog agents quickly and easily , all from the ground up. The Dialog Studio has provided human-machine dialog services for a wide range of large-scale business such as Ali Cloud Intelligent Robot, DingTalk and internal business units of Alibaba economy like Taobao Mobile. During the COVID-19 outbreak, he built China’s largest intelligent outbound call platform based on the Dialog Studio, and was awarded the first prize by The People's Daily for his contribution and efforts in controlling the pandemic. Recently, he has been exploring table-based question answering technologies and has won the first place in many international challenges such as WikiSQL, SParC, and CoSQL. He has published several papers in top international conferences, largely in the field of natural language understanding, dialog management, and question answering.
Yu Zhao joined Alibaba in 2009. He was an architect and one of the founders of Alimama. He participated in the development of multiple important product lines such as Taobao Alliance, P4P, CPM, and Wireless. He leads the data team of translation and natural language projects. The team works on the technical foundation and application R&D for translation and NLP to build the basic data architecture for the projects.
Zhongqiang Huang holds a PhD in computer science from University of Maryland and is responsible for the R&D of technologies in machine translation for communication scenarios, multilingual natural language processing, and multilingual search relevance. He was a senior scientist at Raytheon BBN Technologies and participated in government-funded natural language research programs such as GALE, BOLT, and LORELEI sponsored by DARPA and IARPA. His research areas in artificial intelligence include machine translation and the broader subjects of natural langauge processing. He has co-authored dozens of papers in academic conferences such as ACL, EMNLP, and NAACL.
Jun Xie holds a PhD from the Institute of Computing Technology of Chinese Academy of Sciences. His research focuses on NLP, machine translation, and dialog systems. He has published more than 20 papers in top academic conferences such as ACL, EMNLP, COLING, AAAI, and was involved in multiple research projects, including those funded by National High-tech R&D Program (863 Program) and National Natural Science Foundation. Earlier in his career, he worked for Institute of Computing Technology of Chinese Academy of Sciences, Samsung R&D Institute China, and Tencent, where he served as a technical director for the R&D of commercial dialog systems and machine translation systems.
Bo'xing Chen holds a PhD from the Chinese Academy of Sciences. Prior to his role in Alibaba, he worked as a researcher at the Institute for Infocomm Research, Singapore, and at the National Research Council of Canada. He has published more than 60 papers for academic conferences and journals. He is a recipient of the Best Paper Award at MT Summit 2013 and was nominated for the Best Paper Award at ACL 2013. He served as an area chair for ACL and EMNLP. His research fields include machine translation, NLP, and machine learning.
- Nguyen Bach, Hongjie Chen, Kai Fan, Cheung-Chi Leung, Bo Li, Chongjia Ni, Rong Tong, Pei Zhang, Boxing Chen, Bin Ma, Fei Huang. 2018. Alibaba Speech Translation Systems. IWSLT 2018.
- Jiayi Wang, Kai Fan, Bo Li, Fengming Zhou, Boxing Chen, Yangbin Shi & Luo Si. 2018. Alibaba Submission for WMT18 Quality Estimation Task. In: Proceedings of the Third Conference on Machine Translation (WMT). Brussels, Belgium. 2018
- Jingang Wang, Junfeng Tian, Long Qiu, Sheng Li, Jun Lang, Luo Si, Man Lan. A Multi-task Learning Approach for Improving Product Title Compression with User Search Log Data. AAAI, 2018.
- Kai Song, Yue Zhang, Min Zhang, Weihua Luo.Improved English to Russian Translation by Neural Suffix Prediction. AAAI, 2018.
- Xinzhou Jiang, Zhenghua Li, Bo Zhang, Min Zhang, Sheng Li and Luo Si. Supervised Treebank Conversion: Data and Approaches. ACL, 2018.
- Shaohui Kuang, Junhui Li, António Branco, Weihua Luo and Deyi Xiong. Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings. ACL, 2018.
- Wei Wang, ming yan and Chen Wu. Multi-Granularity Hierarchical Attention Fusion Networks for Reading Comprehension and Question Answering. ACL, 2018.
- YaoBo Ni, Dan Ou, Shichen Liu, Xiang Li, Wenwu Ou, Luo S. Perceive Your Users in Depth: Learning Universal User Representations from Multiple E-commerce Tasks. KDD, 2018.
- Jingjing Wang, Jie Li, Shoushan Li, Yangyang Kang, Min Zhang, Luo Si, Guodong Zhou. Aspect Sentiment Classification with both Word-level and Clause-level Attention Networks. IJCAI, 2018.
- Lu Wang, Shoushan Li, Changlong Sun, Xiaozhong Liu, Luo Si, Min Zhang and Guodong Zhou . One vs. Many QA Matching with both Word-level and Sentence-level Attention Network. COLING, 2018.
- Zhuoren Jiang, Yue Yin, Liangcai Gao, Yao Lu and Xiaozhong Liu. Cross-language Citation Recommendation via Hierarchical Representation Learning on Heterogeneous Graph. SIGIR, 2018.
- Chen Wu , Ming Yan , Luo Si. Session-aware Information Embedding for E-commerce Product Recommendation(Short). ACM CIKM, 2017.
- Shichen Liu, Fei Xiao, Wenwu Ou, Luo Si. Cascade Ranking for Operational E-commerce Search. KDD, 2017.
- In 2018, the Language Technology Lab won five first-place awards for automatic machine translation evaluation at the Workshop on Machine Translation (WMT). They also won first prize for six sub-task translation quality evaluations.
- At the 2018 International Semantic Comprehension Evaluation Conference, the Language Technology Lab won contests in event extraction, semantic extraction, and upper and lower word mining.
- In 2018, the Language Technology Lab ranked No. 1 in the Q&A session of the Trivia QA Web hosted by the University of Washington.
- In 2018, for the first time in the history, Alibaba's accurate machine reading technology surpassed human reading results in the famous SQUAD Machine Reading Comprehension Competition organized by Stanford University.
- In 2017, the Language Technology lab won all three levels of the Chinese Grammatical Error Diagnosis Contest.
- In 2017, the Language Technology lab received first place in the English Division of the Categorized Competition for Information Extraction, organized by the US National Institute of Standards and Technology.