Research Focus
  • Next Generation Multi-scenario and Multi-modal Heterogeneous Computing Engines

Our research focuses on fusing and unifying various computing paradigms such as batch processing, streaming, and interactive analysis, by developing new technologies such as approximate and progressive query processing. This will be utilized to empower next-generation computing engines used by real-world applications at Alibaba, and to facilitate efficient execution of hybrid workloads including traditional data analytics, graph computing, and machine learning.

Meanwhile, the lab studies how to leverage and integrate high-performance heterogeneous hardware, such as GPU, FPGA, and ASIC, with general-purpose computing engines to enhance their power. This will be utilized to better meet the requirements of data-intensive computation such as artificial intelligence and large-scale data analytics.

  • Algorithms and Applications of Large-scale Diverse Data Mining and Machine Learning

Our research focuses on efficient data mining and machine learning algorithms on large-scale heterogeneous data, such as structured data, graph data, and information networks. We also explore and fuse new techniques in the fields of large-scale graph representation learning and knowledge base, and applies them to scenarios such as online fraud prevention, recommendation systems, and search.

  • Smart and Autonomous System

Our research focuses on powering data management and processing systems with artificial intelligence technology: artificial-intelligence techniques will be used in data-warehouse management, resource scheduling, and engine optimization to strengthen the systems and make them smarter, safer, and more reliable. Moreover, system technology will be used to assist model selection and hyperparameter search in artificial intelligence and automate machine learning.

  • Data Security and Privacy Protection

Our research focuses on providing data security and protecting users’ privacy more effectively during different stages of the data pipeline. These stages include data collection, sharing, processing, and analytics, which may leak personal and sensitive information. We aim to provide acceptable data utility under strong security/privacy guarantees.

  • Hyper-scale Graph Computing

Our research focuses on large-scale graph representation learning and graph-based knowledge base technologies, as well as the underlying hyper-scale graph computing engines and hyper-scale knowledge base inference systems, with the purpose of contributing to the fields of information retrieval, distributed computing, large-scale system design, machine learning, artificial intelligence, and natural language processing.


Products and Applications
  • Hyper-scale Graph Inference Engines

    The data of the Alibaba ecosystem is extremely rich and varied, covering everything from shopping, travel, entertainment, and payment. Graph inference combined with deep learning has achieved successful phased results in many of Alibaba's business scenarios. Large-scale graph representations are emerging as ancillary information that can effectively use cross domain information. Thus, we can truly have better understanding the needs of consumers in different business scenarios. We are working on the development of a new generation of graph learning platform that can efficiently perform inference analysis on billions of nodes and trillions of edges.

     

  • Solutions to E-commerce Fraud Detection

    Internet fraud artifice arises as an endless stream.  Alibaba is also heavily attacked by the Internet blackmail, and anti-fraud has become one of its most important tasks. Alibaba can identify tens of millions of highly suspicious devices with their traffic flow every day.  Fraud detection can be roughly divided into two directions: channel equipment and traffic anti-fraud. The main task of channel devices anti-fraud is to identify suspicious simulators, equipment farms, etc. We extracted all kinds of sparse and dense features of the devices from various logs, and effectively modified Google's Wide & Deep model based on our business scenarios to identify millions of highly trusted fraudulent devices every day. The traffic anti-fraud is more related to business scenarios. By considering and modelling suspicious traffic as a whole with enhanced information, we can improve the model capability and effectively identify suspicious devices. Thus, we propose a series of graph models that are able to detect and intercept millions of cheating cookies with high accuracy from hyper-scale traffic logs every day.


Research Team
Jingren ZhouHead of Data Analytics and Intelligence Lab

He holds a Ph.D. in Computer Science from Columbia University. From 2004 to 2015, he served as a researcher at Microsoft Research and an R&D partner at Microsoft. Dr. Zhou has published dozens of papers in top conferences and journals in the fields of large-scale distributed systems, query processing and the optimization of distributed databases, and holds several patented inventions of key technologies in the industry. His current research directions are data processing methods based on large-scale distributed systems and machine learning algorithm platforms.

Bolin DingSenior Staff Engineer of Data Analytics and Intelligence Lab

Dr. Bolin Ding completed his Ph.D. in Computer Science at University of Illinois at Urbana-Champaign. His research focuses on the management and analytics of large-scale data, including real-time approximate query algorithms and systems, data privacy protection, query processing and optimization algorithms, and algorithms and applications of data mining and machine learning. Prior to joining Alibaba, he worked as a researcher in Microsoft Research. He has hold more than 10 US patents. He received the 2017 Technical Excellence Award from Microsoft Privacy for his contributions on the research and deployment of data privacy techniques. He has published more than 50 papers in top conferences and journals in related areas, including SIGMOD, VLDB, ICDE, KDD, CHI, AAAI, and NIPS.

Zhengping QianSenior Staff Engineer of Data Analytics and Intelligence Lab

Director in the Computing Platform Team at Alibaba. He is responsible for driving the development of new systems and business solutions for emerging applications from both inside and outside Alibaba, such as low-latency graph analytics and machine learning. Before joining Alibaba in 2015, he was a Lead Researcher at Microsoft Research. His research interests are in distributed and data-parallel computing. He has published papers in top systems conferences (including OSDI, NSDI, EuroSys, and VLDB) and received the Best Paper Award from EuroSys 2012.Qian received his PhD in Computer Science from South China University of Technology in 2009.

Hongxia YangSenior Staff Data Scientist of Data Analytics and Intelligence Lab

She received her PhD degree in Statistics from Duke University in 2010. Her interests span the areas of Bayesian statistics, time series analysis, spatial-temporal modeling, survival analysis, machine learning, data mining and their applications to problems in business analytics and big data. She used to work as the Principal Data Scientist at Yahoo! Inc and Research Staff Member at IBM T.J. Watson Research Center respectively. She has published over 40 top conference and journal papers and is serving as the associate editor for Applied Stochastic Models in Business and Industry. She has been been elected as an Elected Member of the International Statistical Institute (ISI) in 2017.

Kai ZengStaff Engineer of Data Analysis and Intelligence Lab

Dr. Zeng received his Ph.D. in Computer Science from the University of California Los Angeles. Before joining Alibaba, he was a Senior Scientist at Microsoft Cloud and Information Service Lab, and a postdoc researcher at AMPLab, Univeristy of California Berkeley before that. He is committed to the research of large-scale distributed systems and database systems. He has published papers in top database journals and conferences (including SIGMOD, VLDB, ICDE, TODS, and so on). He has received the Best Paper Award in 2012 and the Best Demonstration Award in 2014 from SIGMOD and was nominated for this Best Demonstration Award in 2010.


Academic Achievements
Paper
  • Zemin Liu, Vincent W. Zheng, Zhou Zhao, Hongxia Yang, Kevin Chen-Chuan Chang, Minghui Wu, Jing Ying. Subgraph-augmented Path Embedding for Semantic User Search on Heterogeneous Social Network. WWW, 2018.
  • Zhen Zhang, Hongxia Yang, Jiajun Bu, Sheng Zhou, Pinggang Yu, Jianwei Zhang, Martin Ester, Can Wang. ANRL: Attributed Network Representation Learning via Deep Neural Networks. IJCAI, 2018.
  • Ninghao Liu, Hongxia Yang, Xia Hu. Adversarial Detection with Model Interpretation. KDD, 2018.
  • Dawei Zhou, Jingrui He, Hongxia Yang, Wei Fan. SPARC: Self-Paced Network Representation for Few-Shot Rare Category Characterization. KDD, 2018.
  • Shen Xin, Weizhao Xian, Martin Ester, Hongxia Yang, Zhongyao Wnag, Jiajun Bu, Can Wang. Mobile access record resolution on large-scale identifier-linkage graphs. KDD, 2018.
  • Zemin Liu, Vincent W. Zheng, Zhou Zhao, Zhao Li, Hongxia Yang, Minghui Wu, Jing Ying. Interactive Paths Embedding for Semantic Proximity Search on Heterogeneous Graphs. KDD, 2018.
  • Xiafei Qiu, Wubin Cen, Zhengping Qian, You Peng, Ying Zhang, Xuemin Lin, Jingren Zhou. Real-time Constrained Cycle Detection in Large Dynamic Graphs. 43rd International Conference on Very Large Data Bases (VLDB), 2018.
  • Sheng Zhou, Hongxia Yang, Martin Ester, Jiajun Bu, Pinggang Yu, Can Wang, Jianwei Zhang and Xin Wang. PRRE: Personalized Relation Ranking Embedding for Attributed Network. 27th ACM International Conference on Information and Knowledge Management (CIKM), 2018.
  • Hongxia Yang, Yada Zhu, Jingrui He. Local Algorithm for User Action Prediction Towards Display Ads. 23rd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2017.
  • Chenglong Wang, Feijun Jiang, Hongxia Yang. Hybrid Framework for Text Modeling with Convolutional RNN. 23rd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2017.
  • Hongxia Yang. Bayesian Heteroscedastic Matrix Factorization for Conversion Rate Prediction. 26th ACM International Conference on Information and Knowledge Management (CIKM), 2017.
  • Hong Huang, Yuxiao Dong, Jie Tang, Hongxia Yang, Nitesh V. Chawla, Xiaoming Fu. Will Triadic Closure Strengthen Ties in Social Networks, ACM Transactions on Knowledge Discovery from Data (TKDD), 2017.
Expand

Scan QR code
关注Ali TechnologyWechat Account