Research Focus
  • Image Understanding and Analysis

Research in this area focuses on the development of technologies such as image classification, target detection, feature representation learning, key point extraction, and large-scale vector search engine to solve the challenges related to the search, recognition, and analysis of product images, general images, human faces and bodies, and textual images.

  • Video Understanding and Mining

Research in this area focuses on the development of technologies such as visual tracking, video tagging, and video generation to solve the challenges related to efficient and stable video monitoring, and search and editing among large-scale videos.

  • Image and Text Analysis

Research in this area focuses on the development of technologies for cross-media content analysis, such as the cross media search and multimedia joint-search of images and texts, and price prediction to solve various industry challenges.

  • 3D Vision

Research in this area focuses on the development of technologies such as scanning, point-cloud processing, texture mapping, and 3D classification/ detection/ feature representation to solve challenges related to the generation, recognition, and search of 3D models for small objects, the human body and body parts, and the scene.

  • Offline Intelligence

Research in this area focuses on the development of technologies such as camera networks, sensor fusion, pedestrian tracking and identification, human pose estimation, and object detection and recognition, to solve the challenges related to identity recognition, global people tracking, motion sequence analysis, product detection and recognition, and people/ product binding.

Products and Applications
  • Pailitao and Image Search Cloud Product

    The Vision Lab develops industry-leading image search and recognition technologies and applies them to a variety of application scenarios. Every day, more than 17 million people use the search-by-image function of Pailitao through Taobao and Tmall. The lab also develops image search product based on the Alibaba Cloud platform to provide complete search-by-image solutions for clients in areas such as e-commerce, photo database, and image gallery websites. This cloud product has attracted numerous overseas and domestic users, including THE ICONIC, a leading fashion and sports retailer in Australia and New Zealand.

  • 3D Smart Manufacturing

    The Vision Lab uses 3D vision technologies to provide industry-specific, customized solutions to create synergy between consumers, brands, and manufacturers. When used in the footwear industry, the lab’s technology achieved accurate personalized shoe recommendations and aided in precisions marketing through the use of efficient and accurate three-dimensional scanning and matching algorithms. Other technologies such as the automatic generation of shoe lasts for manufacturers and intelligent data analysis have allowed our clients to lower down the cost of customization and realize precision manufacturing.

  • AI Solutions for Media

    The lab’s multimodal artificial intelligence technologies for media enhance energy efficiency and save costs for the traditional media industry by performing functions such as media monitoring, labeling, content generation, and the copyright protection of multimedia data. The lab is currently cooperating with domestic media giants such as CCTV, Dongfang TV,, and Xinhua News.

  • Digitization of Consumers, Products and Scenes in New Retail

    This solution is to renovate existing stores or build new unmanned stores for new retail using camera-based sensors and vision technologies to perform functions such as the tracking and locating of customers, performing SKU identification of products and compliance checks of on-shelf product displays, and connecting consumers with the products they take. The overall goal is to promote the digitization of consumers, products, and scenes for stores, supermarkets, and hotels and provide business analysis based on the digitized data. The products are currently in use by Alibaba’s Hema stores.

Research Team
Xiaofeng RenHead of Vision Lab

He is a visiting professor at the Department of Computer Science and Engineering at the University of Washington and holds a Ph.D. from the University of California at Berkeley. Before joining Alibaba, he was a Senior Director and scientist at Amazon responsible for the research and development of visual algorithms for the Amazon Go. His academic research papers have been cited more than 10,000 times. He is also the area chairs of the CVPR and ICCV conferences.

Pan PanSenior Staff Algorithm Engineer of Vision Lab

He holds a Ph.D. from the University of Illinois at Chicago in Electrical and Computer Engineering. He is the co-founder of Pailitao, Alibaba’s visual search product and its algorithm leader. His research interests include deep learning, image and video analysis, and 3D vision. He was previously engaged in the research and development of vision technologies at the Mitsubishi Electric US Research Labs and the Fujitsu Beijing R&D Center. He has published more than 20 papers.

Zhu LiuSenior Staff Algorithm Engineer

He received the Ph.D. degree from New York University. His research interests include video content understanding and analysis, 3D vision, machine learning, etc. Before he joined Alibaba, he was a Principal Scientist at AT&T Research Labs. He was an adjunct professor at Columbia University and New York University. He holds more than 140 U.S. patents and has published more than 70 papers and chapters. He was a recipient of the AT&T Science & Technology Medal. He is a senior member of IEEE, and he is serving as an associate editor for IEEE TMM and SPL.

Hao Li

Yongpan Wang

Academic Achievements
  • Bin Wang, Pan Pan, Qinjie Xiao, Likang Luo, Xiaofeng Ren, Rong Jin, and Xiaogang Jin. Seamless Color Mapping for 3D Reconstruction with Consumer-Grade Scanning Devices. In: Proceedings of the 4th International Workshop on Recovering 6D Object Pose Organized at ECCV 2018, Munich, Germany, 2018.
  • Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Yingya Zhang, Xiaofeng Ren and Rong Jin. Visual Search at Alibaba. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'18), London, UK, 2018.
  • Jie Song, Chengchao Shen, Yezhou Yang, Yang Liu, and Mingli Song. Transductive Unbiased Embedding for Zero-shot Learning. In: Proceedings of the 31th IEEE Conference on Computer Vision and Pattern Recognition (CVPR'18), Salt Lake City, UT, 2018.
  • Peisong Wang, Qinghao Hu, Yifan Zhang, Chunjie Zhang, Yang Liu and Jian Cheng. Two-step Quantization for Low-bit Neural Networks. In: Proceedings of the 31th IEEE Conference on Computer Vision and Pattern Recognition (CVPR'18), Salt Lake City, UT, 2018.
  • Lechao Cheng, Zicheng Liao, Xiaowei Zhao and Yang Liu. Exploiting Non-Local Action Relationships for Dense Video Captioning. In: Proceedings of the 29th British Machine Vision Conference (BMVC, 18), Newcastle, British, 2018.
  • Zhiqi Cheng, Xiao Wu, Yang Liu and Xiansheng Hua. Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images. In: Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR'17), Honolulu, Hawaii, 2017.
  • Chen Chen, Xiaowei Zhao and Yang Liu. Multi-modal Aggregation for Video Classification. In: Proceedings of the 25th ACM Multimedia Workshop 2017 (ACM MM' 17), Mountain View, CA, 2017.
  • L. Cheng, X. Zhou, L. Zhao, D. Li, H. Shang, Y. Zheng, P. Pan, Y. Xu:Weakly Supervised Learning with Side Information for Noisy Labeled Images. ECCV 2020.
  • L. Song, P. Pan, K. Zhao, H. Yang, Y. Chen, Y. Zhang, Y. Xu, R. Jin: Large-Scale Training System for 100-Million Classification at Alibaba. KDD 2020.
  • X. Zhou, P. Pan, Y. Zheng, Y. Xu, R. Jin: Large scale long-tailed product recognition system at Alibaba. CIKM 2020.
  • J. Dong, Z. Cao, T. Zhang, J. Ye, S. Wang, F. Feng, L. Zhao, X. Liu, L. Song, L. Peng, Y. Guo, X. Jiang, L. Tang, Y. Du, Y. Zhang, P. Pan, Y. Xie: EFLOPS: Algorithm and System Co-Design for a High Performance Distributed Training Platform. HPCA 2020.
  • Q. Qian, L. Chen, H. Li, R Jin. DR Loss: Improving Object Detection by Distributional Ranking. CVPR 2020.
  • L. Han, P. Wang, Z. Yin, F. Wang, H. Li. Exploiting Better Feature Aggregation for Video Object Detection. ACMMM 2020.
  • Q. Qian, J. Hu, H. Li. Hierarchically Robust Representation Learning. CVPR 2020.
  • C. Luo, Y. Zhu, L. Jin, Y. Wang. Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition. CVPR 2020.
  • Y. Huang,M. He, Y. Wang, L. Jin. RD-GAN: Chinese Character Font Transfer via Radical Decomposition and Rendering. ECCV 2020.
  • L. Li, F. Gao, J. Bu, Y. Wang, Z. Yu, Q. Zheng. An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension. ECCV 2020.
  • M. Zhou, Z Niu. Adversarial Ranking Attack and Defense. ECCV 2020.
  • W. Wang, X. Liu, X. Ji, Enze X., D. Liang, Z. Yang, T. Lu, C. Shen, P. Luo. AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting. ECCV 2020.
  • H. Wang, P. Lu, H. Zhang, M. Yang, X. Bai, Y. Xu, M. He, Y. Wang, W. Liu. All You Need is Boundary: Toward Arbitrary-Shaped Text Spotting. AAAI 2020.
  • C. Liu, Y. Liu, L. Jin, S. Zhang, C. Luo, Y. Wang. EraseNet: End-to-End Text Removal in the Wild. TIP 2020.
  • Y. Zhang, P. Pan, Y. Zheng, K. Zhao, J. Wu, Y. Xu, R. Jin: Virtual ID Discovery from E-commerce Media at Alibaba: Exploiting Richness of User Click Behavior for Visual Search Relevance. CIKM 2019.
  • K. Zhao, P. Pan, Y. Zheng, Y. Zhang, C. Wang, Y. Zhang, Y. Xu, R. Jin: Large-Scale Visual Search with Binary Distributed Graph at Alibaba. CIKM 2019.
  • Q. Qian,L. Shang,B. Sun, J. Hu,H. Li,R. Jin. SoftTriple Loss: Deep Metric Learning without Triplet Sampling. ICCV 2019.
  • Z. Tan, X. Nie, Q. Qian, N. Li, H. Li. Learning to rank proposals for object detection. ICCV 2019
  • Q. Qian,S. Zhu, J. Tang, B. Sun,H. Li,R. Jin. Robust Optimization over Multiple Domains. AAAI 2019
  • Y. Xu, Y. Wang, W. Zhou, Y. Wang, Z. Yang,X. Bai. TextField: Learning A Deep Direction Field for Irregular Scene Text Detection. TIP 2019
  • M. Zhou, Z Niu. Ladder Loss for Coherent Visual-Semantic Embedding. AAAI 2019
  • Z. Gao, Z. Niu. Video imprint segmentation for temporal action detection in untrimmed videos. AAAI 2019
  • Z. Liu, Z. Niu. Weakly supervised temporal action localization through contrast based evaluation networks. ICCV 2019
  • M. Lin, S. Qiu, J. Ye, X. Song, Q. Qian, L. Sun, S. Zhu, R. Jin. Which Factorization Machine Modeling is Better: A Theoretical Answer with Optimal Guarantee. AAAI 2019
  • M. Lin, X. Song, Q. Qian, H. Li, L. Sun, S. Zhu, R. Jin. Robust Gaussian Process Regression for Real-Time High Precision GPS Signal Enhancement. KDD 2019
  • Q. Qian, J. Tang, H. Li, S. Zhu, R. Jin. Large-scale Distance Metric Learning with Uncertainty. CVPR 2018
  • Z Gao, Z. Niu. Video imprint. TPAMI 2018
  • Z. Liu, Z. Niu. Joint video object discovery and segmentation by coupled dynamic markov networks. TIP 2018
  • Y. Zhang, P. Pan, Y. Zheng, K. Zhao, Y. Zhang, X. Ren, R. Jin: Visual Search at Alibaba. KDD 2018
  • B. Wang, P. Pan, Q. Xiao, L. Luo, X. Ren, R. Jin, X. Jin: Seamless Color Mapping for 3D Reconstruction with Consumer-Grade Scanning Devices. ECCV Workshops 2018
  • C. Leng, Z. Dou, H. Li, S. Zhu, R. Jin. Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM. AAAI 2018
  • The Vision Lab team won the first place in three road-scene segmentation tasks at KITTI in 2018.
  • The Vision Lab team won the large-scale video competition (LSVC) championship at the 2017 ACM Multimedia Conference.

Scan QR code
关注Ali TechnologyWechat Account