Research Focus
  • Image Understanding and Analysis

Research in this area focuses on the development of technologies such as image classification, target detection, feature representation learning, key point extraction, and large-scale vector search engine to solve the challenges related to the search, recognition, and analysis of product images, general images, human faces and bodies, and textual images.

  • Video Understanding and Mining

Research in this area focuses on the development of technologies such as visual tracking, video tagging, and video generation to solve the challenges related to efficient and stable video monitoring, and search and editing among large-scale videos.

  • Image and Text Analysis

Research in this area focuses on the development of technologies for cross-media content analysis, such as the cross media search and multimedia joint-search of images and texts, and price prediction to solve various industry challenges.

  • 3D Vision

Research in this area focuses on the development of technologies such as scanning, point-cloud processing, texture mapping, and 3D classification/ detection/ feature representation to solve challenges related to the generation, recognition, and search of 3D models for small objects, the human body and body parts, and the scene.

  • Offline Intelligence

Research in this area focuses on the development of technologies such as camera networks, sensor fusion, pedestrian tracking and identification, human pose estimation, and object detection and recognition, to solve the challenges related to identity recognition, global people tracking, motion sequence analysis, product detection and recognition, and people/ product binding.

Products and Applications
  • Pailitao and Image Search Cloud Product

    The Vision Lab develops industry-leading image search and recognition technologies and applies them to a variety of application scenarios. Every day, more than 17 million people use the search-by-image function of Pailitao through Taobao and Tmall. The lab also develops image search product based on the Alibaba Cloud platform to provide complete search-by-image solutions for clients in areas such as e-commerce, photo database, and image gallery websites. This cloud product has attracted numerous overseas and domestic users, including THE ICONIC, a leading fashion and sports retailer in Australia and New Zealand.

  • 3D Smart Manufacturing

    The Vision Lab uses 3D vision technologies to provide industry-specific, customized solutions to create synergy between consumers, brands, and manufacturers. When used in the footwear industry, the lab’s technology achieved accurate personalized shoe recommendations and aided in precisions marketing through the use of efficient and accurate three-dimensional scanning and matching algorithms. Other technologies such as the automatic generation of shoe lasts for manufacturers and intelligent data analysis have allowed our clients to lower down the cost of customization and realize precision manufacturing.

  • AI Solutions for Media

    The lab’s multimodal artificial intelligence technologies for media enhance energy efficiency and save costs for the traditional media industry by performing functions such as media monitoring, labeling, content generation, and the copyright protection of multimedia data. The lab is currently cooperating with domestic media giants such as CCTV, Dongfang TV,, and Xinhua News.

  • Digitization of Consumers, Products and Scenes in New Retail

    This solution is to renovate existing stores or build new unmanned stores for new retail using camera-based sensors and vision technologies to perform functions such as the tracking and locating of customers, performing SKU identification of products and compliance checks of on-shelf product displays, and connecting consumers with the products they take. The overall goal is to promote the digitization of consumers, products, and scenes for stores, supermarkets, and hotels and provide business analysis based on the digitized data. The products are currently in use by Alibaba’s Hema stores.

Research Team
Xiaofeng RenHead of Vision Lab

He is a visiting professor at the Department of Computer Science and Engineering at the University of Washington and holds a Ph.D. from the University of California at Berkeley. Before joining Alibaba, he was a Senior Director and scientist at Amazon responsible for the research and development of visual algorithms for the Amazon Go. His academic research papers have been cited more than 10,000 times. He is also the area chairs of the CVPR and ICCV conferences.

ZELNIK, LihiHead of DAMO Israel Machine Intelligence Lab

Lihi Zelnik joined Alibaba in 2018. Before joining Alibaba she was an Associate Professor in the Faculty of Electrical Engineering at the Technion, Israel Institute of Technology and a visiting professor in CornellTech in New-York. Prof. Zelnik holds a PhD in Computer Science from the Weizmann Institute of Science. Her main area of expertise is Computer Vision. Prof Zelnik was a Program Chair of CVPR’16, an Associate Editor at TPAMI, served multiple times as Area Chair at CVPR, ECCV and was on the award committee of ACCV'18 and CVPR'19. In 2021/22 she will serve as General Chair of CVPR'21 and ECCV'22.

Pan PanSenior Staff Algorithm Engineer of Vision Lab

He holds a Ph.D. from the University of Illinois at Chicago in Electrical and Computer Engineering. He is the co-founder of Pailitao, Alibaba’s visual search product and its algorithm leader. His research interests include deep learning, image and video analysis, and 3D vision. He was previously engaged in the research and development of vision technologies at the Mitsubishi Electric US Research Labs and the Fujitsu Beijing R&D Center. He has published more than 20 papers.

Zhu LiuSenior Staff Algorithm Engineer

He received the Ph.D. degree from New York University. His research interests include video content understanding and analysis, 3D vision, machine learning, etc. Before he joined Alibaba, he was a Principal Scientist at AT&T Research Labs. He was an adjunct professor at Columbia University and New York University. He holds more than 140 U.S. patents and has published more than 70 papers and chapters. He was a recipient of the AT&T Science & Technology Medal. He is a senior member of IEEE, and he is serving as an associate editor for IEEE TMM and SPL.

Itamar FriedmanSenior Staff Engineer

He holds MSc and BSc (summa cum laude) in Computer Vision and Machine Learning from the Faculty of Electrical Engineering at the Technion (Israel Institute of Technology). His research interests include efficient video and image analysis, with focus on automated deep learning. He was the CTO of Visualead, a startup developing machine-vision offline-to-online technologies acquired by Alibaba. He is a serial entrepreneur, founding startups in the field of robotics and web development. He was a mentor in Microsoft Accelerator TLV, mentoring Israeli leading AI startups in the fields of medical and drones. He holds several patents, and his research has been integrated into many products in Alibaba Group.

Academic Achievements
  • Bin Wang, Pan Pan, Qinjie Xiao, Likang Luo, Xiaofeng Ren, Rong Jin, and Xiaogang Jin. Seamless Color Mapping for 3D Reconstruction with Consumer-Grade Scanning Devices. In: Proceedings of the 4th International Workshop on Recovering 6D Object Pose Organized at ECCV 2018, Munich, Germany, 2018.
  • Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Yingya Zhang, Xiaofeng Ren and Rong Jin. Visual Search at Alibaba. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'18), London, UK, 2018.
  • Jie Song, Chengchao Shen, Yezhou Yang, Yang Liu, and Mingli Song. Transductive Unbiased Embedding for Zero-shot Learning. In: Proceedings of the 31th IEEE Conference on Computer Vision and Pattern Recognition (CVPR'18), Salt Lake City, UT, 2018.
  • Peisong Wang, Qinghao Hu, Yifan Zhang, Chunjie Zhang, Yang Liu and Jian Cheng. Two-step Quantization for Low-bit Neural Networks. In: Proceedings of the 31th IEEE Conference on Computer Vision and Pattern Recognition (CVPR'18), Salt Lake City, UT, 2018.
  • Lechao Cheng, Zicheng Liao, Xiaowei Zhao and Yang Liu. Exploiting Non-Local Action Relationships for Dense Video Captioning. In: Proceedings of the 29th British Machine Vision Conference (BMVC, 18), Newcastle, British, 2018.
  • Zhiqi Cheng, Xiao Wu, Yang Liu and Xiansheng Hua. Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images. In: Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR'17), Honolulu, Hawaii, 2017.
  • Chen Chen, Xiaowei Zhao and Yang Liu. Multi-modal Aggregation for Video Classification. In: Proceedings of the 25th ACM Multimedia Workshop 2017 (ACM MM' 17), Mountain View, CA, 2017.
  • The Vision Lab team won the first place in three road-scene segmentation tasks at KITTI in 2018.
  • The Vision Lab team won the large-scale video competition (LSVC) championship at the 2017 ACM Multimedia Conference.

Scan QR code
关注Ali TechnologyWechat Account