Research Focus
  • Visual Understanding and Interactive Vision

Research in this area focuses on the development of computer vision technologies. These technologies include object classification, object detection and tracking, object segmentation, representation learning, key point extraction, human pose estimation, gesture recognition, image captioning, and large-scale distributed training engine. The research aims to resolve issues in e-commerce and general-purpose visual computing scenarios, such as identifying commodities and people and learning about the behavior and interaction of these entities.

  • Video Content Understanding and Video Data Mining

Research in this area focuses on the development of technologies such as video annotation, video search, video object detection, and video generation. The research aims to tackle challenges in terms of efficiency and accuracy of moderating, searching, and editing large amounts of video data.

  • 3D Vision

Research in this area focuses on the development of 3D modeling, 3D perception, 3D understanding, and 3D interaction. The research aims to resolve modeling and measuring issues in onboard computer vision and improve the overall experience of AR and VR.

  • Text Identification

Research in this area focuses on the development of text detection, text identification, and understanding the structure of data for images and videos. The research aims to improve the identification of text and extraction of information from complex visual data, including scanned documents, photographs, and images with multiple languages and objects.

  • Image-text Understanding

Research in this area focuses on the development of core technologies in multimedia content understanding, including image-text matching, image-text joint search, and price estimation.

  • Offline Intelligence

Research in this area focuses on how to remotely analyze images and X-ray image data, perform remote sensing, detect changes, and classify object-orientated land cover. The feature also provides cost-effective optimization for deep neural networks, such as model compression, inference acceleration, and network structure search. This feature is based on the studies of device-side and edge-side vision processing and structural solutions, including object detection, object segmentation, multi-object tracking, object identification (pedestrian, vehicle, and facial recognition), object attribute extraction, and behavior analytics algorithms.

  • Low-level Vision

Research in this area focuses on the development of diverse vision computing technologies to solve challenges in low-level vision, including technologies that are used to preprocess images and videos for visual content analytics and understanding. The research is applied in real-world applications such as image/video repairing, image/video enhancement, and denoising. Research in this area also focuses on the development of image editing and generation to improve user experience and optimize human-machine interaction.

Products and Applications
  • Pailitao and Image Search

    The Vision Lab researches and develops cutting-edge image search and recognition technologies that can be used in a wide variety of scenarios. Pailitao is a smart image search feature integrated in Alibaba's e-commerce platforms (Taobao and Tmall). Over 20 million users use Pailitao on a daily basis to perform reverse image searches for products they want to purchase. Image Search is a cloud service provided by Alibaba Cloud. It provides a comprehensive search-by-image solution for customers who want to search for similar images on e-commerce platforms, photo galleries, and image-sharing websites. This service is well received by users all over the world, such as THE ICONIC, a leading online fashion and footwear store in Australia and New Zealand.

  • 3D Smart Manufacturing

    The Vision Lab studies on 3D vision and computer graphics technologies to provide industry-specific solutions for digitalization and intelligentization. These solutions help create synergy among customers, brands, and manufacturers. For example, when adopted in the footwear industry, these technologies can increase the efficiency and precision of footwear manufacturing, marketing, and recommendations by using 3D scanning and matching algorithms. These technologies can also be applied in the real estate industry, where realistic and immersive experiences can be created at low costs and high efficiency. Furthermore, in the e-commerce industry, these technologies can deliver immersive shopping experiences (see-now-buy-now) to customers through AR and VR technologies, which can increase the sales efficiency and conversion rate.

  • Virtual Avatars

    The Vision Lab integrates graphics, image, and speech technologies. 2D and 3D technologies are now making their way to people's homes, such as virtual influencers on Taobao and virtual tutors on online education platforms. The Vision Lab utilizes advanced technologies to generate, operate, and control these realistic virtual avatars. The Vision Lab has developed industry-leading technologies in high-precision reconstruction of human faces and bodies, photo2avatar, video2avatar, speech2action, and dialogue interaction with virtual avatars. These advanced technologies empower industries such as interactive entertainment, intelligent education, new retail, AR, VR, and XR.

  • AI solution in Media

    The rise of AI technologies brings remarkable changes to the digital media industry. AI technologies are used in video and audio processing. These AI technologies include video and audio structuring, facial recognition, video and audio fingerprinting, content generation, smart content moderation, and multi-modal search. The Vision Lab provides AI solutions to support copyright protection, media cataloging, media editing, media moderation, and multi-modal search. This way, digital media enterprises can improve work efficiency and reduce costs. The Vision Lab has established partnerships with digital media giants such as CCTV, People's Daily, and Xinhua News Agency.

  • Analytical Insight of Earth (AIEARTH)

    The Vision Lab uses computer vision technologies to gain analytical insights from multisource earth observation data, extract earth surface information and monitor dynamic changes. Compared with traditional approaches, computer vision technologies are game-changing solutions that allow high efficiency and high precision in natural resource monitoring, ecological environment monitoring, crop yield estimation, and disaster prevention.

Research Team
Yinghui XuDirector of Vision Lab

Yinghui Xu holds a PhD of Computer Science from Toyohashi University of Technology and is a former researcher at Ricoh Software Research Center. He is the director of search sorting and basic algorithms for Alibaba Search Department and director of Cainiao Artificial Intelligence Department. He is the Recipient of the Minori Award from Ricoh Company. He published thesis of the year 2005 for Japan Natural Language Institute and received the Best Thesis nomination for SIGIR 2017. In addition, Yinghui is a co-founder of the Alibaba personalized search system. His research fields include information retrieval, machine learning, computer vision and natural language processing. He is also the director of Chinese Information Processing Society of China and director of the Vision Lab of Alibaba DAMO Academy.

ZELNIK, LihiHead of DAMO Israel Machine Intelligence Lab

Lihi Zelnik joined Alibaba in 2018. Before joining Alibaba she was an Associate Professor in the Faculty of Electrical Engineering at the Technion, Israel Institute of Technology and a visiting professor in CornellTech in New-York. Prof. Zelnik holds a PhD in Computer Science from the Weizmann Institute of Science. Her main area of expertise is Computer Vision. Prof Zelnik was a Program Chair of CVPR’16, an Associate Editor at TPAMI, served multiple times as Area Chair at CVPR, ECCV and was on the award committee of ACCV'18 and CVPR'19. In 2021/22 she will serve as General Chair of CVPR'21 and ECCV'22.

Itamar FriedmanSenior Staff Engineer

He holds MSc and BSc (summa cum laude) in Computer Vision and Machine Learning from the Faculty of Electrical Engineering at the Technion (Israel Institute of Technology). His research interests include efficient video and image analysis, with focus on automated deep learning. He was the CTO of Visualead, a startup developing machine-vision offline-to-online technologies acquired by Alibaba. He is a serial entrepreneur, founding startups in the field of robotics and web development. He was a mentor in Microsoft Accelerator TLV, mentoring Israeli leading AI startups in the fields of medical and drones. He holds several patents, and his research has been integrated into many products in Alibaba Group.

Pan PanSenior Staff Algorithm Engineer of Vision Lab

Pan Pan holds a PhD from University of Illinois at Chicago and is in charge of research and development of visual technology in the e-commerce field. He is also a co-founder of Pailitao. His research fields include deep learning, visual search, and 3D vision. He is a former vision researcher at Mitsubishi Electric Research Laboratories and Fujitsu Beijing Research Center. In addition, Pan published more than 20 papers and owns more than 10 licensed patents.

Zhu LiuSenior Staff Algorithm Engineer

Zhu Liu holds a PhD from New York University. His research fields include video content understanding and analytics, 3D vision, and machine learning. He is the former chief scientist of AT&T Science Laboratory and visiting assistant professor at Columbia University and New York University. He also published more than 70 papers and owns more than 170 US patents. In addition, Zhu is a recipient of AT&T Technology Award, a senior member of Institute of Electrical and Electronics Engineers (IEEE), and the associate editor at IEEE Transactions on Multimedia and IEEE Signal Processing Letters.

Hao LiSenior Algorithm Expert at Vision Lab

Hao Li holds a PhD from Chinese Academy of Sciences and is in charge of real-scene visual understanding technologies. His research fields include smart interpretation of remote sensing images, X-ray object identification, facial recognition-based clocking in system, new retail, and smart campuses. Related technologies include deep learning model compression, facial recognition, person re-identification, and image search. He Published more than 20 papers and owns more than 20 licensed patents.

Yongpan WangSenior Algorithm Expert at Alibaba DAMO Academy

Yongpan Wang holds a master's degree from Zhejiang University and is in charge of optical character recognition (OCR) research. OCR technologies include image and text detection, text detection, structural understanding, edge OCR, and video OCR. Yongpan established a complete extensive OCR technology system, published multiple papers, owns multiple licensed patents, hosts OCR competitions, and is prominent in the OCR field. She is also in charge of Duguang OCR services and dedicated to public welfare projects such as "Visual Accessibility - Purchase through Listening to Images" and "Identification of Old Books - Information Source". Duguang OCR services are widely used in enterprise administration, advertising, and finance and customs services in the cloud.

Academic Achievements
  • ECCV 2020 VIPriors Semantic Segmentation Challenge: First Place
  • ECCV 2020 Tracking Any Objects Challenge: First Place
  • ECCV 2020 Visual Domain Adaption Challenge: First Place
  • ECCV 2020 Large Vocabulary Instance Segmentation: Second Place
  • LPIRC 2019 Classification: First Place
  • ACM MM 2017 Large-scale Video Classification: First Place
  • CVPR2019/WebVision Challenge on Visual Understanding by Learning from Web Data 2019: First Place
  • ICCV2019/COCO Detection and Segmentation Challenge 2019: First Place
  • CVPR2020/DAVIS Challenge on Video Object Segmentation 2020: First Place
  • CVPR2019/iNaturalist Fine-grained Image Classification 2019: Second Place
  • CVPR2020/BMTT Multiple Object Tracking and Segmentation 2020: Second Place
  • CVPR 2020 Activitynet Temporal Action Localization: First Price
  • CVPR 2020 HACS Temporal Action Localization: First Place
  • The Vision Lab team won the first place in three road-scene segmentation tasks at KITTI in 2018.
  • ICCV 2019 Light Weight Face Recognition Challenge: Third Place
  • The Vision Lab team won the large-scale video competition (LSVC) championship at the 2017 ACM Multimedia Conference.

