- Visual Understanding and Interactive Vision
Research in this area focuses on the development of computer vision technologies. These technologies include object classification, object detection and tracking, object segmentation, representation learning, key point extraction, human pose estimation, gesture recognition, image captioning, and large-scale distributed training engine. The research aims to resolve issues in e-commerce and general-purpose visual computing scenarios, such as identifying commodities and people and learning about the behavior and interaction of these entities.
- Video Content Understanding and Video Data Mining
Research in this area focuses on the development of technologies such as video annotation, video search, video object detection, and video generation. The research aims to tackle challenges in terms of efficiency and accuracy of moderating, searching, and editing large amounts of video data.
- 3D Vision
Research in this area focuses on the development of 3D modeling, 3D perception, 3D understanding, and 3D interaction. The research aims to resolve modeling and measuring issues in onboard computer vision and improve the overall experience of AR and VR.
- Text Identification
Research in this area focuses on the development of text detection, text identification, and understanding the structure of data for images and videos. The research aims to improve the identification of text and extraction of information from complex visual data, including scanned documents, photographs, and images with multiple languages and objects.
- Image-text Understanding
Research in this area focuses on the development of core technologies in multimedia content understanding, including image-text matching, image-text joint search, and price estimation.
- Offline Intelligence
Research in this area focuses on how to remotely analyze images and X-ray image data, perform remote sensing, detect changes, and classify object-orientated land cover. The feature also provides cost-effective optimization for deep neural networks, such as model compression, inference acceleration, and network structure search. This feature is based on the studies of device-side and edge-side vision processing and structural solutions, including object detection, object segmentation, multi-object tracking, object identification (pedestrian, vehicle, and facial recognition), object attribute extraction, and behavior analytics algorithms.
- Low-level Vision
Research in this area focuses on the development of diverse vision computing technologies to solve challenges in low-level vision, including technologies that are used to preprocess images and videos for visual content analytics and understanding. The research is applied in real-world applications such as image/video repairing, image/video enhancement, and denoising. Research in this area also focuses on the development of image editing and generation to improve user experience and optimize human-machine interaction.
- Pailitao and Image Search
The Vision Lab researches and develops cutting-edge image search and recognition technologies that can be used in a wide variety of scenarios. Pailitao is a smart image search feature integrated in Alibaba's e-commerce platforms (Taobao and Tmall). Over 20 million users use Pailitao on a daily basis to perform reverse image searches for products they want to purchase. Image Search is a cloud service provided by Alibaba Cloud. It provides a comprehensive search-by-image solution for customers who want to search for similar images on e-commerce platforms, photo galleries, and image-sharing websites. This service is well received by users all over the world, such as THE ICONIC, a leading online fashion and footwear store in Australia and New Zealand.
- 3D Smart Manufacturing
The Vision Lab studies on 3D vision and computer graphics technologies to provide industry-specific solutions for digitalization and intelligentization. These solutions help create synergy among customers, brands, and manufacturers. For example, when adopted in the footwear industry, these technologies can increase the efficiency and precision of footwear manufacturing, marketing, and recommendations by using 3D scanning and matching algorithms. These technologies can also be applied in the real estate industry, where realistic and immersive experiences can be created at low costs and high efficiency. Furthermore, in the e-commerce industry, these technologies can deliver immersive shopping experiences (see-now-buy-now) to customers through AR and VR technologies, which can increase the sales efficiency and conversion rate.
- Virtual Avatars
The Vision Lab integrates graphics, image, and speech technologies. 2D and 3D technologies are now making their way to people's homes, such as virtual influencers on Taobao and virtual tutors on online education platforms. The Vision Lab utilizes advanced technologies to generate, operate, and control these realistic virtual avatars. The Vision Lab has developed industry-leading technologies in high-precision reconstruction of human faces and bodies, photo2avatar, video2avatar, speech2action, and dialogue interaction with virtual avatars. These advanced technologies empower industries such as interactive entertainment, intelligent education, new retail, AR, VR, and XR.
- AI solution in Media
The rise of AI technologies brings remarkable changes to the digital media industry. AI technologies are used in video and audio processing. These AI technologies include video and audio structuring, facial recognition, video and audio fingerprinting, content generation, smart content moderation, and multi-modal search. The Vision Lab provides AI solutions to support copyright protection, media cataloging, media editing, media moderation, and multi-modal search. This way, digital media enterprises can improve work efficiency and reduce costs. The Vision Lab has established partnerships with digital media giants such as CCTV, People's Daily, and Xinhua News Agency.
- Analytical Insight of Earth (AIEARTH)
The Vision Lab uses computer vision technologies to gain analytical insights from multisource earth observation data, extract earth surface information and monitor dynamic changes. Compared with traditional approaches, computer vision technologies are game-changing solutions that allow high efficiency and high precision in natural resource monitoring, ecological environment monitoring, crop yield estimation, and disaster prevention.
Yinghui Xu holds a PhD of Computer Science from Toyohashi University of Technology and is a former researcher at Ricoh Software Research Center. He is the director of search sorting and basic algorithms for Alibaba Search Department and director of Cainiao Artificial Intelligence Department. He is the Recipient of the Minori Award from Ricoh Company. He published thesis of the year 2005 for Japan Natural Language Institute and received the Best Thesis nomination for SIGIR 2017. In addition, Yinghui is a co-founder of the Alibaba personalized search system. His research fields include information retrieval, machine learning, computer vision and natural language processing. He is also the director of Chinese Information Processing Society of China and director of the Vision Lab of Alibaba DAMO Academy.
Lihi Zelnik joined Alibaba in 2018. Before joining Alibaba she was an Associate Professor in the Faculty of Electrical Engineering at the Technion, Israel Institute of Technology and a visiting professor in CornellTech in New-York. Prof. Zelnik holds a PhD in Computer Science from the Weizmann Institute of Science. Her main area of expertise is Computer Vision. Prof Zelnik was a Program Chair of CVPR’16, an Associate Editor at TPAMI, served multiple times as Area Chair at CVPR, ECCV and was on the award committee of ACCV'18 and CVPR'19. In 2021/22 she will serve as General Chair of CVPR'21 and ECCV'22.
He holds MSc and BSc (summa cum laude) in Computer Vision and Machine Learning from the Faculty of Electrical Engineering at the Technion (Israel Institute of Technology). His research interests include efficient video and image analysis, with focus on automated deep learning. He was the CTO of Visualead, a startup developing machine-vision offline-to-online technologies acquired by Alibaba. He is a serial entrepreneur, founding startups in the field of robotics and web development. He was a mentor in Microsoft Accelerator TLV, mentoring Israeli leading AI startups in the fields of medical and drones. He holds several patents, and his research has been integrated into many products in Alibaba Group.
Pan Pan holds a PhD from University of Illinois at Chicago and is in charge of research and development of visual technology in the e-commerce field. He is also a co-founder of Pailitao. His research fields include deep learning, visual search, and 3D vision. He is a former vision researcher at Mitsubishi Electric Research Laboratories and Fujitsu Beijing Research Center. In addition, Pan published more than 20 papers and owns more than 10 licensed patents.
Zhu Liu holds a PhD from New York University. His research fields include video content understanding and analytics, 3D vision, and machine learning. He is the former chief scientist of AT&T Science Laboratory and visiting assistant professor at Columbia University and New York University. He also published more than 70 papers and owns more than 170 US patents. In addition, Zhu is a recipient of AT&T Technology Award, a senior member of Institute of Electrical and Electronics Engineers (IEEE), and the associate editor at IEEE Transactions on Multimedia and IEEE Signal Processing Letters.
Hao Li holds a PhD from Chinese Academy of Sciences and is in charge of real-scene visual understanding technologies. His research fields include smart interpretation of remote sensing images, X-ray object identification, facial recognition-based clocking in system, new retail, and smart campuses. Related technologies include deep learning model compression, facial recognition, person re-identification, and image search. He Published more than 20 papers and owns more than 20 licensed patents.
Yongpan Wang holds a master's degree from Zhejiang University and is in charge of optical character recognition (OCR) research. OCR technologies include image and text detection, text detection, structural understanding, edge OCR, and video OCR. Yongpan established a complete extensive OCR technology system, published multiple papers, owns multiple licensed patents, hosts OCR competitions, and is prominent in the OCR field. She is also in charge of Duguang OCR services and dedicated to public welfare projects such as "Visual Accessibility - Purchase through Listening to Images" and "Identification of Old Books - Information Source". Duguang OCR services are widely used in enterprise administration, advertising, and finance and customs services in the cloud.
- Bin Wang, Pan Pan, Qinjie Xiao, Likang Luo, Xiaofeng Ren, Rong Jin, and Xiaogang Jin. Seamless Color Mapping for 3D Reconstruction with Consumer-Grade Scanning Devices. In: Proceedings of the 4th International Workshop on Recovering 6D Object Pose Organized at ECCV 2018, Munich, Germany, 2018.
- Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Yingya Zhang, Xiaofeng Ren and Rong Jin. Visual Search at Alibaba. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'18), London, UK, 2018.
- Jie Song, Chengchao Shen, Yezhou Yang, Yang Liu, and Mingli Song. Transductive Unbiased Embedding for Zero-shot Learning. In: Proceedings of the 31th IEEE Conference on Computer Vision and Pattern Recognition (CVPR'18), Salt Lake City, UT, 2018.
- Peisong Wang, Qinghao Hu, Yifan Zhang, Chunjie Zhang, Yang Liu and Jian Cheng. Two-step Quantization for Low-bit Neural Networks. In: Proceedings of the 31th IEEE Conference on Computer Vision and Pattern Recognition (CVPR'18), Salt Lake City, UT, 2018.
- Lechao Cheng, Zicheng Liao, Xiaowei Zhao and Yang Liu. Exploiting Non-Local Action Relationships for Dense Video Captioning. In: Proceedings of the 29th British Machine Vision Conference (BMVC, 18), Newcastle, British, 2018.
- Zhiqi Cheng, Xiao Wu, Yang Liu and Xiansheng Hua. Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images. In: Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR'17), Honolulu, Hawaii, 2017.
- Chen Chen, Xiaowei Zhao and Yang Liu. Multi-modal Aggregation for Video Classification. In: Proceedings of the 25th ACM Multimedia Workshop 2017 (ACM MM' 17), Mountain View, CA, 2017.
- L. Cheng, X. Zhou, L. Zhao, D. Li, H. Shang, Y. Zheng, P. Pan, Y. Xu：Weakly Supervised Learning with Side Information for Noisy Labeled Images. ECCV 2020.
- L. Song, P. Pan, K. Zhao, H. Yang, Y. Chen, Y. Zhang, Y. Xu, R. Jin: Large-Scale Training System for 100-Million Classification at Alibaba. KDD 2020.
- X. Zhou, P. Pan, Y. Zheng, Y. Xu, R. Jin: Large scale long-tailed product recognition system at Alibaba. CIKM 2020.
- J. Dong, Z. Cao, T. Zhang, J. Ye, S. Wang, F. Feng, L. Zhao, X. Liu, L. Song, L. Peng, Y. Guo, X. Jiang, L. Tang, Y. Du, Y. Zhang, P. Pan, Y. Xie: EFLOPS: Algorithm and System Co-Design for a High Performance Distributed Training Platform. HPCA 2020.
- Q. Qian, L. Chen, H. Li, R Jin. DR Loss: Improving Object Detection by Distributional Ranking. CVPR 2020.
- L. Han, P. Wang, Z. Yin, F. Wang, H. Li. Exploiting Better Feature Aggregation for Video Object Detection. ACMMM 2020.
- Q. Qian, J. Hu, H. Li. Hierarchically Robust Representation Learning. CVPR 2020.
- C. Luo, Y. Zhu, L. Jin, Y. Wang. Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition. CVPR 2020.
- Y. Huang，M. He, Y. Wang, L. Jin. RD-GAN: Chinese Character Font Transfer via Radical Decomposition and Rendering. ECCV 2020.
- L. Li, F. Gao, J. Bu, Y. Wang, Z. Yu, Q. Zheng. An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension. ECCV 2020.
- M. Zhou, Z Niu. Adversarial Ranking Attack and Defense. ECCV 2020.
- W. Wang, X. Liu, X. Ji, Enze X., D. Liang, Z. Yang, T. Lu, C. Shen, P. Luo. AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting. ECCV 2020.
- H. Wang, P. Lu, H. Zhang, M. Yang, X. Bai, Y. Xu, M. He, Y. Wang, W. Liu. All You Need is Boundary: Toward Arbitrary-Shaped Text Spotting. AAAI 2020.
- C. Liu, Y. Liu, L. Jin, S. Zhang, C. Luo, Y. Wang. EraseNet: End-to-End Text Removal in the Wild. TIP 2020.
- Y. Zhang, P. Pan, Y. Zheng, K. Zhao, J. Wu, Y. Xu, R. Jin: Virtual ID Discovery from E-commerce Media at Alibaba: Exploiting Richness of User Click Behavior for Visual Search Relevance. CIKM 2019.
- K. Zhao, P. Pan, Y. Zheng, Y. Zhang, C. Wang, Y. Zhang, Y. Xu, R. Jin: Large-Scale Visual Search with Binary Distributed Graph at Alibaba. CIKM 2019.
- Q. Qian，L. Shang，B. Sun, J. Hu，H. Li，R. Jin. SoftTriple Loss: Deep Metric Learning without Triplet Sampling. ICCV 2019.
- Z. Tan, X. Nie, Q. Qian, N. Li, H. Li. Learning to rank proposals for object detection. ICCV 2019
- Q. Qian，S. Zhu, J. Tang， B. Sun，H. Li，R. Jin. Robust Optimization over Multiple Domains. AAAI 2019
- Y. Xu， Y. Wang, W. Zhou, Y. Wang， Z. Yang，X. Bai. TextField: Learning A Deep Direction Field for Irregular Scene Text Detection. TIP 2019
- M. Zhou, Z Niu. Ladder Loss for Coherent Visual-Semantic Embedding. AAAI 2019
- Z. Gao, Z. Niu. Video imprint segmentation for temporal action detection in untrimmed videos. AAAI 2019
- Z. Liu, Z. Niu. Weakly supervised temporal action localization through contrast based evaluation networks. ICCV 2019
- M. Lin, S. Qiu, J. Ye, X. Song, Q. Qian, L. Sun, S. Zhu, R. Jin. Which Factorization Machine Modeling is Better: A Theoretical Answer with Optimal Guarantee. AAAI 2019
- M. Lin, X. Song, Q. Qian, H. Li, L. Sun, S. Zhu, R. Jin. Robust Gaussian Process Regression for Real-Time High Precision GPS Signal Enhancement. KDD 2019
- Q. Qian, J. Tang, H. Li, S. Zhu, R. Jin. Large-scale Distance Metric Learning with Uncertainty. CVPR 2018
- Z Gao, Z. Niu. Video imprint. TPAMI 2018
- Z. Liu, Z. Niu. Joint video object discovery and segmentation by coupled dynamic markov networks. TIP 2018
- Y. Zhang, P. Pan, Y. Zheng, K. Zhao, Y. Zhang, X. Ren, R. Jin: Visual Search at Alibaba. KDD 2018
- B. Wang, P. Pan, Q. Xiao, L. Luo, X. Ren, R. Jin, X. Jin: Seamless Color Mapping for 3D Reconstruction with Consumer-Grade Scanning Devices. ECCV Workshops 2018
- C. Leng, Z. Dou, H. Li, S. Zhu, R. Jin. Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM. AAAI 2018
- ECCV 2020 VIPriors Semantic Segmentation Challenge: First Place
- ECCV 2020 Tracking Any Objects Challenge: First Place
- ECCV 2020 Visual Domain Adaption Challenge: First Place
- ECCV 2020 Large Vocabulary Instance Segmentation: Second Place
- LPIRC 2019 Classification: First Place
- ACM MM 2017 Large-scale Video Classification: First Place
- CVPR2019/WebVision Challenge on Visual Understanding by Learning from Web Data 2019: First Place
- ICCV2019/COCO Detection and Segmentation Challenge 2019: First Place
- CVPR2020/DAVIS Challenge on Video Object Segmentation 2020: First Place
- CVPR2019/iNaturalist Fine-grained Image Classification 2019: Second Place
- CVPR2020/BMTT Multiple Object Tracking and Segmentation 2020: Second Place
- CVPR 2020 Activitynet Temporal Action Localization: First Price
- CVPR 2020 HACS Temporal Action Localization: First Place
- The Vision Lab team won the first place in three road-scene segmentation tasks at KITTI in 2018.
- ICCV 2019 Light Weight Face Recognition Challenge: Third Place
- The Vision Lab team won the large-scale video competition (LSVC) championship at the 2017 ACM Multimedia Conference.