研究方向
  • 视觉理解&互动视觉

研发基于视觉(图像/视频)的分类、目标检测跟踪、分割、特征表示学习、关键点提取、人体姿态估计、手势识别、图像描述生成、大规模分布式训练引擎等基础技术,解决电商,通用视觉应用场景下,商品、人体的理解与互动等问题。

  • 视频理解与挖掘

研发视频标签、视频搜索、视频目标检测、视频生成等基础技术,解决在海量视频中进行高效稳定的视频审核、搜索和编辑等问题。

  • 三维视觉

研发三维建模、三维感知、三维理解和三维交互等基础技术,解决端上建模、测量问题和提升AR/VR体验。

  • 文字识别

研发图像视频中的文字检测、文字识别与结构理解等核心技术,解决在扫描、实拍、多语言、混贴等复杂场景下的文字识别与信息抽取。

  • 图文理解

研发图文互搜、图文共搜、价格预估等跨媒体内容理解的核心技术,解决跨媒体内容理解与分析等问题。

  • 线下智能

研究端上和边缘侧的各种视觉处理和结构化方案,包括目标检测,目标分割,多目标跟踪,目标识别(包含行人/机动车重识别,人脸识别等),目标属性提取,行为动作分析等算法,面向遥感影像和X光影像的数据处理,变化检测,地物分类等,以及面向低功耗高效率的深度网络的优化方法,例如模型压缩,推理加速,网络结构搜索等。

  • 底层视觉

研发low-level vision涉及的各种视觉技术,包括图像/视频的修复、增强、去噪等,为后面的视觉分析和理解进行预处理。此外,还研发图像的编辑、生成等技术,为用户更好的体验、互动进行服务。


产品及应用
  • 拍立淘和图像搜索云产品

    研发了业界领先的图像搜索与识别技术,并应用于多种场景。每天有超过1700万人通过淘宝和天猫使用拍立淘的以图搜图功能。基于阿里云平台,研发了图像搜索云产品,为具有海量图像搜索需求的客户(如电商、相册、图库类网站)提供完整的以图搜图解决方案。目前已经有若干海外和国内用户,比如澳洲和新西兰领先的时尚和运动零售商THE ICONIC。

    了解更多
  • 三维视觉端云产品

    通过三维视觉和计算机图形学技术,为行业提供数字化和智能化的解决方案,和生态伙伴共建云+端的技术产品。目前在鞋履产业,通过高效精确的三维扫描和搜索匹配算法,实现精准鞋款推荐、精准营销和精准制造。在房产市场,提供低成本,使用方便、自动化效率高,纹理真实的三维室内、室外场景重建和全景导览功能。在电商平台上,通过AR/VR技术提供给消费者即试即买的沉浸式购物体验,提升销售效率和成交率。

    了解更多
  • 虚拟人

    通过整合研发图形图像语音技术,目前拥有2D仿真人,3D虚拟人技术产品,支持淘宝直播虚拟主播,虚拟讲师等业务场景。技术覆盖虚拟人生成,驱动,交互领域。在高精度人脸人体重建,卡通捏脸(photo2avatar),真人复刻(video2avatar),文本语音驱动(speech2action),虚拟人交互对话等方向有业界领先的技术积累。赋能娱乐交互,智能教育,新零售,AR/VR/XR等行业。

  • 多媒体AI解决方案

    通过多媒体音视频数据中的结构化、人脸识别、音视频指纹、内容生成、智能审核,多模态搜索等媒体AI技术,为数字媒体行业提供版权保护、媒体编目、媒体编辑、媒体审核、多模态搜索等功能,有效的提升数字媒体行业能效并节省成本。目前,已与央视、人民日报、新华社等国内知名数字媒体企业建立合作。

  • Analytical Insight of Earth (AI EARTH)

    通过综合运用计算机视觉分析技术,实现多源对地观测数据的智能解译,提取地表覆盖现状和动态变化信息,改变传统数据处理效率低、精度差等弊端,为自然资源监管、水利河道保护、生态环境监测、农业估产和应急防灾减灾等多个领域提供高效解决方案。


学术成果
论文和学术报告
  • L. Cheng, X. Zhou, L. Zhao, D. Li, H. Shang, Y. Zheng, P. Pan, Y. Xu:Weakly Supervised Learning with Side Information for Noisy Labeled Images. ECCV 2020.
  • L. Song, P. Pan, K. Zhao, H. Yang, Y. Chen, Y. Zhang, Y. Xu, R. Jin: Large-Scale Training System for 100-Million Classification at Alibaba. KDD 2020.
  • X. Zhou, P. Pan, Y. Zheng, Y. Xu, R. Jin: Large scale long-tailed product recognition system at Alibaba. CIKM 2020.
  • J. Dong, Z. Cao, T. Zhang, J. Ye, S. Wang, F. Feng, L. Zhao, X. Liu, L. Song, L. Peng, Y. Guo, X. Jiang, L. Tang, Y. Du, Y. Zhang, P. Pan, Y. Xie: EFLOPS: Algorithm and System Co-Design for a High Performance Distributed Training Platform. HPCA 2020.
  • Q. Qian, L. Chen, H. Li, R Jin. DR Loss: Improving Object Detection by Distributional Ranking. CVPR 2020.
  • L. Han, P. Wang, Z. Yin, F. Wang, H. Li. Exploiting Better Feature Aggregation for Video Object Detection. ACMMM 2020.
  • Q. Qian, J. Hu, H. Li. Hierarchically Robust Representation Learning. CVPR 2020.
  • C. Luo, Y. Zhu, L. Jin, Y. Wang. Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition. CVPR 2020.
  • Y. Huang,M. He, Y. Wang, L. Jin. RD-GAN: Chinese Character Font Transfer via Radical Decomposition and Rendering. ECCV 2020.
  • L. Li, F. Gao, J. Bu, Y. Wang, Z. Yu, Q. Zheng. An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension. ECCV 2020.
  • M. Zhou, Z Niu. Adversarial Ranking Attack and Defense. ECCV 2020.
  • W. Wang, X. Liu, X. Ji, Enze X., D. Liang, Z. Yang, T. Lu, C. Shen, P. Luo. AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting. ECCV 2020.
  • H. Wang, P. Lu, H. Zhang, M. Yang, X. Bai, Y. Xu, M. He, Y. Wang, W. Liu. All You Need is Boundary: Toward Arbitrary-Shaped Text Spotting. AAAI 2020.
  • C. Liu, Y. Liu, L. Jin, S. Zhang, C. Luo, Y. Wang. EraseNet: End-to-End Text Removal in the Wild. TIP 2020.
  • Y. Zhang, P. Pan, Y. Zheng, K. Zhao, J. Wu, Y. Xu, R. Jin: Virtual ID Discovery from E-commerce Media at Alibaba: Exploiting Richness of User Click Behavior for Visual Search Relevance. CIKM 2019.
  • K. Zhao, P. Pan, Y. Zheng, Y. Zhang, C. Wang, Y. Zhang, Y. Xu, R. Jin: Large-Scale Visual Search with Binary Distributed Graph at Alibaba. CIKM 2019.
  • Q. Qian,L. Shang,B. Sun, J. Hu,H. Li,R. Jin. SoftTriple Loss: Deep Metric Learning without Triplet Sampling. ICCV 2019.
  • Z. Tan, X. Nie, Q. Qian, N. Li, H. Li. Learning to rank proposals for object detection. ICCV 2019.
  • Q. Qian,S. Zhu, J. Tang, B. Sun,H. Li,R. Jin. Robust Optimization over Multiple Domains. AAAI 2019.
  • Y. Xu, Y. Wang, W. Zhou, Y. Wang, Z. Yang,X. Bai. TextField: Learning A Deep Direction Field for Irregular Scene Text Detection. TIP 2019.
  • M. Zhou, Z Niu. Ladder Loss for Coherent Visual-Semantic Embedding. AAAI 2019.
  • Z. Gao, Z. Niu. Video imprint segmentation for temporal action detection in untrimmed videos. AAAI 2019.
  • Z. Liu, Z. Niu. Weakly supervised temporal action localization through contrast based evaluation networks. ICCV 2019.
  • M. Lin, S. Qiu, J. Ye, X. Song, Q. Qian, L. Sun, S. Zhu, R. Jin. Which Factorization Machine Modeling is Better: A Theoretical Answer with Optimal Guarantee. AAAI 2019.
  • M. Lin, X. Song, Q. Qian, H. Li, L. Sun, S. Zhu, R. Jin. Robust Gaussian Process Regression for Real-Time High Precision GPS Signal Enhancement. KDD 2019.
  • Q. Qian, J. Tang, H. Li, S. Zhu, R. Jin. Large-scale Distance Metric Learning with Uncertainty. CVPR 2018.
  • Z Gao, Z. Niu. Video imprint. TPAMI 2018.
  • Z. Liu, Z. Niu. Joint video object discovery and segmentation by coupled dynamic markov networks. TIP 2018.
  • Y. Zhang, P. Pan, Y. Zheng, K. Zhao, Y. Zhang, X. Ren, R. Jin: Visual Search at Alibaba. KDD 2018.
  • B. Wang, P. Pan, Q. Xiao, L. Luo, X. Ren, R. Jin, X. Jin: Seamless Color Mapping for 3D Reconstruction with Consumer-Grade Scanning Devices. ECCV Workshops 2018.
  • C. Leng, Z. Dou, H. Li, S. Zhu, R. Jin. Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM. AAAI 2018.
  • Bin Wang, Pan Pan, Qinjie Xiao, Likang Luo, Xiaofeng Ren, Rong Jin, and Xiaogang Jin. Seamless Color Mapping for 3D Reconstruction with Consumer-Grade Scanning Devices. In: Proceedings of the 4th International Workshop on Recovering 6D Object Pose Organized at ECCV 2018, Munich, Germany, 2018.
  • Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Yingya Zhang, Xiaofeng Ren and Rong Jin. Visual Search at Alibaba. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'18), London, UK, 2018.
  • Jie Song, Chengchao Shen, Yezhou Yang, Yang Liu, and Mingli Song. Transductive Unbiased Embedding for Zero-shot Learning. In: Proceedings of the 31th IEEE Conference on Computer Vision and Pattern Recognition (CVPR'18), Salt Lake City, UT, 2018.
  • Peisong Wang, Qinghao Hu, Yifan Zhang, Chunjie Zhang, Yang Liu and Jian Cheng. Two-step Quantization for Low-bit Neural Networks. In: Proceedings of the 31th IEEE Conference on Computer Vision and Pattern Recognition (CVPR'18), Salt Lake City, UT, 2018.
  • Lechao Cheng, Zicheng Liao, Xiaowei Zhao and Yang Liu. Exploiting Non-Local Action Relationships for Dense Video Captioning. In: Proceedings of the 29th British Machine Vision Conference (BMVC, 18), Newcastle, British, 2018.
  • Zhiqi Cheng, Xiao Wu, Yang Liu and Xiansheng Hua. Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images. In: Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR'17), Honolulu, Hawaii, 2017.
  • Chen Chen, Xiaowei Zhao and Yang Liu. Multi-modal Aggregation for Video Classification. In: Proceedings of the 25th ACM Multimedia Workshop 2017 (ACM MM' 17), Mountain View, CA, 2017.
展开更多
竞赛
  • 2020年,获ECCV VIPriors Semantic Segmentation challenge 第一名
  • 2020年,获ECCV Tracking Any Objects Challenge 第一名
  • 2020年,获ECCV Visual Domain Adaption Challenge 第一名
  • 2020年,获ECCV lvis竞赛第二名
  • 2019年,获LPIRC分类任务 第一名
  • 2019年,获CVPR/WebVision超大规模分类挑战赛: 基于网络数据学习的视觉理解第一名
  • 2019年,获ICCV/COCO 检测分割挑战赛 第一名
  • 2020年,获CVPR/DAVIS 视频目标分割挑战赛 第一名
  • 2020年,获CVPR/iNaturalist: FGVC细粒度分类挑战赛 第二名
  • 2020年,获CVPR/BMTT MOT挑战赛:多目标跟踪和分割 第二名
  • 2020年,获CVPR Activitynet: Temporal Action Localization 第一名
  • 2020年,获CVPR HACS Temporal Action Localization 第一名
  • 2019年,获ICCV Light Weight Face Recognition Challange 第三名
  • 2018 KITTI囊括三项道路场景分割任务第一。
  • 2017ACM多媒体大会,大规模视频分类比赛(LSVC)冠军。

扫描二维码
关注阿里技术微信公众号