Alibaba Innovative Research (AIR) > Research on Key Techonologies for Large-scale Pretraining Models
【CCF-AIR青年基金】Few-Shot Learning Based on Large-scale Pre-trained Multimodal Models

Research Themes

Research on Key Techonologies for Large-scale Pretraining Models


The information in the digital world is presented in a mixed form of multimodal such as text, speech, images and videos. Multimodal translation technology will help alleviate language barriers in the globalization. Current machine translation methods based on deep neural networks often require a large amount of labeled training data which are very expensive to collect. These methods encounter various challenges in the low resource scenarios. However, most of practical applications are in the low resource scenarios.


Large-scale pre-training models that have developed rapidly in recent years, including large-scale pre-training models for single- or multi-modal for text, speech and image. They have achieved remarkable results on many low-resource tasks. Therefore, we propose to study few-shot learning based on multi-modal pre-trained models, and aim to achieve high-quality multi-modal translation in low-resource scenarios.


  • Propose an advanced and effective few-shot learning algorithm that can use pre-trained models to perform multi-modal machine translation, and publish 2~3 CCF-A or CCF-B papers, submit 2~3 patent applications.
  • A Prototype System of "Few-Shot Learning for Multimodal Pre-trained Large Models".
  • Experiment reports on in-house real business data, the BLEU scores on few-shot system should not be lower than 90% of the system with labeled training data.

Related Research Topics

  • Few-shot learning for machine translation
  • Transfer learning in low resource setting
  • Multi-domal pre-training model
  • Prompt learning for machine translation
  • Cross-Modal alignment for speech translation
  • Image transaltion based on large pre-trained model

Scan QR code
关注Ali TechnologyWechat Account