Research on Key Techonologies for Large-scale Pretraining Models
In the recent years, big model has improved the translation quality significantly. However, for some specific domains the big general model often works not good enough. Fine-tuning the model with domain data is computing, storage and human-effort consuming, especially for variant domains in commercial applications. Furthermore, customers usually have their own collected small data, from hundreds to thousands sentences pairs. These data can represent well the customers’ need, but not big enough for fine tune the big model.
This research is looking for the better ways to apply the small data from customers. It could include the following work:
1. What is the best way to use small data, from hundreds, thousands and more data?
Plug-in the small model trained or build with small data into the big general model. Or in another words, applying the small data only in inference time. The big model doesn’t need trained again.
1. The data of hundreds, thousands and more sentences can improve the translation quality from 0.5-2% BLEU, and hurt the other domain not more than 0.5% BLEU.
2. Comparing to fine-tuning, the gap of the Collaboration of large and small model in inference is less than 1% BLEU score.
3. the Collaboration of large and small model works better than KNN-MT on translation quality.
The outcome of the project includes
- 1-2papers in CCF-A conferences or journals.
- The code and documents of the work above.