Alibaba Innovative Research (AIR) > Research on Key Techonologies for Large-scale Pretraining Models
【CCF-AIR青年基金】Knowledge Probing and Leveraging for Large-scale Pre-trained Models

Research Themes

Research on Key Techonologies for Large-scale Pretraining Models

Background

In recent years, large-scale pre-training models have opened up a new phase in artificial intelligence, such as BERT and GPT, which have set up milestones in the field of natural language processing. Large-scale pre-training models have gradually evolved from 'purely data-driven' to 'data & knowledge-driven', where knowledge learning and exploitation are key steps towards next-generation AI [1]. Recent works have begun to mine rich knowledge inherent in large-scale pre-trained models. [2] found that some linguistic knowledge, i.e., syntax, could be detected from the models; [3] detected a lot of world knowledge and common sense in the models. More interestingly, [4] took the linguistic knowledge extracted from the models further and continued to exploit it in downstream models, yielding better performance than the linguistic knowledge annotated by human experts.

 

In real-world applications, it is a critical challenge to transfer the knowledge learned from these pre-trained models to downstream models, especially for the tasks that strongly rely on factual knowledge and common sense, such as CommonsenseQA, KBQA, TableFact, etc. The popular approach to leverage pre-trained model is the fine-tuning paradigm: using a single dense vector representation as the interface between the pre-trained model and the downstream model, this paradigm is simple and easy to use, but there is a risk of knowledge degradation due to the common carrier of text representation and knowledge representation through a single vector. Tackling the insufficiencies of the fine-tuning paradigm, a prompt-based approach has recently begun to be emphasized by modifying the construction (templates) of the inputs, but it still suffers from the following challenges.

(1) Templates often rely on manual design and have a strong influence on the ultimate performance, leading to weak model stability.

(2) It is difficult to design templates for complex tasks, such as semantic parsing tasks. Therefore, in both the finetune and prompt paradigms, there has been a gap in knowledge leveraging between the pre-trained model and the downstream task model.

 

To explore a new paradigm for fine-tuning pre-trained models, we expect to probe empirical knowledge in pre-trained models, in order to decouple the language representation from the knowledge representation and to better benefit downstream tasks.

 

One example we would like to share here is table semantic parsing, i.e., translating natural language questions to corresponding SQL queries. The essential problem in this task is to link natural language questions to table schemas, which is typically called schema linking. Conventionally, schema linking is implemented by ad-hoc rules. Such rules suffer from the problem of poor expressiveness and generalization. We found that schema linking knowledge extracted from large-scale pre-trained models is more reliable and robust. Dramatic improvement has been observed on downstream table semantic parsing tasks when such knowledge is explicitly employed.

 

近些年来,大规模预训练模型开启了人工智能领域的新篇章,比如 BERTGPT 在自然语言处理领域中取得了里程碑式的进步。目前大规模预训练模型已经逐渐由「纯数据驱动」发展成为「数据知识双驱动」的模式,知识的学习与利用是迈向下一代人工智能之路的关键步骤 [1]。最近的一些工作开始证明大规模预训练模型中蕴含着丰富的知识,比如 [2] 等人发现可以从模型中探测出一些语言学知识,如句法等,[3] 等人发现模型中存在很多世界性知识及常识。更有趣的是,[4] 将从模型中探测出的语言学知识进一步的在下游模型中继续利用,比人类专家归纳的语言学知识有更好的表现。在真实场景的应用中,如何将这些预训练模型学到的知识向下游模型进行迁移,特别是在一些强依赖事实知识和常识的场景中,如常识问答、KBQATableFact 等任务,是一个关键问题。主流的预训练模型利用方式是微调(finetune)范式:利用单一的稠密向量表示作为大模型与下游模型之间的接口,这种范式简单好用,但由于通过一个向量作为文本表示及知识表示的共同载体,存在着知识损失的风险。针对微调范式的不足,近期基于 prompt (提示)的方式开始被重视,该方法通过改变输入的构造来缓解预训练模型和下游模型之间的鸿沟,但仍然面临以下挑战:

1)模板常常依赖人工设计,且对最终结果影响很大,导致模型稳定性差;

2)在复杂任务上较难设计模板,比如 semantic parsing 任务中;所以,无论是 finetune 还是 prompt 范式,预训练模型和下游任务模型之间都存在知识利用的GAP

 

有没有更好的预训练模型利用机制呢?我们从语言表示与知识表示解耦的角度进行了一些尝试,比如在 semantic parsing 任务中,一个重要的过程是寻求自然语言和表格知识之间的链接关系,该过程目前只能依赖规则匹配完成。但当需要世界知识和常识才能理解的样例时,比如「北京」和「首都」,规则的方式就无能为力了,此时只能依赖额外的人工配置。我们发现即使下游模型使用预训练模型作为底座,通过finetune范式仍然无法解决这种问题。但当利用知识探针技术,发现可以直接从预训练模型中探测出「北京」和「首都」两个词之间存在很大的响应,这种离散的二元组关系就是预训练模型从大规模数据中归纳的、蕴含在模型的参数中的知识。我们将这种知识抽取出来后独立作为下游模型的输入,发现对下游任务的结果有大幅的的提升(10%+)。结合已有的研究以及我们的初步探索,面向大规模预训练模型的知识探测与利用是一个很有前景的方向,一方面,可以通过显式的知识探测和知识利用来提升依赖复杂知识的下游任务,比如常识问答、KBQATableFactVQA、摘要等,另一方面,显式探测的知识可以更好的指导大模型向小模型的知识迁移。作为研究方向,还存在以下技术挑战:1)知识探测:需要设计更加准确高效的探测技术,一方面,需要探测技术切合模型结构、预训练的任务,保证探测出知识的准确性,另一方面,需要保证探测过程的效率,满足上线的要求。目前流行的探测技术都需要 n²的时间复杂度,成本很高,存在很大的优化空间;

2)知识流动:对于探测出的知识,需要考虑更好的方式传递到下游模型中,包括知识形态的定义(离散还是连续),知识利用的方式(是否需要额外的 adapter)等方面,有较多可深入挖掘的研究点;

3)知识融合:对于已经存在人类知识的任务,从模型探测出的知识需要与已有知识进行融合,人类知识和模型知识存在着一致性,同时还存在着互补性,也可能存在一定的冲突,如何平衡两种知识的关系是一个尚未探索的方向;

 

Reference

[1] 张钹, 朱军, 苏航,迈向第三代人工智能,中国科学 2020.

[2] John Hewitt et al. A Structural Probe for Finding Syntax in Word Representations. NAACL 2019.

[3] Petroni et al. Language Models as Knowledge Bases? EMNLP 2019.

[4] Junqi Dai et al. 2021. Does Syntax Matter? A Strong Baseline for Aspect-based Sentiment Analysis with RoBERTa. EMNLP 2021.

Target

  • On tasks where knowledge/commonsense is desperately needed but such knowledge is not explicitly provided, such as CommonsenseQA 2.0, using large-scale pre-trained models for knowledge probing and leveraging is expected rank the first on the leaderboard with a 2% improvement over the current SOTA.
  • On tasks where knowledge/commonsense is desperately needed and such knowledge is explicitly provided, such as TabFact, KBQA, etc, fusing human and model knowledge is expected to get SOTA results on 10+ benchmarks.


  • 在强依赖知识/常识但不存在外部知识的任务上,比如CommonsenseQA 2.0,利用大规模预训练模型知识探测与利用的方式拿到榜单第一名,相对目前SOTA 提升 2%;
  • 在强依赖知识/常识同时存在外部知识的任务上,如TabFact,KBQA 等,利用人类知识和模型知识融合的方式拿到 10+ 国际公开数据集的 SOTA 结果; 

Related Research Topics

  • Knowledge probing of large-scale pre-trained models: how to efficiently and accurately probe empirical knowledge from large-scale pre-trained models.
  • knowledge leveraging of large-scale pre-trained models: how to leverage the knowledge in large-scale pre-trained models to complete the transfer of knowledge from large models to downstream models.
  • human and model knowledge fusion: how the knowledge learned by models can collaborate and complement the knowledge of human experts.

 

  • 大规模预训练模型的知识探测研究:如何高效准确的从大规模预训练模型中探测出经验主义知识;
  • 大规模预训练模型的知识利用研究:如何利用大规模预训练模型蕴含的知识,完成大模型向小模型的知识流动;
  • 人类知识和模型知识融合的研究:模型学到的知识如何与人类专家的知识协同与互补;

Scan QR code
关注Ali TechnologyWechat Account