Alibaba Innovative Research (AIR) > Machine Learning (algorithm)
Cross-lingual Knowledge Base Construction and Application in Global E-Commerce Scene


Machine Learning (algorithm)


Cross-lingual Knowledge Base Construction and Application in Global E-Commerce Scene


AliExpress is a global e-commerce app/website managed by Alibaba, which sells goods to over 200 countries and cover 18 kinds of languages. Complex usage and meaning drift of languages impels us to construct a multi-lingual e-commerce domain knowledge base, which currently used in product searching and shopping guide. We have two kinds of research topic below:

  • Knowledge Base Construction

Different from traditional entity/relation extraction, E-commerce search queries and item titles are noisy, short and disordered, document or sentence based model doesn’t work well in our area. Fortunately, we have abundant interaction corpus between queries and items collected from user behavior, so we are now trying to extract relations and properties of products from query session or query-item pair. As we know, we are the first team who conducts multi-lingual knowledge extraction process from interaction information.

  • Knowledge Base Application

At the same time we are building knowledge base, we are also applying it to understand our queries and items, so we can represent queries and titles as a product-centered graph, integrate symbolic rules and product embedding into one model, transfer query-item matching problem as a graph matching procedure, improve searching experience.

For researcher, we have rich domain corpus come from E-commerce searching logs, and we offer real-world experiment environment. We hope our cooperation can contribute well-defined problems and innovatively algorithms .


  • A well-defined E-Commerce knowledge extraction framework and a innovatively cross-lingual mining algorithm.
  • A Graph-based semantic matching model, using multi-lingual e-commerce knowledge base
  • Submit 1-2 papers of IE/IR domain to top class academic meeting(CCFA)
  • Release part of datasets of multi-lingual knowledge base 

Related Research Topics

  • Weakly supervised/Semi-supervised Entity/Relation Extraction
  • Product Knowledge Graph (Amazon)
  • Deep semantic matching model(DSSM based)
  • Deep Graph Matching
  • Knowledge Augmented language model


Suggested Collaboration Method

AIR (Alibaba Innovative Research), one-year collaboration project. 



Scan QR code
关注Ali TechnologyWechat Account