About the role Your role is to carry out a Post Doc work on: injection of synonymy constraints into semantic representations Global context and problematic Nowadays, severa
About the role
Your role is to carry out a Post Doc work on: injection of synonymy constraints into semantic representations
- Global context and problematic
Nowadays, several word/sentence embedding approaches have been proposed in the literature to measure similarity between words/sentences. However, most of them focus on similarity in terms of context of use (deduced from words coexistence). Nevertheless, this similarity does not capture semantically related words (synonyms, antonyms). Moreover, learning the model on a single corpus of data may not capture relations between semantically similar words if they rarely coexist.
To address this problem, some approaches in the state of the art propose to inject semantic constraints during the learning phase in order to refine the vector representations of words. These semantic/conceptual constraints are usually deduced from a knowledge base.
Nowadays, most of the proposed works is based on knowledge bases that concern the English language which are not compatible with other languages such as French. For the construction of this base, some approaches propose to make a word-by-word translation (English to French, for example, the WONEF dictionary). This significantly degrades the performance of the learned models (when compared with the English version) since the semantic particularities that characterize each language are ignored.
- Scientific objective – results and challenges
The objective of this post-doc is to propose a method that allow to approximate words and sentences (in French language) according to two dimensions: semantics (synonyms, reformulation, etc.) and context of use. This requires the proposition of an approach to build a knowledge base adapted to the French language. This approach is based on NLP and NLU techniques to exploit linguistic resources (e.g dictionaries) of different formats (e.g., pdf). This process requires two main technical issues to be addressed:
1- Automate the construction of a knowledge base from existing resources.
2- Injecting semantic constraints during the learning of the model to consider the closeness in terms of synonyms.
 Liu, Quan, et al. “Learning semantic word embeddings based on ordinal knowledge constraints.” Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing
 Mrkšić, Nikola, et al. “Counter-fitting word vectors to linguistic constraints.”
 Mrkšić, Nikola, et al. “Semantic specialization of distributional word vector spaces using monolingual and cross-lingual constraints.” Transactions of the association for Computational Linguistics
- Skills (scientific and technical) and personal qualities
- Knowledge of Word/Sentence Embedding techniques: Bert, Word2vec, etc
- Knowledge of NLP (Natural Language Processing) and NLU (Natural Language Understanding) techniques
- Knowledge of Deep learning techniques
- Programming & Algorithmics: high level of expertise in one of the NLP & deep learning programming languages (e.g. Python, C++) is required.
- Initiative, Scientific curiosity, Autonomous.
- Education required
PHD in Computer Science, Data Science in the field of semantic analysis
- Desired experiences
Experience in NLP, NLU or Deep learning
The proposed method will improve the performance of existing works (or the future works) within Orange in the context of extracting useful information from textual data in French. This covers:
- The discovery of business processes from unstructured data such as emails
- The discovery of reasons for customer satisfaction/non-satisfaction (expressed in verbatims) towards a product or a service,
- matching CVs and offers problem, etc.
Orange Innovation brings together the research and innovation activities and expertise of the Group’s entities and countries. We work every day to ensure that Orange is recognized as an innovative operator by its customers and we create value for the Group and the Brand in each of our projects. With 740 researchers, thousands of marketers, developers, designers and data analysts, it is the expertise of our 6,000 employees that fuels this ambition every day.
Orange Innovation anticipates technological breakthroughs and supports the Group’s countries and entities in making the best technological choices to meet the needs of our consumer and business customers.
Within Innovation, you will be integrated into a research team at the forefront of innovation and expertise on a wide range of topics around Customer Relationship and Business Process Management. It designs innovative call center and CRM solutions based on new technologies such as HTML5/WebRTC, IoT and blockchain. The team is also interested in the field of enterprise digitalization through its participation in various research projects on big data and artificial intelligence.