Assistant Professor Thien Nguyen has won the Faculty Early Career Development (CAREER) award from the National Science Foundation (NSF). The CAREER award is the NSF's most prestigious award in support of early-career faculty who have the potential to serve as academic role models in research and education and to lead advances in the mission of their department or organization. The five-year grant of $582k will support Dr. Nguyen’s project titled “Multilingual Learning for Event Structures from Text.”
Understanding events and their inherent structures in text is important for natural language processing (NLP) and artificial intelligence technologies. For example, as a core component for knowledge base construction and text summarization systems, event structure recognition can be used to identify events such as natural disasters, disease outbreaks, protests, cyber attacks, or chemical interactions in news, social media, or scientific papers. In addition, causal, temporal, or coreferential relations between the events can be revealed to provide valuable knowledge for event understanding.
Despite extensive research, current methods for event structure understanding are only developed for a handful of popular languages (e.g., English, Chinese, and Arabic), raising important questions on their generalization and applicability to the majority of the world's 7000+ languages. To this end, the overall goal of Dr. Nguyen’s project is to devise learning methods to build event structure models for many more languages in the world. To overcome the prohibitively expensive costs to annotate training data for all possible languages, his project will develop novel multilingual learning approaches, aiming to leverage available training data for high-resource languages to build models for other languages with no or very limited training data. The project will focus on typologically diverse languages that are unexplored, understudied, or low-resource in NLP.
Recent advances in multilingual learning in NLP have introduced large-scale pre-trained language models to induce language-general representations for model development. The project will address three fundamental limitations of existing multilingual learning research for event structures: (i) the lack of multilingual datasets that provide data annotation for multiple languages to sufficiently support generalization evaluation of models across different language families, (ii) the limitations of current multilingual representation learning methods when aligning representations between languages to induce language-general features, and (iii) the scarcity of labeled data in target languages to train multilingual models that hinders cross-lingual performance and generalization for target languages.
This project will contribute directly to the democratization of artificial intelligence to a broader society as it provides multilingual datasets and technologies to build effective event structure systems for multiple languages. Internship programs, new course developments, and outreach activities will also be extended in the project.
About Thien
Thien Nguyen is an assistant professor in the Department of Computer Science at the University of Oregon. He obtained his Ph.D. in Computer Science at New York University and did a postdoc at the University of Montreal. Thien's research areas involve information extraction, language grounding, natural language processing, machine learning, and deep learning where he developed one of the first deep learning models for entity recognition, relation extraction, and event extraction in information extraction. His current research explores multi-domain and multilingual natural language processing that aims to learn transferable representations to perform information extraction tasks over different domains and languages.
Thien has published more than 110 papers in natural language processing and artificial intelligence, which have attracted more than 4100 citations according to Google Scholar. He was recognized for the AI 2000 Most Influential Scholar Honorable Mention (2022) for outstanding and vibrant contributions to natural language processing between 2012 and 2021. His paper on joint event extraction with recurrent neural networks in 2016 was honored as one of the most influential papers in the North American Chapter of the Association for Computational Linguistics (NAACL), one of the top natural language processing conferences in the world.
Thien’s research has been supported by NSF, IARPA, Army Research Office, Adobe Research, and IBM Research.