Sk Mainul Islam

Sekh(Sk) Mainul Islam

I am a Research Engineer at the Center for Cyber Security , New York University, Abu Dhabi since November 2022. Currently, I am working on analyzing and mitigating the effect of concept drift on misinformation detection classifier performance over time. Prior to that, I was an MS by Research student at the Department of Computer Science & Engineering , Indian Institute of Technology, Kharagpur ( IIT Kharagpur ) from January 2019 to September 2022. My master's thesis was on developing a scalable and explainable approach to aspect knowledge information incorporation into the Aspect Based Sentiment Analysis task [ PDF].
I completed my undergraduate studies in Computer Science & Engineering from Maulana Abul Kalam Azad University of Technology , Kolkata in July 2018.

Email / Twitter / LinkedIn / Schedule time with me

Research

My research interests include robustness, generalizability and interpretability of deep neural models, understanding capabilities of LLMs through knowledge probing and understanding mechanisms behind in-context-learning and constrained text generation. I aim to explore LLMs' generalization capabilities on out-of-distribution data, focusing on longer and compositional proof reasoning and understanding the in-context learning mechanism. Despite significant progress in LLMs, challenges persist in multimodal language modelling, knowledge probing, and maintaining trustworthiness, particularly in multimodal data. Additionally, ongoing pursuits include addressing challenges in emotional response and counter-response generation against misinformation, propaganda-based misinformation generation, and adversarial training for early detection scenarios.

CV / Google Scholar / Semantic Scholar / Github

Highlights

September 2024: Joined CopeNLU Group at the University of Copenhagen as a PhD Fellow under Prof. Isabelle Augenstein and Asst. Prof. Pepa Atanasova.
November 2022: Joined New York University Abu Dhabi as a Research Engineer.
Paper on Aspect Level Sentiment Classification accepted at TheWebConf (formerly WWW) 2022.
Invited to be a reviewer of * SEM 2022 .
Paper on Fair and Controlled Text Generation accepted at CtrlGen Workshop at NeurIPS 2021.
Started contributing at GEM v2.
Selected as one of the mentees at ACL Year Round Mentorship program, class of 2022 ACL Mentorship.
Official micro-blogger at ACL 2021 .
Selected as one of the mentees at Fatima Fellowship, class of 2021 Fatima Fellowship.
Invited to be a reviewer of EMNLP 2021 .
Served as a reviewer of NAACL 2021 .
Served as a reviewer of EACL 2021 .
Invited to be a reviewer of * SEM 2021 .

Publications

AR-BERT: Aspect-relation enhanced Aspect-level Sentiment Classification with Multi-modal Explanations
Sk Mainul Islam , Sourangshu Bhattacharya
The Web Conference (WWW), 2022

Aspect level sentiment classification (ALSC) is a difficult problem with state-of-the-art models showing less than 80% macro-F1 score on benchmark datasets. Existing models do not incorporate infor- mation on aspect-aspect relations in knowledge graphs (KGs), e.g. DBpedia. Two main challenges stem from inaccurate disambigua- tion of aspects to KG entities, and the inability to learn aspect representations from the large KGs in joint training with ALSC models. We propose AR-BERT, a novel two-level global-local entity embedding scheme that allows efficient joint training of KG-based aspect embeddings and ALSC models. A novel incorrect disam- biguation detection technique addresses the problem of inaccuracy in aspect disambiguation. We also introduce the problem of deter- mining mode significance in multi-modal explanation generation, and propose a two step solution. The proposed methods show a consistent improvement of 2.5 − 4.1 percentage points, over the recent BERT-based baselines on benchmark datasets.

Fair Data Generation using Language Models with Hard Constraints
Sk Mainul Islam , Abhinav Nagpal, Balaji Ganesan, Pranay Kumar Lohia
CtrlGen Workshop at NeurIPS (NeurIPS-W), 2021

Natural language text generation has seen significant improvements with the advent of pre-trained language models. Using such language models to predict personal data entities, in place of redacted spans in text, could help generate synthetic datasets. In order to address privacy and ethical concerns with such datasets, we need to ensure that the masked entity predictions are also fair and controlled by application specific constraints. We introduce new ways to inject hard constraints and knowledge into the language models that address such concerns and also improve performance on this task.

Updated on 2024-02-09