The CL-Aff HappyDB dataset: in Pursuit of Happiness

It has long been known that human affect is context-driven, and that labeled datasets should account for these factors in generating predictive models of affect. This motivates our Shared Task, which is organized in collaboration with researchers at Megagon Labs and is built upon the HappyDB dataset, comprising human accounts of `happy moments'. I'll add... Continue Reading →

Guide to choosing research methods & writing dissertations

I didn't have much help and jumped into the deep end when I started my PhD. Since then, I've had to unlearn a lot of practices, and pick up strategies from experienced seniors and colleagues. I'm going to use this post to collect some helpful links for anyone who needs a fresh start, or simply... Continue Reading →

Guide to designing online surveys

Note that most research agencies may provide a nationally representative, probability-based, online panel for the US alone. Online panels in other countries are almost entirely opt-in (nonprobability) panels and aren’t designed for following the same respondents longitudinally.  Furthermore, it is recommended to design a longitudinal study which has a fresh sample in each wave, or at... Continue Reading →

Language modeling on social media: the pros and cons

A number of studies have used social media language to (a) profile individuals, (b) profile linguistic styles, (c) profile communities, and (d) extrapolate the results to other domains, individuals, and communities. However, my work shows that pre-trained models may not scale well to other platforms, or even on the same platform to measure aggregated groups of... Continue Reading →

Online Information behavior and Self-disclosure

The focus of my work is on understanding the role of platform affordances in facilitating information disclosure, information seeking, and information-sharing. These affordances have implications for self-disclosure and user trait prediction, for example, on Facebook vs Twitter. At the community level too, the affordances of different platforms, such as Twitter vs. Google Search, imply that... Continue Reading →

Cl-SciSumm Shared Task 2018

The 4th CL-SciSumm 2018 Shared Task, sponsored by Microsoft Research Asia. The Shared Task on the relationship mining and scientific summarization of computational linguistics research papers was organized at SIGIR from 2017-2019. Scientific summarization can play an important role in developing methods to index, represent, retrieve, browse and visualize information in large scholarly databases. More... Continue Reading →

Data Science Projects @ Adobe

Collaboration with Adobe Research India - Affect mining and optimization for marketing content, and behavioral profiling of users. When I was at Adobe, my research mostly focused on automated content analysis for analytics and performance reporting in digital marketing scenarios. My background in natural language processing and linguistics makes me very familiar with the kind of... Continue Reading →

Blog at

Up ↑