The WikiTalkEdit dataset

This paper has had quite an adventure, and it finally has a home. The WikiTalkEdit dataset/paper tests whether the predictors of emotion change in a two-party conversation, are also predictors of behavioral change. The paper tests different language models and provide linguistic insights about the predictors of each kind of change. Inspired by a conversation... Continue Reading →

The CLAff Diplomacy dataset

This was one of most interesting and challenging datasets I explored. The original dataset was collected by a brilliant team of researchers over at CMU. If you want to see how awesome they are, check out their video describing the data: Here's the description provided by Jordan Boyd-Graber: Machine learning techniques to detect deception... Continue Reading →

Cl-SciSumm Shared Task 2018

The 4th CL-SciSumm 2018 Shared Task, sponsored by Microsoft Research Asia. The Shared Task on the relationship mining and scientific summarization of computational linguistics research papers was organized at SIGIR from 2017-2019. Scientific summarization can play an important role in developing methods to index, represent, retrieve, browse and visualize information in large scholarly databases. More... Continue Reading →

The WKWSCI Sentiment Lexicon

The WKWSCI Sentiment Lexicon by Christopher S. G. Khoo, Sathik Basha Johnkhan and Jin-Cheon Na is based on the 6of12dict lexicon, and currently covers adjectives, adverbs and verbs. The words were manually coded with a value on a 7-point sentiment strength scale. The effectiveness of the four sentiment lexicons for sentiment categorization at the document-level and sentence-level was evaluated using an Amazon product review dataset.... Continue Reading →

Create a free website or blog at

Up ↑