Data Science Projects @ Adobe

Collaboration with Adobe Research India – Affect mining and optimization for marketing content, and behavioral profiling of users. When I was at Adobe, my research mostly focused on automated content analysis for analytics and performance reporting in digital marketing scenarios. My background in natural language processing and linguistics makes me very familiar with the kind of features that can be mined from unstructured data. In a project on marketing emails’ performance, I am analyzing databases containing millions of customer responses to emails, with an average of 3 million responses per email sent. The data comprises several response metrics, such as opens, clicks, hovers and purchases; my work involves engineering new features based on the textual and visual properties of emails, and using these together with information about customer profiles, to come up with a predictive performance algorithm.

Past Projects:

  1. Identifying purchase motivations in online shoppers. We aimed to identify which consumers are likely to purchase novelty products based on the psychographic traits elicited from their public social media posts or their responses to a questionnaire. Our pilot study was run on 75 participants and indicated high reliability; the confirmatory study has been conducted on over 500 participants. Our findings can be applied by a marketer to filter his demographic target segments to achieve a higher conversion rate, and send out emails best suited to the information and cognitive requirements of the consumer.
  2. Topic Detection and Tracking in Streaming Social Media. We developed a framework for incrementally clustering streaming social media into stories within a broader trending topic. Our framework compares the similarity between an incoming post and stories, represented as cluster centroids – vectors of all the entities from the posts which are present in the cluster. The framework was implemented in Python and tested on three heterogeneous Twitter datasets. Our algorithm is able to cluster 80% of all tweets into story-based clusters, which are 86% pure. It also enables earlier detection of trending stories than manual reports, and is far more accurate in identifying fine-grained stories within subtopics as compared to baseline systems.
  3. Summarizing Customer Reviews through Aspects and Contexts. This study leveraged the syntactic, semantic and contextual features of online hotel and restaurant reviews from a dataset provided by, to extract information aspects and summarize them into meaningful feature groups. We designed a set of syntactic rules to extract aspects and their descriptors, and developed an algorithm to cluster aspects into closely related feature groups. Our method uses and performs better than two state-of-the-art approaches, and successfully generates thematic aspect groups about food quality, décor and service quality.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

Up ↑

%d bloggers like this: