Language modeling on social media: the pros and cons

A number of studies have used social media language to (a) profile individuals, (b) profile linguistic styles, (c) profile communities, and (d) extrapolate the results to other domains, individuals, and communities. However, my work shows that pre-trained models may not scale well to other platforms, or even on the same platform to measure aggregated groups of people. The same words could have vastly different frequencies because they are used in different contexts [1]. Models trained on a different domain could also fail in a new domain because of the quantitative differences in vocabulary [2]. I’m interested in exploring these problems in language modeling and proposing solutions to de-bias and adapt models to new test datasets.

Relevant published work:

  1. Rieman, D., Schwartz, A., Jaidka, K., Ungar, L. (2017). Domain Adaptation from User-level Facebook Models to County-level Twitter Predictions. In Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP).
  2. Jaidka, K., Guntuku, Sharath C., Ungar L. (2018). Facebook vs. Twitter: Cross-platform differences in self-disclosure and trait prediction. In Proceedings of the 12th International Conference on Web and Social Media (ICWSM 2018). AAAI.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: