A number of studies have used social media language to (a) profile individuals, (b) profile linguistic styles, (c) profile communities, and (d) extrapolate the results to other domains, individuals, and communities. However, my work shows that pre-trained models may not scale well to other platforms, or even on the same platform to measure aggregated groups of people. The same words could have vastly different frequencies because they are used in different contexts . Models trained on a different domain could also fail in a new domain because of the quantitative differences in vocabulary . I’m interested in exploring these problems in language modeling and proposing solutions to de-bias and adapt models to new test datasets.
Relevant published work:
- Rieman, D., Schwartz, A., Jaidka, K., Ungar, L. (2017). Domain Adaptation from User-level Facebook Models to County-level Twitter Predictions. In Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP).
- Jaidka, K., Guntuku, Sharath C., Ungar L. (2018). Facebook vs. Twitter: Cross-platform differences in self-disclosure and trait prediction. In Proceedings of the 12th International Conference on Web and Social Media (ICWSM 2018). AAAI.