Latest Blog post:
- New paper and algorithmic tools: Demographic inference and corrections for non-representativeness when working with multilingual social media data
Since its inception, the European Union has been divided along national, geographic, cultural, and linguistic borders that hinder the spread of knowledge, shared points of view, and understanding. Research concerned with European integration suggests that the convergence of national public communication spheres towards a more unified coverage of “issues of common concern to all citizens of the Union” and the materialization of a “European public sphere” are of crucial importance for developing a common European identity among Europe’s citizens.
Our project therefore sets out to empirically measure and explain what societal issues are given the highest priorities (i) by media organizations, policy makers, and the general public, (ii) in different nations and languages of the European Union and (iii) how these agendas converge or diverge over time.
To achieve this goal, we employ and further develop agenda setting theory. Although core agenda setting assumptions have been replicated in many countries across the world, studies have in general considered only a single country at a time. Further, the role of online media has been investigated, but has not been researched comprehensively and with large-scale empirical studies. To address these gaps, extend agenda setting theory in this regard and answer our overarching research question regarding converging national issue agendas, we perform our analysis in a comparative manner, gathering large amounts of data from multiple countries and languages, including traditional mass media articles and political speeches in digital form as well as social media (mainly, but not exclusively, Twitter and Wikipedia).
With these data, we simultaneously analyze both within-country/within-language dynamics as well as cross-country/cross-language dynamics. This will provide an unprecedented view into how the media, political, and public agendas in different countries and languages cover current affairs in different ways and reveal the influence dynamics between these agendas. It will offer unique empirical insights into the forces shaping the pathways of information across geolinguistic divides in the European Union, and quantify how issue agendas differ for actors in different geolinguistic contexts and how they evolve over time.
To identify and track societal issues across Europe and languages in this way requires novel computer-aided methods to deal with heterogeneous, large-scale datasets. Towards this goal, we develop state-of-the-art methods including multilingual news story detection, latent attribute inference, and identifying probabilistic information pathways. We further complement traditional methods for measuring public and media agendas and advance new techniques to identify how well agendas measured with social media compare with traditional survey methods. These methods are useful not only to our particular research interest in agenda setting and convergence in the European Union, but also to a variety of scholars working across computational social science. We will release our code and data publicly, further extending the impact of our research.
Our project is very data intensive, but our team is more than up to the challenge with a wealth of experience in handling and analyzing large-scale data about human behavior. The expertise of our team spans the social and computational sciences, and our outputs and impacts will reach across the field of computational social science. We will publish our core findings related to agenda setting in interdisciplinary, open-access journals, and targeted social science journals. We further plan to distribute our methodological contributions through publications in top-tier computer science conferences.