Projects - Schools in Computational Social Science and Data Science

During our summer schools, participants and experts work together through the entire research process in projects on a specific research topic with methods of computational social science such as

Innovative data analysis
Internet-based experiment and we would be thrilled if we could add to this the proposed special issue
Data related agent-based modeling and simulation
Social network analysis
Natural language processing
Social media analysis

Current list of projects:

Cultural swarms of social cohesion (Cultural swarms)
Computational institutional analysis (Institutional analysis)
Organizational citizenship behavior: Dynamical reciprocity in giving and receiving help (Dynamical reciprocity)
Predicting the trust radius with machine learning: To what extent are those “Most People” out-groups? (Trust radius)
Using Twitter to measure social cohesion and attitudes towards immigration (Twitter attitudes)
Political Echo Chambers (Twitter echo chambers)
Simulating an empirically-calibrated, society-wide network and its intergroup cohesiveness (Society-wide networks)
Residential segregation and political polarization (Segregation polarization)
Down the rabbit hole: Polarization and information segregation on YouTube (YouTube rabbit holes)
From survey data to coherence agents (Coherence agents)

1. Cultural swarms of social cohesion

Project leader: Gert Jan Hofstede & Yara Khaluf

Short title: Cultural swarms

Research question:

How to define and model group belonging (othering and gathering) in the context of sociality using cultural swarm notions?

Sub-questions:

- How to exploit models of cultural swarms to understand and analyse, both, symmetry-breaking and consensus on best-of-N decisions?
- How to integrate culture notions in swarm dynamics?
- How status-power theory is translated in swarm systems?

Method: Two domains are much studied when modelling group belonging. One is social identity: the group(s) one wishes to belong to, or is classified into by others. The other is discrete choice in opinion / behaviour, for instance in voting, surveys, or choosing among alternative options. Others are possible, but we wish to focus on these two. The causality could go both ways. For joining, the logic is: When an individual adopts the opinion that the majority in their network displays, it joins the group – or alternatively, when it joins the group for a status-power reason of which it is not aware, it adopts its opinion as a sign of belonging to the group. Conversely, for splitting, the logic is: when the individual rejects the opinion of the majority and goes for another alternative, it splits from the group; alternatively, if it leaves the group for status-power reasons, it will change its opinion.
In our project, we consider these joining (gathering), splitting (othering) processes in the context of discrete decision-making processes, where individuals have a set of options to choose from (e.g., a list of restaurants, see figure 1). The decision of an individual is derived from (i) the influence of their social network – we refer to this as the social feedback, and (ii) other factors that may include intrinsic motivation, environmental or cultural factors that make the individual decide against the opinion of their social network – we refer to this as noise. Findings from works by Yara Khaluf show how such social feedback can be balanced with individual noise to achieve a high cohesion at the societal level. The population will consist of computational agents with sociality driven by Gert Jan’s GRASP. GRASP will add ‘Kemperian’ reference groups that may not be physically present, but play a role in the decision.

Data:

Depending on the cases, we may use the following data sources:

World Values Survey, http://www.worldvaluessurvey.org/wvs.jsp. This is an ongoing research collaboration between countries all around the world, now in its 7th wave of data collection. It can be mined for longitudinal trends or cross-national comparisons.
Hofstede 6D model, www.geerthofstede.com (> Research and VSM > Dimension Data Matrix). This is the coherent database of Hofstede dimension scores, shown to have continued validity for understanding cross-national differences in all domains of life, from suicide rates, school performance, economic success, to political and governance systems.
World Bank open data, https://data.worldbank.org/, with e.g. GINI coefficients and world development indexes.

Description :

In our proposal, humans live in cultural swarms of social cohesion. As follows:

Cultural. We act according to “unwritten rules of the social game” that our offspring learn when little, and that gives persistence to social life across the generations. These are our value systems shared across societies.
Swarms. Swarms are distinguished by the absence of a central point of control. This is true for humans as well, even in hierarchical societies. Our social world is much more bottom-up than most of us realise. Culture shapes and varies rituals and institutions. Authoritarian leaders cannot exist in the absence of obedient followers.
Social. We are intensely social. Our sociality consists of manifold groups that give meaning to our lives. Searching for the “self”, we meet the reference groups that shaped us. We live to be seen by others and to be meaningful to them. The othering/gathering question is: who are those others, and why do we draw and redraw the line?
Cohesion. We survive as communities, morally bound to one another, and this necessarily involves creating boundaries. Othering is a precondition for gathering. Gathering is a precondition for othering.

One of the puzzles of the social sciences is how value systems can be so persistent across decades and centuries in the face of so much disturbance and change. The notion of cultural swarms of social cohesion allows us to unpack processes of cultural change, or the lack of it, and model their dynamics. Why does society take a certain turn, and can we influence this?

Additional Material: CulturalSwarms

⇑

2. Computational institutional analysis

Project leader: Seth Frey & Mako Hill

Short title: Institutional analysis

Research question:

What types of rule systems do small self-governing communities develop?
What institutional features characterize successful self-governance?
How do they change over time?

Method:
Web scraping and web science
Data science and statistical learning
Computational text analysis
Large-n comparative analysis at the unit of analysis of the social system

Data:
The rules from all subreddits on Reddit (~70K, about 10%, have posted rules), or some other system of online communities, coded by rule type. Measures from the same system of rule changes, community effectiveness, and relevant covariates.

Description:

Institutional structure and governance style are major drivers of social outcomes. And at a small scale they are a route to decentralization and personal empowerment. But quantitative frameworks for generally representing governance structure remain a frontier. We will be introduced to the Institutional Analysis and Design framework of Nobel Laureate Elinor Ostrom, and its various taxonomies of rule types in small-scale governance systems. In the process we introduce the problem of peer production, an arena of collective action that embodies many of the exciting opportunities of the online realm for social change and is uniquely suited for CSS approaches. We will learn to extract data from the Internet’s rich source of largely independent small-scale governance systems. This will permit us to perform large-n studies not over individuals, but over governance systems, and more rigorously determine the extent of variation in communities’ governance inputs and “societal” outcomes, as well as correlations between these inputs and outputs.

The project leaders, Seth Frey and Mako Hill, are interdisciplinary social scientists who use large datasets and computational methods to isolate the decision processes behind complex social phenomena such as collective active and peer production. Seth specializes in studying the self-organization of governance and cultural systems at a whole-system scale by using “designed societies” like sports matches, theme parks, multiplayer video games, open-source software projects, and online communities. Mako also studies collective action in online communities, seeking to understand how the design of communication and information technologies shape fundamental social outcomes like whether to contribute to a public good. By working at the intersection of many methods and disciplines, they have introduced approaches that provide new insights into governance processes in small-scale online communities and the science of social system design generally.

⇑

3. Organizational citizenship behavior: Dynamical reciprocity in giving and receiving help

Project leader: Corinne Coen

Short title: Dynamical reciprocity

Research question:

What patterns emerge from the interaction of workers giving and receiving and withholding help?
What frequencies and ratios of productive and counterproductive behavior identify contributing individuals?
What structures (teams, departments) support the highest levels of positive reciprocity?

Method: We will review multiple models of reciprocity in the research on cooperation to deepen our understanding of theoretical and empirical options. As we develop our repertoire of design choices, we will map a variety of approaches to understanding emergent outcomes for individuals, groups, and organizations. Together, we will design and build agent-based models.

Data: Substantial references from literature.

Description: Within organizations, people work on tasks in the technical core and offer social and psychological support that catalyzes task activities and processes (Borman and Motowidlo, 1997). These non-task behaviors—called organizational citizenship behavior (OCB)—and the antecedents and contexts for motivating them, have been well researched. Yet, despite the volume of studies, research has focused almost exclusively on individuals. The most studied antecedents for offering help include the internal factors of satisfaction, commitment, disposition, and perceived fairness. Moreover, certain external factors have been found to correlate with helping including individuals’ perceptions of organizational commitment, organizational justice, role stressors, work engagement, role overload, and interpersonal conflict. While less studied, research indicates that the acceptance of help by potential beneficiaries revolves around concerns about image, obligation to reciprocate, self-reliance, lack of trust in coworkers, or reservations about coworker competence. Missing from these studies is a focus on the dynamics between helpers and their potential beneficiaries, among helpers, and the cascade of processes they release in organizations.
Reciprocity, particularly the informal exchange of goods and favors among people for mutual benefit maintains social cohesion. The idea of reciprocity comprising the process of social exchange pops up in discussions of OCB, yet remains underdeveloped. The time is ripe for extension of findings about focal individuals into models of people interacting: administering, receiving, or rejecting help. The proximal context of individuals making decisions to give or take help comprises specific other individuals engaged in similar decisions and experiences. Drawing the key insights about individuals—helpers, damagers, beneficiaries—from this substantial literature, we will build models of reciprocity and its emerging consequences.

⇑

4. Predicting the trust radius with machine learning: To what extent are those “Most People” out-groups?

Project leader: Wahideh Achbari

Short title: Trust radius

Research question:

To what extent can Generalized Trust be predicted by prejudice?
Can we cross-validate the findings with data that have less information on implicit prejudice and intergroup attitudes?

Method:
Supervised Machine Learning Ensembles

Data:
Project Implicit | World Values Survey | European Social Survey | LISS Panel | …

Description:
Generalized (social) trust (hereafter GT) or trust in “most people” is a conspicuous indicator in social survey research on intergroup relations and social cohesion. The survey question is as follows: “would you say that most people can be trusted, or would you be careful in dealing with them?” Recent meta-analyses show that researchers often assume GT taps into an evaluation of the trustworthiness of unknown people or even ethnic out-groups. While this prior research has debated the negative link between ethnic diversity and GT as a proxy for social cohesion, with some notable exceptions, relatively few have so far focused on systematic response bias in answers to GT. These mixed findings may well be due to the operationalization of social cohesion by GT. For example, a study using think-aloud protocols have demonstrated that the majority of respondents high in GT think “most people” refers to people they know, whereas a high proportion of those who are low in GT think about strangers. Following the homophily principle – the tendency of individuals to associate and bond with similar others – we can expect that people known to the respondents are ethnic in-groups, and strangers are more likely to be ethnic out-groups, although formally we do not know this. A faceless stranger one hypothetically meets for the first time could well be an ethnic in-group in homogenous settings as much as a person known to the respondent (friend, neighbor, family member) can be an ethnic out-group in diverse settings. In this project, we, therefore, return to some of the overlooked basics in this literature: the measurement and conceptualization of GT, which is allegedly in decline by ethnic diversity. The proposed project aims a) to examine the validity of the Generalized Trust (GT) question as a measure of out-group attitudes and implicit race bias in an unprecedentedly broad manner using Supervised Machine Learning Ensembles, and b) to cross-validate the results with 3 existing large-scale social surveys (WVS, ESS, LISS).

By employing Machine Learning, we propose to quantify complexity instead of relying on ex-ante model sparsity and favoring a set of variables of interests over others. Other advantages of ML over conventional statistical analyses are its flexibility to select variables when many potential predictors are available; its ability to model nonlinear relationships; and that there are fewer limits to the number of datasets, observed cases, interactions between variables, and hence modeling strategies. Finally, the results are less tainted by researcher degrees of freedom in preferring a scale or measure over another. Our goal with prediction, however, remains in line with what social science generally attempts to do: to get good out-of-sample predictions and to avoid overfitting. Therefore, we propose to train, validate, and test the model again (holdout) with differently sized random subsamples of the data. In addition, we then cross-validate the results using social surveys that contain a limited set of measures of intergroup attitudes.

⇑

5. Using Twitter to measure social cohesion and attitudes towards immigration

Project leader: Francisco Rowe (& Eduardo Graells-Garrido remotely)

Short title: Twitter attitudes

Research question:

What shapes public views of immigration?
How do they differ across socio-demographic characteristics (gender, age, education) and geographical location?
How can we measure attitudes towards immigration and social cohesion using social media data? How do local attitudes towards immigration relate to local levels of social cohesion?

Method:
We will leverage Twitter data to capture social cohesion and attitudes towards immigration using machine learning, spatial modeling, text mining, and information visualization techniques.

Data:
We will run a workshop on crawling and processing data from Twitter, and we will make available four years of political discussion in a Latin-American country (2015-2019, Chile). Also, we will make available intra-city mobility data for Santiago, the capital of Chile.

Description:
Traditional measures of social cohesion (segregation) are `static’. They implicitly assume people do not come in contact with people living in different neighborhoods. Yet people within cities often move for work, shopping and leisure activities, creating opportunities for interaction with individuals from other neighborhoods. We propose two indices to measure social cohesion based on human mobility patterns. We will seek to develop (1) an index -labeled equitable mobility- based on the degree to which the share of visits to other neighborhoods is equal; and, (2) an index, labeled concentrated mobility, based on the extent to which travels are concentrated in a handful of neighborhoods.

Then, we will use Twitter data to derive indices to characterize its users’ attitudes towards immigration. First, we seek to apply topic models, to identify the underlying semantic structure of tweets by quantifying the importance of representative themes. Second, we seek to characterize the overall profile of attitudes towards immigration using two commonly used metrics of sentiment analysis: tendency and polarity. Third, we characterize individual groups of tweets belonging to specific profiles according to their tendency and polarity. Fourth, we seek to understand the diffusion of information of immigration perception through the Twitter network. We will analyze the structure of two networks: (1) the mention network; and, (2) the retweet network estimating the assortativity coefficient of numerical attributes between pairs of linked nodes. Finally, we aim to model the relationship between our measure of social cohesion and immigration sentiment in a Bayesian multi-level modeling framework to determine whether local levels of immigration are positively related to local levels of social cohesion.
⇑

6. Political Echo Chambers

Project leader: Hilke Brockmann & Wiebke Drews

Short title: Twitter echo chambers

Research question: When do Elite Politicians End Up in Echo Chambers?

Sub-questions:

Do Social media platforms lure elite politicians into echo chambers?
Do we find a robust difference between male and female politicians?
Do we find age, political orientation and wealth differences?

Method: Natural language processing and network analysis, machine learning models

Data: Twitter account data from European members of parliament

Description: Societies around the globe suffer from growing social inequalities which undermine social cohesion. The rise of populist politics is one clear indicator. The public demand for diversity another. This project focusses on political leaders of rich democracies – still a blind spot in the social cohesion literature – and interrogates if a) elite politicians form a class for themselves, detached from the people, trapped in a conversational, ego-booster bubble and surrounded by a network of claqueurs. If b) female political leaders who generally benefited from diversity quota are different and c) if we find further dividing lines between age groups, political orientations, and personal wealth.
In order to answer these questions, we use Twitter data from European MPs, contrast their discourse with that of ordinary people, check for the homo- und heterogeneity of their discourse networks, compare male vs female leaders’ discourses and networks, and finally test if age, political orientation (left/center/nationalist) as well as personal wealth reveal further dividing lines.
The findings will shed new light into the hidden influence of social media technology on democratic political discourse, particularly among political leaders. The project will not only search for echo chambers but also specifies under what conditions they appear. The specific focus on female politicians may further provide insights if diversity is changing a self-referential political discourse or not.

⇑

7. Simulating an empirically-calibrated, society-wide network and its intergroup cohesiveness

Project leader: Miranda Lubbers & Michał Bojanowski

Short title: Society-wide networks

Research question:

How cohesive is the society-wide network of acquaintanceship relations in Spain across categorical boundaries of social class and nationality?
What structural features influence macro-level cohesion?
Do the simulated networks differ structurally from those in the literature based on behavioral traces

Method: Statistical analysis, simulation, network analysis

Data: Survey data based on a new network instrument, collected in 2021 with a nationally representative sample of the population residing in Spain (N=1,500) and population statistics for Spain. Modeling assumptions can be further guided by theories and scientific evidence of acquaintanceship networks.

Description:

Social cohesion is fundamentally relational. Sociologists have long argued that complex societies are held together by broad, diffuse networks of interpersonal relationships. Such networks contain intimate or “strong” ties, but also hundreds of more superficial or “weak” ties that, as Peter Blau wrote, “extend beyond intimate circles (…) and establish the intergroup connections on which macrosocial integration rests” (1974: 623).
It is thus surprising that social cohesion research almost entirely ignores social relations and uses individual indicators instead (e.g., social trust, civic participation). This knowledge gap is undoubtedly due to the technical complexity of measuring society-wide networks. We can easily imagine how an extensive, invisible social network connects the members of an entire society, keeping them together as if it were “social glue”. But how can we observe this network or estimate its properties? How can we measure the fractures this network presents along categorical boundaries of, for instance, social class or nationality?
Miranda has recently developed a novel survey instrument to collect data about individual acquaintanceship networks that allow us to estimate their size, the extent to which they expose individuals to other social groups, and their structure (as well as relationship-level data, e.g., tie strength and cognition). This instrument was implemented in a survey administered to a random sample of Spain’s population in 2021 (N=1,500).
In this project, we will work with these new data and use them to simulate a scaled-down society-wide network of personal relationships within and across categorical boundaries of social class and nationality. We will construct a population representing the adult national population of Spain (on a smaller scale) and assign attributes to the nodes (nationality, social class) in the proportions and with the degree of intersectionality present in the population. We then specify a graph generating model to generate a network structure consistent with survey evidence, i.e., based on the degree distributions, the network heterogeneity, and structural parameters found in the survey. With this model, the network structure can be simulated. By running this simulation many times, we can explore the variation in macro-level structures consistent with the parameters to evaluate them in more detail: Are some structures more conducive to macro-level cohesion, e.g., based on how hubs are distributed across the network? Can we suggest any improvements to the survey instrument for future research to narrow down the variation of macro-level patterns consistent with the evidence?

We can compare the resulting structures with previous models of society-wide networks (e.g., theoretical models or models based on behavioral traces). The simulated network can also serve to calibrate an ABM. The question and model will be further developed with the group, guided by the participants’ interests and backgrounds.

⇑

8. Residential segregation and political polarization

Project leader: Jochem Tolsma

Short title: Segregation polarization

Research question:

How to define, identify and visualize different micro- and macro-segregation patterns?
To what extent are different patterns of residential segregation related to polarized voting behavior?
To what extent are different patterns of residential segregation related to individual voting behavior?

Method: spatial (regression) analysis, network analysis

Data: Participants are encouraged to bring their own data to the summer school. Ideally, these data should contain information on the socio-economic background (e.g., age, ethnicity, income), exact location of the residential address (e.g., a pair of coordinates) and political attitudes and behavior of all
residents of a specific geographic unit (e.g., a country). Naturally, such data are hard to come by. For the Netherlands, we will make use of the following data sources:

Fine-grained spatial register data of Statistics Netherlands
Voting outcomes per ballot station (for some Dutch cities at the political candidate level)
Netherlands Lifecourse Study wave 3 (NELLS 2022)

Description: The literature on how the ethnic composition of our living environments impacts social cohesion between and within ethnic groups is vast. With larger ethnic outgroup sizes, ethnic competition over scarce economic resources would become more intense and this would subsequently lead to ethnic hostility. Increased diversity of our living environment would reduce the predictability of the behaviour and opinions of our fellow neighbourhood residents and hence would fuel feelings of anomie and ultimately cause the deterioration of generalized trust. Segregation would not only hamper inter-ethnic contact but would, at the same time, also increase the visibility of outgroups, and thereby feelings of cultural threat. But notwithstanding the long-lasting scholarly attention to this research area, up till now, it has not been possible to distil a ‘social law’ from these studies on how the ethnic composition of our surroundings impacts social cohesion.
One of the challenges the field faces is that the ethnic composition of our living environments can be described in a myriad of ways. Is it group sizes that matter, levels of diversity, segregation or a combination thereof? And at which geographic scale, which ethnic (and other) categories to use, and which measurement instrument would best fit our theoretical needs? Concurrently, there are many indicators of social cohesion. While the literature has recently moved away from purely individual-level indicators of social cohesion (e.g., undertaking voluntary work) and has recognized that social cohesion is by definition a relational concept (e.g., trust of a specific ego in a specific alter/target), studies on the consequences for macro-level indicators of cohesion have been relatively scarce.
To move the field forward, this project will take a (theoretically informed) data-driven approach. First, we will discuss how to define, identify and visualize different patterns of segregation. We will pay special attention to issues of scale – and by extension to spatial measures of segregation –, acknowledging that both patterns of segregation and consequences of segregation may be scale dependent. We will use techniques of spatial analysis and social network analysis and take full advantage of the available fine-grained spatial data of Statistics Netherlands. Second, we will investigate how these (many different) identified micro- and macro-segregation patterns are related to a true macro-level indicator of social cohesion, namely polarized voting behavior. Our data source for the latter will be election results at the polling station level. The final step in this project will be to assess whether the ‘discovered’ patterns of segregation that are related to macro-level polarized voting behavior also predict political attitudes and voting behavior at the individual-level. The aim is to use the third wave of the NEtherlands Lifecouse Study wave 3, to be collected early summer in 2022 among approximately 5000 native Dutch and 5000 Moroccan-Dutch and Turkish-Dutch citizens. To improve our theoretical and methodological understanding of neighbourhood and spatial effects, we will compare results from traditional multi-level models with spatial regression models.

⇑

9. Down the rabbit hole: Polarization and information segregation on YouTube

Project leader: Marijn Keijzer

Short title: YouTube rabbit holes

Research question:

Are there ideological rabbit holes on YouTube?
What is the structure of YouTube´s recommendation network?
Does YouTube´s recommendation algorithm support ideological segregation and opinion polarization?

Method: Social Network Analysis

Data: Data is crawled on YouTube

Description: The recommendation algorithm on YouTube is hotly debated, as it may lure users of the platform into informational ‘rabbit holes’ where beliefs become increasingly extreme and misinformation thrives. Whether such rabbit holes exist, however, is an open question. It is notoriously hard to quantify what an informational rabbit hole is and difficult to separate algorithmic bias from user and creator behavior. One approach to capturing rabbit holes of political extremism is by comparing the network structure of networks of videos on different topics. If rabbit holes exist, we might expect that videos on fiercely debated, deeply polarized issues should have much more pronounced clustered recommendation structure between them. In this project we analyze the network structures of video-recommendation networks around different topics, and explain those differences through characteristics of the topics.

⇑

10. From survey data to coherence agents

Project leader: Bruce Edmonds

Short title: Coherence agents

Research question:

How might we infer belief coherence cognitive models from survey data on beliefs?
How to integrate such cognitive models within simulations of interacting agents?
What impact do these methods have upon the social coherence between agents in the simulations?

Method: To use statistical and clustering algorithms applied to surveys of data to infer which beliefs/dimensions seem to be more cognitively compatible with other beliefs/dimensions. Then to use the curated data and/or analyses to implement a cognitive algorithm for agents that will determine the ease with which new beliefs are accepted or existing beliefs dropped. Finally to explore the consequences of this agent cognition within simulations upon social cohesion.

Data: Detailed surveys of attitudes on issues such as Austrian National Election Study (AUTNES), or American National Election Studies (ANES)

Description: Coherentist models of cognition (such as that of Paul Thagard) present an alternative picture to that of logical inference – these basically say that whilst suggestions for new beliefs come from a variety of sources (internal inference, observation, social suggestion etc.) whether they are adopted depends on its overall coherence with the existing network of beliefs held. Similarly beliefs are more likely to be dropped if they have become incoherent with existing beliefs (cognitive dissonance), In each case any belief change tends to increase the overall belief coherence. Whilst there are abstract models that take this approach, this project aims to take a more data-driven approach, namely to infer (a) if there are different clusters/types of agents indicated by the data (b) for each type, infer the coherence between beliefs/issues/dimensions (c) an algorithm that would drive the belief acceptance/dropping for such an agent. Then the idea is to put such agents into a simulation with a social network where: (1) some beliefs arise randomly in the population, (2) agents interact over the social network suggesting beliefs to each other, (3) agents accept beliefs or drop existing beliefs based upon the coherence algorithm for their type, (4) to change their links to other agents – making some new links (either randomly, friend-of-a-friend or other basis) and dropping links with agents whose beliefs seem incompatible with their own (homophily) using some algorithm. Finally the results of such simulations will be analysed for the emergent social cohesion/polarisation/exclusion that occurs between the agents.

Key References: AJS paper analysing ANES data for belief correlation https://www.journals.uchicago.edu/doi/pdfplus/10.1086/691274.
Example abstract simulation that uses belief coherence agents to look at social coherence https://link.springer.com/article/10.1007/s11135-019-00891-9

⇑