Defining Cultural Analytics

Lev Manovich

June 10-14, 2016.

Context: During March-June 2016 Culture Analytics Institute at UCLA brought together 115 speakers, 40 long term (3 months) researchers in residence and dozens of visitors. Drawing on the work that was already going on for 10 years, the institute articulated new goals for computational study of culture. The following text is my contribution to our discussions of Culture Analytics. I am grateful to all participants for many stimulating conversations and very interesting presentations. (In particular, Tim Tangherlini’s talks made me see the importance of cultural anthropology’s perspectives on digital cultures.) In the following, I am using my original (2005) term “cultural analytics” rather than “culture analytics” to emphasize that the presented views and opinions are mine, and I am not trying to represent a group consensus.


Cultural Analytics refers to the use mathematical, computational, and data visualization methods for the study of cultures. While these methods can be applied to the analysis of historical cultural artifacts and records, they are particularly appropriate for contemporary global digital culture because of its massive scale. Digital cultural expressions are born as digital files (.txt, .pdf, HTML, JavaScript, .css, .jpg, .mov, .aiff, etc.) The traces of online behavior are also recorded in digital form. This allows researchers to start analyzing these expressions and traces right away using algorithms, bypassing the stage of digitization required for non digital artifacts.

Cultural Analytics on a large scale became only possible in middle of 2000s. Its development was motivated by the phenomenal growth of the web and social media with hundreds of millions of people, institutions and companies sharing cultural content online. Because of this scale it is no longer possible to use “qualitative” methods to comprehensively analyze, or even explore this content. At the same time, continued increase in the capacities of PCs (processing speed, memory and storage) and progress in computational methods for media analysis made possible quantitative analysis of “big cultural data” on a single computer. With billions of people creating and/or and interacting with trillions digital content items daily, the scale of human culture approaches that of natural physical or biological systems. (For example, 4.5 billion Likes and 300 million images are added on Facebook daily as of 4/2016; 4.75 billion pieces of content added to FB daily as of 5/2013; 500 million photos are shared by WhatsApp users daily as of 4/2014.) This suggests that mathematical and computational techniques that have already been successfully used to study universe, material world, nature and living organisms can be also used to study cultures. The examples of such methods are descriptive and inferential statistics, data mining, machine learning, simulation, network science, web science, and complex dynamic systems.

Introduction of mathematics and computation does not change the aim of human sciences Geisteswissenschaft) – understanding and interpretation of human expressions (Wilhelm Dilthey). On the contrary, it helps us to better see unique and particular that is characteristic of culture – as opposed to reducing human expressions to general laws as we do in science. Why? When we quantitatively analyze massive numbers of human cultural expressions, the unique ones become visible. We start understanding patterns of similarities and differences between all these expressions, and their variability. Rather than seeing numerous cultural expressions as stereotypical – the instances of the small number of prototypes – we reveal unique DNAs in each expression, even if some of them maybe %99 the same.

However, we can’t simply subject “cultural data” to computational analysis or data visualization and expect interesting insights to come out. In order to produce useful results as opposed to keep reinventing old research questions and answers, Cultural Analytics research should be grounded in paradigms and theories developed for studies of cultures in the last 150 years. The relevant disciplines include cultural anthropology, sociology of culture, academic humanities (history and theory of literature, art, visual culture, music, cinema, media, games, architecture and design), cultural studies, media studies, communication, and software studies.

This does not imply that Cultural Analytics is only interested in testing, validating or rejecting existing hypotheses developed previously in these disciplines. The development of new concepts and theories necessary for quantitative analysis of culture in general, and the global digital culture in particular is important part of Cultural Analytics. Similarly, as opposed to only using existing mathematical, computational and data visualization methods, Cultural Analytics stimulates development of new methods appropriate for the analysis of its particular subjects.

Modern approaches to study of culture can be broadly grouped into two categories. The first focuses on norms, conventions, values, communications and behaviors. Cultural anthropology best exemplifies this paradigm. The second focuses on cultural artifacts and messages. Academic humanities, German critical theory, and media theory belong to this category. Communication and cultural studies are perhaps in between – here researchers analyze cultural messages, the ways they influence people, the ways people may appropriate them to construct their meanings, and activities of producers.

Cultural Analytics combines both views of culture: as behaviors, conventions, and values, and as artifacts and messages. The study of both types of data at scale equally requires computational methods and new technologies. To study online behaviors and messages, we need to collect and quantitatively analyze links between people in social networks, their messages, comments, re-shares, and other activity. The study of offline behaviors also benefits from new technologies that capture records and traces of these behaviors at scale: satellite photography, GPS, sensors, mobile apps, etc. However, the use of such technologies for cultural research also bring with it the questions of privacy and access to commercial data. And necessary anonymization of data prior to analyze makes interpretation of data more difficult. (For example, we can much better understand any Instagram photo if we visit and explore the Instagram account and full gallery of its creator.) This is just one of the reasons why the use of smaller scale methods such as observations, surveys, “thick reading” and “close reading” is required to make sense of the patterns computer analysis can identify in automatically captured cultural and behavioral data.

If we place Cultural Analytics within history of modern studies of culture, it represents a possible new “third stage.” In the end of 19th century, cultural anthropologists Edward Tylor and Franz Boas articulated the modern concept of culture as symbols, expressive forms, and believes present in all human societies and groups. This leads to the comparative studies of then still existing small scale traditional societies around the world (practically they were often located in colonies of western countries). This perspective also emphasized that all cultures are distinct and that evolutionary perspective according to which some cultures are superior to others are incorrect. In particular, Boas used this idea to fight discrimination in America against immigrants, blacks and ingenious people.

In the 20th century, new disciplines and paradigms emerge to study contemporary mass cultures in western countries, and later worldwide. The prominent examples are German critical theory (1920s-), Communication (1950s-), British Cultural Studies (1960s-), American Cultural Studies (1980s-), film and TV studies (1970s-), and game studies (1990s-). Scholars in these fields study societies in which they live, with particular focus on media industries and mass communication. The processes of globalization after end of Cold War (1991) and development of the global computer networks such as World Wide Web (1990) created a new global digital culture at the turn of the 21st century. Cultural analytics is an approach to study of culture that corresponds to this new stage. If people today use software tools to create digital artifacts (and sometime write their own code), Cultural Analytics similarly uses software to analyze large samples of such artifacts. If technology companies responsible for cultural digital platforms and interfaces (Amazon, Netflix, Spotify, YouTube, Facebook, Google, etc.) use data science to analyze, display, and recommend content, Cultural Analytics also relies on data science to study patterns in cultural artifacts and people’ engagement with these artifacts.

When 20th century cultural disciplines studied products of culture industries such as pop music, films and TV programs, they were not directly dependent on these industries. Their objects of study were often singular programs and products available via radio, records, television, movie theaters, bookstores and newsstands – the same products consumed by individuals. To collect and analyze multiple programs, the academics used the same consumer technology as normal viewers and listeners. For instance, in the 1980s-1990s each scholar in TV studies would use a VCR to record her of his own collection of programs. When I was teaching “new media art” in the 1990s and first part of 2009s, I relied on my own collection of about 100 videotapes and dozens of CD-ROMs accumulated over the years. (This was my most valuable possession at the time.)

However, Cultural Analytics depends on access to large numbers of digital artifacts and data about their creators and users and their engagements with each other and these artifacts. Most large digital platforms provide this data via APIs. The APIs became common in the end of 2000s because companies wanted to stimulate development of third party apps for interacting with content shared on their platforms. APIs also allow embedding content shared on one platform on another platform, and automatic exchange of user info between platforms. Although their original purpose was automatic use in third party apps and other social media platforms, APIs also allowed downloading large volumes of shared content to a person’s computer for study. Since these APIs were open to everybody to use, the researchers in Cultural Analytics, Computational Social Science, and Digital Sociology took advantages of this. If these free APIs did not exist, these fields would neither develop at all, or develop at a much slower rate.This means that if companies decide to significantly limit access to their APIs (as Instagram did on June 1, 2016), or eliminate them completely, this will have very significant effect on these fields. Of course, other methods of getting large size online data methods still remain – such as scraping content of web pages. (This method was used by emerging Network Science in the second part of the 1990s). In fact, when I conceived of Cultural Analytics in 2005, this is also how I imagined getting “big cultural data” from the web.

Cultural Analytics has parallels with two other disciplines that emerge in the same time and similarly take advantage of availability of either online data or digitized historical artifacts. The first is already mentioned Computational Social Science and the second is Digital Humanities. Computational Social Sciences uses large scale data from social networks to address questions formulated in sociology, economics and other social sciences. Here the domain of study is contemporary society as well “digital societies,” i.e., behaviors of people on particular social media platforms. Digital Humanities works on quantitative analysis of collections of digitized standard objects of academic humanities – cultural artifacts created by professional authors, such as novel writers. This, if the former exclusively studies contemporary online behavior of millions of ordinary, the latter studies artifacts created in the past by small numbers of professional creators that typically are already included in humanities canons.

Cultural Analytics does not have such limitations. It is equally interested in the present and in history, in cultural behaviors and in cultural artifacts, and in activities of both casual and professional creators. Its attention to contemporary digital culture created by millions of non-professionals separates it from Digital Humanities. Another key difference is that Cultural Analytics borrows its concept of culture from anthropology rather than humanities tradition. Culture is seen as many elements and practices (expressive forms, symbols, habits, etc.) common to all human societies in all historical periods, as opposed to the “best of humanity” created by elites only. Although in recent decades humanities have tried to move beyond this narrow view formulated by Matthew Arnold in 1869, in practice this often means only replacing some texts and authors in the canon by others, while still preserving small size of the canon.)

The focus on the study of symbolic and expressive human forms separates Cultural Analytics from Computational Social Science. Visual art, music, dance, architecture, myth, language, decorations, symbols and use of rhythm emerged much earlier in human evolution than economic markets, bureaucratic organizations, prisons, hospitals, professional careers and other phenomena studied by modern sociologists. In fact, only when human dwellings reach a particular size and an individual becomes fully separated (or “alienated”) from others because of the weakening of religion and other bonds (I am talking about the end of 19th century), “society” as the separate object of study becomes possible. Therefore, if we are to only use the “big data” of global networked digital cultures to study the social rather than the cultural, we will deny ourselves to make our existence more meaningful. We will deny understanding what makes us human today as opposed to yesterday – specific forms and patterns of human culture as mediated and shaped by digital platforms, networks and interfaces.

Ground Zero

(excerpt from a working document by James Abello, Lev Manovich, Jianbo Gao, Katy Börner, and Tina Eliassi-Rad)

The Arrowhead Problems in Culture Analytics

  1. Metrics for the study of culture(s)
  2. Identifying, defining and measuring cultural complexities
  3. Is culture automatically the result of the evolution of groups (see Hilbert Problem #5)
  4. What are the fundamental mechanisms for cultural network formation?
  5. Find algorithms to detect the culturally meaningful topical structure of heterogeneous cultural data (see Hilbert Problem #10)
  6. Find algorithms to detect culturally meaningful phase transitions in heterogeneous cultural data (see Hilbert Problem #10)
  7. Identify invariance of offline and online culture(s) to understand their co-evolution
  8. Measure the impact of culture on health (e.g., there are different narratives/reasons why groups of people do not vaccinate their kids), social conflict, inequality, and the environment (social, env, public goods)
  9. Identify the densities and velocities of changing areas in culture(s) both online and offline
  10. Scaling algorithms to all heterogeneous cultural data
  11. Can one develop a calculus of culture?
  12. Are there axioms of culture and can one develop a mathematical treatment of these? (see Hilbert Problem #6)
  13. How do we measure the cultural impact of globalization?

 

Grand Challenges

Theoretical Challenges

    • Properties of cultural systems (what correlations, statistical models, laws exist?)
    • Phase transitions (e.g., perception of tattoos, viral spread)
    • How to measure, model, and promote cultural diversity

Empirical Challenge

    • Hypothesis testing of cultural assumption (e.g., acceleration of culture=perception of time is compressing as time progresses, need listing), validate humanistic approaches for understanding culture
    • Validate cultural paradoxes
    • AB testing for XXX
    • Creation of an open source software package that is accessible, module, adaptable and that allows for reproducibility.

 

Engineering/CS Challenges

    • Scalability (to trillions of records/PB of data)
    • Usability
    • Reproducibility
    • Long tail?

Practical/Societal Challenges

  • Improve health: Use social media data to predict and prevent episodes of depression
  • Reduce substance abuse: Contextualize people’s experiences and behaviors in overall contexts
  • Promote global peace by analyzing media reported events and identifying bifurcation points
  • Promote stability by decreasing (education, economic, and health) inequalities
  • Reduce education inequality: Teach diverse cultures
  • Privacy
  • Ethics
  • Access