The Big Data of Africa and the Diaspora: The need for new Corpora

By Ama Bemma Adwetewa-Badu

When thinking about the ways in which we can capture and engage with the “big data” of literary and cultural study, Franco Moretti’s statement that reading in and of itself will not do the job (I would add, by itself), continues to ring true. In attempting to work with a large swath of Afro-diasporic poets and their literary productions, an interesting issue arises: what “unit” of analysis can be used to bring together a large corpus of texts and authors? Will the corpus be linguistically bound? Geographically determined? Based on aesthetics and form? Undoubtedly, any of these “units” would give heed to a method of comparison on very rigorous but still limited (rightfully so) grounds.

The process of developing a corpus that is expansive and inclusive is a necessary task in the preservation of cultural productions as well as the inclusion of them in future literary criticism. Again, the question begs to be asked, what is the “unit” of comparison that enables the development of a corpus? What warrants inclusion in a dataset? This question, more than being rooted in the digital humanities, remains a point of fruitful contention in the field of comparative literature. David Damrosch’s engagement with the Dutch comparatists Joep Leerssen highlights this question regarding “the unit of comparison[.]” Likewise, even here, a point made by Richard Bjornson in a 1981 publication in Research in African Literatures, troubles my thinking about comparison’s relationship to the development of a corpus. As he writes,  the “mere fact” that two things can be compared doesn’t warrant their comparison. Yet, what strikes me as interesting about the work of comparison that can be enabled through digital methodologies is that the development of an expansive and inclusive corpus allows for the creation of new “units” of comparison created on the basis of the collection of a large number of literary or cultural products. Distant reading, Moretti’s term for the analysis of a large grouping of literary or cultural texts, enables capture of varying trends in language, genre, style, topics, and more. In the process of bringing together the literatures of a continent, new ways of measuring relationality might emerge.  As Teresa Duarte Martinho noted in an article “Researching Culture through Big Data,” the methods and approaches afforded to researchers through distant reading and cultural analytics enable an “analyses of extensive corpora based on computing [and] may point to clues and trends which are significant for research into culture, on condition that room is left for contextual knowledge, the ability to situate objects of study historically and sociologically, and discussion of the symbolic meanings of large amounts of artifacts and discourses.” The study of the specificities of Africa, and more broadly the Afro-diaspora, through the development of a corpus might make possible a new way of examining trends, relationships, and concepts that have emerged in the longue durée of African history. 

I believe that we are at a moment where the development of corpora that researchers can turn to is of the utmost importance.

At this very moment, if I need to find a dataset on a number of literary topics, they are often a short Google search away. Yet, for Afro-diasporic literature and culture, those databases are far and few in-between. Yet, even with that being said, a number of scholars are hard at work creating digital humanities projects that engage Africa. I, for one, have been working on The Global Poetics Project for a number of years, working with a team to slowly clean a raw data backlog of almost 20-30,000 poets. James Yeku, professor African digital humanities, has developed a project on Nollywood, Kọ́lá Túbọ̀sún has a fantastic database of Yoruba names, and Ainehi Edoro-Glines’ prolific Brittle Paper is an example of a visual database. Chao Tayiana, more so than building databases, develops archives and visualizations of information that can be included in future databases and, more importantly, present histories that might otherwise be occluded from digital memory. Shola Adenekan’s new book, African Literature in the Digital Age, makes a critical case for thinking about African literature alongside the digital question, pushing African literature towards the digital turn emerging in literary study. These engagements with Africa, the diaspora, and the world are all engaging with the big data of Africa, enabling and creating corpora future scholars can engage with, and thus presenting new “units” of comparison for close-reading and close engagement. The continued development of projects such as these will be essential to our understanding of the continent and diaspora in the digital age. 

Leave a comment