An interdisciplinary team of computer scientists, physicists, and political scientists from the Warsaw University of Technology and the University of Warsaw has published a paper in the Journal of Informetrics analysing how publication authors' affiliation with major technology and IT corporations (commonly known as Big Tech) impacts the transfer of ideas in science.
The researchers relied on two comprehensive databases of scientific publications - S2ORC and OpenAlex - to extract 160,000 AI-related papers for which full information on both citations and author affiliations was available. This data was used to build a network with publications as nodes, and links as a consequence of citations between papers.
The impact of affiliation
So far, according to the standard procedure an entire publication has been labelled as Big Tech if at least one author had such an affiliation. To get a more accurate picture, the paper considered the introduction of a continuous parameter ranging from 0 to 1, determining the contribution of Big Tech affiliations to a publication. The conducted statistical analyses show that a tripartite classification could be used without loss of generality, which, apart from the purely academic and purely Big Tech categories, also includes a mixed category (a publication must have at least one academic and at least one Big Tech affiliation).
Although there are about 50 times more purely academic papers in the analysed network than others, it is the ones in the mixed category that have the best statistics, both in terms of the number of citations and relevance on the web, as measured by the Page Rank algorithm used to rank pages in Google search. Interestingly, when publications without citations are omitted, the difference between the Big Tech and mixed categories disappears.
Memetic analysis is essential to understanding the development of science
However, as demonstrated by the paper's authors, the picture changes when, instead of considering simple network measures, we focus on the propagation of specific scientific concepts, i.e. memes, which can be either single words, abbreviations, or entire phrases (e.g. ‘malware’, ‘cnn’, ‘facial expression recognition’). For this purpose, the so-called sticking factor (or transmission factor) was used, understood as the probability of a meme being transmitted from one article to another. In this approach, when the transmission factors of all memes are analysed, the only significant difference is evident for the Big Tech-mixed category pair, exactly the opposite of the network measures. However, if we restrict ourselves to specific memes only, occurring in at least two categories, then the Big Tech affiliation itself turns out to be the most ‘contagious’.
The findings suggest that using the academia-Big Tech divide is an oversimplification and that papers characterised by a mixed affiliation of authors may play a significant role. Thus, the paper contributes to the ongoing heated discussion on the role and risks of Big Tech partnerships, implying that, from the point of view of the development of the field, it may be mutually beneficial to maintain collaboration between the widely understood academia and Big Tech.
***
The findings are the result of the project ‘DARLING - deep analysis of artificial intelligence regulation using language models, network analysis and institutional grammar’, headed by Julian Sienkiewicz from our faculty (CyberiADa-3 competition as part of the Cybersecurity and Data Science Priority Research Area). The team of authors also included students from the Faculty of MIM UW (Stanisław Giziński, Paulina Kaczyńska, Emilia Wiśnios) and the Faculty of MINI PW (Hubert Ruczyński), as well as Przemysław Biecek from the Faculty of MINI PW and Bartosz Pieliński from the Faculty of Political Science and International Studies UW.
The text was prepared by: Julian Sienkiewicz, Przemysław Biecek