Results
In this chapter, the analyses conducted on both corpora will be presented and discussed, in light of the historical and political context already explored in chapter 3.
Analysing the YouTube corpus
As was already discussed in the previous chapter, the present corpus contains relatively low-quality data. The unstructured, colloquial communicative style of YouTube videos, together with wavering sound quality and intelligibility, limit greatly the range of viable analytical tools. Nevertheless, one must never underestimate “how much data science you can do with just counts and a little basic arithmetic” (Wickham, Çetinkaya-Rundel, and Grolemund 2023). As will be visible in the next few paragraphs, there is more than enough meaning to extract from this dataset.
Salience
The first question to be answered is: how much do people talk about environmental policy in Umbria? Is it salient at all? And then again: when was it more salient? To answer these questions, simply counting the amount of documents in which terms connected to environmental policy occur is the simplest, but probably most effective strategy.
The amount of videos retrieved per city varies drastically from a query category to another. As one would expect, numerous videos related to heavy industry can be found about the city of Terni. Their amount peaks in 2015, when the protests climaxed, then drop immediately the next year only to peak again in 2022, when the Arvedi group bought the steel factory from Thyssen-Krupp (Terni in Rete 2022). Shifting from foreign to Italian property was a very conflictual topic.
Similarly, transportation is a very hot topic in Perugia. The videos in this category tend to be very positive short documentaries about good practices in sustainable transportation, especially around 2023. A considerable peak can be detected in 2021, year in which the Bus Rapid Transfer (BRT)1 service was established, requiring considerable investment and urban rearrangement. Sustainable mobility became more salient in Terni recently, due to new cycling infrastructure being under study. A peak in 2017 was caused by the news of a considerable investment by the municipality in bus transportation.
Waste management tends to be generally more relevant in Terni, home of three incinerators. The discourse around them became particularly heated in 2020, when one of them was finally shut down (Tuttoggi 2020).
Finally, the salience of environment- and nature-related posts in Terni was apparently influenced by the relatively crowded Fridays for Future protests in 2019. The movement’s popularity prompted local research centres to publish Sentieri and Mal’aria (Zona et al. 2019; Legambiente 2019), two very influential studies on air quality in the area, and local politicians to take on the problem. The same thing cannot be said for Perugia, where the results of environment-related queries comprise mainly of videos catered to tourists.
At a first glance, environmental policy seems to be more politicised in Terni than in Perugia, and its salience linked to social movements’ success.
Tone and polarisation
At a superficial level, sentiment values in each category seem to be fairly balanced between Perugia and Terni. Looking at individual keywords, however, reveals more accurate insight. The videos retrieved with the “Industry” query show an interesting discrepancy: the ones connected to more general terms, such as industria (industry), or even acciaieria (steel factory) are markedly positive. It is only when one specifically looks for the exact name of the steel works in Terni (Acciai Speciali Terni - AST) that one gets polarised results. This is due to the fact that the more general keywords tend to match to political press releases, which are usually characterised by an either artificially neutral or markedly positive tone. The only exceptions are the few videos taken during protests, displaying very negative sentiment values. The transportation category in Perugia offers some evidence in the opposite direction: more general queries about buses retrieve mostly negative videos about the current state of public transport, whereas querying a specific project (BRT) retrieves enthusiastic political speeches about new public investment. Waste management in Terni is especially interesting, the number of videos about the incinerator with positive and negative sentiment being almost exactly the same.
Co-occurring topics
In general, we can confirm that environmental policy is more polarised and politicised in Terni, as even the sentiment computed on the basic query “environment” is extremely polarised, as opposed to Perugia, where videos collected with the same query are clearly positive. The problem of air quality seems to drive negative sentiment on environment-related topics in Terni.
The issue of insufficient air quality in Terni is clearly connected to the noxiousness of its industrial activity. However, when videos talk about industry, they almost never mention environmental policy. If one searches for videos about industrial activity in Terni, they will hear absolutely no mention of its effects on the environment. Nevertheless, a significant portion of the videos talking about the environmental conditions of the same town mention industrial activity at least once. This fact speaks volumes about the population’s priorities: that the relationship between heavy industry and air pollution is definitely recognised, but it is simply deemed irrelevant in a discussion on industrial planning.
Analysing the UmbriaPress corpus
Due to the nature of the data source, the quality of the text contained in UmbriaPress is considerably higher than that of the YouTube corpus. Together with its much bigger size, this means that more sophisticated modelling can be a viable option. The analysis still begins with keyword-based retrieval and counting, to investigate salience. An exploration of how environmental policy is defined in the corpus was made possible by treating co-occurring tokens as nodes of a relational database and training a word-embedding model. The tone of public discourse around environmental policy was analysed through dictionary-based sentiment analysis.
Salience
Consistently with what was found in previous analyses, the salience of transportation and industry follow very different paths in Terni and Perugia. A keyword-based retrieval returns pretty stable results in Perugia, where environmental policy comes under the spotlight in 2019, following the success of Fridays for Future, only to then yield its spot back to transportation, the main topic of discussion in the city. The situation in Terni is more erratic, each topic falling in and out of fashion way more easily.
Interestingly enough, the co-occurrence pattern detected in the YouTube corpus is partially confirmed by this second chunk of the analysis. In Terni, 22% of the articles containing terms connected to Environmental policy also include Industry-related terms. This relationship is not completely symmetrical: the articles in which Industry and Environment-related terms co-occur represent merely 15% of those in which Industry-related tokens appear. This connection is way less significant in Perugia, where the percentages never reach 10%. Transportation and Environment are also less often connected in Perugia than in Terni. This might offer a hint about how environmental policy is discussed in the two cities: Perugia shows a more compartmentalised situation, where each topic is treated individually; whereas in Terni, environmental policy is often connected to other topics.
From salience to meaning
The relationship between words may help us in discovering both the definitions and connotations of certain words in our corpus. Representing the co-occurrences of significant terms and their immediate neighbours as relational data is a very valuable technique in this sense.
Bigram networks
The terms with the most connections, both in Terni and Perugia, are those relating to waste management. Interestingly enough, the word “waste” (rifiuti) is connected to “pollution” (inquinamento) in Perugia, but not in Terni.
Word embeddings
After splitting the data in two sub-corpora, one for each city, a word-embeddings model was able to represent words as points in two multi-dimensional spaces, one for each sub-corpus. Performing Principal Component Analysis (PCA) on the values associated to relevant terms in each one lets us read them in a topographic representation, the nearest words being more closely related semantically.
A first cluster could be identified in the Terni subcorpus around industry. (Heavy) industry in Terni is connected to polluting emissions and toxic waste. “Waste incinerator” (inceneritore) is instead found at the top of the representation, together with the words “ecosystem”, “reuse”, and “pollution”. This might explain the asymmetry in co-occurrences between the two queries “Environment” and “Industry”: while some areas of environmental policy (namely air pollution) are connected to industrial activity, waste management is treated separately.
The same cannot be said for Perugia, where “pollution”, “emissions”, “reuse”, and “waste” all coexist in the otherwise lonely top-right corner, while waste management is closely related to transportation, and industry-related terms seem to float without an apparent order.
These results contribute to explain the differences in salience and politicisation of environmental policy between the two cities. While the perugini see the environment as a coherent, unified topic of discussion, the ternani clearly mark a difference between free-standing policy areas, such as civilian waste management, and those connected to the noxiousness of economic activity, such as air quality and toxic waste.
Sentiment analysis
Through dictionary-based sentiment analysis, an exploration of the tone in each of the three main topics analysed is possible. At a first glance, no real difference can be identified between the two cities, despite a certain variability between topics being present.
Restricting the corpus
A first element to be taken into account is the differences in tone across newspapers: the fast-journalism outlets Terninrete and PerugiaToday tend to have a very neutral tone, due to the adoption of a very short format and the subsequent shallowness of their content.
By restricting the corpus only to the articles published on Corriere dell’Umbria, one can easily see how sentiment values become more erratic: articles being way more negative in Terni than in Perugia in general, but especially when talking about environmental policy. However, the articles driving the negative tail of the distribution are, for the most part, related to single episodes or individuals’ behaviour (e.g. illegal dumps in the countryside).
Due to the extremely small size of the resulting sample, however, it is necessary take these last results without a grain of salt. The only takeaway from this last chunk of the analysis is that future research should take the type of journalism into account at the moment of compiling a corpus.
Contextual sentiment
By retrieving the tokens found immediately before or after terms connected to environmental policy, one can get a feel for where these keywords appear, as in their context in the overall text. By performing sentiment analysis on these contextual tokens, we can infer how environmental policy is conceived by the public in each city.
The sentiment appears generally more negative in Terni, which is hardly surprising considering that it has the regional record for the worst air quality, and overly complicated waste management conditions. Terms connected to ecology and nature, however, are markedly positive in Terni. A possible explanation could be an idealisation of nature as opposed to the noxious urban environment.
The BRT was defined by the municipality as an “innovative electric transport system, based on an advanced road transport concept, with particularly high standards, characterised by low emissions and high transport capacity” (Comune di Perugia, n.d.). In plain English, electric buses.↩︎






