The Sustainable Future: How NLP is Driving Change

Blog Post

By Vinura Dhananjaya, Machine Learning Engineer at IronOne Technologies LLC, and fellow at Pi School of AI Session 11.

Vinura Dhananjaya

Do you know that climate change and global warming are among the most pressing problems that the world faces in the modern era? The consequences of these issues are critical damage to the environment and significant financial losses to the world economy. Climatological events have caused a loss of 145 billion euros in the European Union (EU) alone in the past decade.

The urgency of mitigating climate change impact has led the EU to introduce the “Green Deal,” a comprehensive plan to transform the EU into a sustainable and carbon-neutral economy. One of the critical components of the Green Deal is the “EU Sustainable Taxonomy,” a classification system that provides a clear definition of sustainable economic activities in the EU.

But what exactly is the EU Sustainable Taxonomy, and why is it so crucial in the fight against climate change? Simply put, it is a system that helps investors and companies identify economic activities that contribute to environmental objectives. By setting clear and transparent criteria for sustainable investments, the EU aims to accelerate the transition to a low-carbon economy and prevent greenwashing, the practice of misleading consumers and investors about the environmental benefits of a product or service.

The EU Sustainable Taxonomy covers various economic activities, from renewable energy and sustainable transportation to circular economy and water management. A unified framework for sustainable investment creates a level playing field for companies and ensures that sustainable activities receive the financial support they need to thrive.

Overall, the EU Sustainable Taxonomy is a critical tool in the fight against climate change. Its adoption by investors and companies is essential for achieving the EU’s ambitious climate targets. Aligning economic activities with environmental objectives helps create a more sustainable and resilient world for us all.

The taxonomy introduces 6 categories,

Climate change mitigation
Climate change adaptation
Sustainable use and protection of water and marine resources
Transition to a circular economy
Pollution prevention and control
Conservation and restoration of biodiversity and ecosystems.

The EU’s economic practices and business entities must be assessed against these 6 categories. This would guarantee that investments would go into truly sustainable businesses, help prevent “greenwashing” activities and help businesses become climate-friendly.

Usually, these assessments are done manually, with the help of domain experts. This process requires analyzing lengthy textual reports, which is ineffective and costly. Pi School fellows and Briink – a company focused on building intelligent systems and services that would help a rapid transformation toward a sustainable economy- took up the challenge of solving this issue.

Vinura Dhananjaya and Srishti Gureja from Pi School of AI session 11, coached by Cristiano De Nobili, leveraged state-of-the-art Transformer based NLP models to solve this task. The main challenges were that the problem domain was new, and the data was short. Hence, the Pi School utilized the power of pre-trained, sustainability domain-related, transformer-based models such as

Climate-BERT
ESG-BERT
SciBERT

Additionally, the fellows tested simpler models, such as XGBoost/RandomForest models with TF-IDF vectors, and transformer-based models proved the most effective. Furthermore, the fellows re-formulated the problem beyond standard multi-class (6-class) classification. This makes the task more fine-grained as we can filter out generic text using the system. The team experimented with this in two different ways. One way uses a sentence-pair binary classification task, and the other uses a 7-class classification task where the 7th class is a “generic” one. Sentence-transformer versions of the models were also experimented with as they provide better sentence-level representations that would suit the nature of the data. For the 7-class classification task, the team achieved an F1 score of 73%.

The fellows also used unsupervised approaches such as “topic modelling” to explore and understand the underlying patterns of sustainability domain-related text. One persisting challenge for all these methods is that they suffer from data imbalance issues, and they need to be evaluated on more unseen data.

As a preliminary step, the fellows have released pipelines of the fine-tuned models on Github that can be straightaway used for inference on new text and further fine-tuning the models on new corpora. As future measures, the models could be further fine-tuned, made more robust for generic text, and improved with other learning methods, such as semi-supervised learning. These publicly available tools would be a great way to pace up achieving sustainable financial goals.