Masters Theses

We invite motivated master’s students (ideally from ETH Zurich or EPFL) to join an interdisciplinary applied research project focused on CleaRIS — the Cleantech Recommendation and Intelligence System. The project leverages cutting-edge Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG) to extract, map, and recommend clean technology innovations across sectors.

Thesis / Project Title

Developing a Cleantech Recommendation and Intelligence System Using Large Language Models

Start Date

ASAP (Spring/Summer 2025)

Project Context

This project is carried out closely with ETH Zurich, the Swiss Data Science Center (SDSC), HSLU, and Anacode GmbH. The project also builds on existing shared task contributions from the SwissText 2025 Cleantech Task. We’ll provide you with existing codebases for a speedy onboarding. See the project pipeline under https://polybox.ethz.ch/index.php/s/f8Xeq8dz8zL8ojx

Thesis Objectives

  • Contribute to developing an RAG system for cleantech recommendations.
  • Fine-tune or adapt LLMs for NER and entity linking in the cleantech innovation space.
  • Contribute to developing tools for an agentic system based on the project's needs.
  • Evaluate and iterate on system performance through real-world use cases and feedback loops.

Data

  • Anacode media dataset (2022-2024), available at https://www.kaggle.com/datasets/jannalipenkova/cleantech-media-dataset
  • Google patent dataset (2022-2024), available at https://www.kaggle.com/datasets/prakharbhandari20/cleantech-google-patent-dataset?resource=download
  • OpenAlex academic dataset (2022-2024), downloadable from OpenAlex API by specifying the time windows: https://docs.openalex.org/how-to-use-the-api/api-overview

Expected Deliverables

  • Cleantech NER ontologies and linkage across three databases.
  • RAG system design with the help of SDSC specialist (see this post for inspiration, https://www.datascience.ch/case-studies/enhancing-parliamentary-services-with-generative-ai).
  • Project code documentation and thesis as a technical report.
    Student Profile
  • Master’s student in Data Science, Computer Science, Electrical Engineering, or related fields.
  • Proficient in Python, and familiar with Transformers, HuggingFace, NER pipelines, and RAG architectures.
  • Interest in sustainability, AI applications, and semantic search.
  • Prior experience with LLMs, information retrieval, or knowledge graphs is a strong plus.

What We Offer

  • Access to prototyping codebases and datasets from SwissText 2025 shared tasks.
  • Active engineering support and mentoring from experts at ETHZ and SDSC.
  • Opportunity to work on a high-impact project contributing to Switzerland’s sustainability and innovation ecosystem.
  • Potential pathways for publication and open-source contributions.

Next Steps

Interested students should reach out with a short motivation, CV, and relevant project/course experience. We will prioritize applicants who can start early and contribute iteratively to the development of the system.

Contact Persons

Dr. Susie Xi Rao, , and Dr. Snežana Nektarijvic, .
 

JavaScript has been disabled in your browser