Central facilities

Find the bug – what AI can really do in software engineering

Can language models such as GPT really understand what a ‘bug’ is? This is one of the questions being investigated by a research team at the University of Passau in a project funded by the DFG.

Language models such as GPT and BERT have developed impressive capabilities in recent years. They translate, summarise, write code and even compose poetry. But do these models actually understand what they are saying, or are they simply repeating patterns from huge amounts of text? Large AI models tend to struggle in specialist areas such as software engineering, where many terms have multiple meanings.

‘In our research, we were able to show that the systems have problems with ambiguous terms. “Bug” and “root”, for example, have completely different meanings in computer science than in botany,’ explains Professor Steffen Herbold, Chair of AI Engineering at the University of Passau. In the DFG project ‘SENLP – Knowledge about Software Engineering in NLP Models’, researchers led by Professor Herbold are taking a closer look at how reliably language models handle specialist knowledge from software development and how their limitations can be better understood.

How AI models deal with nonsense

To this end, the researchers are putting the knowledge of large language models to the test – literally. ‘We treat the AI models like exam candidates and test whether they can answer technical questions,’ explains Professor Herbold. For example, can they recognise a correct definition in a multiple-choice test? Can they correctly distinguish between similar concepts and explain the differences?

Another topic is dealing with nonsense, which is often referred to as model hallucinations. ‘It is well known that large language models generate nonsense. We are looking at how pronounced the problem is in software development.’ To this end, the researchers are not only investigating the extent to which the models themselves generate nonsense. They are also testing how the systems react when they receive a nonsensical input. ‘We want to know whether the models can recognise nonsense.’

Language models in comparison

The researchers systematically evaluate the responses and also compare the assessments of different model architectures: How do smaller, specialised models with so-called encoder-only architecture such as BERT compare with large decoder-only models such as GPT?

The Passau team wants to find out whether models trained on general text corpora can still internalise specialist knowledge from software engineering, or whether domain-specific pre-training is absolutely necessary. Based on these findings, the researchers are developing methodological foundations for testing and improving specialist knowledge in large AI models in a more targeted manner in the future.

The Deutsche Forschungsgemeinschaft (German Research Foundation, DFG) is funding the project for a period of three years.

This text was machine-translated from German.

Symbolic picture: Adobe Stock

Principal Investigator(s) at the University	Prof. Dr. Steffen Herbold (Lehrstuhl für AI Engineering)
Project period	01.04.2024 - 31.03.2027
Source of funding	DFG - Deutsche Forschungsgemeinschaft > DFG - Sachbeihilfe
Projektnummer	524228075
Förderhinweis	Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation).

Information for...

Information for...

Current students

Prospective students

Academics

Early career researchers

Businesses

Alumni and friends

Staff

Media representatives

Faculties & facilities

Administration

Central facilities

Faculties

Faculty of Law

Faculty of Social and Educational Sciences

Faculty of Humanities and Cultural Studies

School of Business, Economics and Information Systems

Faculty of Computer Science and Mathematics

Central facilities

Heading

About Us

Governance

Faculties and administration

Research

Research services

Early career researchers

Researching responsibly and transparently

Study options

Study phases

Support

University life

Knowledge transfer

Entrepreneurship support

Partnerships

International Passau

Going abroad

Our international network

Contact points

Find the bug – what AI can really do in software engineering