In brief, our results show that hashtag#GPT and Co. can have a fairly good precision to identify usability issues (~61%). However, they cannot (yet?) replace usability testing and expert reviews.
Still, we observed that hashtag#GenAI can serve as a ✳️ valuable supplement ✳️ particularly for small teams with limited resources and expertise to identify issues in less common user paths, due to its ability to consider the source code too.
Most credits for this work should go to Ali, who conducted most of the research during his Master Thesis project at the University of Hamburg. It was a pleasure to supervise and mentor him and now to co-engage in a “PhD adventure”.
Congratulations Ali! Twice: for the paper and for the award. Super proud of you.
You can download the Preprint from Arxiv. The source code and the research data are also available in the replication package (link in the paper).
This work is part of a bigger initiative, where my team and I are studying opportunities and risks of using Foundation Models and GenAI in Software Engineering & Design with a focus on the Human-AI-Teaming.
Please consider joining the talk at the conference in Ottawa, Canada in April 2025.