Test-based domain model generation via large language models: a comparative analysis of advanced prompt engineering techniques

SILVA, Pedro Henrique de Oliveira

Use este identificador para citar ou linkar para este item: https://repositorio.ufpe.br/handle/123456789/67689

Compartilhe esta página

Título:	Test-based domain model generation via large language models: a comparative analysis of advanced prompt engineering techniques
Autor(es):	SILVA, Pedro Henrique de Oliveira
Palavras-chave:	Domain Models; LLM; Software Testing; Prompt Engineering; Semantic Validation
Data do documento:	18-Ago-2025
Citação:	SILVA, Pedro Henrique de Oliveira. Test-based domain model generation via large language models: a comparative analysis of advanced prompt engineering techniques. 2025. Trabalho de Conclusão de Curso Engenharia da Computação - Universidade Federal de Pernambuco, Recife, 2025.
Abstract:	The automated generation of domain models from test cases represents a fundamental challenge in software engineering, particularly in the context of mobile device testing. This work extends the research by Silva (2025), who proposed a framework based on Large Language Models (LLMs) to automate this process. We present three main contributions: (1) SBERT, a BERT-based model for generating sentence embeddings to measure semantic similarity, will be replaced by an LLM for semantic validation, eliminating fixed threshold limitations and providing contextual analysis capabilities; (2) implementation and comparative evaluation of five advanced prompt engineering techniques - Few-Shot, Chain-of-Thought, Universal Self-Consistency, Tree of Thoughts, and Prompt Chaining; and (3) systematic analysis of the impact of the temperature parameter on the quality of the generated models. Using Gemini 2.5-flash (instead of Gemini 2 adopted in the previous work), but reusing the same dataset from the original work to ensure comparability, our experiments focus on evaluating the effectiveness of different prompting strategies. Among the techniques evaluated, Chain-of-Thought demonstrated the best overall performance with median recall of 0.87 and low variance (! =0.06), while being computationally efficient. Temperature analysis revealed an optimal result with value 0.3 for structured modelling tasks, balancing determinism and flexibility. These results not only validate the effectiveness of the proposed techniques but also provide practical guidelines for applying LLMs to software engineering tasks that require structural precision and semantic understanding. In particular, we demonstrate significant improvements over the baseline work, with increases of up to 23% in correct identification of implicit atoms and 15% in detection of complex associations.
URI:	https://repositorio.ufpe.br/handle/123456789/67689
Aparece nas coleções:	(TCC) - Engenharia da Computação

Arquivos associados a este item:

Arquivo	Descrição	Tamanho	Formato
TCC Pedro Henrique de Oliveira Silva.pdf		2.25 MB	Adobe PDF	Visualizar/Abrir

Este arquivo é protegido por direitos autorais

Ver licença

Mostrar registro completo do item Recomendar este item Visualizar estatísticas

Este item está licenciada sob uma Licença Creative Commons