Providing Projective and Affine Invariance for Recognition by Multi-Angle-Scale Vision Transformer

CHARAMBA, Luiz Gustavo da Rocha

Por favor, use este identificador para citar o enlazar este ítem: https://repositorio.ufpe.br/handle/123456789/67307

Comparte esta pagina

Título :	Providing Projective and Affine Invariance for Recognition by Multi-Angle-Scale Vision Transformer
Autor :	CHARAMBA, Luiz Gustavo da Rocha
Palabras clave :	Affine Invariance; Projective Invariance; Geometric Deep Learning; Vision Transformer; Computer Vision
Fecha de publicación :	28-ago-2025
Editorial :	Universidade Federal de Pernambuco
Citación :	CHARAMBA, Luiz Gustavo da Rocha. Providing Projective and Affine Invariance for Recognition by Multi-Angle-Scale Vision Transformer. 2025. Tese (Doutorado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2025.
Resumen :	The recognition of deformed planar shapes finds applications in many unrelated areas, such as marketing, OCR, and autonomous vehicles. An enormous effort has been devoted to this in the literature, based on direct geometric approaches, although with limited results or performance. More recently, many machine learning approaches have been pro posed with satisfactory results only when the deformation is a weak affine at best. This thesis introduces the Multi-Angle-Scale Vision Transformer, MASViT, a deep-learning based solution that outperforms state of the art methods in the recognition of affinely and projectively deformed images. A crucial point in our setting is the absence of deformed images during training phase. Our approach employs 1D convolutional filters correspond ing to straight lines crossing the shape in the polar domain, preserving collinearity, a basic projective invariant. Angular sequences deriving from the polar domain integrate well with the Vision Transformer (ViT) architecture, as these patch embeddings are geometrically coherent, enhancing suitability for the transformer encoder. We also introduce several reg ularization techniques to boost the generalizability of model. To validate the approach, we curated new test datasets derived from the German Traffic Sign Recognition Benchmark (GTSRB). Through extensive experiments, we demonstrate that this approach surpasses state-of-the-art models, particularly when dealing with images subjected to severe affine and projective deformations.
URI :	https://repositorio.ufpe.br/handle/123456789/67307
Aparece en las colecciones:	Teses de Doutorado - Ciência da Computação

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
TESE Luiz Gustavo da Rocha Charamba.pdf		10.46 MB	Adobe PDF	Visualizar/Abrir

Este ítem está protegido por copyright original

Visualizar la licencia

Mostrar el registro Dublin Core completo del ítem Recomiende este ítem

Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons