利用視覺Transformer之多標籤深度視覺語義嵌入模型

來毓庭; Lai, Yu-Ting

利用視覺Transformer之多標籤深度視覺語義嵌入模型

dc.contributor	葉梅珍	zh_TW
dc.contributor	Yeh, Mei-Chen	en_US
dc.contributor.author	來毓庭	zh_TW
dc.contributor.author	Lai, Yu-Ting	en_US
dc.date.accessioned	2022-06-08T02:43:30Z
dc.date.available	9999-12-31
dc.date.available	2022-06-08T02:43:30Z
dc.date.issued	2021
dc.description.abstract	多標籤影像分類是一項具挑戰性的工作，目標是同時找出不同大小的物件並且辨識正確的標籤。然而，常見的做法是使用整張影像抽取特徵，較小物體的資訊可能會因此被稀釋，或是成為雜訊，造成辨識困難。在先前的研究裡顯示，使用關注機制和標籤關係能各自增進特徵擷取和共生關係，以取得更強健的資訊，幫助多標籤分類任務。在本工作中，我們使用Transformer之架構，將視覺區域特徵關注至全域特徵，同時考慮標籤之間的共生關係，最後將加權後之新特徵產生出一動態的語義分類器，在語義空間內分類得出預測標籤。在實驗中，顯示我們的模型可達到很好的成效。	zh_TW
dc.description.abstract	Multi-label classification is a challenge task since we must identify many kinds of objects in different scales. While using global features of an image may discard small object information, many researches have shown that an attention mechanism improves feature extraction and that label relations reveal label co-occurrence, both of which benefit a multi-label classification task.In this work, we extract attended features from one image by Transformer and simultaneously consider labels’ co-occurrence. Then, we use the attended features to generate a classifier applied on the semantic space to predict the labels. Experiments validate the proposed method.	en_US
dc.description.sponsorship	資訊工程學系	zh_TW
dc.identifier	60847029S-40527
dc.identifier.uri	https://etds.lib.ntnu.edu.tw/thesis/detail/7bb0f9321ecc32df3057b4e5f01722d4/
dc.identifier.uri	http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/117318
dc.language	中文
dc.subject	多標籤分類	zh_TW
dc.subject	視覺語義嵌入模型	zh_TW
dc.subject	關注機制	zh_TW
dc.subject	multi-label classification	en_US
dc.subject	visual-semantic embedding	en_US
dc.subject	Transformer	en_US
dc.title	利用視覺Transformer之多標籤深度視覺語義嵌入模型	zh_TW
dc.title	Multi-Label Deep Visual-Semantic Embedding with Visual Transformer	en_US
dc.type	學術論文

Collections

學位論文

利用視覺Transformer之多標籤深度視覺語義嵌入模型

Files

Collections