일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |
- textmining
- SOMs
- Generative model
- Ann
- stemming
- cross domain
- Clustering
- NMF
- 자기조직화지도
- ResNet
- Binary classification
- 군집화
- Attention
- BERT
- gaze estimation
- Logistic Regression
- tensorflow
- Gradient Descent
- 경사하강법
- LSTM
- TFX
- MLOps
- NER
- Support Vector Machine
- Transfer Learning
- Python
- nlp
- VGGNet
- RNN
- AI 윤리
- Today
- Total
juooo1117
Source-free Adaptive Gaze Estimation by Uncertainty Reduction 본문
Source-free Adaptive Gaze Estimation by Uncertainty Reduction
Hyo__ni 2025. 3. 25. 20:16Paper Review
Source-free Adaptive Gaze Estimation by Uncertainty Reduction
(Accepted by CVPR 2023)
https://ieeexplore.ieee.org/document/10204815
Abstract
We present an unsupervised source-free domain adpatation approach for gaze estimation, which adapts a a source-trained gaze estimator to unlabeled target domains without source data.
(도메인 차이를 해결하려는 접근으로써, 이 방식은 소스 데이터 없이도 타겟 도메인에 적응할 수 있도록 한다.)
We propose the Uncertainty Reduction Gaze Adaptation(UnReGA) framework, which achieves adaptation by reducing both sample and model uncertainty.
Sample unertainty is mitigated by enhancing image quality and making them gaze-estimation friendly,
whereas Model uncertainty is reduced by minimizing prediction variance on the same inputs.
Results show that our model outperforms other sota cross-domain gaze estimation methods under both protocols, with and without source data.
Introduction
Notwithstanding the previous methods, the appearance-based gaze estimators meet the most challenging problem that their performance drops significantly when they are trained and tested on different domains.
(e.g., the domains with different subjects, image quality, background environments, or illuminations.)
왜냐면,
Usually, gaze estimators are trained on the data collected under controlled conditions where true gaze is feasible to be measured and recorded by the deployed devices. Then, these gaze estimators would be applied under a much different and uncontrolled environment. (훈련시키는 데이터의 상태와 실제 사용되는 환경이 매우 다른게 문제)
위를 해결하기 위해서, target data에 훈련된 모델을 적용시키는 방법으로 도메인 갭을 줄이는 방법이 시도됨. 하지만 이 방법들에는 문제점이 있다.
First, most gaze models are trained with face images which might be not accessible due to privacy or bandwidth issues. (프라이버시나 대역폭 문제로 소스 데이터에 접근 불가)
Secondly, processing source data might not be computationally practical in real-time gaze estimation on the target domain. (소스 데이터를 처리하는 것이 실시간에서는 비효율적임)
따라서, 아래와 같이 해결책을 제시한다.
Therefore, we formulate gaze estimation as an unsupervised source-free domain adaptation problem, where we cannot access the source data when fitting the model to the target. (소스 데이터 사용없이 새로운 도메인에 적응시킴!!)
We propose to adapt the source-trained gaze estimators to the target domain by reducing both the sample uncertainty and model uncertainty on the unlabeled target data. (이 방법으로 새로운 도메인에 적응시키겠다)
Sample uncertainty:
noise inherent in the input images, such as sensor noise and motion blur (입력 이미지의 noise로 인한 불확실성)
Reducing the sample uncertainty pulls together the source and target data, and accordingly reduces the estimator’s model uncertainty on target data. by doing this, we reduce the image quality discrepancy
- 샘플 불확실성을 줄이면 소스 데이터와 타겟 데이터 간의 차이를 줄일 수 있다.
- 예를 들어, image enhancement 를 통해 이미지를 개선하면, 데이터셋 간의 이미지 품질 차이가 줄어들어, 타겟 데이터와의 불일치가 감소함.
- 즉, 이미지 품질이 향상되면 모델은 더 일관된 예측을 하게 되어 불확실성이 줄어듦
Model uncertainty:
inconsistency of predication or model perturbations (모델의 예측 불일치나 변동성으로 인한 불확실성)
Model uncertainty empirically shows a positive correlation with gaze estimation error in cross-domain scenarios.
(즉, 모델의 예측이 불확실할수록 gaze estimation 오류가 커지는 것! / Figure1-b 참고)
즉, 샘플 불확실성을 줄여 소스 데이터와 타겟 데이터 간의 차이를 줄이면, 모델 불확실성도 감소하고, 결국 gaze estimation 오류를 줄인다. (model uncertainty 와 gaze estimation error 사이에는 positive correlation 이 존재하는 실험적 결과 제시)
Proposed Approach - UnReGA framework:
we first transfer the input images into a gaze-estimation-friendly domain by introducing a face enhancer to enhance input images without changing the gaze. (이미지를 향상시키면 눈에 대한 세부 정보가 더 많이 전달되므로, 샘플 불확실성이 줄어듦 → better generalization / Figure1-c 참고)
Next, we update an ensemble of source gaze estimators by minimizing the variance of their predictions on the unlabeled target data. Finally, we merge the updated estimators into a single model during inference. (예측 분산을 최소화하여 gaze estimators 의 앙상블을 업데이트 → 타겟 데이터에 대한 예측이 일관적으로 됨! / Figure1-c 참고)
We propose the variance minimization and pseudo-label supervision mechanisms in UnReGA to address the adaptation issue without source data for regres sion tasks, while most existing source-free adaptation methods are designed for classification tasks.
(기존 대부분의 방법은 classification tasks 로 접근했으나, UnReGA는 regression tasks 로 접근하며, 이를 위해서 variance minimization, pseudo-label supervision mechanism 을 통해 소스 데이터 없이 타겟 도메인에 적응하도록 함)
Related Work
Cross-domain Gaze Estimation:
With the development of deep learning, many efforts are made in appearance-based gaze estimation to reduce predictino errors on public gaze datasets. (e.g., MPI-IGaze, ETH-XGaze and GazeCapture)
However, training data for these estimators are often collected under controlled conditions, limiting their applicability in diverse real-world scenarios.
→ 따라서, recent studies have explored gaze estimation methods across domains
We review the cross-domain gaze estimation methods under three settings;
- domain generalization (어느 target domain에서도 잘 적응할 수 있도록, 학습 중에 일반화 능력을 키우는 것!)
Park et al. proposed to learn a rotation-aware latent representation of gaze
Cheng et al. proposed to extract domain-agnostic gaze feature
- unsupervised domain adaptation with source data (소스 데이터에 대한 학습 오류를 최소화하면서 타겟 도메인에 적응시켜서, 두 도메인 간의 차이를 최소화함)
e.g., adversarial learning, outlier guidance, and contrastive regression
- unsupervised domain adaptation without source data (소스 데이터 없이 target domain에서 모델의 성능을 최적화하는 것은 실용적이지 x, 기존 방법들은 target domain에 적응 할 수는 있지만 source domain에서 supervision 정보가 없다면 성능이 떨어짐)
Bao et al. proposed a self-training strategy by keeping rotation consistency on augmented target images for adaptation without source data.
Source-free Domain Adaptation:
The domain adaptation problem without source data is also explored in other computer vision task;
e.g., image classification, semantic segmentation, object detection...
To solve this problem,
existing works leverage the knowledge hidden in the source model by pseudo-labeling, feature alignment, self-supervised learning, batch normalization adaptation.
→ most of the methods are designed for classification problems but might fail in regression. (분류 문제에 맞춰 설계되어서, 회귀 문제의 성능은 떨어짐)
Our proposed method addresses the issue in gaze estimation, which is a regression problem.
We are inspired by the work, which computes uncertainty with an ensemble of models to measure the domain shift, to reduce cross-domain gaze errors by reducing uncertainty. (모델 앙상블을 통해 uncertainty 를 계산, domain shift 를 측정하는 방식처럼, uncertainty 를 줄여서 cross-domain 에서의 오류를 줄이는 방법을 제안)
Uncertainty Reduction Gaze Adaptation
3.1. Problem Definition
Ds = {I, g} be the source domain data, where 'I', 'g' represent the 'i-th image' and 'its true gaze label', respectively.
(통제된 조건에서 수집되며 ground truth labels 존재!)
Dt = {I} denote the target domain images captured under different conditinos in real-world scenarios.
(실제 시나리오에서 다양한 조건 하에 수집된 이미지)
The goal is to estimate the gaze of the target images when we cannot access to the source and target data simultaneously. Thus, we train the source models on the source data without knowledge of the target data and then adapt these models to the unlabelled target data in absence of the source data.
(모델 훈련 시에, target data 에 대한 정보는 사용되지 않음 → target domain 적응 시에는 라벨 없는 target data 만 사용함)
3.2. UnReGA Framework
Uncertainty Reduction Gaze Adaptation framework comprises three stages:
(1) Source model training
- We train the face enhancer and the gaze estimator with the enhanced images as input.
- The face enhancer reduces the sample uncertainty by improving the input images' quality and makes them more suitable for gaze estimation across domains.
- We keep a set of trained gaze estimators at different iterations during the training process for the next adaptation stage.
(2) Source-free adaptation
- The set of gaze estimators is updated by the variance minimization mechanism and pseudo-label mechanism.
(3) Inference on target data
- By taking the mean parameters of the updated estimators, the set of models is merged into a single one, which is used to predict the gaze for target images.
아래에서 위 3단계를 자세히 살펴보자.
3.3. Training on Source Data
We first pretrain the gaze estimator and the face enhancer respectively, and then finetune the face enhancer and the gaze estimator sequentially.
Pre-train the gaze estimator:
We employ a ResNet18 as the gaze estimator 'g'.
It is pretrained on the annotated source data 'Ds' by minimizing the discrepancy between the prediction and the true gaze.
Pre-train the face enhancer:
We employ a general image super-resolution model Real-ESRGAN as the face enhancer 'F' by removing the last up-sample module of Real-ESRGAN and ensuring the input and output to be of the same resolution. (마지막 업샘플 모듈을 제거하고, 입출력이 동일한 해상도를 가지도록 보장함)
Given the high quality image 'Ih', we degrade its quality through degradation methods. Then, we fed the degraded image 'I' into the face enhancer and obtain the enhanced result. To train the face enhancer, we force the generated image and the original 'Ih' to be consistent and un-distinguishable by minimizing a reconstruction loss and adopting an adversarial mechanism.
(진짜 이미지를 degradation 함 → degraded image 를, face enhancer 를 이용해서 새로운 enhanced image 를 생성함 → 진짜 이미지와 enhanced image 를 비교하면서 face enhancer 를 학습시킴! / Figure3 참고!)
Finetune the face enhancer:
To keep the gaze unchanged, we finetune the face enhancer by forcing the enhanced images to have the same gaze as the label or a the one from the original image.
(face enhancing 과정에서 gaze 정보가 변하거나 손실될 수 있으므로, enhanced image의 gaze가 원본 이미지나 label과 동일하도록 해 주어야 함!)
- For the image 'Is' in the source domain, we enhance it into 'enhanced Is' by the face enhancer and then predict its gaze using the gaze estimator 'g'.
(학습된 face enhancer로 source data 를 enhancing 하고, 해당 이미지에서 gaze를 예측한다)
- We require the predictions to be consistent with the true gaze by minimizing the gaze estimation loss 'lg' over all the source data.
(loss를 줄여가는 방법으로 예측과 실제 gaze 값이 일치하도록 해야한다.)
- For high-quality images, we optimize the reconstruction loss and adversarial loss as we do in the pretraining stage. Additionally, we force the original high quality 'Ih' and the enhanced low quality to have the same prediction by minimizing the gaze consistent loss 'lgc'.
(고품질 이미지와 향상된 저품질 이미지가 동일한 예측을 할 수 있도록 하기 위함)
→ We finetune the parameters of the face enhancer by freezing the gazes estimator and minimizing the sum of 'lg', 'lgc', and the losses 'lpre' used in the pretraining the face enhancer (thus, minimizing 'lg' + 'lgc'+ 'lpre')
Finetune the gaze estimator:
Freezing the parameters of the face enhancer, we update the parameter of the gaze estimator to boost its performance on the enhanced images. (성능 향상 위해서)
After training on source data, we obtain a keeping-gaze face enhancer and a gaze estimator that predicts gaze for the enhanced images.
We keep a set of gaze estimators from different iterations during the finetuning of the gaze estimator, which will be used in the next stage of source-free adaptation.
(fine tunning 할 때 여러 gaze estimator models 을 저장해서 source-free adaptation 단계에서 적용시킴!)
3.4. Source-free Adaptation
During the adaptation stage, we only have access to the unlabelled target data and the source model without the source data. We unsupervised update the estimators' parameters by minimizing the model uncertainty and preserving the models' ability in gaze estimation via pseudo-labels.
(소스 모델을 타겟 데이터에 적응시키는 과정에서, 불확실성을 줄이고 & 모델의 gaze estimation 성능을 유지하면서, pseudo-labels 을 통해 모델을 업데이트한다.)
Variance Minimization:
We formulate model uncertainty as the variance of the predictions by the set of models on the same input images.
(여러 모델이 같은 input에 대해서 예측한 값들의 분산으로 uncertainty는 정의됨 → 분산을 최소화해서 uncertainty 를 줄이자)
K(the number of models) models are saved checkpoints from K different training iterations when we finetune the gaze estimator on the source. They have the same architecture but different parameter values.
(이 K개 모델의 파라미터는, 모델이 예측하는 값들이 더 일관되도록 파라미터를 조정해 나간다 / 각 모델들의 mean prediction 을 활용!)
Pseudo-Label Supervision:
We introduce pseudo-labels to supervise the gaze prediction in target.
(타겟 데이터에 대해 gaze estimators 가 예측한 값을 라벨로 사용하여 훈련시키는 방식 → 타겟 도메인에 라벨이 없기 때문에, 예측된 값들을 pseudo-label 로 활용해서 훈련을 계속하는 것)
Since directly usingthe output of the gaze estimators in the variance minimization branch as the pseudo label may accumulate errors, we generate pseudo labels by employing the temporal average of the models to reduce the accumulated errors.
(gaze estimator 예측값을 바로 사용하면 오류가 누적되어서(예측이 잘못되었을 경우) 정확성이 떨어질 수 있음 → 따라서 모델들이 예측한 값들의 평균을 사용해서 오류를 줄인다)
3.5. Inference on Target Data
Given a new image in the target domain, we predict the gaze by sequentially passing it through the face enhancer trained from source data and the mean estimator of the K gaze estimators updated on target data.
Using the mean parameters has less computation cost than using the mean predictions and leads to better generalization than a single model.
- 소스 데이터로 훈련된 face enhancer로 이미지 향상시킴
- 타겟 데이터에서 업데이트된 k개의 gaze estimators 의 평균 모델(mean estimator)을 통과시킴
- 이 때, k개 모델의 예측값의 평균이 아닌, k개 모델의 파라미터 평균을 사용함으로써 계산 비용이 적고 더 나은 일반화 성능을 가짐
Experiments
- We use Real-ESRGAN model as the face enhancer and Resnet18 as the backbone of gaze estimators.
- During the training on source data, we pretrain the face enhancer on FFHQ with the same settings as in and finetune it for 20000 iterations with a batch size of 16.
- We train gaze estimator using the Adam optimizer with a learning rate of 10^-4 until 40 epochs.
- We chose K=10 gaze estimators of the last 10 epochs (훈련되면서 점차 성능이 좋아지므로, 마지막 10 에폭에서 10개 모델 선택)
- During the source-free adatation, we use the Adam optimizer
Baseline:
We train a ResNet18 as the gaze estimator with the source data for 40 epochs.
ModelAvg:
We average the parameters of gaze estimators from the last 10 epochs during the training of baseline.
The mean estimator is evaluated on different target domains.
→ Reduced errors indicates that averaging the parameters is effective and contributes to better generalization ability.
EnhanceFace:
This method omits the source-free adaptation component in UnReGA
It applies the face enhancer on both source and target data,
and trains the gaze estimator with the enhanced source data, (즉, 여기서는 향상된 이미지로 모델을 훈련시킴!)
and then employs the mean estimator of the last 10 epochs on the enhanced target images. (이 과정은 동일, 단 향상된 target 이미지 사용)
→ It indicates that reducing the sample uncertainty by a face enhancer helps reduce the domain gap and improves the performance on cross-domain tasks considerately. (enhancing 하는 것만으로도 도메인 갭을 줄이고 성능을 높이는 것 확인!!)
UnReGA- :
This method omits the component of face enhancement in UnReGA
It significantly improves the performance of baseline and outperforms EnhanceFace.
→ It means tha the source-free adaptation mechanism is more effective than face enhancement and is crucial in the proposed UnReGA framework. (enhancing보다 source adaptation 이 더 성능높이는데 도움이 됨을 확인!)
UnReGA :
This method integrates all three components(*mean estimator with averaged parameters, face enhancement, source-free adaptation).
→ best performance!!
Comparison with Cross-Domain Gaze Estimation Methods
Source-free adaptation
- PureGaze: 응시 특징 정제를 이용한 도메인 일반화 방법
- CSA: 대조 회귀를 사용한 소스-프리 도메인 적응 방법
- PnP-GA (oma): 이상치 가이드 모델 적응을 사용한 비지도 도메인 적응 방법 (소스 데이터 없이 이상치 손실만 사용)
- RUDA: 회전 일관성으로 비지도 응시 적응
Unsupervised domain adaptation (*adaptation methods with source data)
- GazeAdv: 적대적 학습을 사용한 응시 적응 방법
- Gaze360: 적대적 학습과 핀볼 손실을 사용한 응시 적응 방법
- PnP-GA: 이상치 가이드를 사용한 협력적 적응 방법
- CRGA: 대조 회귀를 사용한 응시 적응 방법 (100개의 타겟 및 소스 샘플 사용)
Ablation Study
We investigate the effectiveness of each loss item during the stages of training on source data and source-free adaptation in the UnReGA framework.
Loss Terms for Training on Source Data
We propose gaze loss 'Lg' and gaze consistency loss 'Lgc' to finetune the face enhancer for keeping the gaze unchanged in enhanced images.
The comparision results demonstrate that both two losses improve the baseline on four cross-domain tasks.
Loss Terms for Source-free Adaptation
We investigate the mechanisms of variance minimization 'Lvm' and pseudo-label supervision 'Lwpl' in source-free adaptation under both the settings as UnReGA- and UnReGA.
The results below demonstrate that adaptation 'Lvm' or 'Lwpl' individually achieve performance improvement over baseline and adaptation with 'Lvm' + 'Lwpl' achieve the best performance. (동시에 사용하는 것이 더 효과가 좋음!)
However, each loss function has its advantages and disadvantages.
- Soley employing 'Lvm' can significantly reduce gaze errors but errors may increase after a certain iteration, which is challenging to identify without access to labeled validation data. (일정 epoch 이후에는 오류가 증가할 수 있다. → 이 현상은 라벨이 있는 val data 에 접근하지 않으면 알아채기 힘듦)
- Utilizing 'Lwpl' alone can maintain stable gaze errors after convergence, but its performance is not as excellent as the best iteration of 'Lvm' alone. (수렴 후 error는 안정되지만, variance minimization 'Lvm' 을 단독으로 썼을 때의 최고 성능에는 미치지 못함!)
⇒ Combining 'Lvm', 'Lwpl' during adaptation can leverage the strengths of both loss functions, resulting in satisfactory performance and stable results during optimization. (그러니까 둘을 같이 써야함!! / Figure 4 참고!!)
Discussion about Uncertainty Reduction
To discuss how image quality influences the model uncertainty and gaze errors, we visualize some high quality and low quality examples and their enhanced pairs.
Compared with high-quality images, low-quality samples tend to have higher model uncertainty and higher gaze errors. After face enhancement on samples, the model uncertainty of both high-quality and low-quality images decreases and so do the gaze errors.
Moreover, the enhancement of low-quality images brings more performance gain than high-quality images.
To understand the correlation between reducing model uncertainty and reducing gaze errors, we illustrate the model uncertainty and gaze errors of applying EnhanceFace and UnReGA on different samples grouped by model uncertainty. (Figure 6 참고!)
The results indicate that both EnhanceFace and UnReGA can consistently reduce model uncertainty and gaze errors for groups with different uncertainty over baseline. Moreover, the higher uncertainty of the samples, the more uncertainty and errors can be reduced by EnhanceFace and UnReGA.
(EnhanceFace와 UnReGA는 모델 불확실성과 오류 사이에 상관관계가 있음을 보여주며, 모델 불확실성이 높은 샘플에서 오류가 더 줄어드는 것을 확인함!)
Conclusion
- UnReGA improves gaze estimation performance on the target data by reducing uncertainty on the target.
- Our source-free adaptation method shows significant performance improvements over baseline and also outperforms the SOTA gaze adaptation methods using source-data during adaptation on adaptation tasks. (소스를 사용하는 최신 방법들보다도 성능우수!)
In the future,
- The connection between face enhancement and the minimization of sample uncertainty can be discussed by formulating sample uncertainty mathematically (둘 사이에 어떤 관계가 있는지 수학적으로 공식화하는 방법)
- The proposed uncertainty reduction method can be explored on other cross-domain regression problems.
'Artificial Intelligence > Research Paper' 카테고리의 다른 글
RT-GENE: Real-Time Eye Gaze Estimation in Natural Environments (0) | 2025.03.20 |
---|---|
MoST: Motion Style Transformer between Diverse Action Contents (0) | 2025.03.19 |
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs (0) | 2025.03.01 |
Cross domain - Medical Image Segmentation (0) | 2025.02.03 |
Cross-domain Image Analysis - Domain Shift (0) | 2025.02.03 |