TY - JOUR
T1 - ISP Distillation
AU - Schwartz, Eli
AU - Bronstein, Alex M.
AU - Giryes, Raja
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2023
Y1 - 2023
N2 - Nowadays, many of the images captured are 'observed' by machines only and not by humans, e.g., in autonomous systems. High-level machine vision models, such as object recognition or semantic segmentation, assume images are transformed into some canonical image space by the camera Image Signal Processor (ISP). However, the camera ISP is optimized for producing visually pleasing images for human observers and not for machines. Therefore, one may spare the ISP compute time and apply vision models directly to RAW images. Yet, it has been shown that training such models directly on RAW images results in a performance drop. To mitigate this drop, we use a RAW and RGB image pairs dataset, which can be easily acquired with no human labeling. We then train a model that is applied directly to the RAW data by using knowledge distillation such that the model predictions for RAW images will be aligned with the predictions of an off-the-shelf pre-trained model for processed RGB images. Our experiments show that our performance on RAW images for object classification and semantic segmentation is significantly better than models trained on labeled RAW images. It also reasonably matches the predictions of a pre-trained model on processed RGB images, while saving the ISP compute overhead.
AB - Nowadays, many of the images captured are 'observed' by machines only and not by humans, e.g., in autonomous systems. High-level machine vision models, such as object recognition or semantic segmentation, assume images are transformed into some canonical image space by the camera Image Signal Processor (ISP). However, the camera ISP is optimized for producing visually pleasing images for human observers and not for machines. Therefore, one may spare the ISP compute time and apply vision models directly to RAW images. Yet, it has been shown that training such models directly on RAW images results in a performance drop. To mitigate this drop, we use a RAW and RGB image pairs dataset, which can be easily acquired with no human labeling. We then train a model that is applied directly to the RAW data by using knowledge distillation such that the model predictions for RAW images will be aligned with the predictions of an off-the-shelf pre-trained model for processed RGB images. Our experiments show that our performance on RAW images for object classification and semantic segmentation is significantly better than models trained on labeled RAW images. It also reasonably matches the predictions of a pre-trained model on processed RGB images, while saving the ISP compute overhead.
KW - ISP
KW - RAW image
KW - computational photography
KW - knowledge distillation
KW - object recognition
UR - http://www.scopus.com/inward/record.url?scp=85147303604&partnerID=8YFLogxK
U2 - 10.1109/OJSP.2023.3239819
DO - 10.1109/OJSP.2023.3239819
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85147303604
SN - 2644-1322
VL - 4
SP - 12
EP - 20
JO - IEEE Open Journal of Signal Processing
JF - IEEE Open Journal of Signal Processing
ER -