Exploiting Multiple Similarity Spaces for Deduplication of Encrypted Container Images

出版物
ACM Transactions on Storage

The growing popularity of encrypted container images in registries poses unique challenges for storage management due to the necessity for deduplication amidst rising image volumes. Traditional deduplication struggles with encrypted content, which inherently disguises duplicate data as distinct due to its randomized nature. Current advanced methods tackle this issue by decompressing images and applying message-locked encryption (MLE). However, these techniques face considerable challenges. Minor content changes can impair deduplication effectiveness, and decompressing layers increases storage requirements. Furthermore, this process negatively impacts both the speed at which users access the images and the overall system throughput. We propose SimEnc, a high-performance and secure deduplication system for encrypted container images by exploiting multiple similarity spaces. SimEnc pioneers the integration of semantic hashing with MLE to effectively parse semantic relationships across layers, thereby increasing deduplication efficacy. This system incorporates a rapid selection mechanism for similarity spaces, offering enhanced flexibility over previous models that relied on full decompression. By adopting Huffman decoding to navigate new similarity spaces, SimEnc not only improves deduplication ratios but also enhances overall performance. Our experimental results demonstrate that SimEnc substantially reduces storage needs by up to 261.7% compared to encrypted serverless platforms and by 54.2% against plaintext registries, while also delivering superior pull latency metrics.

李博睿
李博睿

李博睿,东南大学计算机学院讲师

吕嘉美
吕嘉美
特聘研究员

吕嘉美,浙江大学软件学院特聘研究员

高艺
高艺
教授

高艺,浙江大学计算机学院教授,博士生导师

董玮
董玮
教授

董玮,浙江大学计算机学院教授,博士生导师