Efficient Compression of Long Arbitrary Sequences With No Reference at the Encoder

DOI10.1109/TIT.2020.3023945zbMATH Open1465.94043arXiv2002.09893OpenAlexW3086976386MaRDI QIDQ5151688

Publication date: 22 February 2021

Published in: IEEE Transactions on Information Theory (Search for Journal in Brave)

Abstract: In a distributed information application an encoder compresses an arbitrary vector while a similar reference vector is available to the decoder as side information. For the Hamming-distance similarity measure, and when guaranteed perfect reconstruction is required, we present two contributions to the solution of this problem. One result shows that when a set of potential reference vectors is available to the encoder, lower compression rates can be achieved when the set satisfies a certain clustering property. Another result reduces the best known decoding complexity from exponential in the vector length

n

to

O (n^{1.5})

by generalized concatenation of inner coset codes and outer error-correcting codes. One potential application of the results is the compression of DNA sequences, where similar (but not identical) reference vectors are shared among senders and receivers.

Full work available at URL: https://arxiv.org/abs/2002.09893

zbMATH Keywords

decoding complexity compression of DNA sequences Hamming-distance similarity measure

Mathematics Subject Classification ID

Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science) (68P30) Source coding (94A29)