Random mappings designed for commercial search engines
From MaRDI portal
Publication:6263803
arXiv1507.05929MaRDI QIDQ6263803
Author name not available (Why is that?)
Publication date: 21 July 2015
Abstract: We give a practical random mapping that takes any set of documents represented as vectors in Euclidean space and then maps them to a sparse subset of the Hamming cube while retaining ordering of inter-vector inner products. Once represented in the sparse space, it is natural to index documents using commercial text-based search engines which are specialized to take advantage of this sparse and discrete structure for large-scale document retrieval. We give a theoretical analysis of the mapping scheme, characterizing exact asymptotic behavior and also giving non-asymptotic bounds which we verify through numerical simulations. We balance the theoretical treatment with several practical considerations; these allow substantial speed up of the method. We further illustrate the use of this method on search over two real data sets: a corpus of images represented by their color histograms, and a corpus of daily stock market index values.
Has companion code repository: https://gitlab.com/dgpr-sparse-search/code
This page was built for publication: Random mappings designed for commercial search engines
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6263803)