Data fusion in information retrieval (Q2429152)

The development of the Internet over the last 20 years has enabled applications to query multiple, widely distributed heterogeneous information and data sources, and to combine or integrate the results obtained where necessary or appropriate. A major challenge in such a combining is to detect data objects that are represented in distinct sources in different, sometimes even overlapping ways, but which are actually representations of identical real-world objects. Data fusion is the process of putting such pieces together and of combining multiple records representing the same object into a single consistent one. Data fusion has many applications, in particular in Web search and information retrieval (IR), and is most often placed within the wider context of data integration, where data fusion is seen as one step in a more comprehensive process. Fusion needs to overcome two major challenges, namely conflicting or inconsistent or missing data values and uncertainty with respect to how to fill in and fuse missing values. This book is, according to its author, the result of a 10-year long engagement in data fusion within the context of various research projects. It covers a wide range of fusion approaches such as score normalization, linear combination, ranking-based fusion, or fusing from overlapping databases. It also provides the theoretical background needed to understand and assess these approaches, with particular highlights being expositions such as the geometric framework for data fusion in Chapter 5, which presents interesting methods for assigning weights to components that are under fusion. The book is written in a very concise and dense manner, which makes it unsuited for the novice but quite readable for the expert, in particular the one with a good mathematical background. It contains a lot of evaluation results that help compare the various fusion methods presented, which is helpful for the practitioner. It also gives a good overview (in the final chapter) of applications of data fusion. Although the list of references appears comprehensive, what is strange is that some well-known fusion articles, such as the famous survey by \textit{J. Bleiholder} and \textit{F. Naumann} [``Data fusion'', ACM Comput. Surv. 41, No. 1, Article No. 1 (2008; \url{doi:10.1145/1456650.1456651})], are not cited. This fact indicates that the book might not be suited for an introductory course on data fusion but more as an aid to the fusion researcher.

0 references

reviewed by

Gottfried Vossen

0 references

zbMATH Keywords

data fusion

0 references

information retrieval

0 references

data integration

0 references

Web search