Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
Mining the World Wide Web. An information search approach - MaRDI portal

Mining the World Wide Web. An information search approach (Q2734774)

From MaRDI portal





scientific article; zbMATH DE number 1637021
Language Label Description Also known as
English
Mining the World Wide Web. An information search approach
scientific article; zbMATH DE number 1637021

    Statements

    23 August 2001
    0 references
    World Wide Web
    0 references
    Web crawling agents
    0 references
    0 references
    0 references
    0 references
    0 references
    Mining the World Wide Web. An information search approach (English)
    0 references
    The book deals with the challenges of information search in the World Wide Web (WWW). In the first part of the book, the authors take an information retrieval perspective on the WWW, with focus on information encoded in textual data (written documents). They start with an overview of the techniques underlying standard, keyword-based Web search engines for unstructured textual data (including Web crawlers and meta-search engines). Then they discuss query-based search systems and Web query languages for structured data with reference to database management system methods. The authors, finally, turn to mediator, data warehouse and wrapper architectures, which integrate structured databases with the mostly non-structured or semi-structured data available in the WWW. Since the WWW also contains a wide variety of non-textual data (image, video and audio data), the authors briefly survey some methods underlying multimedia search engines.NEWLINENEWLINENEWLINEIn the second part of the book, the focus shifts to data mining on the Web. First, a survey of basic concepts and methods underlying data mining is given. Since the focus of data mining is on the extraction of information from structured data, this view has to be complemented by methods which deal with unstructured data in the WWW as stored in documents. This leads to an overview of methods dealing with text mining, i.e., knowledge discovery in documents in terms of association, trend and event discovery. With the growth of online data in the Web, the application of data mining techniques to pattern discovery in Web data has surfaced in terms of so-called Web mining. Three trends are discussed: Web content mining (the automatic discovery of Web document content patterns, i.e., text mining), Web usage mining (the automatic discovery of Web user behavior patterns), and Web structure mining (the automatic discovery of hypertext and linking structure patterns by connectivity and link topology analysis). Combing the notions of Web crawlers and agent technology has recently led to the concept of autonomous and intelligent Web crawling agents which gather information from the WWW. A case study of the architecture underlying Envirodaemon, an information search engine operating on the WWW which deals with the environmental domain, concludes the book.
    0 references
    0 references

    Identifiers