Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
Improved feature weight algorithm and its application to text classification - MaRDI portal

Improved feature weight algorithm and its application to text classification (Q1793568)

From MaRDI portal





scientific article; zbMATH DE number 6953568
Language Label Description Also known as
English
Improved feature weight algorithm and its application to text classification
scientific article; zbMATH DE number 6953568

    Statements

    Improved feature weight algorithm and its application to text classification (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    12 October 2018
    0 references
    Summary: Text preprocessing is one of the key problems in pattern recognition and plays an important role in the process of text classification. Text preprocessing has two pivotal steps: feature selection and feature weighting. The preprocessing results can directly affect the classifiers' accuracy and performance. Therefore, choosing the appropriate algorithm for feature selection and feature weighting to preprocess the document can greatly improve the performance of classifiers. According to the Gini Index theory, this paper proposes an Improved Gini Index algorithm. This algorithm constructs a new feature selection and feature weighting function. The experimental results show that this algorithm can improve the classifiers' performance effectively. At the same time, this algorithm is applied to a sensitive information identification system and has achieved a good result. The algorithm's precision and recall are higher than those of traditional ones. It can identify sensitive information on the Internet effectively.
    0 references
    0 references

    Identifiers