Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
collins - MaRDI portal

collins

From MaRDI portal
Dataset:6035391



OpenML40971MaRDI QIDQ6035391

OpenML dataset with id 40971

No author found.

Full work available at URL: https://api.openml.org/data/v1/download/17953251/collins.arff

Upload date: 10 November 2017



Dataset Characteristics

Number of classes: 30
Number of features: 24 (numeric: 20, symbolic: 4 and in total binary: 1 )
Number of instances: 1,000
Number of instances with missing values: 0
Number of missing values: 0

Author: Jeff Collins Source: StatLib Please cite: None

Data used in an analysis of the Brown and Frown corpora for my doctoral dissertation titled ``Variations in Written English: Characterizing Authors' Rhetorical Language Choices Across Corpora of Published Texts" (Completed at Carnegie Mellon Univ, 2003). The source of the corpora was the ICAME CD-ROM (get info at <http>).

The data were generated from the texts using tagging and visualization software, Docuscope.

The first row is the variable names. The genre of each text (assigned by the Brown corpus compilers) is in 'Genre' column and the corpus is listed in the 'corpus' column with 1=Brown and 2=Frown corpus.

The dataset may be freely used and distributed for non-commercial purposes.

Note: The Genre and Corpus values together make up the target, and the Countr just counts documents within each counter, so they should probably be ignored.





This page was built for dataset: collins