cnae-9
OpenML dataset with id 1468
Author name not available (Why is that?)
Full work available at URL: https://api.openml.org/data/v1/download/1586233/cnae-9.arff
Upload date: 21 May 2015
Dataset Characteristics
Number of classes: 9
Number of features: 857 (numeric: 856, symbolic: 1 and in total binary: 0 )
Number of instances: 1,080
Number of instances with missing values: 0
Number of missing values: 0
Author: Patrick Marques Ciarelli, Elias Oliviera Source: UCI - 2010 Please cite:
Description
This is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a subset of 9 categories.
Source
``` Patrick Marques Ciarelli, pciarelli '@' lcad.inf.ufes.br, Department of Electrical Engineering, Federal University of Espirito Santo Elias Oliveira, elias '@' lcad.inf.ufes.br, Department of Information Science, Federal University of Espirito Santo ```
Data Set Information
This is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a subset of 9 categories cataloged in a table called National Classification of Economic Activities (Classificação Nacional de Atividade Econômicas - CNAE). The original texts were preprocessed to obtain the current data set: initially, it was kept only letters and then it was removed prepositions of the texts. Next, the words were transformed to their canonical form. Finally, each document was represented as a vector, where the weight of each word is its frequency in the document. This data set is highly sparse (99.22% of the matrix is filled with zeros).
Attribute Information
In the dataset there are 857 attributes, 1 attributes with the class of instance and 856 with word frequency:
```
1. category: range 1 - 9 (integer) 2. 857. word frequency: (integer) ```
Relevant Papers
Patrick Marques Ciarelli, Elias Oliveira, 'Agglomeration and Elimination of Terms for Dimensionality Reduction', Ninth International Conference on Intelligent Systems Design and Applications, pp.547-552, 2009
Patrick Marques Ciarelli, Elias Oliveira, Evandro O. T. Salles, 'An Evolving System Based on Probabilistic Neural Network',
Brazilian Symposium on Artificial Neural Network, 2010
This page was built for dataset: cnae-9