Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
KDDCup09_appetency - MaRDI portal

KDDCup09_appetency

From MaRDI portal
Dataset:6033831



OpenML1111MaRDI QIDQ6033831

OpenML dataset with id 1111

No author found.

Full work available at URL: https://api.openml.org/data/v1/download/53994/KDDCup09_appetency.arff

Upload date: 7 October 2014



Dataset Characteristics

Number of classes: 2
Number of features: 231 (numeric: 192, symbolic: 39 and in total binary: 5 )
Number of instances: 50,000
Number of instances with missing values: 50,000
Number of missing values: 8,024,152

Author: Source: Unknown - Date unknown Please cite:

Datasets from ACM KDD Cup (http://www.sigkdd.org/kddcup/index.php)

KDD Cup 2009 http://www.kddcup-orange.com

Converted to ARFF format by TunedIT Customer Relationship Management (CRM) is a key element of modern marketing strategies. The KDD Cup 2009 offers the opportunity to work on large marketing databases from the French Telecom company Orange to predict the propensity of customers to switch provider (churn), buy new products or services (appetency), or buy upgrades or add-ons proposed to them to make the sale more profitable (up-selling). The most practical way, in a CRM system, to build knowledge on customer is to produce scores. A score (the output of a model) is an evaluation for all instances of a target variable to explain (i.e. churn, appetency or up-selling). Tools which produce scores allow to project, on a given population, quantifiable information. The score is computed using input variables which describe instances. Scores are then used by the information system (IS), for example, to personalize the customer relationship. An industrial customer analysis platform able to build prediction models with a very large number of input variables has been developed by Orange Labs. This platform implements several processing methods for instances and variables selection, prediction and indexation based on an efficient model combined with variable selection regularization and model averaging method. The main characteristic of this platform is its ability to scale on very large datasets with hundreds of thousands of instances and thousands of variables. The rapid and robust detection of the variables that have most contributed to the output prediction can be a key factor in a marketing application. Appetency: In our context, the appetency is the propensity to buy a service or a product. The training set contains 50,000 examples. The first predictive 190 variables are numerical and the last 40 predictive variables are categorical. The last target variable is binary {-1,1}.




This page was built for dataset: KDDCup09_appetency