Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
credit_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True - MaRDI portal

Deprecated: Use of MediaWiki\Skin\SkinTemplate::injectLegacyMenusIntoPersonalTools was deprecated in Please make sure Skin option menus contains `user-menu` (and possibly `notifications`, `user-interface-preferences`, `user-page`) 1.46. [Called from MediaWiki\Skin\SkinTemplate::getPortletsTemplateData in /var/www/html/w/includes/Skin/SkinTemplate.php at line 691] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Deprecated: Use of MediaWiki\Skin\BaseTemplate::getPersonalTools was deprecated in 1.46 Call $this->getSkin()->getPersonalToolsForMakeListItem instead (T422975). [Called from Skins\Chameleon\Components\NavbarHorizontal\PersonalTools::getHtml in /var/www/html/w/skins/chameleon/src/Components/NavbarHorizontal/PersonalTools.php at line 66] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

Deprecated: Use of QuickTemplate::(get/html/text/haveData) with parameter `personal_urls` was deprecated in MediaWiki Use content_navigation instead. [Called from MediaWiki\Skin\QuickTemplate::get in /var/www/html/w/includes/Skin/QuickTemplate.php at line 131] in /var/www/html/w/includes/Debug/MWDebug.php on line 372

credit_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

From MaRDI portal
Dataset:6037351



OpenML44357MaRDI QIDQ6037351

OpenML dataset with id 44357

Author name not available (Why is that?)

Full work available at URL: https://api.openml.org/data/v1/download/22111119/credit_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True.arff

Upload date: 17 November 2022



Dataset Characteristics

Number of classes: 2
Number of features: 11 (numeric: 10, symbolic: 1 and in total binary: 1 )
Number of instances: 2,000
Number of instances with missing values: 0
Number of missing values: 0

Subsampling of the dataset credit (44089) with

seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code:


```python

   def subsample(
       self,
       seed: int,
       nrows_max: int = 2_000,
       ncols_max: int = 100,
       nclasses_max: int = 10,
       stratified: bool = True,
   ) -> Dataset:
       rng = np.random.default_rng(seed)
       x = self.x
       y = self.y
       # Uniformly sample
       classes = y.unique()
       if len(classes) > nclasses_max:
           vcs = y.value_counts()
           selected_classes = rng.choice(
               classes,
               size=nclasses_max,
               replace=False,
               p=vcs / sum(vcs),
           )
           # Select the indices where one of these classes is present
           idxs = y.index[y.isin(classes)]
           x = x.iloc[idxs]
           y = y.iloc[idxs]
       # Uniformly sample columns if required
       if len(x.columns) > ncols_max:
           columns_idxs = rng.choice(
               list(range(len(x.columns))), size=ncols_max, replace=False
           )
           sorted_column_idxs = sorted(columns_idxs)
           selected_columns = list(x.columns[sorted_column_idxs])
           x = x[selected_columns]
       else:
           sorted_column_idxs = list(range(len(x.columns)))
       if len(x) > nrows_max:
           # Stratify accordingly
           target_name = y.name
           data = pd.concat((x, y), axis="columns")
           _, subset = train_test_split(
               data,
               test_size=nrows_max,
               stratify=data[target_name],
               shuffle=True,
               random_state=seed,
           )
           x = subset.drop(target_name, axis="columns")
           y = subset[target_name]
       categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs]
       columns = list(x.columns)
       return Dataset(
           # Technically this is not the same but it's where it was derived from
           dataset=self.dataset,
           x=x,
           y=y,
           categorical_mask=categorical_mask,
           columns=columns,
       )

```






This page was built for dataset: credit_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True