Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
Are-Two-Sentences-of-the-Same-Topic - MaRDI portal

Are-Two-Sentences-of-the-Same-Topic

From MaRDI portal
Dataset:6036786



OpenML43691MaRDI QIDQ6036786

OpenML dataset with id 43691

No author found.

Full work available at URL: https://api.openml.org/data/v1/download/22102516/Are-Two-Sentences-of-the-Same-Topic.arff

Upload date: 24 March 2022
Copyright license: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International



Dataset Characteristics

Number of features: 3 (numeric: 1, symbolic: 0 and in total binary: 0 )
Number of instances: 129,156
Number of instances with missing values: 0
Number of missing values: 0

Do two sentences come from the same article? We randomly sampled sentences from across Wikipedia. Some sentences came from the same articles, others do not. Sentences from the Same Article These two sentences are from the same article.

There were 2,788 housing units at an average density of 4 per squaremile (2/km). It is also home to the Oklahoma State Reformatory, located in Granite.

So are these:

Monument of the Judiciary Citadel of Salerno, near the Colle Bellara. The La Carnale Castle got his name from a medieval battle against the Arabs and is part of a sport complex (with pool, tennis courts and hockey).

As are these:

The idea of Haar measure is to take a sort of limit of as becomes smaller to make it additive on all pairs of disjoint compact sets, though it first has to be normalized so that the limit is not just infinity. When left and right Haar measures differ, the right measure is usually preferred as a prior distribution.

Sentences from Different Articles These two sentences are from different articles:

US Open womens doubles champion France Ranked world No. The average household size was 2.72 and the average family size was 3.19.

As are these:

The initial goal of the WordNet project was to build a lexical database that would be consistent with theories of human semantic memory developed in the late 1960s. Males had a median income of 25,625 versus 20,515 for females.

These are also different:

Meanwhile, Western foods which are rich in fat, salt, sugar, and refined starches are also imported into countries. According to the United States Census Bureau, the CDP has a total area of , of which is land and (3.61) is water.

Disclaimer Please note, we attempted to remove any data sampled that includes controversial words or hate speech. However, such language is present in Wikipedia, so some such material may be present in this dataset. Due to the size of this dataset, it was not possible to have a human being audit each sentence.




This page was built for dataset: Are-Two-Sentences-of-the-Same-Topic