Top Jet W-Momentum Reconstruction Dataset
DOI10.5281/zenodo.8197723Zenodo8197723MaRDI QIDQ6723997
Dataset published at Zenodo repository.
Author name not available (Why is that?)
Publication date: 30 July 2023
Copyright license: No records found.
Overview A set of Monte Carlo simulated events, for the evaluation of top quarks' (and theirchild particles') momentum reconstruction, produced using the HEPData4ML package [1]. Specifically, the entries in this dataset correspond with top quark jets, and the momentum of the jets' constituent particles.This is a newer version of the "Top Quark Momentum Reconstruction Dataset" [2], but with sufficiently large changes to warrant this separate posting. The datasetis saved in HDF5 format, as sets of arrays with keys (as detailed below). There are ~1.5M events, approximately broken down into the following sets: Training: 700k events (files with "_train" suffix) Validation: 200k events (files with "_valid" suffix) Testing (small): 100k events (files with "_test" suffix) Testing (large): 500k events (files with "_test_large" suffix) The two separate types of testing files -- small and large -- are independent from one another, the former for conveniently running quicker testing and the latter for testing with a larger sample. There are four version of the dataset present, with the versions indicated by the filenames. The different versions correspond with whether or not fast detector simulation was performed (versus truth-level jets), and whether or not the W-boson mass was modified: One version of the dataset uses the nominal value of \(m_W = 80.385 \text{ GeV}\)as used by Pythia8 [3], whereas another uses a variable mW taking on 101 values evenly-spaced as\(m_W \in \{ 64.308,96.462 \} \text{ GeV}\). The dataset naming scheme is as follows: train.h5 : jets clustered from truth-level, nominal mW train_mW.h5: jets clustered from truth-level, variable mW train_delphes.h5: jets clustered from Delphes outputs, nominal mW train_delphes_mW.h5: jets clustered from Delphes outputs, variable mW Description 13 TeVcenter-of-mass energy, fully hadronic top quark decays, simulated with Pythia8. (\(t \rightarrow W \, b, \; W\rightarrow q \, q'\)) Events are generated with leading top quark pT in [550,650] GeV. (set via Pythia8's \(\hat{p}_{T,\text{ min}}\)and\(\hat{p}_{T,\text{ max}}\) variables) No inital- or final-state radiation (ISR/FSR), nor multi-parton interactions (MPI) Where applicable,detector simulation is done using DELPHES [4], with the ATLAS detector card. Clustering of particles/objects is done via FastJet [5], using the anti-kT algorithm, with\(R=0.8\). For the truth-level data, inputs to jet clustering are truth-level, final-state particles (i.e. clustering "truth jets"). For the data with detector simulation, the inputs are calorimeter towers from DELPHES. `Tower` objects from DELPHES (notE-flow objects, no tracking information) Each entry in the dataset corresponds with a single top quark jet, extracted from a \(t\bar{t}\)event. All jets are matched to a parton-level top quark within\(\Delta R 0.8\). We choose the jetnearestthe parton-level top quark. Jets are required to have \(|\eta| 2\), and\(p_{T} 15 \text{ GeV}\). The 200 leading (highest-pT) jet constituent four-momenta are stored in Cartesian coordinates(E,px,py,pz), sorted by decreasingpT, with zero-padding. The jet four-momentum is stored in Cartesian coordinates (E, px, py, pz), as wellas in cylindrical coordinates \((p_T,\eta,\phi,m)\). The truth (parton-level) four-momenta of the top quark, the bottom quark the W-boson, and the quarks to which the W-boson decays, are stored in Cartesian coordinates. In addition, the momenta of the 120 leading stable daughter particles of the W-boson are stored in Cartesian coordinates. Description of data fields metadataBelow is a brief description of the various fields in the dataset. The dataset also contains metadata fields, stored using HDF5's "attributes". This is used for fields that are common across many events, and stores information such as generator-level configurations (in principle, all the information is stored as to be able to recreate the dataset with the HEPData4ML tool). Note that fields whose keys have the prefix "jh_" correspond with output from the Johns Hopkins top tagger [6], as implemented in FastJet. Also note that for the keys corresponding with four-momenta in Cartesian coordinates, there are rotated versions of these fields -- the data has been rotated so that the W-boson is at\((\theta=0, \phi=0)\), and the b-quark is in the\((\theta=0, \phi 0)\)plane. This rotation is potentially useful for visualizations of the events. Nobj: The number of constituents in the jet. Pmu: The four-momenta of the jet constituents, in (E, px, py, pz). Sorted by decreasing pT and zero-padded to a length of 200. Pmu_rot: Rotated version. contained_daughter_sum_Pmu: Four-momentum sum of the stable daughter particles of the W-boson that fall within\(\Delta R 0.8\)of the jet centroid. contained_daughter_sum_Pmu_rot: Rotated version. cross_section: Cross-section for the corresponding process, reported by Pythia8. cross_section_uncertainty:Cross-section uncertainty for the corresponding process, reported by Pythia8. energy_ratio smeared: Ratio of the true energy of W-boson daughter particles contributing to this calorimeter tower, divided by the total smeared energy in this calorimeter tower. Only relevant for the DELPHES datasets. energy_ratio_truth: Ratio of the true energy of W-boson daughter particles contributing to this calorimeter tower, divided by the total true energy of particles contributing to this calorimeter tower. The above definition is relevant only for the DELPHES datasets. For the truth-level datasets, this field is repurposed to store a value (0 or 1) indicating whether or not the given particle (whose momentum is in the `Pmu` field) is a W-boson daughter. event_idx: Redundant -- used for event indexing during the event generation process. is_signal: Redundant -- indicates whether an event is signal or background, but this is a fully signal dataset. Potentially useful if combining with other datasets produced with HEPData4ML. jet_Pmu: Four-momentum of the jet, in(E, px, py, pz). jet_Pmu_rot: Rotated version. jet_Pmu_cyl: Four-momentum of the jet, in \((pT_,\eta,\phi,m)\). jet_bqq_contained_dR06: Boolean flag indicating whether or not the truth-level b and the two quarks from W decay are contained within\(\Delta R 0.6\)of the jet centroid. jet_bqq_contained_dR08:Boolean flag indicating whether or not the truth-level b and the two quarks from W decay are contained within\(\Delta R 0.8\)of the jet centroid. jet_bqq_dr_max: Maximum of\(\big\lbrace \Delta R \left( \text{jet},b \right), \; \Delta R \left( \text{jet},q \right), \; \Delta R \left( \text{jet},q' \right) \big\rbrace\). jet_qq_contained_dR06:Boolean flag indicating whether or not the two quarks from W decay are contained within\(\Delta R 0.6\)of the jet centroid. jet_qq_contained_dR08:Boolean flag indicating whether or not the two quarks from W decay are contained within\(\Delta R 0.8\)of the jet centroid. jet_qq_dr_max:Maximum of\(\big\lbrace \Delta R \left( \text{jet},q \right), \; \Delta R \left( \text{jet},q' \right) \big\rbrace\). jet_top_daughters_contained_dR08: Boolean flag indicating whether the final-state daughters of the top quark are within\(\Delta R 0.8\)of the jet centroid. Specifically, the algorithm for this flag checks that the jet contains the stable daughters of both the b quark and the W boson. For the b and W each, daughter particles are allowed to be uncontained as long as (for each particle) the \(p_T\)of the sum of uncontained daughters is below\(2.5 \text{ GeV}\). jh_W_Nobj: Number of constituents in the W-boson candidate identified by the JH tagger. jh_W_Pmu: Four-momentum of the JH tagger W-boson candidate, in(E, px, py, pz). jh_W_Pmu_rot: Rotated version. jh_W_constituent_Pmu: Four-momentum of the constituents of the JH tagger W-boson candidate, in(E, px, py, pz). jh_W_constituent_Pmu_rot: Rotated version. jh_m: Mass of the JH W-boson candidate. jh_m_resolution: Ratio of JH W-boson candidate mass, versus the true W-boson mass. jh_pt:\(p_T\)of the JH W-boson candidate. jh_pt_resolution:Ratio of JH W-boson candidate \(p_T\), versus the true W-boson mass. jh_tag: Whether or not a jet was tagged by the JH tagger. mc_weight: Monte Carlo weight for this event, reported by Pythia8. process_code: Process code reported by Pythia8. rotation_matrix: Rotation matrix for rotating the events' 3-momenta as to produce the rotated copies stored in the dataset. truth_Nobj: Number of truth-level particles (saved in truth_Pmu). truth_Pdg: PDG codes of the truth-level particles. truth_Pmu: Truth-level particles: The top quark, bottom quark, W boson, q, q', and 120 leading, stable W-boson daughter particles, in (E, px, py, pz). A few of these are also stored in separate keys: truth_Pmu_0: Top quark. truth_Pmu_0_rot: Rotated version. truth_Pmu_1: Bottom quark. truth_Pmu_1_rot: Rotated version. truth_Pmu_2: W-boson. truth_Pmu_2_rot: Rotated version. truth_Pmu_3: q from W decay. truth_Pmu_3_rot: Rotated version. truth_Pmu_4: q' from W decay. truth_Pmu_4_rot: Rotated version. truth_Pmu_0_rot: Rotated version of `truth_Pmu`. The following fields correspond with metadata -- they provide the index of the corresponding metadata entry for each event: command_line_arguments: The command-line arguments passed to HEPData4ML's `run.py` script. config_file: The contents of the Python configuration file used for HEPData4ML. This, together with the command-line arguments, defines how the tool was run, what processes, jet clustering and post-processing was done, etc. git_hash: Git hash for HEPData4ML. timestamp: Timestamp for when the dataset was created (local). timestamp_string_utc: Timestamp for when the dataset was created (in UTC). pythia_config: Configuration file passed to Pythia8, by HEPData4ML. Defines the process. pythia_random_seed: Random seed passed to Pythia8, for initializing its random number generator. unique_id: A unique string for identifying the generated set of events (one run of the HEPData4ML tool). unique_id_short: Similar to `unique_id`, but shortened. Citations [1]: J. T. Offermann, X. Liu, and T. Hoffman,HEPData4ML (2023). [2]: J. T. Offermann, A. Bogatskiy, and T. Hoffman, Top Quark Momentum Reconstruction Dataset(2022). [3]:C. Bierlich and others,A Comprehensive Guide to the Physics and Usage of PYTHIA 8.3, (2022). [4]:J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lematre, A. Mertens, and M. Selvaggi,DELPHES 3, A Modular Framework for Fast Simulation of a Generic Collider Experiment, JHEP02, 057 (2014). [5]:M. Cacciari, G. P. Salam, and G. Soyez,FastJet User Manual, Eur. Phys. J. C72, 1896 (2012). [6]:D. E. Kaplan, K. Rehermann, M. D. Schwartz, and B. Tweedie,Top Tagging: A Method for Identifying Boosted Hadronically Decaying Top Quarks, Phys. Rev. Lett.101, 142001 (2008).
This page was built for dataset: Top Jet W-Momentum Reconstruction Dataset