Normalized Difference Urban Index (NDUI)
NOAA U.S. Climate Normals
Human Cancer Models Initiative (HCMI) Cancer Model Development Center
NOAA National Bathymetric Source Data
NOAA Global Surface Summary of Day
Image classification - fast.ai datasets
National Cancer Institute Center for Cancer Research - Diffuse Large B Cell Lymphoma (DLBCL) Genomics and Expression
Multiview Extended Video with Activities (MEVA)
District of Columbia - Classified Point Cloud LiDAR 2018
District of Columbia - Classified Point Cloud LiDAR 2015
Japanese Tokenizer Dictionaries
NOAA Integrated Surface Database (ISD)
NOAA Terrestrial Climate Data Records
NOAA Global Mosaic of Geostationary Satellite Imagery (GMGSI)
NOAA Global Hydro Estimator (GHE)
InRad COVID-19 X-Ray and CT Scans
Central Weather Bureau OpenData
Voices Obscured in Complex Environmental Settings (VOiCES)
COVID-19 Molecular Structure and Therapeutics Hub
Daylight Map Distribution of OpenStreetMap
Cancer Genome Characterization Initiatives - Burkitt Lymphoma, HIV+ Cervical Cancer
Basic Local Alignment Sequences Tool (BLAST) Databases
New Jersey Statewide Digital Aerial Imagery Catalog
New Jersey Statewide LiDAR
NA-CORDEX - North American component of the Coordinated Regional Downscaling Experiment
A2D2: Audi Autonomous Driving Dataset
Longitudinal Nutrient Deficiency
Nanopore Reference Human Genome
Genome Aggregation Database (gnomAD) - Data Lakehouse Ready
KITTI Vision Benchmark Suite
Global Seasonal Sentinel-1 Interferometric Coherence and Backscatter Data Set
EPA Risk-Screening Environmental Indicators
Atmospheric Models from Meteo-France
2021 Amazon Last Mile Routing Research Challenge Dataset
DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue
Clinical Proteomic Tumor Analysis Consortium 2 (CPTAC-2)
Finnish Meteorological Institute Weather Radar Data
Earth Observation Data Cubes for Brazil
1000 Genomes Phase 3 Reanalysis with DRAGEN 3.5 and 3.7
The Massively Multilingual Image Dataset (MMID)
Allen Ivy Glioblastoma Atlas
1940 Census Population Schedules, Enumeration District Maps, and Enumeration District Descriptions
1950 Census Population Schedules, Enumeration District Maps, and Enumeration District Descriptions
Google Brain Genomics Sequencing Dataset for Benchmarking and Development
GATK Structural Variation (SV) Data
COCO - Common Objects in Context - fast.ai datasets
The Genome Modeling System
Ford Multi-AV Seasonal Dataset
Cancer Cell Line Encyclopedia (CCLE)
Allen Brain Observatory - Visual Coding AWS Public Data Set
Cloud to Street - Microsoft Flood and Clouds Dataset
NIH NCBI PMC Article Datasets - Full-Text Biomedical and Life Sciences Journal Articles on AWS
Common Crawl February/March 2021
Common Crawl January 2022
CoMMpass from the Multiple Myeloma Research Foundation
Common Crawl January 2021
Common Crawl June/July 2022
Cell Painting Image Collection
CAM6 Data Assimilation Research Testbed (DART) Reanalysis: Cloud-Optimized Dataset
Sloan Digital Sky Survey Release 14
Copernicus Digital Elevation Model (DEM)
Boreas Autonomous Driving Dataset
The Human Microbiome Project
NOAA Emergency Response Imagery
IBL Neuropixels Brainwide Map on AWS
Multi-Scale Ultra High Resolution (MUR) Sea Surface Temperature (SST)
Legal Entity Identifier (LEI) and Legal Entity Reference Data (LE-RD)
Cloud Indexes for Bowtie, Kraken, HISAT, Centrifuge, and SPUMONI
NOAA Atmospheric Climate Data Records
IDEAM - Colombian Radar Network
LOFAR ELAIS-N1 cycle 2 observations on AWS
Genome Aggregation Database (gnomAD)
GeoNet Aotearoa New Zealand Data
Amazon Berkeley Objects Dataset
A Realistic Cyber Defense Dataset (CSE-CIC-IDS2018)
Community Earth System Model Large Ensemble (CESM LENS)
The Klarna Product-Page Dataset
NOAA Global Extratropical Surge and Tide Operational Forecast System (Global ESTOFS)
Genome in a Bottle on AWS
Community Earth System Model v2 Large Ensemble (CESM2 LENS)
Galaxy Evolution Explorer Satellite (GALEX)
Distributed Archives for Neurophysiology Data Integration (DANDI)
Airborne Object Tracking Dataset
COVID-19 Genome Sequence Dataset
NOAA High-Resolution Rapid Refresh (HRRR) Model
NOAA Global Ensemble Forecast System (GEFS)
NIH NCBI Sequence Read Archive (SRA) on AWS
Image localization - fast.ai datasets
Defense Meterology Satellite Program (DMSP) Auroral Particle Flux
ARPA-E PERFORM Forecast data
NOAA Oceanic Climate Data Records
Sloan Digital Sky Survey Release 15
Sloan Digital Sky Survey Release 16
Low Altitude Disaster Imagery (LADI) Dataset
Global Database of Events, Language and Tone (GDELT)
Human PanGenomics Project
UK Met Office Global and Regional Weather Forecasts
Cell Organelle Segmentation in Electron Microscopy (COSEM) on AWS
Sounds of Central African landscapes
NOAA Fundamental Climate Data Records (FCDR)
1000 Genomes Phase 3 Reanalysis with DRAGEN 3.5 - Data Lakehouse Ready
NOAA Climate Forecast System (CFS)
International Neuroimaging Data-Sharing Initiative (INDI)
Transiting Exoplanet Survey Satellite (TESS)
UK Biobank Pan-Ancestry Summary Statistics
Fly Brain Anatomy: FlyLight Gen1 and Split-GAL4 Imagery
DNAStack COVID19 SRA Data
NOAA Global Forecast System (GFS)
NOAA Geostationary Operational Environmental Satellites (GOES) 16 & 17
Hubble Space Telescope Public Data
Sloan Digital Sky Survey Release 17
Foldingathome COVID-19 Datasets
Encyclopedia of DNA Elements (ENCODE)
Coupled Model Intercomparison Project 6
Variant Effect Predictor (VEP) and the Loss-Of-Function Transcript Effect Estimator (LOFTEE) Plugin
Common Crawl September 2020
Common Crawl November/December 2020
Common Crawl October 2020
NOAA Global Historical Climatology Network Daily (GHCN-D)
Medical Segmentation Decathlon
Common Crawl November/December 2021
NapierOne Mixed File Dataset
ChEMBL - Data Lakehouse Ready
Clinical Proteomic Tumor Analysis Consortium 3 (CPTAC-3)
High Resolution Population Density Maps + Demographic Estimates by CIESIN and Meta
NASA Prediction of Worldwide Energy Resources (POWER)
Common Crawl July/August 2021
Common Crawl September 2021
Common Crawl October 2021
Africa Soil Information Service (AfSIS) Soil Chemistry
Deutsche Boerse Public Dataset
Beat Acute Myeloid Leukemia (AML) 1.0
NOAA Water-Column Sonar Data Archive
NOAA Joint Polar Satellite System (JPSS)
NOAA North American Mesoscale Forecast System (NAM)
3000 Rice Genomes Project
NOAA Global Ensemble Forecast System (GEFS) Re-forecast
Epoch of Reionization Dataset
Allen Cell Imaging Collections
National Archives Catalog
Common Crawl May/June 2020
Common Crawl March/April 2020
Common Crawl February 2020