Data availability

 
The AURORA manuscript data allows the replication of the main analyses in the Aftimos et al article. It comprises data from 617 samples from 375 patients, with the molecular data distributed like this:
 
Data type Patients with primary data Patients with meta data Patients with both data
Ion Torrent TGS (441 genes) 19 114 242
Illumina RNA-Seq 53 109 152
Affymetrix Oncoscan CNV 110 67 67
 

Data description

 
The processed data is distributed in 5 files. Please refer to the article methods section for additional details on the computed variables.
 
File 1 Aurora_clinical_data.csv: Clinical data
 
PAM50_primary
PAM50 subtype of the primary tumor
PAM50_meta
PAM50 subtype of the biopsied metastatic lesion
IHC_primary
Biological subtype (HR+/HER2-, HER2+, TNBC) of the primary tumor
IHC_meta
Biological subtype (HR+/HER2-, HER2+, TNBC) of the biopsied metastatic lesion
metastatic_biopsy_site
Site of the metastatic biopsy
is_de_novo
Is it de novo metastatic breast cancer?
adjuvant
Did the patient receive an adjuvant treatment?
neodajuvant
Did the patient receive a neoadjuvant treatment?
block_before_or_after_neo_treatement
If patient received a neoadjuvant treatment, was the sample collected before or after it?
metastatic_tx_lines_before_aurora
Number of metastatic treatment lines (0 or 1) before enrolment
num_metastatic_sites_at_inclusion
Number of metastatic sites at inclusion in the study
primary_grade
Grade of the primary tumor
primary_patho_node_status
Lymph node involvement for primary lesion
primary_size_t1_or_t2
Size (T1 or T2) of the primary tumor
overall_survival_days
Overall survival in days
death_events
Did the patient die (1) during the study, or not (0)
time_to_metatstatic_relapse_days
Time from primary diagnosis to metastaic relapse in days
patient_in_oncoplot
Is patient is included in the Oncoplot (1), or not (0)
 
File 2 AURORA_table_mutations.csv: Sequence variants detected with Targeted Gene Sequencing
 
sample_type
Type of sample (primary or meta)
chrom
Chromosome
pos
Genomic coordinates (hg19 genome reference)
ref
Reference allele
alt
Alternative allele detected in the tumor
type
Type of variant (INS, DEL, SNV)
symbol
Gene symbol
VAF
Variant allele frequence
coverage
Sequencing coverage
effect
Consequence of the mutation
AA
Variant at the amino acid level
CN
Copy number estimate
purity_affy
Purity estimate based on the Affymetrix OncoScan data
purity_facets
Putity estimate based on TGS with FACETS
CCF
Estimate of the cancer cell fraction
cellular_prevalence
Estimate of the cellular prevalence
cDNA
Variant at the cDNA level
census_role_in_cancer
Role of the gene in the COSMIC Cancer census
cosmic_mutation_significance_tier
Significance tier in COSMIC for the variant
Protein_position
Amino acid position in the protein
vep_consequence
Predicted consequence by the Variant Effect Predictor
SIFT
Predicted pathogenicity by SIFT
PolyPhen
Predicted pathogenicity by Polyphen
FATHMM_pred
Predicted pathogencity by FATHMM
aurora_driver_call
Aggregated pathogenicity call
aurora_driver_call_source
Source of the aggregated pathogenicity call
 
File 3 OncoScan_CNA.csv: Affymetric OncoScan CNA regions
 
sample
Type of sample (primary or meta)
chrom
Chromosome
start
Start coordinate of the CNA region (hg19 reference genome)
end
Start coordinate of the CNA region (hg19 reference genome)
nMajor
Copy number of major haplotype in sample
nMinor
Copy number of minor haplotype in sample
nAraw
Unajusted copy number of major haplotype in sample
nBraw
Unajusted copy number of minor haplotype in sample
 
File 4 TGS_CNA.csv: Gene-level copy number estimates from TGS with FACETS
 
File 5 RNA.zip: Raw gene counts from RNA-Seq
 
In addition to the processed data, the corresponding raw data (BAM files and FASTQ files for the sequencing data, OSCHP files for the CNV data) are available in the context of a research proposal if requested.