This is a really niche post intended for geneticists who are doing de novo genome assemblies of 10X linked-reads sequencing data using Supernova (see… niche), but it also might be interesting to people who want to know a little bit about what the heck it is I do for a living.
A good amount of bioinformatics is Googling.
Supernova is an assembly program specifically for use with linked-reads sequencing data generated by 10X Genomics sequencing technology. Explaining exactly what linked-reads are can be a whole other post so if you want those details, go here. But for now, all you need to know is that it’s the raw data I am using to generate reference genomes from scratch (a.k.a. de novo).
The end product of a de novo assembly is something called a FASTA. This is a large file that contains all the As, Gs, Cs, and Ts in order for that genome. This file is what’s used to as a guide for doing future genomic analyses of the species.
In Supernova, there are 4 options for this output. The online documentation didn't go into enough depth on how the 4 output options work for me to really know which would be the best option for what we’re going to be using them for, and Googling didn’t get me any closer to an answer. So, I generated all 4 types of output, got assembly stats (QUAST & BUSCO) on all of them, and that gave me enough info to figure out which option would be best.
So, here's my very simplified takeaway from this endeavor:
Supernova represents sequences in the raw assembly as “microbubbles” and “megabubbles”.
It looks like a “bubble” is when there's more than one sequence assembled to a contig separated by "gaps" (single sequences or runs of Ns). Collections of microbubbles create megabubbles. These megabubbles then have to be flattened into a single sequence. This is where the output options come into play.
So, here’s the full breakdown that sounds like a riddle but isn’t a riddle:
raw is ALL the bubbles even microbubbles within megabubbles in one FASTA
megabubbles flattens microbubbles within megabubble arms in one FASTA
pseudohap flattens all the megabubbles to one FASTA
pseudohap2 is each flattened megabubble arm in a separate FASTA
I hope this helps some of my fellow Googling bioinfomagicians out there and didn’t make everyone else just super confused.