R News

Beware of mis-assembled genomes: still valid today!

by L. Collado-Torres · January 26, 2012

This article is originally published at https://lcolladotor.github.io/

I’ve been recently impressed by Steven Salzberg talk as you might have noticed, and browing his home page I stumbled upon his opinion piece (also by James Yorke): Beware of mis-assembled genomes.

It’s a short note published in 2005, but damn, can anyone deny that it fits perfectly for today’s state of the art in the de novo genome assembly field? I bet no one will. For instance, it’s a solid statement to say:

> The source of most mis-assemblies is, as it has always been, repeats.

He didn’t add a “will always be” or “will be for at least 7 years more” now that we are in 2012, but it feels like this will be the case until we can get accurate (and cheap) reads that span even the longest repeat. Well, maybe we don’t need such huge reads as people have been able to find large genome duplications.

And as I said in my previous post, I’m still surprised by how careless the human genome assembly was carried out as they didn’t track their own steps. I was hoping that wasn’t the case, but clearly it is:

> Indeed, many of the original assemblies of parts of the human genome were done in the mid- and late-1990s, and are now lost.

I’m also impressed by how accurate Steven and James’ prediction was when they forsaw that people were going to be misled in judging assembly quality by contigs size without taking into account mis-assemblies.

They also called upon the bioinformatics community to take action in evaluating genome assemblies. Due to the amount of data nowadays, it feels like a inhuman (well, incluster as infeasible by high power clusters :P, well, incomputable is the correct term) task. But with some funding, I bet Salzberg and colleagues could find a way to do so. At least partially. Yet, as with anything, you need motivation, and I’m note sure they are motivated to clean up the mess.

Thanks for visiting r-craft.org
This article is originally published at https://lcolladotor.github.io/
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Beware of mis-assembled genomes: still valid today!

You may also like...

Categories

Beware of mis-assembled genomes: still valid today!

You may also like...

biased sample!

R Weekly 2020-11 Equity, Shiny Education, Persistent Config

How to Make Stunning Histograms in R: A Complete Guide with ggplot2

Categories