Artwork for podcast Lefteris asks science
Lefteris asks science - Edition 25 - DNA Sequencing and Genomes (w/ Krithika Arumugam)
Episode 2518th May 2021 • Lefteris asks science • Lefteris Statharas
00:00:00 00:25:05

Share Episode

Shownotes

First time in more than a year met in three-dimensional space with a researcher. Ms. Krithika Arumugam from  Singapore Centre for Environmental Life Sciences Engineering. She helped me understand DNA Sequencing and genomes.

Krithika Arumugam SCELSE page: https://www.scelse.sg/People/Detail/770a9e99-97c2-4c13-b470-a1631db35409

Link to the work discussed in the podcast:https://www.nature.com/articles/s41522-021-00196-6

Subscribe to the podcast: https://www.lefterisasks.com/listen

Subscribe to the weekly newsletter: lefterisasks.com/newsletter

Buy me a coffee: https://www.buymeacoffee.com/LefterisAsks

Join the Facebook Group: https://www.facebook.com/groups/544815233347328/

Transcripts

Krithika Aramugam:

If you're on a sampling from an environment, there's going to be

Krithika Aramugam:

like, uh, hundreds of bacterias in it.

Krithika Aramugam:

So just imagine, uh, there's going to be like millions of jigsaw pieces and

Krithika Aramugam:

you don't know the original picture.

Krithika Aramugam:

Uh, it's going to be very tricky to sort of assemble them and picture

Krithika Aramugam:

what's exactly there in the community.

Lefteris:

Greetings my good humans and welcome to Lefteris asks science edition.

Lefteris:

Number 25.

Lefteris:

I am.

Lefteris:

Lefteris the annoying guy that calls academics and scientists and ask them

Lefteris:

questions until I understand what, how and why they do what they do this week.

Lefteris:

For the first time in more than a year, I met a person in three dimensional space.

Lefteris:

I traveled all the way to Nanyang technological university and met Ms.

Lefteris:

Krithika Arumugam from the Singapore center for environmental life

Lefteris:

sciences and engineering, and she helped me understand what genomes

Lefteris:

are and what DNA sequencing is before we go on with the show as always.

Lefteris:

We have some housekeeping.

Lefteris:

If you enjoy the show, please subscribe to it and share it with

Lefteris:

your friends that might like it too.

Lefteris:

Follow me on Twitter at Lefteris_asks.

Lefteris:

And we also now have an Instagram page under the same

Lefteris:

name, which is very exciting.

Lefteris:

Also, I just made a new Facebook group called Lefteris , ask science,

Lefteris:

come join the group to ask questions, find out about updates and much more.

Lefteris:

Lastly, I have a weekly newsletter where I share my favorite news from the

Lefteris:

world of science and academia have small explanations and links to the research

Lefteris:

for anyone who wants to find out more.

Lefteris:

If you like that, go to the show notes and click that link

Lefteris:

to subscribe to the newsletter.

Lefteris:

Lastly, in the show notes, you'll find links that you can support

Lefteris:

me in doing this by donating.

Lefteris:

Let's now meet Ms.

Krithika Aramugam:

Arumugam.

Krithika Aramugam:

I'm Krithika Arumugam.

Krithika Aramugam:

I'm a, bioinformatician at the Singapore center for environmental

Krithika Aramugam:

life science engineering, which is a research center of excellence

Krithika Aramugam:

located at Nanyang technological university here in Singapore.

Krithika Aramugam:

So, um, I finished my undergraduate , computer science

Krithika Aramugam:

and engineering from India.

Krithika Aramugam:

And then, um, I moved through my master's here in Singapore and

Krithika Aramugam:

bioinformatics, or just to be at the center of an Interdisciplinary field

Lefteris:

do not adjust your podcast sets.

Lefteris:

I did say in the beginning of the show, we're going to talk

Lefteris:

about genomes and DNA sequencing.

Lefteris:

Ms.

Lefteris:

Arumugam moved from computer science background to bioinformatics.

Lefteris:

If you want to be a part of multidisciplinary study, then you'll

Lefteris:

probably have to juggle a lot of different information about different fields.

Lefteris:

How has that experience for Ms.

Lefteris:

Arumugam?

Krithika Aramugam:

Yeah, I would say it was challenging initially.

Krithika Aramugam:

Um, well, um, I was completely new to all the biology constantly.

Krithika Aramugam:

I had some modules in biology in my high school.

Krithika Aramugam:

That was my last, you know, studying biology, but then after coming, yeah,

Krithika Aramugam:

like, uh, yeah, used to be, it was a bit challenging jumping into a new topic.

Krithika Aramugam:

Uh, you'll get to know a lot of things.

Krithika Aramugam:

It was quite interesting.

Krithika Aramugam:

That's right.

Krithika Aramugam:

But, uh, like using your informatics skills to understand, you know, complex

Krithika Aramugam:

things and biology was very exciting.

Krithika Aramugam:

Yeah.

Lefteris:

It was always exciting for me to see unfamiliar terms, even from

Lefteris:

the title of the paper, while I might have seen the word genome before the

Lefteris:

word replicon did not sound familiar.

Lefteris:

Lucky enough.

Lefteris:

I have someone that works with both genomes and Replicons to

Lefteris:

explain to me what is, what.

Krithika Aramugam:

So, I guess, uh, do you know, must be a very familiar term,

Krithika Aramugam:

if not, it's, uh, it's basically a genetic material found in any living organisms,

Krithika Aramugam:

like, uh, so which is actually the DNA, the DNA, Deoxyribonucleic acid, I guess.

Krithika Aramugam:

Uh, you know, there is a sort of, uh, uh, helix bridge in Singapore,

Krithika Aramugam:

near the Marina Bay sands.

Krithika Aramugam:

So that's a DNA inspired structure.

Krithika Aramugam:

So that's how a DNA look like it looks like.

Krithika Aramugam:

And, uh, So once you decode that or for any particular organism, then,

Krithika Aramugam:

uh, that's actually gone the genomes.

Krithika Aramugam:

So, uh, which is actually responsible for, uh, all the functional aspects

Krithika Aramugam:

of living organism or how actually a living organism looks in case of humans.

Krithika Aramugam:

Like how you look, uh, what's your hair colour.

Krithika Aramugam:

So everything is encoded in the DNA.

Krithika Aramugam:

Okay.

Krithika Aramugam:

So the genome consist of, you know, the chromosome, which is

Krithika Aramugam:

the primary genetic material.

Krithika Aramugam:

Um, and then data other, uh, genetic material as well, which

Krithika Aramugam:

we call them as non chromosomal replicons which are smaller in

Krithika Aramugam:

size compared to the chromosome, but, uh, they are still useful.

Krithika Aramugam:

And, uh, they have other different function aspects compared to the main

Krithika Aramugam:

chromosome, make better the primary color.

Lefteris:

As we learned, genomes are the instructions and rules of an organism

Lefteris:

and follows to grow and develop.

Lefteris:

Your DNA or the RNA if you are a virus, one thing you might have heard about

Lefteris:

when it comes to DNA and its genomes is a term called sequencing, DNA consists

Lefteris:

of four small, different compounds, cytosine, guanine adenine, and thymine

Lefteris:

the long helical structure of a DNA consists of different

Lefteris:

sequences of these four components.

Lefteris:

So sequencing is finding out what is the structure of the DNA when

Lefteris:

it comes to these four components?

Krithika Aramugam:

Sequencing is basically the process of,

Krithika Aramugam:

uh, reading the DNA and coding.

Krithika Aramugam:

What's actually in the DNA.

Krithika Aramugam:

So that's a technical term.

Krithika Aramugam:

We call it sequencing.

Krithika Aramugam:

What people normally do is, uh, they collect samples or, uh, If it's a single

Krithika Aramugam:

bacteria, they tried to grow it in the lab and then, uh, sequence it as in

Krithika Aramugam:

sequence it, as in meaning, uh, decoding, what's actually present at the DNA to

Krithika Aramugam:

understand the function of that particular organism in this case, bacteria.

Krithika Aramugam:

So there are different methods, which does it.

Krithika Aramugam:

So once you collect the sample, you extract the DNA out of it.

Krithika Aramugam:

So this is all done in the lab.

Krithika Aramugam:

So the DNA is actually sheared the processes.

Krithika Aramugam:

You start with shearing the DNA.

Krithika Aramugam:

That's you cut down the DNA into multiple different lengths and then the machine,

Krithika Aramugam:

tries to read, uh, each of the fragments.

Krithika Aramugam:

So.

Krithika Aramugam:

And the machine actually gives you an output in the form of text file.

Krithika Aramugam:

So, okay.

Krithika Aramugam:

Text file.

Krithika Aramugam:

As in, you cannot open it with a normal text editor, it's going to be quite big

Krithika Aramugam:

it's in terms of giggle, it can be as big as gigabytes or terabytes as friendly.

Krithika Aramugam:

So, I mean, that's when we step in the computation people step in because as

Krithika Aramugam:

the data gets bigger, you need like specialized people with different

Krithika Aramugam:

expertise to look at the data.

Krithika Aramugam:

Right.

Krithika Aramugam:

So, okay.

Krithika Aramugam:

The actually the file.

Krithika Aramugam:

It's actually,a text file, uh, made up off, uh, different characters encoded

Krithika Aramugam:

in the DNA, which is actually translates to the compote and coding and the DNA.

Krithika Aramugam:

So the DNA is actually made up of, uh, four compounds.

Krithika Aramugam:

We call it, uh, A T G and C, which is actually Adenine,

Krithika Aramugam:

thymine gyanine and cytosine.

Krithika Aramugam:

So.

Krithika Aramugam:

Theshort form is ATGC.

Krithika Aramugam:

So you have a text file with, you know, permutations and combinations

Krithika Aramugam:

of ATGC, which is actually the DNA being encoded and decoded.

Krithika Aramugam:

Uh, uh, so since you have a sheared DNA fragment before

Krithika Aramugam:

into multiple fragments, so.

Krithika Aramugam:

The land of the DNA, fragment is quite small, right.

Krithika Aramugam:

But the original size of the DNA of the chromosome and the living

Krithika Aramugam:

organism say, for example, microbe, it's going to be like five megabase,

Krithika Aramugam:

but, uh, the technology allows you to only share the DNA and then

Krithika Aramugam:

read it out and smaller famines.

Krithika Aramugam:

So, okay.

Krithika Aramugam:

So since the fragment size is smaller, it becomes difficult to reconstruct the

Krithika Aramugam:

original chromosome of the bacteria.

Krithika Aramugam:

Uh, okay.

Krithika Aramugam:

So an easier way to visualize this as people usually compare it to a

Krithika Aramugam:

jigsaw puzzle, I'm sure you must have played it as a kid or, yeah.

Krithika Aramugam:

So, so you.

Krithika Aramugam:

Okay.

Krithika Aramugam:

In a jigsaw puzzle, you don't know what the picture is, but

Krithika Aramugam:

you have this picture is actually cut down into smaller fragments.

Krithika Aramugam:

So you try and piece them together to form the bigger picture, the bigger picture.

Krithika Aramugam:

Right?

Krithika Aramugam:

So, okay.

Krithika Aramugam:

In this case, it's a bit tricky because if the jigsaw pieces are going to be

Krithika Aramugam:

smaller, there's going to be many pieces.

Krithika Aramugam:

So it's going out.

Krithika Aramugam:

If you don't know the original picture, it's going to be difficult to, you

Krithika Aramugam:

know, uh, put those pieces together and reconstruct the original picture.

Krithika Aramugam:

So in this case, we, uh, uh, you know, uh, jarring the small genome fragments

Krithika Aramugam:

that can be like millions of fragments.

Krithika Aramugam:

So putting them together and trying to find the original

Krithika Aramugam:

chromosome is always challenging.

Krithika Aramugam:

Right.

Krithika Aramugam:

So even if it's challenging, even if it's a single bacteria.

Krithika Aramugam:

So for example, if you're sampling from an environment, there's going to

Krithika Aramugam:

be like a hundreds of bacteria in it.

Krithika Aramugam:

So just imagine, uh, there's going to be like millions of jigsaw pieces and

Krithika Aramugam:

you don't know the original picture.

Krithika Aramugam:

Uh, it's going to be very tricky to sort of assemble them and picture

Krithika Aramugam:

what's exactly there, uh, community.

Lefteris:

I really love that.

Lefteris:

Puzzles example.

Lefteris:

So when scientists want to find out, for example, what types of

Lefteris:

organisms live in the Lake to take a sample and try to piece all of the

Lefteris:

fragments together to figure out what is the complete picture they make?

Lefteris:

It's like, if we take.

Lefteris:

10 1000 piece jigsaw puzzles through all of the pieces in the same box and

Lefteris:

try to create the 10 different images.

Lefteris:

It would take a lot of effort and time to achieve that.

Lefteris:

Now, one thing that would make things easier would be if the pieces

Lefteris:

were actually bigger and here is where the terms short read and

Lefteris:

long read sequencing will be used.

Krithika Aramugam:

Technology has advanced, uh, to an extent that, uh, we

Krithika Aramugam:

can increase the size of the pieces as in the DNA fragments, which are being read.

Krithika Aramugam:

So normally what we do is we do short read sequencing.

Krithika Aramugam:

So, uh, in short read sequencing, um, uh, the error rate is very negligible.

Krithika Aramugam:

So, uh, today it may still use Sharpied sequencing, but, uh, one of

Krithika Aramugam:

the limitation is that the lent of the read or lent of the DNA fragment,

Krithika Aramugam:

we call it a read in technical dorms.

Krithika Aramugam:

So the lens of the DNA fragment, it can read us.

Krithika Aramugam:

It can go up to 300 base pair or 300 characters if you could call it that way.

Krithika Aramugam:

So the technical term is being spare.

Krithika Aramugam:

So each of the is a base and T is a base.

Krithika Aramugam:

So you can read up to 300 base pairs.

Krithika Aramugam:

Okay.

Krithika Aramugam:

Uh, the process of, uh, trying and putting the genome fragments

Krithika Aramugam:

together is going to be tricky.

Krithika Aramugam:

So it's.

Krithika Aramugam:

It becomes easier if the genome Frackman, you know, it can be, uh, if it's a

Krithika Aramugam:

bit longer than the, the sequencing machine, you know, can, uh, read the

Krithika Aramugam:

DNA, fragment, uh, up to a bigger land.

Krithika Aramugam:

It's going to be easier comparatively.

Krithika Aramugam:

When you have a bigger jigsaw piece, you know, it's going to get easier, right?

Krithika Aramugam:

So that's what the long lead sequencing here actually means.

Lefteris:

Now.

Lefteris:

How do they actually do the assembly of the genome and how do they know

Lefteris:

that they got the correct picture?

Krithika Aramugam:

So, what does is, uh, it takes the beads and dry and

Krithika Aramugam:

bothers them to produce a quantity of this, uh, section of the news.

Krithika Aramugam:

So it doesn't overlap.

Krithika Aramugam:

The reads are going to be marched.

Krithika Aramugam:

So like I said, the bead is composed of ATGC.

Krithika Aramugam:

Right?

Krithika Aramugam:

So if there's an overlap in another fragment, those two are

Krithika Aramugam:

going to be marched and we try and produce the algorithm twice and

Krithika Aramugam:

produce extended fragments to try and make the reads a bit longer.

Krithika Aramugam:

So, so once, uh, and a lot of them has, uh, processed the reeds.

Krithika Aramugam:

What we get out of it is called conflicts, uh, which are actually

Krithika Aramugam:

extended fragment of the leads.

Krithika Aramugam:

So it's just a different terminology.

Krithika Aramugam:

Sure.

Krithika Aramugam:

Uh, so we call them con things because they are contiguous

Krithika Aramugam:

sequences, I guess it's a sharp font.

Krithika Aramugam:

Yeah.

Krithika Aramugam:

So.

Krithika Aramugam:

Okay.

Krithika Aramugam:

and then there are different techniques, uh, uh, to try and combine the context

Krithika Aramugam:

based on the, uh, uh, based on the characteristics of the conflicts.

Krithika Aramugam:

As in, um, uh, how, uh, uh, there are different characteristics of the Cod

Krithika Aramugam:

things like you can take into account the abundance of the context, like how many

Krithika Aramugam:

leads were used to make that context.

Krithika Aramugam:

So of the more, the number, uh, the reliable the content is going to be.

Krithika Aramugam:

And then we can group context together with, uh, similar

Krithika Aramugam:

abundances, assuming that they are coming from the same bacteria.

Krithika Aramugam:

I should.

Krithika Aramugam:

Okay.

Krithika Aramugam:

So, I mean, this is one kind of characteristics.

Krithika Aramugam:

There are multiple characteristics as well.

Krithika Aramugam:

So sometimes we combine all those characteristic to try and group those

Krithika Aramugam:

contexts together to see, uh, you know, uh, which of these contexts belong

Krithika Aramugam:

to, or, uh, come from which bacteria.

Krithika Aramugam:

So, okay.

Krithika Aramugam:

In this case, uh, Uh, the back, the genome is still made up of multiple contexts.

Krithika Aramugam:

It's not one complete content.

Krithika Aramugam:

So that's what we are trying to achieve with long lead sequencing.

Krithika Aramugam:

We are trying to achieve, if we can acquit one single content, like, uh,

Krithika Aramugam:

continuous, uh, content, uh, of say five megabase band and blend, which

Krithika Aramugam:

is approximately the size of a back.

Krithika Aramugam:

Yeah.

Krithika Aramugam:

So in case of short leads, you still can get fine obvious pair, but they're

Krithika Aramugam:

going to be fragments of context, which is going to sum up we'll fight.

Krithika Aramugam:

So they can be like a hundred KV, a hundred KB or NMB.

Krithika Aramugam:

So everything together.

Krithika Aramugam:

So they are instilled in multiple fragments, uh, because there are,

Krithika Aramugam:

uh, Since they are in multiple fragments, uh, uh, we might not

Krithika Aramugam:

know the ordering of the con things.

Krithika Aramugam:

You know, if a particular content is going to, you know, be located, located

Krithika Aramugam:

at position one, and it's going to, uh, be difficult to please do this context

Krithika Aramugam:

together or the order in which they.

Krithika Aramugam:

Actually come from.

Krithika Aramugam:

Yeah.

Krithika Aramugam:

So, so those are the limitations associated with charter and

Krithika Aramugam:

sequencing and it can be difficult to sequence or, uh, plays the comp.

Krithika Aramugam:

There can be complex regions in the genome of any organisms.

Krithika Aramugam:

So.

Krithika Aramugam:

If the lead to shorter, we might not have sequenced those complex regions.

Krithika Aramugam:

So it's going to be difficult to piece them together.

Krithika Aramugam:

So that's why we use long read sequencing, uh, to actually, um, check

Krithika Aramugam:

if we can, uh, uh, get a continuous.

Krithika Aramugam:

Genome sequence instead of multiple fragments.

Krithika Aramugam:

Yeah.

Krithika Aramugam:

So that's why we were able to actually, uh, uh, see in this paper, like,

Krithika Aramugam:

uh, we did be bad able to extract around grade two genomes, which was,

Krithika Aramugam:

um, like complete tools to genomes.

Krithika Aramugam:

Okay.

Krithika Aramugam:

Meaning it's in a single fragment, not fragment, meaning it's as

Krithika Aramugam:

soon as single sheepish seats.

Lefteris:

Right.

Lefteris:

The benefits of actually having a face to face interview.

Lefteris:

I was able to see both the machines that were doing the sequencing, but

Lefteris:

also most importantly, themselves, it is astonishing to see a text file that

Lefteris:

is gigabytes in size, in my complete ignorance of homicide among and works.

Lefteris:

I asked if life would be simpler for her.

Lefteris:

If instead of sequences of letters, she would find a different

Lefteris:

way to visualize the data.

Krithika Aramugam:

So it's difficult to wish you advise

Krithika Aramugam:

the reader at the bead level.

Krithika Aramugam:

Yeah, but we, uh, well, uh, at the level, but, um, there are different ways you

Krithika Aramugam:

can, depending upon what do you want to work looking forward in the data,

Krithika Aramugam:

depending upon your research Westin.

Krithika Aramugam:

So if you want to look at.

Krithika Aramugam:

Uh, the taxonomy taxonomy to the content often species and the data

Krithika Aramugam:

then, uh, uh, we try and map the leads to, uh, existing databases.

Krithika Aramugam:

So, uh, if, uh, you know, how many leads mapped or a selected species or

Krithika Aramugam:

a certain gene is off the back the app.

Krithika Aramugam:

And if the number of leads mapping to a certain bacteria is more than making.

Krithika Aramugam:

You know, say a certain, certain percentage of bacteria

Krithika Aramugam:

is found in that sample.

Krithika Aramugam:

So something like that, but, um, uh, that, that's how we were

Krithika Aramugam:

analyzing the data initially.

Krithika Aramugam:

But then as the, uh, you know, uh, assembly and gardens started

Krithika Aramugam:

developing it's, uh, it becomes much more reliable when you try and piece

Krithika Aramugam:

those things together instead of just mapping the global database.

Krithika Aramugam:

We try and reconstruct or we didn't cheat.

Krithika Aramugam:

So that gets more interesting.

Krithika Aramugam:

And we can actually know what kind of bacteria is actually in

Krithika Aramugam:

there and what are they doing?

Krithika Aramugam:

Yeah, yeah, yeah.

Krithika Aramugam:

The functions of it.

Krithika Aramugam:

So if you try and like where the genome, then it's easier to understand

Krithika Aramugam:

how that particular bacteria work,

Krithika Aramugam:

uh, how it's responsible for certain things, certain

Krithika Aramugam:

processes and the sort of stuff.

Lefteris:

Puzzle solving in this magnitude doesn't happen on a local computer level.

Lefteris:

These algorithms require a lot of computational power in order to give

Lefteris:

results in a relatively short time.

Lefteris:

But even then the time is not as short as you think.

Krithika Aramugam:

Sequencing promising sequencing machine depends

Krithika Aramugam:

upon, uh, the throughput of the data.

Krithika Aramugam:

It can take a day or two, so to sequence it, but when you're processing it,

Krithika Aramugam:

processing the data, uh, it depends on.

Krithika Aramugam:

What do you actually want to do?

Krithika Aramugam:

So if you're doing a taxonomy analysis, uh, uh, it can take a few

Krithika Aramugam:

days, uh, but if you're doing an assembly and garden, so, uh, I mean,

Krithika Aramugam:

there's also a sort of a limitation to the existing assembly algorithm.

Krithika Aramugam:

So, uh, so the data size keeps increasing, but, uh, you know, it's difficult to

Krithika Aramugam:

catch it computationally with the devil.

Krithika Aramugam:

As well.

Krithika Aramugam:

So, so for example, we had been trying to assemble, uh, um, uh, I can't remember

Krithika Aramugam:

the exact number, but maybe it on the.

Krithika Aramugam:

A billion reads.

Krithika Aramugam:

So, um, with the existing capacity computeration capacity, we have

Krithika Aramugam:

the metagenome assembly, uh, uh, no two or three months, I guess.

Krithika Aramugam:

So, I mean, it doesn't make sense to wait for that long.

Krithika Aramugam:

So instead we try and.

Krithika Aramugam:

Sort of compress the data or subsample it randomly in a way that, you know,

Krithika Aramugam:

you could answer your questions sooner.

Krithika Aramugam:

Yeah.

Krithika Aramugam:

So, yeah.

Krithika Aramugam:

So it depends on what you actually are looking for or what kind of

Krithika Aramugam:

questions you're looking to answer.

Krithika Aramugam:

So.

Krithika Aramugam:

Uh, so usually, uh, the, in general, we, if you're generating say around,

Krithika Aramugam:

uh, one run of Hi-C Hi-C is the sequencing mission, the Catan of

Krithika Aramugam:

sequencing mission it's mostly sheltering sequencing is mostly done in

Krithika Aramugam:

Illumina Illumina is the company name.

Krithika Aramugam:

So the kind of mesh we use is high seek.

Krithika Aramugam:

So high seek generates around one round, of high seekgenerates around, um, uh,

Krithika Aramugam:

600, approximately 600 million reads

Krithika Aramugam:

so if your community is going to be complex, that is if.

Krithika Aramugam:

You think there's going to be like conduct of bacterias or

Krithika Aramugam:

hundreds of microbes in it.

Krithika Aramugam:

You have to sequence more.

Krithika Aramugam:

Yeah.

Krithika Aramugam:

Only then, you know, the sequencing depth has to be high.

Krithika Aramugam:

You don't need that.

Krithika Aramugam:

And you know, what kind of microbes out in there.

Krithika Aramugam:

And if, uh, if doc under certain microbes are.

Krithika Aramugam:

Less than the community.

Krithika Aramugam:

It's going to be difficult if you sequence, if the sequencing depth is low,

Krithika Aramugam:

it's going to be difficult to recover genomes of microbes of lower abundance.

Krithika Aramugam:

So the higher, the sequencing depth better your chances of recovery.

Krithika Aramugam:

Yeah.

Lefteris:

So

Krithika Aramugam:

basically, right.

Krithika Aramugam:

So yeah, so for, uh, For example, one run of high seek generates

Krithika Aramugam:

around 600 million reads.

Krithika Aramugam:

So I, uh, yeah, so, so you do initiate quality checks as well of the raw data.

Krithika Aramugam:

And then if you're doing a mega genome assembly, um, we

Krithika Aramugam:

usually, uh, split the data.

Krithika Aramugam:

I mean, so for example, you can, okay.

Krithika Aramugam:

So.

Krithika Aramugam:

Uh, they can be multiple samplings sequenced in one drum.

Krithika Aramugam:

So each of those, uh, sound booths might have like say, uh, if there

Krithika Aramugam:

are 10 samples, then they might be around 60 million reads per sample.

Krithika Aramugam:

So you can assemble them sample wise as well.

Krithika Aramugam:

So that's going to be much positive in terms of things.

Krithika Aramugam:

Sure.

Krithika Aramugam:

So if you're putting all the samples together and assembling

Krithika Aramugam:

600 million reads, that's good.

Krithika Aramugam:

Do you need more Ram?

Krithika Aramugam:

That's going to take like a few days, a few weeks.

Krithika Aramugam:

Sure.

Krithika Aramugam:

So, and then once you get the results from the assembly, uh, So, I mean, if

Krithika Aramugam:

it's shocking and sequencing, you're not going to get the complete sequence of

Krithika Aramugam:

back together going to be in fragments.

Krithika Aramugam:

Right.

Krithika Aramugam:

Even though you assembled them.

Krithika Aramugam:

So we use other techniques called my dogs, you know, bending.

Krithika Aramugam:

Uh, that's how, uh, earlier, when I explained we try and group those contexts

Krithika Aramugam:

together based on their characteristics.

Krithika Aramugam:

So that process is actually gone by that, you know, being.

Krithika Aramugam:

So there are other downstream analysis as well to evaluate, uh, if the bin or all

Krithika Aramugam:

those contexts belonging to a particular genome particular bacteria, if they are

Krithika Aramugam:

complete, we have to analyze that as well.

Krithika Aramugam:

So there are other than downstream processing as well.

Krithika Aramugam:

So yeah.

Krithika Aramugam:

Yeah, I could see the one they wanted two months or something for yeah.

Krithika Aramugam:

For wonder highest seat and then there, and then you can interpret, so, okay.

Krithika Aramugam:

These are the kinds of bacterias in their store.

Krithika Aramugam:

We can then check the functions of it, et cetera.

Krithika Aramugam:

So it depends on your research question.

Krithika Aramugam:

Yeah.

Krithika Aramugam:

Time taken.

Krithika Aramugam:

So the initial processing can take a month.

Krithika Aramugam:

Yeah, yeah, yeah.

Lefteris:

And that's it for another edition of Lefteris asks

Lefteris:

science, DNA sequencing is a big puzzle solving exercise, which

Lefteris:

sounds really, really exciting.

Lefteris:

And imagine that sometimes if you have enough results of a sequence that

Lefteris:

you can't match to anything in the database, you might discover a new kind

Lefteris:

of species, which is always exciting.

Lefteris:

I'd like to thankKrithika Arumugam for her time and the description of the

Lefteris:

episode, and you'll find links for her bio and the work we were talking about.

Lefteris:

And thank you for sticking around until the end in the show notes,

Lefteris:

you will find ways that you can support me in doing this.

Lefteris:

One is a way you can support me, especially just sharing

Lefteris:

the episode with a friend.

Lefteris:

I really appreciate it until we meet again, take care,

Chapters

Video

More from YouTube