Lefteris asks science - Edition 25 - DNA Sequencing and Genomes (w/ Krithika Arumugam)

Krithika Aramugam: 00:00:00

If you're on a sampling from an environment, there's going to be

Krithika Aramugam: 00:00:03

like, uh, hundreds of bacterias in it.

Krithika Aramugam: 00:00:05

So just imagine, uh, there's going to be like millions of jigsaw pieces and

Krithika Aramugam: 00:00:10

you don't know the original picture.

Krithika Aramugam: 00:00:12

Uh, it's going to be very tricky to sort of assemble them and picture

Krithika Aramugam: 00:00:17

what's exactly there in the community.

Lefteris: 00:00:23

Greetings my good humans and welcome to Lefteris asks science edition.

Lefteris: 00:00:26

Number 25.

Lefteris: 00:00:28

I am.

Lefteris: 00:00:28

Lefteris the annoying guy that calls academics and scientists and ask them

Lefteris: 00:00:32

questions until I understand what, how and why they do what they do this week.

Lefteris: 00:00:37

For the first time in more than a year, I met a person in three dimensional space.

Lefteris: 00:00:42

I traveled all the way to Nanyang technological university and met Ms.

Lefteris: 00:00:47

Krithika Arumugam from the Singapore center for environmental life

Lefteris: 00:00:50

sciences and engineering, and she helped me understand what genomes

Lefteris: 00:00:54

are and what DNA sequencing is before we go on with the show as always.

Lefteris: 00:00:59

We have some housekeeping.

Lefteris: 00:01:01

If you enjoy the show, please subscribe to it and share it with

Lefteris: 00:01:04

your friends that might like it too.

Lefteris: 00:01:07

Follow me on Twitter at Lefteris_asks.

Lefteris: 00:01:09

And we also now have an Instagram page under the same

Lefteris: 00:01:12

name, which is very exciting.

Lefteris: 00:01:15

Also, I just made a new Facebook group called Lefteris , ask science,

Lefteris: 00:01:19

come join the group to ask questions, find out about updates and much more.

Lefteris: 00:01:25

Lastly, I have a weekly newsletter where I share my favorite news from the

Lefteris: 00:01:29

world of science and academia have small explanations and links to the research

Lefteris: 00:01:33

for anyone who wants to find out more.

Lefteris: 00:01:36

If you like that, go to the show notes and click that link

Lefteris: 00:01:38

to subscribe to the newsletter.

Lefteris: 00:01:40

Lastly, in the show notes, you'll find links that you can support

Lefteris: 00:01:44

me in doing this by donating.

Lefteris: 00:01:46

Let's now meet Ms.

Krithika Aramugam: 00:01:48

Arumugam.

Krithika Aramugam: 00:01:48

I'm Krithika Arumugam.

Krithika Aramugam: 00:01:50

I'm a, bioinformatician at the Singapore center for environmental

Krithika Aramugam: 00:01:54

life science engineering, which is a research center of excellence

Krithika Aramugam: 00:01:59

located at Nanyang technological university here in Singapore.

Krithika Aramugam: 00:02:03

So, um, I finished my undergraduate , computer science

Krithika Aramugam: 00:02:06

and engineering from India.

Krithika Aramugam: 00:02:08

And then, um, I moved through my master's here in Singapore and

Krithika Aramugam: 00:02:12

bioinformatics, or just to be at the center of an Interdisciplinary field

Lefteris: 00:02:19

do not adjust your podcast sets.

Lefteris: 00:02:21

I did say in the beginning of the show, we're going to talk

Lefteris: 00:02:23

about genomes and DNA sequencing.

Lefteris: 00:02:26

Ms.

Lefteris: 00:02:26

Arumugam moved from computer science background to bioinformatics.

Lefteris: 00:02:30

If you want to be a part of multidisciplinary study, then you'll

Lefteris: 00:02:33

probably have to juggle a lot of different information about different fields.

Lefteris: 00:02:38

How has that experience for Ms.

Lefteris: 00:02:39

Arumugam?

Krithika Aramugam: 00:02:42

Yeah, I would say it was challenging initially.

Krithika Aramugam: 00:02:45

Um, well, um, I was completely new to all the biology constantly.

Krithika Aramugam: 00:02:52

I had some modules in biology in my high school.

Krithika Aramugam: 00:02:56

That was my last, you know, studying biology, but then after coming, yeah,

Krithika Aramugam: 00:03:02

like, uh, yeah, used to be, it was a bit challenging jumping into a new topic.

Krithika Aramugam: 00:03:07

Uh, you'll get to know a lot of things.

Krithika Aramugam: 00:03:09

It was quite interesting.

Krithika Aramugam: 00:03:10

That's right.

Krithika Aramugam: 00:03:10

But, uh, like using your informatics skills to understand, you know, complex

Krithika Aramugam: 00:03:16

things and biology was very exciting.

Krithika Aramugam: 00:03:19

Yeah.

Lefteris: 00:03:22

It was always exciting for me to see unfamiliar terms, even from

Lefteris: 00:03:26

the title of the paper, while I might have seen the word genome before the

Lefteris: 00:03:31

word replicon did not sound familiar.

Lefteris: 00:03:34

Lucky enough.

Lefteris: 00:03:35

I have someone that works with both genomes and Replicons to

Lefteris: 00:03:38

explain to me what is, what.

Krithika Aramugam: 00:03:41

So, I guess, uh, do you know, must be a very familiar term,

Krithika Aramugam: 00:03:45

if not, it's, uh, it's basically a genetic material found in any living organisms,

Krithika Aramugam: 00:03:51

like, uh, so which is actually the DNA, the DNA, Deoxyribonucleic acid, I guess.

Krithika Aramugam: 00:03:56

Uh, you know, there is a sort of, uh, uh, helix bridge in Singapore,

Krithika Aramugam: 00:04:02

near the Marina Bay sands.

Krithika Aramugam: 00:04:03

So that's a DNA inspired structure.

Krithika Aramugam: 00:04:06

So that's how a DNA look like it looks like.

Krithika Aramugam: 00:04:08

And, uh, So once you decode that or for any particular organism, then,

Krithika Aramugam: 00:04:14

uh, that's actually gone the genomes.

Krithika Aramugam: 00:04:17

So, uh, which is actually responsible for, uh, all the functional aspects

Krithika Aramugam: 00:04:22

of living organism or how actually a living organism looks in case of humans.

Krithika Aramugam: 00:04:27

Like how you look, uh, what's your hair colour.

Krithika Aramugam: 00:04:30

So everything is encoded in the DNA.

Krithika Aramugam: 00:04:33

Okay.

Krithika Aramugam: 00:04:33

So the genome consist of, you know, the chromosome, which is

Krithika Aramugam: 00:04:36

the primary genetic material.

Krithika Aramugam: 00:04:38

Um, and then data other, uh, genetic material as well, which

Krithika Aramugam: 00:04:44

we call them as non chromosomal replicons which are smaller in

Krithika Aramugam: 00:04:49

size compared to the chromosome, but, uh, they are still useful.

Krithika Aramugam: 00:04:54

And, uh, they have other different function aspects compared to the main

Krithika Aramugam: 00:04:58

chromosome, make better the primary color.

Lefteris: 00:05:03

As we learned, genomes are the instructions and rules of an organism

Lefteris: 00:05:06

and follows to grow and develop.

Lefteris: 00:05:09

Your DNA or the RNA if you are a virus, one thing you might have heard about

Lefteris: 00:05:14

when it comes to DNA and its genomes is a term called sequencing, DNA consists

Lefteris: 00:05:21

of four small, different compounds, cytosine, guanine adenine, and thymine

Lefteris: 00:05:28

the long helical structure of a DNA consists of different

Lefteris: 00:05:32

sequences of these four components.

Lefteris: 00:05:35

So sequencing is finding out what is the structure of the DNA when

Lefteris: 00:05:40

it comes to these four components?

Krithika Aramugam: 00:05:43

Sequencing is basically the process of,

Krithika Aramugam: 00:05:46

uh, reading the DNA and coding.

Krithika Aramugam: 00:05:49

What's actually in the DNA.

Krithika Aramugam: 00:05:50

So that's a technical term.

Krithika Aramugam: 00:05:52

We call it sequencing.

Krithika Aramugam: 00:05:54

What people normally do is, uh, they collect samples or, uh, If it's a single

Krithika Aramugam: 00:06:00

bacteria, they tried to grow it in the lab and then, uh, sequence it as in

Krithika Aramugam: 00:06:05

sequence it, as in meaning, uh, decoding, what's actually present at the DNA to

Krithika Aramugam: 00:06:11

understand the function of that particular organism in this case, bacteria.

Krithika Aramugam: 00:06:15

So there are different methods, which does it.

Krithika Aramugam: 00:06:17

So once you collect the sample, you extract the DNA out of it.

Krithika Aramugam: 00:06:22

So this is all done in the lab.

Krithika Aramugam: 00:06:24

So the DNA is actually sheared the processes.

Krithika Aramugam: 00:06:27

You start with shearing the DNA.

Krithika Aramugam: 00:06:28

That's you cut down the DNA into multiple different lengths and then the machine,

Krithika Aramugam: 00:06:34

tries to read, uh, each of the fragments.

Krithika Aramugam: 00:06:38

So.

Krithika Aramugam: 00:06:40

And the machine actually gives you an output in the form of text file.

Krithika Aramugam: 00:06:44

So, okay.

Krithika Aramugam: 00:06:46

Text file.

Krithika Aramugam: 00:06:46

As in, you cannot open it with a normal text editor, it's going to be quite big

Krithika Aramugam: 00:06:50

it's in terms of giggle, it can be as big as gigabytes or terabytes as friendly.

Krithika Aramugam: 00:06:55

So, I mean, that's when we step in the computation people step in because as

Krithika Aramugam: 00:06:59

the data gets bigger, you need like specialized people with different

Krithika Aramugam: 00:07:02

expertise to look at the data.

Krithika Aramugam: 00:07:04

Right.

Krithika Aramugam: 00:07:04

So, okay.

Krithika Aramugam: 00:07:05

The actually the file.

Krithika Aramugam: 00:07:07

It's actually,a text file, uh, made up off, uh, different characters encoded

Krithika Aramugam: 00:07:13

in the DNA, which is actually translates to the compote and coding and the DNA.

Krithika Aramugam: 00:07:18

So the DNA is actually made up of, uh, four compounds.

Krithika Aramugam: 00:07:22

We call it, uh, A T G and C, which is actually Adenine,

Krithika Aramugam: 00:07:28

thymine gyanine and cytosine.

Krithika Aramugam: 00:07:30

So.

Krithika Aramugam: 00:07:31

Theshort form is ATGC.

Krithika Aramugam: 00:07:33

So you have a text file with, you know, permutations and combinations

Krithika Aramugam: 00:07:37

of ATGC, which is actually the DNA being encoded and decoded.

Krithika Aramugam: 00:07:42

Uh, uh, so since you have a sheared DNA fragment before

Krithika Aramugam: 00:07:48

into multiple fragments, so.

Krithika Aramugam: 00:07:52

The land of the DNA, fragment is quite small, right.

Krithika Aramugam: 00:07:54

But the original size of the DNA of the chromosome and the living

Krithika Aramugam: 00:07:58

organism say, for example, microbe, it's going to be like five megabase,

Krithika Aramugam: 00:08:03

but, uh, the technology allows you to only share the DNA and then

Krithika Aramugam: 00:08:08

read it out and smaller famines.

Krithika Aramugam: 00:08:11

So, okay.

Krithika Aramugam: 00:08:13

So since the fragment size is smaller, it becomes difficult to reconstruct the

Krithika Aramugam: 00:08:19

original chromosome of the bacteria.

Krithika Aramugam: 00:08:24

Uh, okay.

Krithika Aramugam: 00:08:25

So an easier way to visualize this as people usually compare it to a

Krithika Aramugam: 00:08:29

jigsaw puzzle, I'm sure you must have played it as a kid or, yeah.

Krithika Aramugam: 00:08:34

So, so you.

Krithika Aramugam: 00:08:36

Okay.

Krithika Aramugam: 00:08:36

In a jigsaw puzzle, you don't know what the picture is, but

Krithika Aramugam: 00:08:40

you have this picture is actually cut down into smaller fragments.

Krithika Aramugam: 00:08:43

So you try and piece them together to form the bigger picture, the bigger picture.

Krithika Aramugam: 00:08:48

Right?

Krithika Aramugam: 00:08:49

So, okay.

Krithika Aramugam: 00:08:51

In this case, it's a bit tricky because if the jigsaw pieces are going to be

Krithika Aramugam: 00:08:56

smaller, there's going to be many pieces.

Krithika Aramugam: 00:08:59

So it's going out.

Krithika Aramugam: 00:09:00

If you don't know the original picture, it's going to be difficult to, you

Krithika Aramugam: 00:09:04

know, uh, put those pieces together and reconstruct the original picture.

Krithika Aramugam: 00:09:08

So in this case, we, uh, uh, you know, uh, jarring the small genome fragments

Krithika Aramugam: 00:09:17

that can be like millions of fragments.

Krithika Aramugam: 00:09:19

So putting them together and trying to find the original

Krithika Aramugam: 00:09:23

chromosome is always challenging.

Krithika Aramugam: 00:09:25

Right.

Krithika Aramugam: 00:09:25

So even if it's challenging, even if it's a single bacteria.

Krithika Aramugam: 00:09:29

So for example, if you're sampling from an environment, there's going to

Krithika Aramugam: 00:09:33

be like a hundreds of bacteria in it.

Krithika Aramugam: 00:09:35

So just imagine, uh, there's going to be like millions of jigsaw pieces and

Krithika Aramugam: 00:09:41

you don't know the original picture.

Krithika Aramugam: 00:09:43

Uh, it's going to be very tricky to sort of assemble them and picture

Krithika Aramugam: 00:09:48

what's exactly there, uh, community.

Lefteris: 00:09:53

I really love that.

Lefteris: 00:09:53

Puzzles example.

Lefteris: 00:09:54

So when scientists want to find out, for example, what types of

Lefteris: 00:09:58

organisms live in the Lake to take a sample and try to piece all of the

Lefteris: 00:10:03

fragments together to figure out what is the complete picture they make?

Lefteris: 00:10:07

It's like, if we take.

Lefteris: 00:10:09

10 1000 piece jigsaw puzzles through all of the pieces in the same box and

Lefteris: 00:10:14

try to create the 10 different images.

Lefteris: 00:10:16

It would take a lot of effort and time to achieve that.

Lefteris: 00:10:20

Now, one thing that would make things easier would be if the pieces

Lefteris: 00:10:23

were actually bigger and here is where the terms short read and

Lefteris: 00:10:28

long read sequencing will be used.

Krithika Aramugam: 00:10:31

Technology has advanced, uh, to an extent that, uh, we

Krithika Aramugam: 00:10:36

can increase the size of the pieces as in the DNA fragments, which are being read.

Krithika Aramugam: 00:10:41

So normally what we do is we do short read sequencing.

Krithika Aramugam: 00:10:45

So, uh, in short read sequencing, um, uh, the error rate is very negligible.

Krithika Aramugam: 00:10:52

So, uh, today it may still use Sharpied sequencing, but, uh, one of

Krithika Aramugam: 00:10:57

the limitation is that the lent of the read or lent of the DNA fragment,

Krithika Aramugam: 00:11:02

we call it a read in technical dorms.

Krithika Aramugam: 00:11:04

So the lens of the DNA fragment, it can read us.

Krithika Aramugam: 00:11:07

It can go up to 300 base pair or 300 characters if you could call it that way.

Krithika Aramugam: 00:11:13

So the technical term is being spare.

Krithika Aramugam: 00:11:15

So each of the is a base and T is a base.

Krithika Aramugam: 00:11:18

So you can read up to 300 base pairs.

Krithika Aramugam: 00:11:23

Okay.

Krithika Aramugam: 00:11:23

Uh, the process of, uh, trying and putting the genome fragments

Krithika Aramugam: 00:11:27

together is going to be tricky.

Krithika Aramugam: 00:11:29

So it's.

Krithika Aramugam: 00:11:30

It becomes easier if the genome Frackman, you know, it can be, uh, if it's a

Krithika Aramugam: 00:11:35

bit longer than the, the sequencing machine, you know, can, uh, read the

Krithika Aramugam: 00:11:40

DNA, fragment, uh, up to a bigger land.

Krithika Aramugam: 00:11:43

It's going to be easier comparatively.

Krithika Aramugam: 00:11:46

When you have a bigger jigsaw piece, you know, it's going to get easier, right?

Krithika Aramugam: 00:11:51

So that's what the long lead sequencing here actually means.

Lefteris: 00:11:56

Now.

Lefteris: 00:11:57

How do they actually do the assembly of the genome and how do they know

Lefteris: 00:12:00

that they got the correct picture?

Krithika Aramugam: 00:12:04

So, what does is, uh, it takes the beads and dry and

Krithika Aramugam: 00:12:08

bothers them to produce a quantity of this, uh, section of the news.

Krithika Aramugam: 00:12:13

So it doesn't overlap.

Krithika Aramugam: 00:12:15

The reads are going to be marched.

Krithika Aramugam: 00:12:16

So like I said, the bead is composed of ATGC.

Krithika Aramugam: 00:12:19

Right?

Krithika Aramugam: 00:12:20

So if there's an overlap in another fragment, those two are

Krithika Aramugam: 00:12:23

going to be marched and we try and produce the algorithm twice and

Krithika Aramugam: 00:12:26

produce extended fragments to try and make the reads a bit longer.

Krithika Aramugam: 00:12:31

So, so once, uh, and a lot of them has, uh, processed the reeds.

Krithika Aramugam: 00:12:39

What we get out of it is called conflicts, uh, which are actually

Krithika Aramugam: 00:12:43

extended fragment of the leads.

Krithika Aramugam: 00:12:45

So it's just a different terminology.

Krithika Aramugam: 00:12:47

Sure.

Krithika Aramugam: 00:12:47

Uh, so we call them con things because they are contiguous

Krithika Aramugam: 00:12:51

sequences, I guess it's a sharp font.

Krithika Aramugam: 00:12:53

Yeah.

Krithika Aramugam: 00:12:54

So.

Krithika Aramugam: 00:12:55

Okay.

Krithika Aramugam: 00:12:57

and then there are different techniques, uh, uh, to try and combine the context

Krithika Aramugam: 00:13:05

based on the, uh, uh, based on the characteristics of the conflicts.

Krithika Aramugam: 00:13:13

As in, um, uh, how, uh, uh, there are different characteristics of the Cod

Krithika Aramugam: 00:13:18

things like you can take into account the abundance of the context, like how many

Krithika Aramugam: 00:13:22

leads were used to make that context.

Krithika Aramugam: 00:13:25

So of the more, the number, uh, the reliable the content is going to be.

Krithika Aramugam: 00:13:30

And then we can group context together with, uh, similar

Krithika Aramugam: 00:13:35

abundances, assuming that they are coming from the same bacteria.

Krithika Aramugam: 00:13:39

I should.

Krithika Aramugam: 00:13:39

Okay.

Krithika Aramugam: 00:13:40

So, I mean, this is one kind of characteristics.

Krithika Aramugam: 00:13:43

There are multiple characteristics as well.

Krithika Aramugam: 00:13:44

So sometimes we combine all those characteristic to try and group those

Krithika Aramugam: 00:13:48

contexts together to see, uh, you know, uh, which of these contexts belong

Krithika Aramugam: 00:13:55

to, or, uh, come from which bacteria.

Krithika Aramugam: 00:13:59

So, okay.

Krithika Aramugam: 00:14:00

In this case, uh, Uh, the back, the genome is still made up of multiple contexts.

Krithika Aramugam: 00:14:09

It's not one complete content.

Krithika Aramugam: 00:14:11

So that's what we are trying to achieve with long lead sequencing.

Krithika Aramugam: 00:14:16

We are trying to achieve, if we can acquit one single content, like, uh,

Krithika Aramugam: 00:14:21

continuous, uh, content, uh, of say five megabase band and blend, which

Krithika Aramugam: 00:14:27

is approximately the size of a back.

Krithika Aramugam: 00:14:31

Yeah.

Krithika Aramugam: 00:14:31

So in case of short leads, you still can get fine obvious pair, but they're

Krithika Aramugam: 00:14:35

going to be fragments of context, which is going to sum up we'll fight.

Krithika Aramugam: 00:14:41

So they can be like a hundred KV, a hundred KB or NMB.

Krithika Aramugam: 00:14:46

So everything together.

Krithika Aramugam: 00:14:46

So they are instilled in multiple fragments, uh, because there are,

Krithika Aramugam: 00:14:50

uh, Since they are in multiple fragments, uh, uh, we might not

Krithika Aramugam: 00:14:56

know the ordering of the con things.

Krithika Aramugam: 00:14:59

You know, if a particular content is going to, you know, be located, located

Krithika Aramugam: 00:15:04

at position one, and it's going to, uh, be difficult to please do this context

Krithika Aramugam: 00:15:08

together or the order in which they.

Krithika Aramugam: 00:15:12

Actually come from.

Krithika Aramugam: 00:15:14

Yeah.

Krithika Aramugam: 00:15:14

So, so those are the limitations associated with charter and

Krithika Aramugam: 00:15:19

sequencing and it can be difficult to sequence or, uh, plays the comp.

Krithika Aramugam: 00:15:23

There can be complex regions in the genome of any organisms.

Krithika Aramugam: 00:15:28

So.

Krithika Aramugam: 00:15:29

If the lead to shorter, we might not have sequenced those complex regions.

Krithika Aramugam: 00:15:33

So it's going to be difficult to piece them together.

Krithika Aramugam: 00:15:36

So that's why we use long read sequencing, uh, to actually, um, check

Krithika Aramugam: 00:15:42

if we can, uh, uh, get a continuous.

Krithika Aramugam: 00:15:47

Genome sequence instead of multiple fragments.

Krithika Aramugam: 00:15:50

Yeah.

Krithika Aramugam: 00:15:50

So that's why we were able to actually, uh, uh, see in this paper, like,

Krithika Aramugam: 00:15:56

uh, we did be bad able to extract around grade two genomes, which was,

Krithika Aramugam: 00:16:01

um, like complete tools to genomes.

Krithika Aramugam: 00:16:04

Okay.

Krithika Aramugam: 00:16:05

Meaning it's in a single fragment, not fragment, meaning it's as

Krithika Aramugam: 00:16:10

soon as single sheepish seats.

Lefteris: 00:16:12

Right.

Lefteris: 00:16:14

The benefits of actually having a face to face interview.

Lefteris: 00:16:17

I was able to see both the machines that were doing the sequencing, but

Lefteris: 00:16:20

also most importantly, themselves, it is astonishing to see a text file that

Lefteris: 00:16:26

is gigabytes in size, in my complete ignorance of homicide among and works.

Lefteris: 00:16:31

I asked if life would be simpler for her.

Lefteris: 00:16:34

If instead of sequences of letters, she would find a different

Lefteris: 00:16:37

way to visualize the data.

Krithika Aramugam: 00:16:40

So it's difficult to wish you advise

Krithika Aramugam: 00:16:43

the reader at the bead level.

Krithika Aramugam: 00:16:44

Yeah, but we, uh, well, uh, at the level, but, um, there are different ways you

Krithika Aramugam: 00:16:51

can, depending upon what do you want to work looking forward in the data,

Krithika Aramugam: 00:16:56

depending upon your research Westin.

Krithika Aramugam: 00:16:58

So if you want to look at.

Krithika Aramugam: 00:17:02

Uh, the taxonomy taxonomy to the content often species and the data

Krithika Aramugam: 00:17:07

then, uh, uh, we try and map the leads to, uh, existing databases.

Krithika Aramugam: 00:17:15

So, uh, if, uh, you know, how many leads mapped or a selected species or

Krithika Aramugam: 00:17:23

a certain gene is off the back the app.

Krithika Aramugam: 00:17:25

And if the number of leads mapping to a certain bacteria is more than making.

Krithika Aramugam: 00:17:31

You know, say a certain, certain percentage of bacteria

Krithika Aramugam: 00:17:34

is found in that sample.

Krithika Aramugam: 00:17:37

So something like that, but, um, uh, that, that's how we were

Krithika Aramugam: 00:17:42

analyzing the data initially.

Krithika Aramugam: 00:17:44

But then as the, uh, you know, uh, assembly and gardens started

Krithika Aramugam: 00:17:50

developing it's, uh, it becomes much more reliable when you try and piece

Krithika Aramugam: 00:17:55

those things together instead of just mapping the global database.

Krithika Aramugam: 00:17:59

We try and reconstruct or we didn't cheat.

Krithika Aramugam: 00:18:02

So that gets more interesting.

Krithika Aramugam: 00:18:05

And we can actually know what kind of bacteria is actually in

Krithika Aramugam: 00:18:08

there and what are they doing?

Krithika Aramugam: 00:18:10

Yeah, yeah, yeah.

Krithika Aramugam: 00:18:13

The functions of it.

Krithika Aramugam: 00:18:14

So if you try and like where the genome, then it's easier to understand

Krithika Aramugam: 00:18:18

how that particular bacteria work,

Krithika Aramugam: 00:18:23

uh, how it's responsible for certain things, certain

Krithika Aramugam: 00:18:26

processes and the sort of stuff.

Lefteris: 00:18:33

Puzzle solving in this magnitude doesn't happen on a local computer level.

Lefteris: 00:18:37

These algorithms require a lot of computational power in order to give

Lefteris: 00:18:41

results in a relatively short time.

Lefteris: 00:18:44

But even then the time is not as short as you think.

Krithika Aramugam: 00:18:49

Sequencing promising sequencing machine depends

Krithika Aramugam: 00:18:52

upon, uh, the throughput of the data.

Krithika Aramugam: 00:18:55

It can take a day or two, so to sequence it, but when you're processing it,

Krithika Aramugam: 00:19:01

processing the data, uh, it depends on.

Krithika Aramugam: 00:19:05

What do you actually want to do?

Krithika Aramugam: 00:19:07

So if you're doing a taxonomy analysis, uh, uh, it can take a few

Krithika Aramugam: 00:19:16

days, uh, but if you're doing an assembly and garden, so, uh, I mean,

Krithika Aramugam: 00:19:22

there's also a sort of a limitation to the existing assembly algorithm.

Krithika Aramugam: 00:19:26

So, uh, so the data size keeps increasing, but, uh, you know, it's difficult to

Krithika Aramugam: 00:19:32

catch it computationally with the devil.

Krithika Aramugam: 00:19:35

As well.

Krithika Aramugam: 00:19:35

So, so for example, we had been trying to assemble, uh, um, uh, I can't remember

Krithika Aramugam: 00:19:47

the exact number, but maybe it on the.

Krithika Aramugam: 00:19:50

A billion reads.

Krithika Aramugam: 00:19:53

So, um, with the existing capacity computeration capacity, we have

Krithika Aramugam: 00:19:58

the metagenome assembly, uh, uh, no two or three months, I guess.

Krithika Aramugam: 00:20:07

So, I mean, it doesn't make sense to wait for that long.

Krithika Aramugam: 00:20:11

So instead we try and.

Krithika Aramugam: 00:20:13

Sort of compress the data or subsample it randomly in a way that, you know,

Krithika Aramugam: 00:20:18

you could answer your questions sooner.

Krithika Aramugam: 00:20:20

Yeah.

Krithika Aramugam: 00:20:21

So, yeah.

Krithika Aramugam: 00:20:22

So it depends on what you actually are looking for or what kind of

Krithika Aramugam: 00:20:25

questions you're looking to answer.

Krithika Aramugam: 00:20:28

So.

Krithika Aramugam: 00:20:29

Uh, so usually, uh, the, in general, we, if you're generating say around,

Krithika Aramugam: 00:20:37

uh, one run of Hi-C Hi-C is the sequencing mission, the Catan of

Krithika Aramugam: 00:20:43

sequencing mission it's mostly sheltering sequencing is mostly done in

Krithika Aramugam: 00:20:47

Illumina Illumina is the company name.

Krithika Aramugam: 00:20:50

So the kind of mesh we use is high seek.

Krithika Aramugam: 00:20:53

So high seek generates around one round, of high seekgenerates around, um, uh,

Krithika Aramugam: 00:20:58

600, approximately 600 million reads

Krithika Aramugam: 00:21:02

so if your community is going to be complex, that is if.

Krithika Aramugam: 00:21:09

You think there's going to be like conduct of bacterias or

Krithika Aramugam: 00:21:11

hundreds of microbes in it.

Krithika Aramugam: 00:21:14

You have to sequence more.

Krithika Aramugam: 00:21:15

Yeah.

Krithika Aramugam: 00:21:16

Only then, you know, the sequencing depth has to be high.

Krithika Aramugam: 00:21:19

You don't need that.

Krithika Aramugam: 00:21:19

And you know, what kind of microbes out in there.

Krithika Aramugam: 00:21:23

And if, uh, if doc under certain microbes are.

Krithika Aramugam: 00:21:27

Less than the community.

Krithika Aramugam: 00:21:30

It's going to be difficult if you sequence, if the sequencing depth is low,

Krithika Aramugam: 00:21:34

it's going to be difficult to recover genomes of microbes of lower abundance.

Krithika Aramugam: 00:21:41

So the higher, the sequencing depth better your chances of recovery.

Krithika Aramugam: 00:21:47

Yeah.

Lefteris: 00:21:48

So

Krithika Aramugam: 00:21:49

basically, right.

Krithika Aramugam: 00:21:51

So yeah, so for, uh, For example, one run of high seek generates

Krithika Aramugam: 00:21:56

around 600 million reads.

Krithika Aramugam: 00:21:59

So I, uh, yeah, so, so you do initiate quality checks as well of the raw data.

Krithika Aramugam: 00:22:07

And then if you're doing a mega genome assembly, um, we

Krithika Aramugam: 00:22:12

usually, uh, split the data.

Krithika Aramugam: 00:22:15

I mean, so for example, you can, okay.

Krithika Aramugam: 00:22:17

So.

Krithika Aramugam: 00:22:19

Uh, they can be multiple samplings sequenced in one drum.

Krithika Aramugam: 00:22:25

So each of those, uh, sound booths might have like say, uh, if there

Krithika Aramugam: 00:22:31

are 10 samples, then they might be around 60 million reads per sample.

Krithika Aramugam: 00:22:37

So you can assemble them sample wise as well.

Krithika Aramugam: 00:22:40

So that's going to be much positive in terms of things.

Krithika Aramugam: 00:22:44

Sure.

Krithika Aramugam: 00:22:44

So if you're putting all the samples together and assembling

Krithika Aramugam: 00:22:47

600 million reads, that's good.

Krithika Aramugam: 00:22:50

Do you need more Ram?

Krithika Aramugam: 00:22:53

That's going to take like a few days, a few weeks.

Krithika Aramugam: 00:22:56

Sure.

Krithika Aramugam: 00:22:56

So, and then once you get the results from the assembly, uh, So, I mean, if

Krithika Aramugam: 00:23:04

it's shocking and sequencing, you're not going to get the complete sequence of

Krithika Aramugam: 00:23:08

back together going to be in fragments.

Krithika Aramugam: 00:23:09

Right.

Krithika Aramugam: 00:23:09

Even though you assembled them.

Krithika Aramugam: 00:23:11

So we use other techniques called my dogs, you know, bending.

Krithika Aramugam: 00:23:15

Uh, that's how, uh, earlier, when I explained we try and group those contexts

Krithika Aramugam: 00:23:19

together based on their characteristics.

Krithika Aramugam: 00:23:21

So that process is actually gone by that, you know, being.

Krithika Aramugam: 00:23:24

So there are other downstream analysis as well to evaluate, uh, if the bin or all

Krithika Aramugam: 00:23:31

those contexts belonging to a particular genome particular bacteria, if they are

Krithika Aramugam: 00:23:36

complete, we have to analyze that as well.

Krithika Aramugam: 00:23:39

So there are other than downstream processing as well.

Krithika Aramugam: 00:23:41

So yeah.

Krithika Aramugam: 00:23:43

Yeah, I could see the one they wanted two months or something for yeah.

Krithika Aramugam: 00:23:51

For wonder highest seat and then there, and then you can interpret, so, okay.

Krithika Aramugam: 00:23:56

These are the kinds of bacterias in their store.

Krithika Aramugam: 00:23:59

We can then check the functions of it, et cetera.

Krithika Aramugam: 00:24:01

So it depends on your research question.

Krithika Aramugam: 00:24:03

Yeah.

Krithika Aramugam: 00:24:03

Time taken.

Krithika Aramugam: 00:24:04

So the initial processing can take a month.

Krithika Aramugam: 00:24:07

Yeah, yeah, yeah.

Lefteris: 00:24:14

And that's it for another edition of Lefteris asks

Lefteris: 00:24:17

science, DNA sequencing is a big puzzle solving exercise, which

Lefteris: 00:24:21

sounds really, really exciting.

Lefteris: 00:24:23

And imagine that sometimes if you have enough results of a sequence that

Lefteris: 00:24:27

you can't match to anything in the database, you might discover a new kind

Lefteris: 00:24:30

of species, which is always exciting.

Lefteris: 00:24:34

I'd like to thankKrithika Arumugam for her time and the description of the

Lefteris: 00:24:37

episode, and you'll find links for her bio and the work we were talking about.

Lefteris: 00:24:42

And thank you for sticking around until the end in the show notes,

Lefteris: 00:24:45

you will find ways that you can support me in doing this.

Lefteris: 00:24:48

One is a way you can support me, especially just sharing

Lefteris: 00:24:50

the episode with a friend.

Lefteris: 00:24:52

I really appreciate it until we meet again, take care,

Share Episode

Shownotes

Transcripts

Follow

Links

Chapters

Video

More from YouTube