Today, we discuss the advancements in optical storage technology and its implications for data archival. Our guest, Greg Kittilson, VP of Engineering at Folio Photonics, shares insights on how their innovative approach to optical storage offers multiple layers of data storage, significantly enhancing capacity. Unlike traditional methods, this technology can provide long-lasting and environmentally friendly solutions for data archiving. We explore the importance of understanding data accessibility and the cost implications of different storage solutions. Join us as we delve into the intersection of technology, sustainability, and software architecture.
Software Architecture Insights presents a compelling dialogue with Greg Kittleson, the VP of Engineering at Folio Photonics. We delve into the evolution of data archiving and the pivotal role of optical storage technology in modern software architecture. Unlike traditional methods, Folio Photonics is advancing optical storage to offer sustainable and durable solutions for data archiving. Greg explains how their innovative approach overcomes the limitations of conventional optical storage, such as Blu-ray technology, by enabling significantly more data layers on a single disk through advanced polymer co-extrusion techniques. This advancement allows for larger storage capacities while maintaining lower costs, making it a viable option for enterprise-scale archival systems.
As the conversation unfolds, we explore the implications of data storage in the context of environmental sustainability. Greg emphasizes the importance of reducing energy consumption in data centers and how optical storage can alleviate some of the environmental impacts associated with traditional storage solutions. We discuss the staggering amount of data generated annually, estimated at around 200 zettabytes, and how much of this data remains archived rather than actively used. This leads us to examine the different classes of data—active, near-archive, and cold archive—and how software architects can make informed decisions about data storage strategies based on access speed and cost considerations.
The episode further highlights the potential for optical storage to contribute to a greener future. With minimal power requirements and fewer environmental controls needed compared to hard drives and tapes, optical storage emerges as a strong contender in the quest for sustainable data solutions. We conclude with insights into how software architects can leverage these advancements in optical storage technology, ensuring that they not only meet current data demands but also anticipate future needs in an increasingly data-driven world.
Takeaways:
Links referenced in this episode:
Companies mentioned in this episode:
Mentioned in this episode:
How do you operate a modern organization at scale?
Read more in my O'Reilly Media book "Architecting for Scale", now in its second edition. http://architectingforscale.com
What do 160,000 of your peers have in common?
They've all boosted their skills and career prospects by taking one of my courses. Go to atchisonacademy.com.
Hello and welcome to Software Architecture Insights.
Speaker A:Your go to resource for empowering software architects and aspiring professionals with the knowledge and tools they require to navigate the complex landscape of modern software design.
Speaker A:My guest today is Greg Kittleson.
Speaker A:Greg is VP of engineering at Folio Photonics is a company building better optical storage technology in order to create enterprise scale immutable active archive systems that provide sustainability at their core.
Speaker A:Greg, welcome to Software Architecture Insights.
Speaker B:Thanks Lee.
Speaker B:Happy to be here.
Speaker A:So data archive, it's not a new concept obviously it's been around really since data's been around.
Speaker A:Right, but and also optical storage isn't new either, but what you're doing is new.
Speaker A:So what's different about what you're doing compared to traditional archival storage and traditional optical storage?
Speaker B:I recently was asking myself what's the impact of going to optical storage and why would anyone really be interested in it?
Speaker B:And the big thing with optical storage I think is that it's kind of a forgotten technology.
Speaker B:We've had CD ROMs that have worked for us for many, many years.
Speaker B:Our music has been preserved for many, many years.
Speaker B:It's a technology that's been very solid for us for many years and it actually had been, I would say, expanded into enterprise solutions by Sony a few years back with their Blu Ray and archive technology.
Speaker B:The difference that really happened was that we continued to make aerial density gains with hard drives and with but we saw a flattening out of Blu Ray technology and the optical technology were kind of, I would say, plateaued around oh, three layers of data that was really being able to be stored on that optical disk.
Speaker B:Well, this is where Folio Photonics comes in.
Speaker B:Born out of Case Western, we had a really clever idea and this idea has been around for a while.
Speaker B:But productizing is a challenge because of the nuances that are necessary in order to maintain certain tolerances for manufacturing and productization.
Speaker B:But what's neat about it is that it overcomes the number of layers that the Blu Ray had.
Speaker B:Blu Ray has only got up to three because of their manufacturing tolerances.
Speaker B:Well, Folio approaches this with film.
Speaker B:We do it with film that's stacked in a co extrusion technology using polymers.
Speaker B:We put dye into one of the polymers as an active layer where we store the data.
Speaker B:And the technology has been demonstrated across the industry to provide multiple layers of films up to hundreds of layers.
Speaker B:Now there's tricks to making it work for storage, but the potential to keep a low cost storage media is there.
Speaker B:And so that's what makes it really enticing I think, for the industry.
Speaker A:So basically for the non physicists who are listening in here, basically what this means is more layers on the same disk and more layers means more capacity, Correct?
Speaker B:Exactly.
Speaker B:And since it's optical, it has a, it has the beauty of being able to be removed, be stored separately, and a power reduction for you, if you will.
Speaker B:So it's a great media for that.
Speaker A:Yeah, we're going to spend a fair amount of time talking about sustainability and climate control, those sorts of things.
Speaker A:So we're going to get into that.
Speaker A:But I absolutely love that we've had some great conversations, you and I, about this and in the past.
Speaker A:So I want to definitely make sure we talk about that in the podcast.
Speaker A:But let's, let's stay on just the data archival industry for a little bit before we get too deep into that.
Speaker A:How much data is there in the world?
Speaker B:Oh boy, that's a great question.
Speaker B:I think right now about 181zettabytes of data per year.
Speaker A:Pretty near 200 a year.
Speaker B:Yeah, I think so.
Speaker B:That's probably by this year.
Speaker B:200 zettabytes?
Speaker A:Yeah, we're talking probably a thousand, ten thousand zettabytes.
Speaker A:What's above a zettabyte, by the way?
Speaker A:I'm trying to do, you know, offhand, I don't know, let's say 10,000 zettabytes.
Speaker A:We'll just leave it at that.
Speaker A:Probably 10, maybe 100,000 zettabytes total data in the world is what we're probably talking about.
Speaker A:Okay, how much, what percent of that?
Speaker A:And this is.
Speaker A:I know, I'm just asking for a guess at your standpoint now, but what percentage of that is archival versus actively engaged data?
Speaker B:You know the estimate, and I've seen a number of different studies on this, but it's probably around 80%, I would say, is archival or cold storage.
Speaker B:And when we refer to that, that's, that's data.
Speaker B:Let's say, let's say, Lee, you took a picture, I don't know, three months ago, you send it to your friends.
Speaker B:Well, now you put it away, it's away and you don't go look at it again until maybe, maybe you access it a year later.
Speaker B:You know, maybe you Access it in 10 years, you have them, it's archived and you know it's safe.
Speaker B:Same with some of our older medical records.
Speaker B:They don't change, they stay the same.
Speaker B:So that's when we talk about that kind of storage.
Speaker B:And most of us, most of our data is like that.
Speaker A:So that's Actually an interesting aspect because there's what I would consider three types of data.
Speaker A:And maybe you've got a better division here than this, but let's just go with this first and you can correct me or make it better later.
Speaker A:But there's active data.
Speaker A:This is typical database data that you're actively changing, doing something with.
Speaker A:You're writing documents, you're doing all that sort of stuff.
Speaker A:That's on one end of the extreme.
Speaker A:At the other end of the extreme is what I would call true archival.
Speaker A:This is like we made a backup six years ago and we don't want to delete it, we want to leave it around.
Speaker A:No one's ever going to look at it.
Speaker A:It's sitting there.
Speaker A:We need it for compliance reasons or just to let us sleep reasons or whatever it is.
Speaker A:We've got that line there.
Speaker A:Then there's this middle ground, which is what you were alluding to, which is data that we're not actively using.
Speaker A:And as such it can be a little bit slower to access, but we want to be able to get to it, and so we want it to be available to us.
Speaker A:So you want that.
Speaker A:You want your wedding pictures available, quote unquote, on your phone at a moment's notice.
Speaker A:But that doesn't mean they have to be on your phone.
Speaker A:They can be in the cloud somewhere, on a drive.
Speaker A:That's as long as it's available and you can get to it within few seconds.
Speaker A:That's probably good enough.
Speaker A:Those are kind of like three different classes of data.
Speaker A:Is that a good separation?
Speaker A:And am I missing anything in that categorization?
Speaker B:You know, there's probably.
Speaker B:There's probably one that's above the one that you said.
Speaker B:That's the database.
Speaker B:There's one very fast access, as we're seeing with AI, perhaps, you know, something that's got to be immediately there for compute to turn around as quickly as it can.
Speaker B:But I agree with you, there's that active archive and then there's the cold, cold archive, where you can, you know, you're going to put it away in, like, older government records, as an example, property records, things like that.
Speaker B:You have a lot of that, but you may need to access that again.
Speaker B:And so you want to have that in either that, you know, kind of near, I guess it's this active archive tier or the really cold tier.
Speaker B:And sometimes that's a little difficult to make the decision of exactly where it goes.
Speaker B:But I think certainly with, like I said, medical records, government records, those kind of things, you would identify cold.
Speaker A:So let's talk about the disk technology itself.
Speaker A:So optical disc technology in general, and specifically your products.
Speaker A:I can easily see how it's good for cold storage because it's a write once, read many for data that never changes, and you can take it out of a drive and set it on a rack.
Speaker A:I guess I'm thinking in my mind like the Blu Ray player I have in my home, which is not what we're talking about here, but it's literally offline.
Speaker A:It's available, but if I want to play a Blu Ray, I have to go get it, put it in my drive and do something with it.
Speaker A:Now what you're talking about though, isn't that, is that correct?
Speaker A:It's more like a hard drive?
Speaker A:No, it is from that standpoint.
Speaker B:Yeah.
Speaker B:Yeah.
Speaker B:So obviously the, you know, tape, tape has a large libraries.
Speaker B:And Sony many, many years ago created an optical system that was actually similar to tape, where we would sit there and shuffle cartridges of optical disks online for people to access in data centers.
Speaker A:A robot would.
Speaker B:So that technology is disks.
Speaker B:Exactly.
Speaker B:Just like tape.
Speaker B:Yep, exactly.
Speaker B:So they would go seek it.
Speaker B:And, and we have, in our, in our company, we actually have one of those boxes, not a Sony, but a similar system where we are working with that to make sure that our technology, our drives will match and mate into that system.
Speaker B:So we call it a library system.
Speaker B:And you can get racks of these library systems just like you can do a rack of hard drives, just like a rack of tapes.
Speaker B:So the technology sits there.
Speaker B:And the nice thing about this technology versus the hard drive is that it is, you know, the power is not being pulled by the drive to keep them spinning.
Speaker B:I know people could say they spin down hard drives, spin them up, but there's a lot of data that's being stored on those hard drives here you can go access the disk.
Speaker B:You get the random access benefits with those optical disks that you do with the hard drives.
Speaker B:That gives it a better advantage over tape, where tape has a much slower access time.
Speaker B:If you're going to look for random data, and a lot of our data nowadays, you know, like I mentioned the picture, there's probably no relationship to another picture that you took or another picture that you took.
Speaker B:And there's small, more random access type data than you would see with what you would get from a tape where you would have probably streams and streams of backup data like you were mentioning.
Speaker B:You know, if you're going to take your entire company's data set and just say, I'm going to back it up, at this point you'd want to do that with a tape.
Speaker B:You could also do it on optical.
Speaker B:The nice thing about optical is again, tape is magnetic.
Speaker B:Optical has been demonstrated to be stored for 50 years.
Speaker B:So there is that capability that you get in optical versus that you get from tape and hard drives that starts.
Speaker A:Getting into the environmental, which maybe this is a good time to start talking about that.
Speaker A:There's the environmental needs of hard drives versus tape versus versus optical are very different.
Speaker A:Hard drives need active data centers in order to operate right.
Speaker A:You have to have them cooled, powered just like you do in the rest of your data center.
Speaker A:It's maybe not as much of a cooling requirement it is for your CPUs, but it still is a cooling requirement necessary just to keep them up.
Speaker A:And operating tape doesn't really have as much of that.
Speaker A:I mean again, it's, it's the library system allows you to take the tape and start off and separate and there's no, you know, there's no energy consumption that's going on with the tape.
Speaker A:Yet there is still environmental concerns with the tape.
Speaker A:If you store them too hot or too cold, the tape degrades quicker.
Speaker A:You have to climate control the environment in order to keep the tapes to allow them to last as long as you want them to last.
Speaker A:For optical, does it have the same sort of restrictions?
Speaker B:No, that's a good point.
Speaker B:Lee.
Speaker B:The optical, you know, as far as the storage capabilities, back to your Blu Ray comment, I'm sure you have abused your Blu rays occasionally and left them in spots.
Speaker B:I know I do with my Sony PlayStation games.
Speaker B:I leave them in spots.
Speaker B:I probably shouldn't leave them, but they have been demonstrated even with scratches.
Speaker B:And this gets back to the error correction coding schemes that have been employed to deal with the noise and the defects.
Speaker B:Optical is much different than that.
Speaker B:You can take it offline and you can store it and put it into a different environment.
Speaker B:You wouldn't want to, you know, it's probably not the best, but you can certainly put even into a wet environment, come back later, wipe it off, clear it off and put it right back in.
Speaker B:But from a data center perspective, if you start thinking about it, you certainly have to climate control Now I think a few years ago we, we talked about these air cooled environments where we're trying to keep massive heat and cooling systems out of the data center.
Speaker B:I think it was Google who was experimenting with this and we also heard about the Microsoft one where they took the data center and put it in the ocean to keep it cool.
Speaker B:But Again, the cooling isn't necessary.
Speaker B:As I would say.
Speaker B:The environmental demands, as you mentioned, are not as stringent with the optical as they are with the tape or with the hard drives.
Speaker A:So what are some of the benefits of the optical technologies ability to be in a more diverse environmental area?
Speaker A:Besides the cost benefit of not having to run a cooling in data center environment, what other benefits can you get from the environmental constraints with optical?
Speaker B:Well, I think the other thing you can think about it is you could actually take the, you could take it offline and store it differently.
Speaker B:You don't have to leave it online in your data center.
Speaker B:This is back to, you know, kind of your conversation on the active archive and the cold archive.
Speaker B:You know, you can certainly take it and put into an environment where, you know, maybe it's not sitting in your, you know, hot, cold rows.
Speaker B:You can put in, put it offline, take it out of that environment and store it in an extra secure area, if you will.
Speaker B:It does give you kind of a nice air gap.
Speaker B:So it does give you additional security because you're not going to be able to hack it if it's not online.
Speaker B:So there is that potential sitting there.
Speaker A:It saves some money and also increased security.
Speaker A:I mean, we talk about loss of a data center from a fire or other catastrophe and the ability to recover from a catastrophe.
Speaker A:Those sorts of backup requirements require that your data be off site.
Speaker A:And off site means out of your data center.
Speaker A:You can do off site into another data center.
Speaker A:But, but with optical, off site can literally be anywhere.
Speaker B:Yes, very good point.
Speaker B:I do remember many years ago when we were doing development on, I forget which company it was, but I was developing on hard drives.
Speaker B:We had to make three copies of our code and one of them was to go off site into a fireproof vault on a 3 1/2 inch floppy.
Speaker B:So you would take that and you put it off site somewhere to make sure that the code was preserved in case you did have a problem.
Speaker B:Even though the code existed on the server, existed on a floppy, we had it offsite and secure.
Speaker A:And there's nothing environmental about a fireproof vault.
Speaker B:Yeah, because if it got a little too hot on the fireproof vault, you know, fire, it would probably melt.
Speaker B:Right.
Speaker A:So, so that brings us into the longevity aspect.
Speaker A:So long from a longevity standpoint, optical is much better than tape and it's much better than drives as well.
Speaker A:Is that correct?
Speaker B:Yeah, it's about, you know, we, we talk about CD ROMs.
Speaker B:It's always, it's always interesting.
Speaker B:Our founder of our company would say, when he asked, was asked that question, he says, well, I still have my CDs from the late 70s or early 80s, and they still play.
Speaker B:And it's.
Speaker B:They are designed to last for many, many years.
Speaker B:And longevity tests Show they go 50, 100 years, where you start taking a look at, you know, hard drive moving parts, it's going to wear and you just can't pop a disk out of the hard drive and put it to another hard drive.
Speaker B:And so maybe that's.
Speaker B:I don't know what the latest MTBF numbers are, but let's say three to five years and then you got tape, maybe 15 to 30 years.
Speaker B:So, you know, it is ultimately a much more durable media, especially for now.
Speaker A:We're talking the cold storage, not the, the active storage as much as now.
Speaker B:Correct.
Speaker A:There was a project.
Speaker A:I wish I could remember what it was and I didn't write it down.
Speaker A:I think we were talking about it earlier that was involving storing humankind in DNA.
Speaker A:Arctic.
Speaker A:Is that where it was?
Speaker B:Oh, sorry, yeah, There's.
Speaker B:There's an Arctic World Archive.
Speaker A:That's it.
Speaker A:That's it.
Speaker B:It's up in Norway.
Speaker B:And what they, what they do is they actually are saving off DNA, saving off seeds, saving off data.
Speaker B:And I think one of the companies dealing with that is Pickle, and they have been doing optical storage film, and they've been storing it in that archive.
Speaker B:People are looking at it and saying it's necessary to kind of preserve a civilization.
Speaker B:There's a place near me, it's an old missile silo.
Speaker B:One of the local county seats has all of their records in the basement down there.
Speaker B:And I look at.
Speaker B:I went on a tour and I'm looking at it and they're all books.
Speaker B:And, you know, it's.
Speaker B:It's dry, fairly dry, but paper.
Speaker B:Again, here's another opportunity for optical storage.
Speaker B:You look at it and you say, okay, well, this, this is not good.
Speaker B:Probably all these books can fit onto one of our folio disks with all the data it has.
Speaker B:Preserve it.
Speaker A:One gets fired.
Speaker A:We hear that.
Speaker B:Oh, yes, yes.
Speaker B:But it's incredible that we've got this technology that we can use to be able to store this kind of stuff and preserve, well, humankind or our history.
Speaker A:So let's get back to the active archive a little bit more and away from the cold storage in the active archive.
Speaker A:So data that's stored that you don't need now, but you need reasonable performance access to it at some point in time can be stored on this technology because it's A cheaper way to store it, it's a more reliable way to store it, a longer term way to store it.
Speaker A:But still you can, through robots and other mechanisms, you can get to the data rather quickly.
Speaker A:So define the word rather, in that.
Speaker B:State rather quickly in an active archive in the storage world, you could call it first time to bite.
Speaker B:So how quickly you can get there and you know tape, you have to end up from a random perspective.
Speaker B:It would take a significant amount of time.
Speaker B:You'd have to go pull the tape.
Speaker B:So the robots are going to do the same thing for an optical cartridge, if you will, versus tape cartridges.
Speaker B:They're going to both pull them at the same time, load them into a drive.
Speaker B:Now the difference between the optical and the tape is going to be how quickly it's going to get to that random data.
Speaker B:Because you're able to seek much faster and you have this dispersion or breakup of the data across these different layers of the disk.
Speaker B:You're able to find that data much faster.
Speaker B:And it's a much shorter space.
Speaker B:Whereas the tape itself is spooled.
Speaker B:So you're going to have to wind it up, find it and then unwind it.
Speaker B:And so the access time itself is much faster.
Speaker B:And because it's similar to a hard drive, its seek time may not be quite as fast.
Speaker B:I think we're probably maybe three or four times slower than a hard drive, but it's much faster than a tape.
Speaker B:So back to that comment about relative and how fast you can be.
Speaker B:It's very capable of getting your data quickly and at a reasonable cost, weight cost, if you will, to, to get your, to get your data.
Speaker A:So let's bring it down to a consumer level here for a second.
Speaker A: he last, well, at least since: Speaker A:But, but all my photos from way back when and I get, just got gigabytes of photos and videos on my quote unquote on my phone.
Speaker A:Now not all of them are on the phone, they're all up in the cloud.
Speaker A:Can it be an active archive?
Speaker A:And what would be the ramification to me as a consumer of that?
Speaker B:You know, it's kind of funny.
Speaker B:I was actually, I was just performing this with my wife the other day.
Speaker B:She says, do you remember that picture?
Speaker B:And so what do I do?
Speaker B:Well, I grab my iPhone and it has some nice search capabilities.
Speaker B:Right.
Speaker B:But you certainly see when you click a search, even with a Google search, or you try to search your local computer, you're sitting there, Sometimes it takes a few minutes.
Speaker B:You know, that's kind of what we're talking about here.
Speaker B:We're not talking about something that's for an active archive.
Speaker B:We're talking about something that you're waiting maybe a few minutes for as opposed to, you know, it's instantaneous.
Speaker B:You know, it's right there in the cache, ready to go.
Speaker B:Certainly when you're, when you're scrolling again on your phone, looking for that photo, you're just scrolling and scrolling and scrolling and scrolling.
Speaker B:Going back years.
Speaker B: If I found it back in: Speaker B:Maybe you could type it in, get back there fast.
Speaker B:But your human actions are actually probably slower than how fast the archive is going to actually be able to access that material, if you think about it, because we're dealing in the seconds world, that's dealing in the milliseconds world.
Speaker A:Right.
Speaker B:So it's still much faster than human perception.
Speaker A:Yeah, I'll date myself here a little bit.
Speaker A:My first computer, I read programs off of an audio tape cassette device.
Speaker A:It did audio signals to store onto the audio cassette and then you could read it back, and it read them back in the computer.
Speaker A:Very, very, very slow, very small, low capacity, obviously.
Speaker A:And, you know, random access wasn't even a term that was available, right.
Speaker A:I know there was.
Speaker A:I had 100 programs on this one cassette.
Speaker A:Now I wanted to find program number 37.
Speaker A:That was.
Speaker A:Let me look at my sheet of paper here.
Speaker A:That's at 10 minutes, 15 seconds onto the desk.
Speaker A:So that's right around here.
Speaker A:Let me try here.
Speaker A:Nope, that's in the middle of number 46.
Speaker A:I can try there.
Speaker A:Nope, nope, that's right too far back.
Speaker A:Let me find the right spot.
Speaker A:Okay.
Speaker A:You know, that's it.
Speaker A:That's it.
Speaker A:That's where I needed to be.
Speaker A:Well, my index is wrong.
Speaker A:I need to rewind to the beginning and then fast forward in order to get the index right.
Speaker A:You know, and, and, and you did that.
Speaker A:And eventually, you know, an hour later, you could load this 10K program and, and you have something running.
Speaker A:You know, obviously, when I got my first hard drive, I was amazed, right.
Speaker A:That, you know, now you look, I can do.
Speaker A:I can type the equivalent of LS and get a list of all my files.
Speaker A:Right.
Speaker A:So expectations have changed a lot for availability of data.
Speaker A:And now the problem isn't with most data that getting to the data takes A while.
Speaker A:The biggest problem now is figuring out which data you want and then getting to it.
Speaker A:Like what you're saying.
Speaker A:Your problem wasn't that you were saying, I need that picture.
Speaker A: ,: Speaker A:If you, if you knew that you could find it really fast and get to it right away, your problem was you didn't know what it was, so you had to search for it.
Speaker A:So search is now the biggest problem.
Speaker A:It isn't the indexing, it isn't the random access nature of it, it's the searching capabilities.
Speaker A:Now, I know opticals doesn't really change that equation any, but how do most customers using backup systems like optical storage deal with the indexing issue and deal with search capabilities?
Speaker B:That's again, another really good question, as I'm the one who's trying to sell you a solution of the optical storage versus what's out there.
Speaker B:And trying to really understand that use case is very important.
Speaker B:A lot of customers will sit there and it offers a unique, I would say, tier, if you will, in that storage stack.
Speaker B:And you could come in and you can kind of say, well, you could certainly play into the spot where there's the tape and start working with the tape first and saying, okay, here's something.
Speaker B:But that's kind of where the advantage of, you know, let's go back a few years and say when we first introduced SSDs versus hard drives in the data center, now the SSDs everybody knew was going to be faster, but how are we going to use them?
Speaker B:You know, so we came out.
Speaker B:Back when I was working at Dodd Hill, we were one of the first to offer this hybrid storage array that would detect, you know, hot and cold data.
Speaker B:And going back to my days at Seagate, we also did a hybrid drive where we had some, you know, NAND on the drive and where we would determine the hot and cold data to kind of help make that decision.
Speaker B:And so that's where you get into, you know, why I was interested in talking to you on this podcast is because software architects, and that's where you start thinking about how you're tiering your data.
Speaker B:What's your data availability, how fast do you need it, what's your design in your program and what's the user really going to go after?
Speaker B:You can't always predict.
Speaker B: om a particular event back in: Speaker B:Right.
Speaker B:She's not going to be able to.
Speaker B:We can't predict that, but we can start thinking about how we can improve those search algorithms or we can talk about, you know, what type of data needs to be stored, where and how fast can I access it?
Speaker B:Because all of this is coming back down to feeding the CPUs, if you will, for data, to feeding, you know, the needs and quick access time that people are looking for.
Speaker B:So maybe I'm kind of dancing around the question a little bit because I'm not the guy that's doing the design at the data center level to say what tiers they should be.
Speaker B:But I certainly see the vision for how they fit into that structure and know that as the user of it or the designer of the databases, as a software architect, I always sit there and start thinking about, you know, where do I need to place this?
Speaker B:It goes back to you think about the ELT or etl.
Speaker B:Extract, transform, load or extract, load, transform.
Speaker B:How do you want to deal with that data?
Speaker B:What's the best fit for it and how does this fit into that structure?
Speaker B:And, and what does your data really look like?
Speaker B:It's really easy to sit there and say, I'm just going to do a little program for data sorting and data analysis or machine learning of trying to come up with some kind of trend or something.
Speaker B:But knowing where the data is placed and knowing how quick you have to access it, knowing how much data you're going to process comes down to really understanding the placement of it and what options you have.
Speaker B:So long answer.
Speaker A:And that was actually a great answer.
Speaker A:And I know it was probably an unfair question, but it got the conversation going the direction I wanted it to go, which is basically, what does this mean to software architects?
Speaker A:And I think the thing I would like to add to that for software architects that I think is important is data is central to everything we do as a software architect.
Speaker A:And that's absolutely true.
Speaker A:Everyone knows that.
Speaker A:If you don't know that, you, you will.
Speaker A:If you spent any time as a software architect, data is everything.
Speaker A:And the amount of data that we use is skyrocketing.
Speaker A:I mean, AI is very, very, very data hungry.
Speaker A:Huge quantities of data are needed now, have always been needed, are needed now, and even more will be needed in the future.
Speaker A:And the amount of data your applications can be involved with is going to skyrocket.
Speaker A:Whatever your application is, the amount of data is going to continue to skyrocket.
Speaker A:So the most important thing for you to keep in mind is what are you doing with that data?
Speaker A:And the decisions you make with that data have huge ramifications to your application.
Speaker A:You know, if, if you want your data available in a millisecond notice, it's going to be in your database and easily accessible and easily indexed and all that sort of stuff, it's going to be in your database.
Speaker A:If not, if it's going to be something you get to maybe over half a second or so, then maybe you can store it in something equivalent to S3 or a secondary storage drive of some sort.
Speaker A:And if it's something that you don't need very often at all, and accessing it, time to first byte can be a minute or two, then something like optical storage is a great solution for that.
Speaker A:And the fact of the matter is you're going to have data that fits in all those categories.
Speaker A:And the important thing is cost is significantly different between a byte stored in the database versus the bytes stored in S3 versus a byte stored in an optical disk.
Speaker A:And understanding that is a critical aspect of software architects today, but certainly software architects of tomorrow as well.
Speaker A:So I think the important lesson, I think you would agree with that lesson as well, is that the accessibility of data is apparent, is important to understand.
Speaker A:Not all data has to be millisecond access, and there's a cost to having millisecond access data.
Speaker B:Yeah, I think you bring up a good point, and that is really think about where we're headed with big data, lots of data, AI data, and you can think about it that, you know, as you kind of think about a measurement, and here's a wild one, let's say you want to go take a look at temperature data for a particular area of the country and you're trying to understand if there's a trend associated, and you want to see some factors that are associated.
Speaker B:So you're going to develop a regression model.
Speaker B:So you're going to go look at other factors, other data sets that are out there.
Speaker B:Well, you can think about it this way, that you want that data to be available for you, stored for you, but how quickly you are able to access that data is going to yield how fast you're going to understand and answer the regression question.
Speaker B:But if you think of it from a perspective of designing that extract transform load or extract load transform question, you've got to think about.
Speaker B:First off, maybe, okay, so is there a lot and a lot and a lot, a lot of data that I need to pull, maybe that sits there on tape?
Speaker B:It depends on how big of the model you're developing, what the sources are, or maybe you need to sit there and say I've got a lot of random data, and I don't need data from years and years and years and years.
Speaker B:But maybe that's something that you should be considering.
Speaker B:You put on an optical disk as an example, but it gets back to exactly.
Speaker B:You said what.
Speaker B:How much time is it going to take to that data?
Speaker B:And you need to understand that because that's going to ultimately cost you.
Speaker B:And how long it's going to take to process and run on expensive CPUs, GPUs.
Speaker B:How long is going to take you to develop that model and come back with an answer?
Speaker B:So it's a very important question, Lee.
Speaker B:And ultimately it comes down to the software architecture saying, this is what we're trying to get to.
Speaker B:How I'm going to solve my problem and how quickly I can solve it, and at what cost.
Speaker B:Because you can certainly afford, you know, not all of us can go afford a big GPU farm to solve our problems, but it does force some constraints on your considerations and things that you hadn't considered in the past.
Speaker B:Because I think we all used to think COMPUTE was free.
Speaker B:Well, AI has changed that.
Speaker B:You know, COMPUTE now is.
Speaker B:AI is consuming a lot of that compute, and it's made it very expensive and, you know, demand for electricity going up and up and up.
Speaker B:So it's getting more costly.
Speaker A:Yeah, I heard just the other day that Meta announced they're building a data center that's planned to be roughly the size of Manhattan.
Speaker A:Did I read that correctly?
Speaker A:And I don't remember the details.
Speaker A:It was something I heard on the side.
Speaker A:So I don't know all the details.
Speaker A:I may have it very wrong, but it's something like that.
Speaker A:And we already know Microsoft is working in Pennsylvania now to reactivate Three Mile island, and they've bought all the access to all the power from the newly started Three Mile island nuclear power plant.
Speaker A:All of that power is going to power Microsoft Data Center.
Speaker A:So it's amazing the types of power, the types of space we need in order for AI to work and storage.
Speaker A:Compute's getting all the glory right now, but it's only because storage works.
Speaker A:And storage is going to continue to be a problem, though, in the future as time goes on, as we get more and more data, trying to deal with it now.
Speaker A:One of the things we haven't talked about with data storage, though, with longevity, is the compliance aspect.
Speaker A:A lot of data has to be stored for a particular period of time and then intentionally not stored anymore and deleted and destroyed.
Speaker A:It's easy to imagine how to do that with databases, it's easy to understand how to do that with data disk drives, it's even easy to understand how to do that with a tape drive.
Speaker A:But how do you do that with optical?
Speaker A:Optical is literally a permanent storage.
Speaker A:Write once, read forever, destroy never, sort of technology.
Speaker B:Yeah, that's a great question.
Speaker B:One thing you can easily do, of course, is destroy the disk, you know, physically.
Speaker B:And I would also say, even if you think about it from a perspective of the optical disc itself, you could go back and write it again and just delete everything.
Speaker B:You could destroy, in our case, the die.
Speaker B:You can go wipe it completely out.
Speaker B:There's no transitions to find out any data whatsoever.
Speaker B:So there is that aspect.
Speaker B:It could build that into the drive itself and just say delete the data and destroy it.
Speaker B:That's a possibility.
Speaker B:Or of course, physically destroying the disk.
Speaker A:So.
Speaker A:Yeah, but again, if I'm talking about I'm using a service that has this library system with zettabytes of optical storage there for my taking, I just write stuff to it on a regular basis.
Speaker A:There isn't an easy way for me to say write it and store it for 25 years and then delete it.
Speaker A:There's no real easy way to do that in a reliable way.
Speaker A:Is that correct?
Speaker B:I think I'd probably argue that you could do that.
Speaker B:Depends on what your agreement would be with the service provider.
Speaker B:If, if that's a design of a service that they could offer, I don't see why you couldn't go with our, with the technology we have.
Speaker B:Lee.
Speaker B:So speaking of the folio technology, I certainly know that it's possible to go, you know, destroy some more dye and so that you have nothing.
Speaker B:Okay, so that is a possibility.
Speaker A:So you can do performance of, of delete.
Speaker A:But, but it would require a, a policy for.
Speaker A:I mean, you would.
Speaker A:You need a policy and a process and a system do to make that happen.
Speaker B:But it's absolutely.
Speaker A:It's a possible thing to architect into a system, to have that.
Speaker B:It is, it is.
Speaker B:And it, it's always good to talk to you because you come up with these, you know, different perspectives of things.
Speaker B:And that's a really interesting approach to it because I always thought it'd be much easier just to go off and smash your disk.
Speaker A:Well, smash disks can be recovered, though.
Speaker A:If you're talking about forensic sort of data recovery, forensics can do some amazing things.
Speaker A:Simply wiping a hard disk is not enough to remove it.
Speaker A:You have to do multiple wipes and all that sort of stuff and same sort of Thing with optical, right.
Speaker A:You'll just crushing a disk, potentially.
Speaker A:There are still bytes still accessible.
Speaker A:Even if you've indexed it so that you've deleted those files, the files are physically still on the disk.
Speaker A:So if you can get, depending on how thorough your destroy process is, it's still possible to get some of the information out.
Speaker B:It's funny you mentioned this.
Speaker B:So there's a company that used to exist on track, I don't know if you remember the hard drive company, but I graduated.
Speaker B:Some of my, some of my fellow students ended up going to that company and I heard a story of they would take time, they would figure out how to recover the data on some pretty broken hard drives.
Speaker B:And I remember one of the guys told me, he said it was after the Kuwait.
Speaker B:Iraq invaded Kuwait.
Speaker B:He said, yeah, we had some drives that we had bullet holes in that we had to recover some data from.
Speaker B:But you're right, I mean, if someone's very ingenuitive, they could do that, but they would also have to then figure out the encryption that would sit on top of the data.
Speaker A:But if you're, again, if you're talking compliance, in certain industries that sort of requirements do exist and software architects do have to think about those sorts of things.
Speaker A:So the answer is it's something you have to think about, but it doesn't preclude the use of optical storage is what it sounds like.
Speaker B:Right, right.
Speaker B:It does not.
Speaker B:I think clearly some agencies do require immutability, so that's clear.
Speaker B:But when you start talking about the destruction of data, it is possible with the optical, just like you would do with a hard drive or like you would do with tape.
Speaker A:Now, one thing we've alluded to, this is kind of changing direction a little bit here.
Speaker A:Now one thing we alluded to, but we never really talked about in depth was the green aspect of archival data.
Speaker A:Now again, we started talking about Manhattan data centers and Three Mile island power plants.
Speaker A:And so green compute is kind of an important topic and will continue to be an important topic as time goes on.
Speaker A:We've already talked about one of the advantages of optical is you don't have to have a environmental controlled environment which saves energy, saves power, but that also therefore means it's a greener solution as well.
Speaker A:Are there other ways that optical is greener than, let's say, for instance, tape, or are there other aspects other than just the storage requirements that make it greener?
Speaker B:Yeah, well, the big thing is right now is that it's power.
Speaker B:And since we're not actually, we're only spinning the disk during its access to data, whereas hard drives are always online.
Speaker B:And then tape again, same kind of thing.
Speaker B:You'd have to be on.
Speaker B:It has to be online, similar to optical.
Speaker B:So we're closer to probably tape.
Speaker B:But in comparison to a hard drive, we certainly are significantly less power, probably up to around 98% less power than a hard drive.
Speaker B:And that's probably the biggest play that we would have.
Speaker B:Seeing the growth in electricity demand in data centers, it gives once again back to your choices in the data center.
Speaker B:If you go after an optical archive, if you're going to archive and you're taking a look at the different options and you need random access, you take a look and say, well, random access you got to get from the hard drives.
Speaker B:But gosh, standing up a rack of hard drives in my data center is going to cost me this much power.
Speaker B:And you take a look at the optical and say, oh well, here's an opportunity to save a significant amount of power, particularly when it's at a premium in the data center because, you know, the GPUs are going to get most of the data and if not, you know, most of the power budget.
Speaker A:Right, right.
Speaker A:And you're talking about power from the standpoint of literally powering the drives.
Speaker A:But there's also the power for cooling or environmental control.
Speaker B:True.
Speaker A:And actually optical helps in a couple areas.
Speaker A:There, as I think we talked about earlier too, is it's not only are the environmental storage requirements a lot looser, so you don't need as solid of a, of a environmental control for the storage of the disks themselves.
Speaker A:You don't need to keep them in a certain temperature range so the tape doesn't melt or the drives don't break.
Speaker A:A lot more versatile, but also they take physically less space, which means you have a smaller area that you need to environmentally control.
Speaker B:It's, I would say today, probably the case would be, is tape would probably be more dense.
Speaker B:As we come out with our first technology, they're probably about 10 to 25 petabytes in a 42U rack.
Speaker B:So we'd be the Sony, I think, maybe achieved 3 petabytes in that same space.
Speaker B:So, but then again, the cooling was not as, you know, aggressive as what you need for the tape versus the optical.
Speaker B:But our roadmap has got us.
Speaker B:We're looking at disks that are going from about 400, 500 gigabytes in the first generation to 1 terabyte to 2 terabytes.
Speaker B:And when we reach our roadmap and we're out at 2 terabytes.
Speaker B:We are very space competitive with a tape that would be there today, probably around 20 petabytes per 42U rack.
Speaker A:So within a few years that'll be a true statement.
Speaker A:But right now it really is not quite true.
Speaker B:Right now it's not quite true yet.
Speaker B:It's our dream.
Speaker B:Yes, that is what we are.
Speaker B:That's a big part of it is trying to make sure that we're giving options again to the data centers and to become more green and to save space and to be able to make sure that they've got another offering that suits the customer's needs.
Speaker A:So before we close, is there anything else that Optical offers to the software architect?
Speaker B:I think the big deal is getting back to what you said as a software architect, what do you need to consider?
Speaker B:And it's very important that you take a look at with all the data that's out there and knowing how you're going to have to access it, and taking a look at the options and making sure you have an understanding of where you're pulling your data from and what kind of data it is, I think that's probably the most important.
Speaker B:Taking a look and saying Optical is coming back, it's going to fit into this stack and there is a definite need for it given the environmental and power draw that we're looking at for the future.
Speaker A:Great.
Speaker A:Thank you.
Speaker A:So my guest today has been Greg Kittelson.
Speaker A:Greg is the VP of Engineering at Folio Photonics, a company revolutionizing the sustainability and durability of archival storage.
Speaker A:Greg, thank you so much for joining me on Software Architecture Insights.
Speaker A:Thank you, Lee.
Speaker B:This is great.
Speaker A:Thank you for joining us on Software Architecture Insights.
Speaker A:If you found this episode interesting, please tell your friends and colleagues you can listen to Software Architecture Insights on all of the major podcast platforms.
Speaker A:And if you want more from me, take a look at some of my many articles@softwarearchitectureinsights.com and while you're there, join the 2,000 people who have subscribed to my newsletter so you always get my latest content as soon as it is available.
Speaker A:Thank you for listening to Software Architecture Insights.