Artwork for podcast Ocean Missions Campfire
Thomas de Marchin & Milana Filatenkova: Data Science on Blockchain - NFT Analysis
Episode 63rd April 2022 • Ocean Missions Campfire • Scott Milat
00:00:00 00:26:12

Share Episode

Shownotes

Thomas & Milana are both data scientists who have been analysing blockchain data for interesting insights about NFTs. We discuss their process and what they recommend for others conducting this type of research for the very first time.

You can read more about their research below.

NFT Analysis Articles:

Data Science on Blockchain with R. Part I: Reading the blockchain

Data Science on Blockchain with R. Part II: Tracking the NFTs

Helium Article:

Data Science on Blockchain with R, Part III: Helium-based IoT is taking over the world

You can follow the guys here for upcoming articles and research

Linkedin - https://www.linkedin.com/in/tdemarchin/ https://www.linkedin.com/in/mfilatenkova/

Twitter - https://twitter.com/tdemarchin

Medium - https://tdemarchin.medium.com/

Transcripts

[:

So thank you for joining us on the podcast today, guys.

[:

[:

[:

[:

Yes. Yes. So while we are both data scientists, then we work as consultants for the pharmaceutical industry. I personally discovered the world of blockchain of about five, six years ago. My mother-in-law invested into a strange things called Bitcoin. Look at that and I found it really cool. Then I invested a bit, then I think I bought my first Bitcoin.

They were about $500 and it was 20, something like that. And then I got quickly obsessed by the technology itself and the blockchain technology. I mean I heard a lot about the, trying to understand how it works. And to be honest, I want, it was not immediately clear to me that blockchain of further buss field of discovery for that percent is us.

I initially just so with, as a ledger for financial transactions, And I think it's only two, three years ago that I realized that blockchain is in fact much more than that. And that there is a massive amount of interesting data just waiting to be discovered. So I think that's it for me

[:

[:

So to help them make the most informative decisions from the strategic uh, view point. And a few years ago I found out about blockchain and I got absolutely obsessed with it. I think it is a revolutionary technology that opens the door into a fairer new world because it liberates us. It has a potential to liberate us from this parasitic centralized intermediaries.

And it is only recently though. Thanks to Thomas that I had an opportunity to get a hands on experience analyzing the actual on chain data.

[:

So I came across an article that was published on the towards data science publication on Medium.. That was basically a summary of, of the work that , the two of you were working on. So I was just wondering if you could help those listening, just get a quick overview of of what that project was and what, what that was looking at.

[:

It can get quite expensive too, since you would need very performing hard drives to store the blockchain database. But fortunately for us we have discovered that there exists services out there that can provide us with a shortcut. And those are Etherscan for example, or more recently, the one that appeared more recently is called the Graph.

And so what they offer is an API functionality to ease the process of retrieving the data in the right format that can be easily processed by statistical software such as R. So in the first article, we describe how to use API. To download NFT transactions data from the NFT market.

In our case, we use Opensea as an NFT market. Uh, We also show some basic exploratory analysis that can be performed on their own chain. In our case we made a few simple graphs contained in the summary of the transaction prices. So here it is for the first article and uh, Thomas going to talk a little bit about the second one.

[:

or money related token, I mean NFTs are unique and you can actually track each of them individually. And that's what we did. So we decided to track the Weird Whales collection.. The Weird Whale a bit like the crypto punk and the farmers keep the plank, but with little tiny whales, very cute. So there are a total of about a bit more than 3 thousand whales and each of them has a different combination of attributes, different.

Colour or some other art, some smoke cigarettes. So we, we chose that collection because the story behind it is nice. It was actually created by a 12 years old boy who started programming when he was five. That little boy made the bus with his NFT as well as half a million dollar in two months.

Yes. Yes. It was quite profitible.. But so coming back to the article, the first part is about retreiving the data. Again, you, you will use the etherscan, so using the API. But, but I have to say it was way more technical than the, for the first article, because it involved doing a bit of reverse engineering.

To understand the Opensea smart, contract. So you can read the, the smart contact with judge stone, the shin, but you need to understand it to, to, to be able to, to know which that then you need, you need to expect them all to, to hit them. So then once you have the data, oh, and do you visualize.

have so many NFTs, more than:

So, if you try to plot all the transaction data, you would just get the nosy, you know, unreadable plots. So that's why when you analyze blockchain data and you want, want to work on subset of the data on to, or to find effective ways to surmise it,

[:

Or are you, is that functionality more built into some of those services?

[:

So that, that, that can be quite complicated. But so with the services, it seems like etherscan the half and they are money of the. It's easy. So you just, the way he, the API would just, for example, say, I want to, to listen to the transaction for, from this address between that date and that date and did the, the, the silver will send you back the data in a nice formatted way.

[:

I was just wondering if you could rewind back the clock and two, when you're first thinking about this project and some of the things that you were experiencing at the time or sort of what led you to, to begin working on it.

[:

We use, ah, you can also use Python. There are others, but R and Python and the main programming language for data science. So that it's not that difficult, but there's is a learning curve. If you have no prior knowledge of programming. No, no. Regarding what led me work on this project, we described such a, the one on following the NFT transaction.

One day I just simply got curious about how on earth one gets to hit the blockchain data and it just kicked on from there.

[:

[:

And for instance, I remember me spending hours trying to add a little directional arrows in a transaction graph to finally realize that the whole plot was of no use. And that is why my recommendation would be to our listeners to always have a final goal and the approach towards achieving it maybe even written somewhere because that would help you to avoid getting stuck in unimportant details and focus on the goal that you are trying to achieve or with with. your project.

[:

[:

It's common for all data science project, I

[:

And we will get to that later.

[:

[:

[:

So we still have a lot to learn there.

[:

[:

[:

[:

[:

[:

So once you did it once you, you, you will be maybe easier for, for the next. So at once you're out at that time, of course, the second part is to perform the analysis under the bars, the parties, to write a report or an article to describe what you did just as what you would do with another data science project No, we got in the tool we use.

I think we mentioned it briefly, but for that, that sense that the scientist, there are two main language R and Python. So you will use only R because we are used to it. That's the language we use on a daily basis for, for. And, and for the one while not familiar with it it's a functional programming language extensively used by statistician and data scientist, as I mentioned.

And when it comes to data analysis of endless possibilities. So you can do everything with data management, plotting modeling web publication ebook thing it's open source and easy to learn. And then. Big huge community which developed thousands of thousands of packages to enhance the functionality.

So whatever analysis you would want to perform, it's very likely that someone already developed a package for. So, so it wasn't, I mentioned we did everything with our the only the data using the API, doing the data management, the analysis, of course, and even the hypoxemia we write to have to kill using a Macedon to, to produce NY HTML uh report,, so finally once we don't, we just push everything on Github.

So, so it saves somewhere in the cloud and it can be used by the community

[:

[:

Minton is event of creating a new NFT on the blockchain. There is a period of intense price growth. Then a quieter period follows when the prices drop. Followed again by price. And the letter is clearly associated with an intense activity on the social media, around weird whale collection. So this is what is very interesting.

There is some cycles in the in the data. So you see ups and downs and yes, those can be linked back to what is happening behind the scenes in the, in the social. But of course this pattern is very specific to the weird well collection and other collections may show a different behavior.

We don't know because we haven't looked at other collections, but regarding weird Wales one. Yes. So this is the part that we observed that the ups and downs are linked to the social media attention to the. And also, I was surprised by the number of transactions that we retrieved from the blockchain.

The only we looked at one single NFT collection. Which was a recent, by the way. Eh, so as Thomas mentioned, it is very challenging to visualize so much data and there is field of discovery. There is there are lots of opportunities to be made in terms of developing new visualization tools that are tailored to onchain data visual, visually.

[:

[:

We see many question and positive feedback. They also, they do everyday new follow works which is I think a good sign. And yeah, just to give a comparrison, isn't have you see, we had written a few article on pharmaceutical process. That's the domain in which we were doing the day. That's a work but I have to say that the number of few for the, to his son, blockchain related articles is all the magnitude above the one about pharmaceutical processes.

So that's interested.

[:

[:

[:

So much attention, some credible.

[:

[:

[:

[:

Between wireless device that we often call IOT device internet of the thing. These device can be environmental sensors to monitor the quality or for our vehicle to help pick poles. It can be also localization sense, so to talk by bike fleets. So they are many application, many of these little device.

So, so even it's interesting because people are incentivized to install hotspots by earning helium token. So they installed these, they buy this hot spot, they install it and then they earn money. And this is what allows the network to increase its coverage. So that's a very successful project. I think you're yeah, one, two months ago we were about 500,000 hot spots in the world.

And it's just growing exponentially. So, so I personally liked this project because it does a practical use.. It's not just about finance, like money or the blockchain project. It has real world application. So, so I have one or two spots two meters away from me. So, so we are,

[:

[:

So, yeah. Yeah. But that's quite profitable. I don't have to see, but so I'm looking back at the. I would say that looking at helium data is a whole new step.. That it's not a simple curiosity. I believe people need maitakes and statistics to take good decision. I oh, they will find a work useful, but also to come back to your question or quiz and golden, I got to am project is to visualize the hopes of the network and to quantify it's usage.

So this time, it's a bit down there to get the data. Because to visualize the goals of the network you need, of course the data from times you up to to know. So you need all historical data for all it spots in the world. So that means that basically you need to give the full blockchain and this is once it's in a database loaded and a running.

So I'll take her by of data. So you, you, you cannot rely just on API and send a query to an API and receive a a small data table. It's just a order of magnitude to be clear the, the size of the, this

[:

[:

[:

[:

And these nodes are all the database from the big.

[:

So essentially when you have the historical data, can you start to build a map of, of all the usage on there or what's the kind of have you had a chance to kind of get an, get an idea of the level of granularity to, to the data that you can get from the helium.

[:

So the, the, the, the blockchain, he goes all the transaction. So when the device connect to you spot, it will transfer some data. I don't think the, these, these doctor, despite the, what is transferred by the device is saved on the blockchain. But folks, shoe, what is saved on the blockchain is that there was a transfer form of.

And of course this transfer generates he reward in terms of hileum token. And that is saved on the blockchain. And also a while you have another layer, which is the co-heads of the network. So to ensure that you would spot is it's useful it will communicate with other hotspots. And so if it can communicate with other hotspots, it means that your spot is set up at a good place.

And then you'll be rewarded for having a spot at the good place. And so all this transaction reward are saved on the blockchain. So you can really filter that wild by, is it so he wanted you to a good courage or because you can say it, you will see the tough come IOT device or this is.

[:

[:

And then they go to, just to get some money.. And so I do. Yeah. And it's supposed to be, but to detect these hot spots. So there are a lot of nice, nice thing to do about it. Nice data science project they're

[:

[:

And it sounds like you've got some pretty interesting stuff to come. If people are wanting to learn more about your work and potentially see future projects, where should they be?

[:

And the address is HTTPS slash slash Meagan medium.com/at TD matching. And TD marchin is T D E M A R C H I N,

[:

[:

[:

Links