Artwork for podcast NeedleStack
Dark web research tips for the OSINT-curious
Episode 1624th May 2022 • NeedleStack • Authentic8
00:00:00 00:27:44

Share Episode

Shownotes

Michael James of the OSINT Curious Project joins the podcast to give expert tips on conducting dark web research. From GitHub tools to Discord resources— Michael has practical guidance on finding information in the dark.

Transcripts

MICHAEL JAMES:

It has been, in the last ten years, with a lot of people getting braver and putting these hacker forums online in regards to clear net sites and even advertising for certain illicit, illegal tradecraft things that are just being posted on Craigslist or on Telegram or other things. So, to get to a lot of the malicious actors and to go through and start investigations, you don't necessarily need to go through and start your investigation in a Tor browser or in a deep web kind of sense.

MATT ASHBURN:

Welcome to Needlestack, the podcast for professional online research. I'm your host Matt Ashburn, and I'm more than just a little bit curious about OSINT.

JEFF PHILLIPS:

And I'm Jeff Phillips, tech industry veteran and curious to a fault. Today we're continuing our exploration of the dark web and its role as an information research for analysts, for researchers, as well as for investigators. And we're doing so today by chatting with the OSINT Curious Project's Michael James. Now, Michael started out investigating risk and fraud in the finance sector, he's now in his day job, a threat intelligence researcher, but he's also one of the founders of the OSINT Curious Project, which is how we ended up having him on the show. Welcome to the show, Michael!

MICHAEL JAMES:

Thank you so much. It's really an honor to be out here, I really appreciate it, but yeah, it's good times, yeah.

JEFF PHILLIPS:

Excellent. Well, I mentioned the OSINT Curious Project a couple of times in that intro, let's start off there. Can you tell us a little bit about it and its mission?

MICHAEL JAMES:

Sure. So OSINT Curious is - the OSINT Curious Project is a nonprofit organization. It's focused on expanding kind of what open source intelligence can do for your day-to-day. It doesn't necessarily have to go through and do a cyber intelligence or even information security, but it kind of lives in that space, right? You hit it right on the head, I used to go through and do a lot with financial sector work, so we would pull publicly available information and we would use social media accounts, even though there's some debate in regards to open source intelligence versus SOCMINT authentication, stuff like that. But the goal of OSINT Curious is to provide free, tangible information to people so that they can go through and use it in their day-to-day jobs to go through and to help defend kind of their exterior perimeter for a lot of stuff. And just, again, the knowledge base, of what can be done with open source intelligence and for, like I said, clicking all the links, searching all the things and making sure that you understand what the safety protocols for that are. We're very big in regards to operational security because sometimes you look at things and they look back, and that's what we want to go through and make sure that you at least have the ability to defend yourself against.

JEFF PHILLIPS:

I'm sorry, Matt. I was just going to say I encourage people to check that out. It's a great resource that even I look at from the website to, you guys have a podcast also.

MICHAEL JAMES:

We have ten minute tips on YouTube and I still go through and look at those ten minute tips. It's very very good. It's very helpful.

MATT ASHBURN:

Yeah, I was gonna say the same exact thing, Jeff. It's a great resource, a great set of resources, I should say. Not just the podcast, but YouTube clips and everything. And the OSINT Curious Project has done tremendous work with getting the word out about OSINT as a craft and as a profession, and really also, I think, helping to break down people's fears and misperceptions about OSINT, too. There's a lot of research people have been doing. You guys are really helping to improve the tradecraft. So good job on that stuff.

MICHAEL JAMES:

We really want to go through and open it up to more than just the traditional norms as well, right? Like, everyone knows Google, you know, if you have a problem, you Google it. And that's fantastic, but a lot of stuff that you're leaving on the table by not checking Bing, by not checking Yandex, by not checking other geospecific search engines or social media search engines that are specific to a country or region, there's a lot of in-depth information you can get from that. And so we're there to go through and expand upon that and then also maybe knock down some of the other things that are just information gathering versus actual intelligence, you know? The thing with open source intelligence is that you gather the information, which is a large part of it, but you actually have to go through and consider and consult that information and make sure that it's actually tangible and it's answering a question for whatever your stakeholder, your client or your mission set actually is. So it's very important to go through and lead with "why is it important?" as opposed to "what tool can I use?" Because that's not OSINT. But yeah, absolutely.

MATT ASHBURN:

That's very true and something I've said many times on this podcast and in real life as well - huge difference between information and intelligence, or data and intelligence, right? You have to apply that analytic rigor to get from information and data to a finished intel project, so good point there. Speaking of some of the misperceptions and things, we've actually, in the past few episodes, have been doing a little bit of myth busting and education on the dark web and as it pertains to cryptocurrency and investigations. You mentioned also that there are many things out there aside from Google, and this series that we're focusing on is mostly related to the dark web and resources out there. What are some of the misconceptions that you find folks have around conducting OSINT on the dark web?

MICHAEL JAMES:

So I'll start with the classic deep web versus dark web, right? There's the myth that everything is the dark web in regards to anything that's on a Tor browser, I2P anything else, right? The deep web is just anything that is not cache, that is not searchable by a Google search query. You know, it could be an intranet system inside your internal company's network that is protected by a firewall and so other people can't get to because it is resource dependent on internal needs. It does obviously encompass, like, IPFS, which is the InterPlanetary File System, which is its own deep web. Telegram is technically its own deep web, even though it's a social media, but it's not cache, so you can get the invites, but the actual membership, you have to log in for that stuff. Dark web really speaks to the malicious, right? It really is where the markets are in regards to who's selling what illicit illegal substances, or data brokering or initial access brokering for ransomware actors, where ransomware actors host their sites to go through and to name and shame their "clients" or victims. And so there's a lot of misconception in regards to everything is dark web, everything is evil and malicious, and as soon as you log on, they're gonna own you and your computer's going to be a slave, and it's not obviously true. Tor was actually created by the Navy, I think it was back in the 1950s-60s, and it was a means for them to go through and communicate with officers that were out, that didn't have independent, reliable clearance communication networks. And so when it fell off there, the EFF and a couple of other privacy-focused individuals took up the mantle to go through and continue to go through and develop the Tor network, The Onion Router, as it's called, and it allows a level of anonymity that helps with a lot of people that are in countries that don't have access to free speech or free communication. So it actually was based as a privacy tool and now it's been maliciously distorted to go through and kind of serve as this platform for selling illicit, illegal materials as well. So there's a lot of good that is on the dark web. CNN is on the dark web, Facebook is on the dark web - or the deep web, excuse me. The CIA right now currently hosts a deep web site to go through and to communicate with people who are in Russia and in Ukraine to go through and get the information from those countries that are behind borders to freedom-loving Americans and other people that can help with causes that may need them to go through and to be helped, right? So there's a lot of misconception in regards to that, but there's - there's a fair bit of bad stuff on there as well, so tread lightly, right?

JEFF PHILLIPS:

You know, I was looking at something, it might have been a tweet of yours, on the dark web. You were talking about how a fair amount of information is often mirrored between - you want to call it the light web, the surface - and the dark web.

MICHAEL JAMES:

Clear web or whatever you want to call it, yeah, that's absolutely true. There's a lot that used to be kind of the domain of the deep web, right? You needed to go through and have that barrier of entry. You needed to know how to get to the Tor browser and install that. You needed to know what domain you were going to. It was kind of like BBS boards back in the day. You had that specific address to get there, otherwise you were out of luck, because there is no Google for the dark web or the deep web specifically. There are keyword searches and indicators you can go through and do with search engines like Ahmia and Haystack and Stronghold - or not Stronghold, sorry, that's a pastebin thing. But there are a lot of things like Hunchly's data set that you can go through and pull down daily to tell you where to go for that stuff. But it has been, in the last ten years, with a lot of people getting braver and putting these hacker forums online in regards to clear net sites and even advertising for certain illicit, illegal tradecraft things that are just being posted on Craigslist or on Telegram or other things. So, to get to a lot of the malicious actors and to go through and start investigations, you don't necessarily need to go through and start your investigation in a Tor browser or in a deep web kind of sense. There are other things you can do while you have a current browser, and there are Tor proxies, which we don't recommend for OSINT Curious, but there are Tor proxies that you can add a .ly or a .cab to the end of an Onion link, and you can actually cruise the Tor browser link in your current browser, be it Chrome or Firefox or anything else like that. So there's a lot that you can go through and do that allows you to go through and stay on the clear web, stay in your comfort zone as it was, to get to that information that's accessible.

JEFF PHILLIPS:

Let's go down that path a little bit about what kinds of things, when you do go to the dark web and you're passing - and you're conducting research, it would probably depend on your use case, but what kind of things are you looking for on the dark web?

MICHAEL JAMES:

Sure, a lot of the things that people will contract us to go through and do is privacy- and security-minded work or data leak in regards to corporate or non-governmental, or even governmental organizations looking to go through and see - maybe if it's a law enforcement case, what sort of trend analysis you can go through and find in regards to specifically like USA or Europe, in regards to illicit drugs or running of trafficking, things like that. A lot of times I'm on there looking for data breach analysis information. There are a lot of databases that have been hacked and then sold either for, like I said, initial access brokers that sell to ransomware groups to go through and get their initial foothold and then carry out a bigger plot, or just to go through and say, hey, I was able to go through and scrape all of Facebook and put it all on a database here, or whatever, and now you have this link in the dark web where you can research people's individual PII, so personally identifiable information. You have their phone number, their email address, their name, what city they live in, all of the things you would put on a normal Facebook profile that hopefully you have locked down and put privacy settings on. But these people went above and beyond and went through and scraped all the information and put it publicly available on a Tor site there. So there are times when we'll do a high-value target enumeration or we're looking at the privacy of specific political candidates, we need to go through and see what information is out there about them so that we know how malicious actors are going to go through and pivot from that information into the real world. That's one of the real benefits of OSINT for me, is taking a digital artifact and moving it into a physical world. It's the connection between the cyber and the real that really plays a big role in regards to personal security and privacy.

MATT ASHBURN:

Yeah, that's really really rewarding. Can you give us an example or tell us a little bit about one of the more successful investigations that you've conducted?

MICHAEL JAMES:

Sure, I think so. So, we have a lot of scrapers in regards to pastebin-type sites and things like that where people are constantly advertising. One of the ones that we went to and we were launching investigations from for the National Childhood Protection Task Force was trafficking. There were a lot of ads out there for human trafficking, and, you know, some of them are bogus and that's the reason why you look at them, right? You want to go through and vet for legitimacy in regards to other links you can go through and provide and make sure that if there is something to be found, that it's found and then disseminated to law enforcement or the appropriate legal organization, right? So when we start scraping these kind of platforms and they start leaving identifying markers like WhatsApp numbers, Proton accounts for email, usernames where they post on several different sites, or even the different pastebins, we're able to go through and take that information and enumerate that in regards to anything you would do with classic OSINT tradework, right? So username analysis, pivoting from social media site to social media site, backtracking in regards to historical views from the Wayback Machine. The thing about dark web and deep web analysis is that it seems very abrupt and scary, but the thing that you have to understand - this is with any technology - as long as you're focused on your core tenants, as long as you know what your tradecraft is and you had to pivot from one piece of information into another, that's all you're really doing. And the technology is just a means of delivery, right? So as long as you stick to your standard tradecraft and you're able to go through and take a username and run it through the WhatsMyName app that Micah Hoffman developed a long time ago. If you take an email address and you're able to go through and put it through iPadOS tool, and you're able to go through and find out other platforms that's been registered or the services that maybe they registered for that. So that's all going to go through and get you that much further into defining what verified information you can get. In that case that we were talking about with the trafficking, we were able to go through and link it to a LinkedIn profile that was actually selling themselves as immigration services when it really was a human trafficking situation and they were maliciously advertising to people to go through and get out of these wartorn countries in bad situations and then literally enslaving them in regards to death that they didn't know they were doing, all while having this shiny front on LinkedIn to go through and advertise and get endorsements from people and really kind of make it a very bad situation for these people. You know, legitimacy and then backing it with the actual actors on the deep web, dark web stuff is a really good way for us to go through and to expose some of the things that's actually even come out from the stuff.

MATT ASHBURN:

That's incredible.

JEFF PHILLIPS:

That is super interesting. So it sounds like, despite when people think of the dark web or think of Tor, the idea of anonymity, that there's quite a bit of information that's going to flow back and forth between the dark or deep web and the surface web.

MICHAEL JAMES:

Sure. So that's just the overshared information, right? These people, especially if you think about darknet markets, people who are selling illicit drugs and things like that, it's kind of unfair to them, not because they're drug dealers or anything, but because they have to go through and build up these reputations on these darknet platforms. No one wants to trust each other on these platforms and they're all worried about rug pulls and getting their money stolen, so they have to build up this repository of good notifications and they get the most stars for whatever they do, right? So that allows us to go through and take that link in that market, that stall, that vendor, that username ID and profile an entity out of it, right? So that allows us to go through and say, okay, this person does these type of drug sales or this type of information brokering and they're on this site so then we can pattern them from other sites that they may be accessible on as well. And then at that point, if there are other pages that they put in regards to social media or if they have their own unique website, that they will brandish their goods, as it is, that allows us to go through and do technical analysis. When you go to those pages, they may be setting up kind of an off-the-shelf market or off-the-shelf kind of WordPress site or an Nginx site or something in Apache, and maybe they don't put all the privacy restrictions on there, maybe they leave the info status for the server, what else is running on there, and so we can go to that through URL jacking. You can go through and look for sitemaps through XML. At the end of it, just tag XML - or sitemap.xml. That can lead to some very interesting information because if they fill out the authorship of that XML sitemap, then you know who that is. We actually found one individual - more than one individual - but we found one individual who had a sitemap who listed himself as the author, gave himself the username, which we know from a previous case, and actually were able to go through and trace it back to the very beginning of that WordPress site, which gave us the creation date, which gave us an additional piece of information in an email and then everything else that was already listed that they thought was a secmail.pro, so it's a super secret Squirrel email address, but this other one was a ProtonMail account and ProtonMail were able to go through and enumerate when they created that. So it's not the smoking gun all the time, but if you can layer more and more of these informational artifacts, then you can go through and continue to build a case and the longer term that you're able to go through and to investigate this specific area, subject, research kind of grant, the more that you're able to go through and pull out and contextualize. So it's really nice to go through and have that historical look in regards to some of that stuff.

MATT ASHBURN:

That's very cool. One thing that folks have heard mentioned is there's a switch from version two to version three. There's some change there in URLs and things like that. How does that affect the practice of OSINT research on the dark web?

MICHAEL JAMES:

Sure, yeah. Back in October they switched - Tor decided that - The Onion Router Project, excuse me, they decided they wanted to go through and move from a version two, which was a smaller character set in regards to the URLs, to a larger character set, which is more space for encryption and more anonymity. Also, it gives - just like with the problem that we're having with IPv4 and IPv6, they were running out of random characterizations in regards to sites for URLs. This allows them to go through and kind of almost infinitely be able to have The Onion domains, or whatever, and have a random character string so that it's affordable for everyone to go through and have their own domain. So what it does for us is it allows us to go through and delineate who is keeping current on their projects and who is not, right? It made all of us go through and update all of our mirror sites and make sure that we know what actors are actually following the trend. There is another way to go through and kind of look at this as well, is that you have a deprecated system like v2 where there are people who are going to use that as for better term, whatever is security through obscurity - if you don't upgrade to this version three, you can stay kind of hidden under the radar - and there is a way that you can deprecate your Tor browser and you can still search for those version two links there, so that's something we did figure out and we had to go through and enumerate. But that is something that allows us to go through and kind of keep, in a metadata sense, who is keeping current with this, who is making money off this, who is ashamed of downtime in regards to this stuff. So it puts the most serious actors kind of upfront because they're making that Patch Tuesday update, right? They're getting to that spot where they can be accessible and there's redundancies in place. So that puts a higher target on those people because they are more in line with what their business structure is for either sales or distribution or trafficking, depending upon what it is. So there's some contextual information you can get around that stuff, but there's a lot of sites that we used to use that unfortunately have kind of gone to the wayside and we can't go through and get those in our standard browsers now and sometimes that's not great, but social media sites and things like that, they were quick to jump on the version three and they knew exactly when it was coming, so they obviously are in it to win it for that stuff.

JEFF PHILLIPS:

Well, Michael, one of the things we like to help the audience with is any kind of resources - is resources. You've dropped a bunch as we've gone through here, but maybe as a last question, if you're not doing dark web research all the time or if you're not familiar with it at all, you know, any suggestions on resources on how to get started and learn - how to learn - how to find the information that you might be looking for?

MICHAEL JAMES:

Sure. I know you guys have done a couple of different shows in regards to this, so I won't beat everyone over the head about how to go through and download the Tor browser. What I will say, and I say this to veterans and I say this to people who are just getting started, operational security is your number one concern, right? So you don't want to go through and go to the Tor browser and then look up your bank sites, right? It's important to go through and understand that there are JavaScript modules in some sites that will try to trigger and will try to go through and do malicious things to your device, right? There are some myths in regards to blowing it up full size versus keeping it compact in regards to fingerprinting, and fingerprinting is a real thing, whatever. So people can go through and look at your browser and your machine and see what your fingerprint provides them by going to their site. So operational security is first and foremost the number one thing you should consider. So Chris Hadnagy and a couple of other people used to go through and say what you do before matters, right? So think about your operational mindset and if you're just going and you're just doing research as a hobbyist, that's totally fine. Maybe you want to pick up something like Tails OS, which is an operating system that is designed to go through and to help with anonymity. If you're doing this as a daily researcher and you have a VM, which I recommend, that way you can go through and you can implement Tor and you can do snapshots, you can revert back, you can clear all the cache and all that stuff - that's a repeatable process. There are a lot of things you can do as a third party site if you have the money to go through and pay, or if you're law enforcement or military and you don't have the ability to go through and put code on your computer, you can't get to Sudo in regards to your Linux station. Echosec is a very good platform, that's a paid tool. There are a lot of other sites like ahmia.fi that will allow you to go through and do strings to go through and search for certain keywords and things like that. There's a lot that you can do as a hobbyist or as a professional. One thing I will do for you guys is I actually have a GitHub repository and I will link some of the better tools and some of the more advanced things to go through and do, like Google dorking searches through Katana and stuff like that, or for how to get started in regards to that sort of stuff, because it's very daunting if you don't know. And if you just walk into it, you really can go through and either see something you don't want to see and you'll never be able to unsee or you can actually get hurt and revert to some of the stuff. So it's very important to tread lightly. I know that I've interacted with Trace Labs, which is a group that does missing persons cases and I know that we've done some - something through OSINT Curious to go through and just talk about operational security. It is one of those things that you really want to go through and understand what you're doing on the deep web before you go through and start interacting with it. Because like I said, if you start - as with anything, if you start clicking links, you could unfortunately get to a CAPTCHA that allows them to go through and manipulate your browser or download something, or if there's a PDF or a malicious doc, you know - it's just, it's basic cybersecurity at this point that you just want to understand what you're doing and not click all the links unless you're in a sandbox or a good environment, right?

MATT ASHBURN:

That's always good advice, and as a cybersecurity guy, I appreciate you mentioning that. That's always an important topic.

MICHAEL JAMES:

It's not the sexiest, but it is one of those things that will keep you safe, right?

MATT ASHBURN:

It is. That paired with OPSEC as well, you mentioned that a few times today, right? Those are incredibly important concepts, regardless of whatever you're doing, but especially important for OSINT, so I appreciate that. Michael James, thank you so much for joining us, we learned a lot. It was great having you on the show this week, and thanks again for being here. Thanks to everyone else at home for tuning into this week's show. To find out more about the OSINT Curious Project, you can visit their website osintcurio.us - that's OSINT Curious, but you break off the .us at the end - or you can join their Discord community. Keep in mind they're also a nonprofit organization and you can donate to the project if you'd like via the Patreon link on their website. You can find the link to the website and more information about OSINT Curious Project in our show notes. If you liked what you heard today, you can subscribe to our show wherever you get your podcasts. You can also watch episodes on YouTube and view transcripts and other show information on our website. That's at authentic8.com/needlestack. That's authentic with the number eight dot com slash needlestack. We'll be back next week with more on the dark web, talking to other folks specialize in dark web research. We hope to see you then.

Follow

Links

Chapters

Video

More from YouTube