AI-Powered Dark Web Intelligence & Threat Classification
A full-stack framework combining automated .onion scraping, transformer-based NLP classification, and interactive BI dashboards to transform unstructured darknet data into actionable cybersecurity intelligence.
The dark web hosts thousands of anonymous .onion marketplaces facilitating drug trafficking, stolen credentials, hacking services, counterfeit goods, and cybercrime-as-a-service — shielded by Tor anonymity, CAPTCHAs, and constantly shifting infrastructure.
NeoSilk is an AI-powered threat intelligence framework built to penetrate this environment. It automates collection from live darknet markets, applies state-of-the-art NLP to classify and understand threats, and surfaces findings through actionable intelligence dashboards for cybersecurity analysts.
The system bridges the critical gap between the dark web's operational opacity and the real-time visibility required for proactive cyber defense.
Python scrapers over Tor extract product listings, vendor profiles, pricing, and review text from live .onion marketplaces with CAPTCHA resolution support.
Fine-tuned transformer models classify products into threat categories: Drugs, Digital, Fraud, Guides. Covers stimulants, benzos, psychedelics, carding, and hacking services.
SHAP-based explainability surfaces exact token-level attributions driving each classification — essential for analyst trust and operational deployment in high-stakes security contexts.
Retrieval-Augmented Generation powered by DistilGPT-2 enables natural language analyst queries directly against indexed darknet content without structured query syntax.
RoBERTa-based sentiment extraction on buyer and seller review text. Produces vendor trust scores and surfaces reputation patterns and emerging threat signals from marketplace community data.
Power BI dashboards for cybersecurity analysts — KPIs across 26K+ listings, category distribution, vendor rankings, geographic shipping analysis, conversion funnels, and stock intelligence.
Route all traffic through Tor. Anonymize connections to .onion addresses.
Extract listings, vendors, prices, reviews from Hidden Market & MGM Grand.
Manual + ML-assisted resolution. 5K image dataset collected live from MGM Grand.
BERT / DarkBERT / RoBERTa classify threats. Sentiment on reviews. RAG Q&A.
Token-level attribution for every prediction. Full analyst interpretability.
Power BI surfaces KPIs, trends, vendor maps, and actionable threat intel.
Pre-trained exclusively on dark web corpus. Natively understands darknet jargon, coded language, and marketplace-specific terminology — highest-performing model for this domain.
Bidirectional Encoder Representations from Transformers. Fine-tuned on scraped darknet data for multi-class product threat categorization across Drugs, Digital, Fraud, and Tutorials.
Robustly Optimized BERT. Applied to vendor review sentiment extraction across both marketplaces — producing trust scores and flagging negative vendor-product patterns from community feedback.
Lightweight distilled GPT-2 powers the Retrieval-Augmented Generation layer. Enables natural language analyst queries against the indexed darknet corpus without structured query syntax.
5,000 alphanumeric CAPTCHA images collected during live scraping of MGM Grand darknet marketplace. Released publicly to support CAPTCHA-solving model research and the security community.
View on Kaggle — 5K CAPTCHA Images| Vendor | Units |
|---|---|
| danielvitor61 | 3,700 |
| thepirateisland | 3,400 |
| heartkidnapper | 2,600 |
| kingaccount | 2,100 |
| drunkdragon | 2,000 |
| Metric | Value |
|---|---|
| Conversion Rate | 2% |
| Avg Units Sold | 5.89 |
| Avg Views/Product | 329 |
| Escrow Coverage | ~90% |
| Vendor | Qty |
|---|---|
| 4free | 21,000 |
| sexman66 | 13,000 |
| greenleafde | 12,000 |
| LucySkyDiam… | 11,000 |
| Vanny3 | 11,000 |
| Product | Qty |
|---|---|
| 20300 XANAX | 6,500 |
| 20200 S903 Green | 5,900 |
| LSD 125ug Bulk | 4,900 |
| FREE SHIP B9 | 3,900 |
| 1 Active Listing | 3,900 |
This project was developed exclusively for educational and cybersecurity research purposes under academic supervision at Cairo University's Faculty of Computers and Artificial Intelligence. All data collection followed established ethical guidelines for security research. NeoSilk is a research instrument designed to help cybersecurity professionals gain proactive threat intelligence — its sole purpose is improving defensive capabilities. The CAPTCHA dataset and framework are released to advance the community's ability to study and counter darknet threats.