DARK WEB THREAT INTELLIGENCE FRAMEWORK

NeoSilk

AI-Powered Dark Web Intelligence & Threat Classification

A full-stack framework combining automated .onion scraping, transformer-based NLP classification, and interactive BI dashboards to transform unstructured darknet data into actionable cybersecurity intelligence.

View Repository Live Dashboards
0Products Indexed
0KUnits Purchased
0CAPTCHA Images
0Vendors Tracked
Framework Overview

What is NeoSilk?

The dark web hosts thousands of anonymous .onion marketplaces facilitating drug trafficking, stolen credentials, hacking services, counterfeit goods, and cybercrime-as-a-service — shielded by Tor anonymity, CAPTCHAs, and constantly shifting infrastructure.

NeoSilk is an AI-powered threat intelligence framework built to penetrate this environment. It automates collection from live darknet markets, applies state-of-the-art NLP to classify and understand threats, and surfaces findings through actionable intelligence dashboards for cybersecurity analysts.

The system bridges the critical gap between the dark web's operational opacity and the real-time visibility required for proactive cyber defense.

PythonTor NetworkBERT DarkBERTRoBERTaSHAP / XAI Power BIRAG / QADistilGPT-2 .onion Scraping
Core Capabilities

What NeoSilk Does

01 — COLLECTION

Secure Darknet Scraping

Python scrapers over Tor extract product listings, vendor profiles, pricing, and review text from live .onion marketplaces with CAPTCHA resolution support.

Tor NetworkPython ScraperCAPTCHA Bypass.onion Sites
02 — CLASSIFICATION

NLP Threat Classification

Fine-tuned transformer models classify products into threat categories: Drugs, Digital, Fraud, Guides. Covers stimulants, benzos, psychedelics, carding, and hacking services.

BERTDarkBERTRoBERTaMulti-class
03 — EXPLAINABILITY

Explainable AI via SHAP

SHAP-based explainability surfaces exact token-level attributions driving each classification — essential for analyst trust and operational deployment in high-stakes security contexts.

SHAP ValuesFeature AttributionXAI
04 — UNDERSTANDING

RAG-Based Q&A Intelligence

Retrieval-Augmented Generation powered by DistilGPT-2 enables natural language analyst queries directly against indexed darknet content without structured query syntax.

RAGDistilGPT-2Semantic Search
05 — SENTIMENT

Review Sentiment Analysis

RoBERTa-based sentiment extraction on buyer and seller review text. Produces vendor trust scores and surfaces reputation patterns and emerging threat signals from marketplace community data.

Sentiment NLPVendor ProfilingTrust Scoring
06 — VISUALIZATION

Interactive BI Dashboards

Power BI dashboards for cybersecurity analysts — KPIs across 26K+ listings, category distribution, vendor rankings, geographic shipping analysis, conversion funnels, and stock intelligence.

Power BIKPI MonitoringGeo IntelVendor Analytics
System Architecture

Intelligence Pipeline

01
Tor Access

Route all traffic through Tor. Anonymize connections to .onion addresses.

02
Scrape Markets

Extract listings, vendors, prices, reviews from Hidden Market & MGM Grand.

03
CAPTCHA Solve

Manual + ML-assisted resolution. 5K image dataset collected live from MGM Grand.

04
NLP Classification

BERT / DarkBERT / RoBERTa classify threats. Sentiment on reviews. RAG Q&A.

05
SHAP XAI

Token-level attribution for every prediction. Full analyst interpretability.

06
Intel Dashboard

Power BI surfaces KPIs, trends, vendor maps, and actionable threat intel.

AI Model Stack

NLP Models

DarkBERT
PRIMARY THREAT CLASSIFIER
Domain-Trained

Pre-trained exclusively on dark web corpus. Natively understands darknet jargon, coded language, and marketplace-specific terminology — highest-performing model for this domain.

Classification Accuracy93%
Domain Relevance97%
BERT
FINE-TUNED CLASSIFIER
Transformer

Bidirectional Encoder Representations from Transformers. Fine-tuned on scraped darknet data for multi-class product threat categorization across Drugs, Digital, Fraud, and Tutorials.

Classification Accuracy88%
Weighted F1 Score86%
RoBERTa
SENTIMENT ANALYSIS
Fine-Tuned

Robustly Optimized BERT. Applied to vendor review sentiment extraction across both marketplaces — producing trust scores and flagging negative vendor-product patterns from community feedback.

Sentiment F191%
Review Coverage94%
DistilGPT-2
RAG-BASED Q&A
Generative

Lightweight distilled GPT-2 powers the Retrieval-Augmented Generation layer. Enables natural language analyst queries against the indexed darknet corpus without structured query syntax.

Q&A Relevance Score79%
Retrieval Precision83%
Data Sources

Scraped Marketplaces

Hidden Market
*.onion — No CAPTCHA — Full Scrape Access
8.2KProducts
331Vendors
203KUnits Sold
MGM Grand
*.onion — CAPTCHA Protected — Manual Resolution
18.2KProducts
881Sellers
107KUnits Purchased
Public CAPTCHA Dataset — Kaggle

5,000 alphanumeric CAPTCHA images collected during live scraping of MGM Grand darknet marketplace. Released publicly to support CAPTCHA-solving model research and the security community.

View on Kaggle — 5K CAPTCHA Images
Intelligence Dashboards

Threat Intel Visualization

NEOSILK — MGM GRAND THREAT DASHBOARD — POWER BI
Total Products
0
18,233 Active Listings
Total Sellers
0
Vendor Profiles
Total Views
0
Marketplace Traffic
Units Purchased
0
Transaction Volume
Avg Price (USD)
0
Per Product Avg
In Stock Rate
0
Availability
Category Distribution — Products by Type
Threat Category Split
16.3K
Stimulants3.1K
Cannabis2.8K
Psychedelics1.6K
Benzos1.3K
Digital / Porn2.3K
Other5.1K
Top Vendors by Volume
VendorUnits
danielvitor613,700
thepirateisland3,400
heartkidnapper2,600
kingaccount2,100
drunkdragon2,000
Ships To Distribution
Key Intel Metrics
MetricValue
Conversion Rate2%
Avg Units Sold5.89
Avg Views/Product329
Escrow Coverage~90%
NEOSILK — HIDDEN MARKET THREAT DASHBOARD — POWER BI
Total Products
0
8,161 Active Listings
Total Vendors
0
Vendor Profiles
Avg Price (USD)
0
Per Product Avg
Total Purchased
0
Transaction Volume
In Stock Rate
0
Availability
Escrow Rate
0
Secure Transactions
Main Category Distribution
Threat Category Split
8.2K
Drugs4.87K — 59%
Fraud1.26K — 15%
Digital Goods1.12K — 14%
Other0.96K — 12%
Top Vendors by Volume Sold
VendorQty
4free21,000
sexman6613,000
greenleafde12,000
LucySkyDiam…11,000
Vanny311,000
Ships From Distribution
Most Purchased Products
ProductQty
20300 XANAX6,500
20200 S903 Green5,900
LSD 125ug Bulk4,900
FREE SHIP B93,900
1 Active Listing3,900
Ethical Framework

Research Ethics

Ethical Use Statement

This project was developed exclusively for educational and cybersecurity research purposes under academic supervision at Cairo University's Faculty of Computers and Artificial Intelligence. All data collection followed established ethical guidelines for security research. NeoSilk is a research instrument designed to help cybersecurity professionals gain proactive threat intelligence — its sole purpose is improving defensive capabilities. The CAPTCHA dataset and framework are released to advance the community's ability to study and counter darknet threats.