Telegram Ads Spy
AdsChannelsAdvertisersNiches
Stats
Sign in
Telegram Ads Spy

The Telegram Ads archive.

live
84K
creatives
45K
advertisers
5.6M
channels in pool

Browse

  • Ads
  • Archive
  • Channels
  • Advertisers
  • Trending

Categories

  • Niches
  • Countries
  • Regions
  • Cashier apps
  • Mini-apps
  • Channel stats

Resources

  • About
  • Pricing
  • Public API
  • Submit a channel
  • Blog
  • Wiki
  • Glossary
  • FAQ

G.Media product family

We craft what deserves attention and trust.

See full family →
Ad intelligenceLIVE
Telegram Ads Spy
tgadsspy.com

Public archive of every ad on Telegram. Search, alerts, analytics.

84Kcompetitor creatives

HubNEW
Telegram Ads Hub
tgads.net

Where Telegram advertisers learn, decide, launch.

45Kadvertisers to study

CuratedNEW
Best Mini Apps
bestapps.tg

Best Telegram mini-apps · handpicked monthly by the G.Media team.

26niches · handpicked

LeaderboardNEW
Top Mini Apps
topapps.tg

Public daily leaderboard of Telegram mini-apps by active users.

1.4Kin the live ranking

Social · TelegramLIVE
Wall
wall.tg

Social Telegram Mini App for creators · powered by TON.

120,039users

G.Media·DMCC, JLT, Dubai·[email protected]·G MEDIA PARTNERS EUROPE d.o.o.

© 2026 Telegram Ads Spy.
PrivacyTermsDMCA
for developerssitemap.xmlrss.xmlllms.txtopenapi.json
Home/Blog/How tgadsspy Works: Technical Deep Dive into the Classifier and Ingest Pipeline
2026-04-21·11 min read·by tgadsspy research

How tgadsspy Works: Technical Deep Dive into the Classifier and Ingest Pipeline

Technical documentation of the tgadsspy data pipeline — gramesh API integration, niche classification architecture (regex + weights), geo classifier 3-step pipeline, media mirror SHA256 content-addressed storage, and aggregation caching. For developers, researchers and compliance teams.

#methodology#technical#pipeline#classifier#osint
TelegramX

Contents

  1. Purpose and audience
  2. 1. Data source: gramesh API
  3. 2. Channel pool and tiering
  4. 3. Creative deduplication
  5. 4. Niche classification
  6. 5. Geo classification
  7. 6. Media mirror
  8. 7. Advertiser extraction
  9. 8. Aggregation caching
  10. 9. Discover-similar BFS spider
  11. 10. Data completeness and known gaps
  12. 11. Data licensing and citation
  13. Related documentation

Purpose and audience#

This document is a technical deep-dive into how Telegram Ads Spy collects, classifies, and serves Telegram ad data. It supplements the overview at /about with implementation-level detail. Primary audience: developers building on the public API, researchers who need to understand data provenance for citation, and compliance teams assessing the system's OSINT methodology.

A shorter overview methodology is published at /about. This document covers the how in depth: API integration, classification logic, storage architecture, and caching.


1. Data source: gramesh API#

All ad data in Telegram Ads Spy originates from a single source: the gramesh HTTP API at api.wall.systems/gramesh. gramesh is a proxy/aggregation layer over Telegram's MTProto protocol, exposing REST endpoints that return structured JSON. Telegram Ads Spy uses gramesh exclusively — no direct MTProto implementation, no scraping.

Key endpoints used#

POST /channels.getSponsored

  • Input: { channel_id: <int>, dc_id: <int> }
  • Output: array of sponsored message objects for the given channel, in the given Telegram data-centre region
  • Includes: title, text, ctaUrl, ctaLabel, accentColor, mediaType, mediaUrl, ctaTargetUsername
  • Media URLs: signed, 1-hour TTL (/files/photo/<id>?sig=&exp=)
  • Rate limit: 10 RPS against gramesh; Telegram Ads Spy throttles ingest cron to stay within this

POST /channels.getInfo

  • Input: { username: <string> | id: <int> }
  • Output: channel metadata — id, title, username, description, memberCount, avatarUrl
  • Used by the resolver-cron to hydrate Channel placeholders

POST /contacts.search

  • Input: { q: <string> }
  • Output: array of channel objects matching the query
  • Used by the discover-cron with 47 rotating seed queries
  • Used by /api/v1/submit when a user submits a t.me URL

POST /channels.getSimilar

  • Input: { channel_id: <int> }
  • Output: similar channels as recommended by Telegram's internal similarity model
  • Used by the discover-similar in-process BFS spider

Fan-out by region#

Telegram has multiple data-centre clusters (DC1–DC5). Sponsored messages are region-specific: a channel in DC2 may show different sponsored messages than the same channel viewed from DC4. Telegram Ads Spy performs multi-region ingest for high-tier channels: the same channel is queried against multiple dc_id values to increase creative coverage. This is the mechanism behind "seen in multiple geos" — different DCs serve different advertiser targeting.


2. Channel pool and tiering#

The ingest pipeline operates on a pool of ~9,000+ channels. Channels are tiered by member count, which determines ingest frequency:

Tier Member count Ingest interval Rationale
S (Super) 1M+ 30 minutes High ad density, fast creative turnover
A 100k–1M 2 hours Active market, moderate turnover
B 10k–100k 8 hours Adequate for daily coverage
C 1k–10k 72 hours Low ad density, spot-checking sufficient
Placeholder Unknown Not ingested Awaiting resolver to hydrate

The resolver-cron (every 15 min) picks up placeholder channels — those submitted via seed batches, user submission, or discovery — and calls /channels.getInfo to populate memberCount, title, and avatarUrl. Once resolved, the channel is tiered and added to the ingest queue.


3. Creative deduplication#

When the ingest cron calls /channels.getSponsored and receives creatives, deduplication happens before any storage:

creative fingerprint = sha256(title + text + ctaUrl + ctaLabel + accentColor)

A creative is considered new only if its fingerprint hasn't been seen before. This means:

  • The same ad running in 100 channels produces one AdCreative record (not 100)
  • Each appearance in a channel produces one SponsoredImpression record pointing to the AdCreative
  • Minor variations (different accentColor with same text) are treated as different creatives — intentional, as colour variants are used in A/B testing

Creative lifecycle: A creative is first seen when its fingerprint first appears. It's considered "active" while it continues appearing in new ingest ticks. AdCreative.lastSeenAt is updated on each new impression. When a creative stops appearing, it transitions to inactive naturally — no explicit deactivation signal from gramesh.


4. Niche classification#

Every AdCreative is classified into one or more niches. The classifier is a weighted keyword-plus-brand-detection system implemented in lib/niche.ts.

Architecture#

The classifier operates on the concatenated text of title + text + ctaUrl. It runs two passes:

Pass 1: Brand detection A lookup against a dictionary of ~400 known advertiser brands, mapped to their primary niche. Example entries:

  • binance → crypto
  • 1xbet → gambling
  • nordvpn → vpn
  • exness → forex
  • dream11 → betting

Brand matches carry high weight (w=10) and dominate the classification when present. The brand dictionary is maintained in lib/niche-brands.ts and updated as new advertisers are identified.

Pass 2: Keyword scoring For each niche, a set of regex patterns is evaluated against the creative text. Each match adds a positive weight to that niche's score. Patterns are designed to avoid false positives through:

  • Specificity: "USDT P2P exchange" is a crypto signal; "exchange" alone is too generic
  • Negation rules: Some patterns carry negative weight to suppress false positives (e.g., "slot" appearing in a tech context)
  • Language variants: Patterns include Arabic, Russian, Indonesian, Thai and other language variants for the major niches

Score threshold: A niche is assigned if its score exceeds a minimum threshold. Multiple niches can be assigned — a creative can be gambling + crypto if it's a crypto-casino (e.g., BC.Game).

Niche taxonomy#

Current top-level niches (as of April 2026):

crypto, trading, forex, fintech, gambling (casino), betting (sports), vpn, dating, news, education, gaming, retail, tech, bots, adult, signals, remittance, ai, other

Sub-niches are assigned within the niche-meta.ts taxonomy for display grouping. The classification schema is append-only — niches are never removed, only new ones added.

Classification accuracy#

We validate accuracy through:

  1. Spot-check sampling: periodic manual review of 50 random creatives per niche
  2. Brand-miss audit: if a known brand is misclassified, the brand dictionary is updated
  3. False positive rate: estimated at ~4% based on last sampling round (January 2026)

Known limitations:

  • Short text-only creatives with no brand signal have ~12% misclassification rate
  • New brands not yet in the dictionary are initially classified by keyword only
  • Multilingual edge cases (mixed script creatives) occasionally confuse the keyword scorer

5. Geo classification#

Every creative receives a geo assignment (ISO 3166-1 alpha-2 country code or regional code). The geo classifier is a 3-step cascade:

Step 1: CTA URL TLD analysis The CTA URL's top-level domain is parsed:

  • .ru → RU
  • .com.br, .br → BR
  • .pk → PK
  • .sa → SA
  • .eu alone is ambiguous (treated as EU rather than a specific country)

Country-code TLDs provide high-confidence geo signals. If Step 1 produces a non-ambiguous result, the cascade stops.

Step 2: Language detection on creative text If Step 1 is ambiguous (e.g., .com domain), the creative text language is detected using Unicode block analysis and a fastText-family language identifier:

  • Arabic script → AR (regional)
  • Cyrillic → RU/CIS (default RU unless Step 3 disambiguates)
  • Devanagari → HI (India likely)
  • Hangul → KR
  • Hiragana/Katakana → JP
  • Thai → TH
  • Bengali → BD
  • Urdu (Arabic script + language model) → PK

Step 3: Channel-level geo aggregation The channel in which the creative appeared has its own geo signal (from language, name, description, and prior ingest history). When a creative appears in channels with consistent geo signals, the creative inherits that geo. For example, a .com domain English creative that appears predominantly in Russian-language channels is classified as RU.

Multiple geo assignment: A creative can have multiple geo codes when it demonstrably targets multiple markets (common for Binance global campaigns). In the UI, multi-geo creatives appear in all relevant geo filter segments.


6. Media mirror#

gramesh provides signed URLs to Telegram's media CDN with a 1-hour TTL. These URLs expire and become inaccessible, making them unsuitable for permanent archiving.

The Telegram Ads Spy-media-mirror cron (runs every 5 minutes) processes newly ingested creatives with unmirrored media:

  1. Fetch: HTTP GET to the gramesh-signed media URL
  2. Hash: SHA-256 of the raw binary content
  3. Store: Write to /var/www/tgadsspy-media/<prefix>/<sha256-hex>.<ext> on the server
    • prefix = first 2 hex characters of the SHA256 (256-bucket directory sharding)
    • ext = inferred from Content-Type header
  4. Update: AdCreative.mediaUrl is updated from the gramesh URL to /m/<prefix>/<sha256-hex>.<ext>

The nginx alias serves /m/ paths from the media storage directory with:

Cache-Control: public, immutable, max-age=31536000

One-year immutable cache — content-addressed storage guarantees the hash never changes.

Deduplication: Two different ads using the same banner image produce a single stored file (same hash). The file is stored once; both creatives reference the same /m/... URL.

Fallback: For creatives where gramesh doesn't return a banner (text-only or channel-pic format), a secondary nightly cron (Telegram Ads Spy-creative-media) fetches og:image from the CTA URL domain as a fallback thumbnail. A third fallback uses the target channel's avatar URL.


7. Advertiser extraction#

Advertiser identity is derived from the CTA URL, not from any Telegram-provided field:

Domain advertiser: if ctaUrl is an external URL, the registered domain (e.g., binance.com from https://www.binance.com/en/referral?...) becomes an Advertiser record with type: domain. The full URL (including UTM parameters and referral codes) is preserved on the AdCreative record for competitive analysis.

Telegram advertiser: if ctaUrl is a t.me/<username> URL, the username becomes an Advertiser record with type: telegram. The channel is also added to the discovery queue if not already tracked.

Advertiser slug: a normalized version of the domain or username — lowercase, special characters stripped, used in /advertisers/<slug> URLs. Slugs are stable once assigned.

Alias merging: The same entity may advertise from multiple domains (e.g., binance.com and binance.cc). Manual alias merges are supported in the admin interface, consolidating creative counts under a canonical advertiser record.


8. Aggregation caching#

Two Redis keys are pre-warmed every minute by the Telegram Ads Spy-warm-cache cron:

Telegram Ads Spy:home:agg (TTL 120s) Contains: total creative count, total advertiser count, total channel count, top niches (name + count), recent 20 creatives (thumbnail + title), today's new creative count, today's new advertiser count. Used by the home page dashboard and the /api/v1/stats endpoint. Cold miss on this key would cause the home page to hit the database directly — the warm-cache cron ensures this never happens in production.

Telegram Ads Spy:pool:stats (TTL 600s) Contains: channel count by tier, total sponsored-eligible count, countries represented. Used by the OG image generator (the home page's dynamic Open Graph image includes live stats — must be fast to serve in the 100ms og:image timeout).

Per-entity caches: Individual channel, advertiser, and niche pages cache their aggregated stats at /api/v1/channels/<id>, etc., with TTL of 60–300s depending on update frequency expectations.


9. Discover-similar BFS spider#

In addition to the seed-based discovery cron, Telegram Ads Spy runs a continuous BFS (breadth-first search) spider using Telegram's "similar channels" graph:

  • Anchor selection: Channel.lastSimilarCheckAt IS NULL OR < NOW()-1h — channels that haven't been checked for similarities in the past hour, ordered ascending (oldest check first, new channels prioritised by default NULL value)
  • Fan-out: 30 channels per tick (every minute in-process)
  • gramesh call: POST /channels.getSimilar { channel_id } → returns 10–20 similar channels
  • New channel handling: similar channels not in the pool are added as placeholders → resolver picks them up in the next 15-minute cycle
  • Bot filter: channels with memberCount < 100 or names matching bot-name patterns are discarded
  • Rate limiting: anchor cooldown of 1 hour prevents the same channel from being re-spidered more than once per hour, regardless of how many other channels reference it as similar

This recursive BFS, combined with the multi-query discover-cron, is how the channel pool has grown from seed lists of a few hundred to 9,000+ channels organically.


10. Data completeness and known gaps#

What we capture well:

  • EUR-cabinet sponsored messages on the Telegram Ads Platform (high coverage via multi-region ingest)
  • TON-paid owner placements when they appear as channel posts in channels we track (partial coverage — not all channels are tracked)

Known gaps:

  • Group-level advertising: Telegram groups and supergroups are not indexed (sponsored messages only run in channels; TON-paid posts in groups are outside our scope)
  • Bot-to-user messages: Advertisers who send promotional messages directly to users via bots are not captured — we only see channel-level placements
  • Inline bot results: Telegram's inline query ads (rare) are not captured
  • Very new channels: Channels created recently that haven't been discovered by any pipeline path may be missed for days or weeks
  • Low-tier channels (< 1k subscribers): Not eligible for EUR-cabinet sponsored messages; TON placements in very small channels are not in our scope

Coverage estimate: For EUR-cabinet sponsored messages, coverage is estimated at 65–75% of all unique creatives that ran in the period. The gap represents channels not yet in our pool that received sponsored message deliveries. This estimate is based on cross-referencing our creative counts against gramesh's own aggregate stats for channels we do track.


11. Data licensing and citation#

All data in Telegram Ads Spy's archive is released under CC-BY-4.0. You may use, republish, and analyse it for any purpose with attribution:

Source: tgadsspy.com · tgadsspy.com/blog/tgadsspy-classifier-pipeline-technical-deep-dive · CC-BY-4.0

For programmatic access: public API documentation · bulk CSV export.

For bug reports or data correction requests: open an issue at the GitHub repo or email [email protected].


Related documentation#

  • /about — non-technical methodology overview
  • Public API docs — endpoint reference for developers
  • State of Telegram Ads 2026 — what the pipeline has collected
  • Regulation Guide 2026 — how regulators can use this data

Also available in:

SpanishFrenchIndonesianItalianUkrainianArabicGermanPortugueseRussianTurkish

Cite this article

tgadsspy research (2026). How tgadsspy Works: Technical Deep Dive into the Classifier and Ingest Pipeline. tgadsspy.com. Retrieved from https://tgadsspy.com/blog/tgadsspy-classifier-pipeline-technical-deep-dive

Licensed CC-BY-4.0 — reuse allowed including commercial, attribution required.

Related research

#osint →
  • 2026-04-20

    How tgadsspy works — public methodology for Telegram ad indexing