How tgadsspy Works: Technical Deep Dive into the Classifier and Ingest Pipeline

Technical documentation of the tgadsspy data pipeline — gramesh API integration, niche classification architecture (regex + weights), geo classifier 3-step pipeline, media mirror SHA256 content-addressed storage, and aggregation caching. For developers, researchers and compliance teams.

#methodology #technical #pipeline #classifier #osint

by Telegram Ads Spy research

2026-04-2111 min read

Telegram X

Contents

Purpose and audience
1. Data source: gramesh API
2. Channel pool and tiering
3. Creative deduplication
4. Niche classification
5. Geo classification
6. Media mirror
7. Advertiser extraction
8. Aggregation caching
9. Discover-similar BFS spider
10. Data completeness and known gaps
11. Data licensing and citation
Related documentation

Purpose and audience#

This document is a technical deep-dive into how Telegram Ads Spy collects, classifies, and serves Telegram ad data. It supplements the overview at /about with implementation-level detail. Primary audience: developers building on the public API, researchers who need to understand data provenance for citation, and compliance teams assessing the system's OSINT methodology.

A shorter overview methodology is published at /about. This document covers the how in depth: API integration, classification logic, storage architecture, and caching.

1. Data source: gramesh API#

All ad data in Telegram Ads Spy originates from a single source: the gramesh HTTP API at api.wall.systems/gramesh. gramesh is a proxy/aggregation layer over Telegram's MTProto protocol, exposing REST endpoints that return structured JSON. Telegram Ads Spy uses gramesh exclusively — no direct MTProto implementation, no scraping.

Key endpoints used#

POST /channels.getSponsored

Input: { channel_id: <int>, dc_id: <int> }
Output: array of sponsored message objects for the given channel, in the given Telegram data-centre region
Includes: title, text, ctaUrl, ctaLabel, accentColor, mediaType, mediaUrl, ctaTargetUsername
Media URLs: signed, 1-hour TTL (/files/photo/<id>?sig=&exp=)
Rate limit: 10 RPS against gramesh; Telegram Ads Spy throttles ingest cron to stay within this

POST /channels.getInfo

Input: { username: <string> | id: <int> }
Output: channel metadata — id, title, username, description, memberCount, avatarUrl
Used by the resolver-cron to hydrate Channel placeholders

POST /contacts.search

Input: { q: <string> }
Output: array of channel objects matching the query
Used by the discover-cron with 47 rotating seed queries
Used by /api/v1/submit when a user submits a t.me URL

POST /channels.getSimilar

Input: { channel_id: <int> }
Output: similar channels as recommended by Telegram's internal similarity model
Used by the discover-similar in-process BFS spider

Fan-out by region#

Telegram has multiple data-centre clusters (DC1–DC5). Sponsored messages are region-specific: a channel in DC2 may show different sponsored messages than the same channel viewed from DC4. Telegram Ads Spy performs multi-region ingest for high-tier channels: the same channel is queried against multiple dc_id values to increase creative coverage. This is the mechanism behind "seen in multiple geos" — different DCs serve different advertiser targeting.

2. Channel pool and tiering#

The ingest pipeline operates on a pool of ~9,000+ channels. Channels are tiered by member count, which determines ingest frequency:

Tier	Member count	Ingest interval	Rationale
S (Super)	1M+	30 minutes	High ad density, fast creative turnover
A	100k–1M	2 hours	Active market, moderate turnover
B	10k–100k	8 hours	Adequate for daily coverage
C	1k–10k	72 hours	Low ad density, spot-checking sufficient
Placeholder	Unknown	Not ingested	Awaiting resolver to hydrate

The resolver-cron (every 15 min) picks up placeholder channels — those submitted via seed batches, user submission, or discovery — and calls /channels.getInfo to populate memberCount, title, and avatarUrl. Once resolved, the channel is tiered and added to the ingest queue.

3. Creative deduplication#

When the ingest cron calls /channels.getSponsored and receives creatives, deduplication happens before any storage:

creative fingerprint = sha256(title + text + ctaUrl + ctaLabel + accentColor)

A creative is considered new only if its fingerprint hasn't been seen before. This means:

The same ad running in 100 channels produces one AdCreative record (not 100)
Each appearance in a channel produces one SponsoredImpression record pointing to the AdCreative
Minor variations (different accentColor with same text) are treated as different creatives — intentional, as colour variants are used in A/B testing

Creative lifecycle: A creative is first seen when its fingerprint first appears. It's considered "active" while it continues appearing in new ingest ticks. AdCreative.lastSeenAt is updated on each new impression. When a creative stops appearing, it transitions to inactive naturally — no explicit deactivation signal from gramesh.

4. Niche classification#

Every AdCreative is classified into one or more niches. The classifier is a weighted keyword-plus-brand-detection system implemented in lib/niche.ts.

Architecture#

The classifier operates on the concatenated text of title + text + ctaUrl. It runs two passes:

Pass 1: Brand detection A lookup against a dictionary of ~400 known advertiser brands, mapped to their primary niche. Example entries:

binance → crypto
1xbet → gambling
nordvpn → vpn
exness → forex
dream11 → betting

Brand matches carry high weight (w=10) and dominate the classification when present. The brand dictionary is maintained in lib/niche-brands.ts and updated as new advertisers are identified.

Pass 2: Keyword scoring For each niche, a set of regex patterns is evaluated against the creative text. Each match adds a positive weight to that niche's score. Patterns are designed to avoid false positives through:

Specificity: "USDT P2P exchange" is a crypto signal; "exchange" alone is too generic
Negation rules: Some patterns carry negative weight to suppress false positives (e.g., "slot" appearing in a tech context)
Language variants: Patterns include Arabic, Russian, Indonesian, Thai and other language variants for the major niches

Score threshold: A niche is assigned if its score exceeds a minimum threshold. Multiple niches can be assigned — a creative can be gambling + crypto if it's a crypto-casino (e.g., BC.Game).

Niche taxonomy#

Current top-level niches (as of April 2026):

crypto, trading, forex, fintech, gambling (casino), betting (sports), vpn, dating, news, education, gaming, retail, tech, bots, adult, signals, remittance, ai, other

Sub-niches are assigned within the niche-meta.ts taxonomy for display grouping. The classification schema is append-only — niches are never removed, only new ones added.

Classification accuracy#

We validate accuracy through:

Spot-check sampling: periodic manual review of 50 random creatives per niche
Brand-miss audit: if a known brand is misclassified, the brand dictionary is updated
False positive rate: estimated at ~4% based on last sampling round (January 2026)

Known limitations:

Short text-only creatives with no brand signal have ~12% misclassification rate
New brands not yet in the dictionary are initially classified by keyword only
Multilingual edge cases (mixed script creatives) occasionally confuse the keyword scorer

5. Geo classification#

Every creative receives a geo assignment (ISO 3166-1 alpha-2 country code or regional code). The geo classifier is a 3-step cascade:

Step 1: CTA URL TLD analysis The CTA URL's top-level domain is parsed:

.ru → RU
.com.br, .br → BR
.pk → PK
.sa → SA
.eu alone is ambiguous (treated as EU rather than a specific country)

Country-code TLDs provide high-confidence geo signals. If Step 1 produces a non-ambiguous result, the cascade stops.

Step 2: Language detection on creative text If Step 1 is ambiguous (e.g., .com domain), the creative text language is detected using Unicode block analysis and a fastText-family language identifier:

Arabic script → AR (regional)
Cyrillic → RU/CIS (default RU unless Step 3 disambiguates)
Devanagari → HI (India likely)
Hangul → KR
Hiragana/Katakana → JP
Thai → TH
Bengali → BD
Urdu (Arabic script + language model) → PK

Step 3: Channel-level geo aggregation The channel in which the creative appeared has its own geo signal (from language, name, description, and prior ingest history). When a creative appears in channels with consistent geo signals, the creative inherits that geo. For example, a .com domain English creative that appears predominantly in Russian-language channels is classified as RU.

Multiple geo assignment: A creative can have multiple geo codes when it demonstrably targets multiple markets (common for Binance global campaigns). In the UI, multi-geo creatives appear in all relevant geo filter segments.

6. Media mirror#

gramesh provides signed URLs to Telegram's media CDN with a 1-hour TTL. These URLs expire and become inaccessible, making them unsuitable for permanent archiving.

The Telegram Ads Spy-media-mirror cron (runs every 5 minutes) processes newly ingested creatives with unmirrored media:

Fetch: HTTP GET to the gramesh-signed media URL
Hash: SHA-256 of the raw binary content
Store: Write to /var/www/tgadsspy-media/<prefix>/<sha256-hex>.<ext> on the server
- prefix = first 2 hex characters of the SHA256 (256-bucket directory sharding)
- ext = inferred from Content-Type header
Update: AdCreative.mediaUrl is updated from the gramesh URL to /m/<prefix>/<sha256-hex>.<ext>

The nginx alias serves /m/ paths from the media storage directory with:

Cache-Control: public, immutable, max-age=31536000

One-year immutable cache — content-addressed storage guarantees the hash never changes.

Deduplication: Two different ads using the same banner image produce a single stored file (same hash). The file is stored once; both creatives reference the same /m/... URL.

Fallback: For creatives where gramesh doesn't return a banner (text-only or channel-pic format), a secondary nightly cron (Telegram Ads Spy-creative-media) fetches og:image from the CTA URL domain as a fallback thumbnail. A third fallback uses the target channel's avatar URL.

7. Advertiser extraction#

Advertiser identity is derived from the CTA URL, not from any Telegram-provided field:

Domain advertiser: if ctaUrl is an external URL, the registered domain (e.g., binance.com from https://www.binance.com/en/referral?...) becomes an Advertiser record with type: domain. The full URL (including UTM parameters and referral codes) is preserved on the AdCreative record for competitive analysis.

Telegram advertiser: if ctaUrl is a t.me/<username> URL, the username becomes an Advertiser record with type: telegram. The channel is also added to the discovery queue if not already tracked.

Advertiser slug: a normalized version of the domain or username — lowercase, special characters stripped, used in /advertisers/<slug> URLs. Slugs are stable once assigned.

Alias merging: The same entity may advertise from multiple domains (e.g., binance.com and binance.cc). Manual alias merges are supported in the admin interface, consolidating creative counts under a canonical advertiser record.

8. Aggregation caching#

Two Redis keys are pre-warmed every minute by the Telegram Ads Spy-warm-cache cron:

Telegram Ads Spy:home:agg (TTL 120s) Contains: total creative count, total advertiser count, total channel count, top niches (name + count), recent 20 creatives (thumbnail + title), today's new creative count, today's new advertiser count. Used by the home page dashboard and the /api/v1/stats endpoint. Cold miss on this key would cause the home page to hit the database directly — the warm-cache cron ensures this never happens in production.

Telegram Ads Spy:pool:stats (TTL 600s) Contains: channel count by tier, total sponsored-eligible count, countries represented. Used by the OG image generator (the home page's dynamic Open Graph image includes live stats — must be fast to serve in the 100ms og:image timeout).

Per-entity caches: Individual channel, advertiser, and niche pages cache their aggregated stats at /api/v1/channels/<id>, etc., with TTL of 60–300s depending on update frequency expectations.

9. Discover-similar BFS spider#

In addition to the seed-based discovery cron, Telegram Ads Spy runs a continuous BFS (breadth-first search) spider using Telegram's "similar channels" graph:

Anchor selection: Channel.lastSimilarCheckAt IS NULL OR < NOW()-1h — channels that haven't been checked for similarities in the past hour, ordered ascending (oldest check first, new channels prioritised by default NULL value)
Fan-out: 30 channels per tick (every minute in-process)
gramesh call: POST /channels.getSimilar { channel_id } → returns 10–20 similar channels
New channel handling: similar channels not in the pool are added as placeholders → resolver picks them up in the next 15-minute cycle
Bot filter: channels with memberCount < 100 or names matching bot-name patterns are discarded
Rate limiting: anchor cooldown of 1 hour prevents the same channel from being re-spidered more than once per hour, regardless of how many other channels reference it as similar

This recursive BFS, combined with the multi-query discover-cron, is how the channel pool has grown from seed lists of a few hundred to 9,000+ channels organically.

10. Data completeness and known gaps#

What we capture well:

EUR-cabinet sponsored messages on the Telegram Ads Platform (high coverage via multi-region ingest)
TON-paid owner placements when they appear as channel posts in channels we track (partial coverage — not all channels are tracked)

Known gaps:

Group-level advertising: Telegram groups and supergroups are not indexed (sponsored messages only run in channels; TON-paid posts in groups are outside our scope)
Bot-to-user messages: Advertisers who send promotional messages directly to users via bots are not captured — we only see channel-level placements
Inline bot results: Telegram's inline query ads (rare) are not captured
Very new channels: Channels created recently that haven't been discovered by any pipeline path may be missed for days or weeks
Low-tier channels (< 1k subscribers): Not eligible for EUR-cabinet sponsored messages; TON placements in very small channels are not in our scope

Coverage estimate: For EUR-cabinet sponsored messages, coverage is estimated at 65–75% of all unique creatives that ran in the period. The gap represents channels not yet in our pool that received sponsored message deliveries. This estimate is based on cross-referencing our creative counts against gramesh's own aggregate stats for channels we do track.

11. Data licensing and citation#

All data in Telegram Ads Spy's archive is released under CC-BY-4.0. You may use, republish, and analyse it for any purpose with attribution:

Source: tgadsspy.com · tgadsspy.com/blog/tgadsspy-classifier-pipeline-technical-deep-dive · CC-BY-4.0

For programmatic access: public API documentation · bulk CSV export.

For bug reports or data correction requests: open an issue at the GitHub repo or email [email protected].

/about — non-technical methodology overview
Public API docs — endpoint reference for developers
State of Telegram Ads 2026 — what the pipeline has collected
Regulation Guide 2026 — how regulators can use this data

Also available in:

Spanish French Indonesian Italian Ukrainian Arabic German Portuguese Russian Turkish

Cite this article

tgadsspy research (2026). How tgadsspy Works: Technical Deep Dive into the Classifier and Ingest Pipeline. tgadsspy.com. Retrieved from https://tgadsspy.com/blog/tgadsspy-classifier-pipeline-technical-deep-dive

Licensed CC-BY-4.0 — reuse allowed including commercial, attribution required.

Related research

#osint →

2026-04-20
How tgadsspy works — public methodology for Telegram ad indexing

How tgadsspy Works: Technical Deep Dive into the Classifier and Ingest Pipeline

#methodology #technical #pipeline #classifier #osint

by Telegram Ads Spy research

2026-04-2111 min read

Telegram X

Contents

Purpose and audience
1. Data source: gramesh API
2. Channel pool and tiering
3. Creative deduplication
4. Niche classification
5. Geo classification
6. Media mirror
7. Advertiser extraction
8. Aggregation caching
9. Discover-similar BFS spider
10. Data completeness and known gaps
11. Data licensing and citation
Related documentation

Purpose and audience#

A shorter overview methodology is published at /about. This document covers the how in depth: API integration, classification logic, storage architecture, and caching.

1. Data source: gramesh API#

Key endpoints used#

POST /channels.getSponsored

Input: { channel_id: <int>, dc_id: <int> }
Output: array of sponsored message objects for the given channel, in the given Telegram data-centre region
Includes: title, text, ctaUrl, ctaLabel, accentColor, mediaType, mediaUrl, ctaTargetUsername
Media URLs: signed, 1-hour TTL (/files/photo/<id>?sig=&exp=)
Rate limit: 10 RPS against gramesh; Telegram Ads Spy throttles ingest cron to stay within this

POST /channels.getInfo

Input: { username: <string> | id: <int> }
Output: channel metadata — id, title, username, description, memberCount, avatarUrl
Used by the resolver-cron to hydrate Channel placeholders

POST /contacts.search

Input: { q: <string> }
Output: array of channel objects matching the query
Used by the discover-cron with 47 rotating seed queries
Used by /api/v1/submit when a user submits a t.me URL

POST /channels.getSimilar

Input: { channel_id: <int> }
Output: similar channels as recommended by Telegram's internal similarity model
Used by the discover-similar in-process BFS spider

Fan-out by region#

2. Channel pool and tiering#

The ingest pipeline operates on a pool of ~9,000+ channels. Channels are tiered by member count, which determines ingest frequency:

Tier	Member count	Ingest interval	Rationale
S (Super)	1M+	30 minutes	High ad density, fast creative turnover
A	100k–1M	2 hours	Active market, moderate turnover
B	10k–100k	8 hours	Adequate for daily coverage
C	1k–10k	72 hours	Low ad density, spot-checking sufficient
Placeholder	Unknown	Not ingested	Awaiting resolver to hydrate

3. Creative deduplication#

When the ingest cron calls /channels.getSponsored and receives creatives, deduplication happens before any storage:

creative fingerprint = sha256(title + text + ctaUrl + ctaLabel + accentColor)

A creative is considered new only if its fingerprint hasn't been seen before. This means:

The same ad running in 100 channels produces one AdCreative record (not 100)
Each appearance in a channel produces one SponsoredImpression record pointing to the AdCreative
Minor variations (different accentColor with same text) are treated as different creatives — intentional, as colour variants are used in A/B testing

4. Niche classification#

Every AdCreative is classified into one or more niches. The classifier is a weighted keyword-plus-brand-detection system implemented in lib/niche.ts.

Architecture#

The classifier operates on the concatenated text of title + text + ctaUrl. It runs two passes:

Pass 1: Brand detection A lookup against a dictionary of ~400 known advertiser brands, mapped to their primary niche. Example entries:

binance → crypto
1xbet → gambling
nordvpn → vpn
exness → forex
dream11 → betting

Brand matches carry high weight (w=10) and dominate the classification when present. The brand dictionary is maintained in lib/niche-brands.ts and updated as new advertisers are identified.

Specificity: "USDT P2P exchange" is a crypto signal; "exchange" alone is too generic
Negation rules: Some patterns carry negative weight to suppress false positives (e.g., "slot" appearing in a tech context)
Language variants: Patterns include Arabic, Russian, Indonesian, Thai and other language variants for the major niches

Score threshold: A niche is assigned if its score exceeds a minimum threshold. Multiple niches can be assigned — a creative can be gambling + crypto if it's a crypto-casino (e.g., BC.Game).

Niche taxonomy#

Current top-level niches (as of April 2026):

Sub-niches are assigned within the niche-meta.ts taxonomy for display grouping. The classification schema is append-only — niches are never removed, only new ones added.

Classification accuracy#

We validate accuracy through:

Spot-check sampling: periodic manual review of 50 random creatives per niche
Brand-miss audit: if a known brand is misclassified, the brand dictionary is updated
False positive rate: estimated at ~4% based on last sampling round (January 2026)

Known limitations:

Short text-only creatives with no brand signal have ~12% misclassification rate
New brands not yet in the dictionary are initially classified by keyword only
Multilingual edge cases (mixed script creatives) occasionally confuse the keyword scorer

5. Geo classification#

Every creative receives a geo assignment (ISO 3166-1 alpha-2 country code or regional code). The geo classifier is a 3-step cascade:

Step 1: CTA URL TLD analysis The CTA URL's top-level domain is parsed:

.ru → RU
.com.br, .br → BR
.pk → PK
.sa → SA
.eu alone is ambiguous (treated as EU rather than a specific country)

Country-code TLDs provide high-confidence geo signals. If Step 1 produces a non-ambiguous result, the cascade stops.

Arabic script → AR (regional)
Cyrillic → RU/CIS (default RU unless Step 3 disambiguates)
Devanagari → HI (India likely)
Hangul → KR
Hiragana/Katakana → JP
Thai → TH
Bengali → BD
Urdu (Arabic script + language model) → PK

6. Media mirror#

gramesh provides signed URLs to Telegram's media CDN with a 1-hour TTL. These URLs expire and become inaccessible, making them unsuitable for permanent archiving.

The Telegram Ads Spy-media-mirror cron (runs every 5 minutes) processes newly ingested creatives with unmirrored media:

Fetch: HTTP GET to the gramesh-signed media URL
Hash: SHA-256 of the raw binary content
Store: Write to /var/www/tgadsspy-media/<prefix>/<sha256-hex>.<ext> on the server
- prefix = first 2 hex characters of the SHA256 (256-bucket directory sharding)
- ext = inferred from Content-Type header
Update: AdCreative.mediaUrl is updated from the gramesh URL to /m/<prefix>/<sha256-hex>.<ext>

The nginx alias serves /m/ paths from the media storage directory with:

Cache-Control: public, immutable, max-age=31536000

One-year immutable cache — content-addressed storage guarantees the hash never changes.

Deduplication: Two different ads using the same banner image produce a single stored file (same hash). The file is stored once; both creatives reference the same /m/... URL.

7. Advertiser extraction#

Advertiser identity is derived from the CTA URL, not from any Telegram-provided field:

Advertiser slug: a normalized version of the domain or username — lowercase, special characters stripped, used in /advertisers/<slug> URLs. Slugs are stable once assigned.

8. Aggregation caching#

Two Redis keys are pre-warmed every minute by the Telegram Ads Spy-warm-cache cron:

Per-entity caches: Individual channel, advertiser, and niche pages cache their aggregated stats at /api/v1/channels/<id>, etc., with TTL of 60–300s depending on update frequency expectations.

9. Discover-similar BFS spider#

In addition to the seed-based discovery cron, Telegram Ads Spy runs a continuous BFS (breadth-first search) spider using Telegram's "similar channels" graph:

Anchor selection: Channel.lastSimilarCheckAt IS NULL OR < NOW()-1h — channels that haven't been checked for similarities in the past hour, ordered ascending (oldest check first, new channels prioritised by default NULL value)
Fan-out: 30 channels per tick (every minute in-process)
gramesh call: POST /channels.getSimilar { channel_id } → returns 10–20 similar channels
New channel handling: similar channels not in the pool are added as placeholders → resolver picks them up in the next 15-minute cycle
Bot filter: channels with memberCount < 100 or names matching bot-name patterns are discarded
Rate limiting: anchor cooldown of 1 hour prevents the same channel from being re-spidered more than once per hour, regardless of how many other channels reference it as similar

This recursive BFS, combined with the multi-query discover-cron, is how the channel pool has grown from seed lists of a few hundred to 9,000+ channels organically.

10. Data completeness and known gaps#

What we capture well:

EUR-cabinet sponsored messages on the Telegram Ads Platform (high coverage via multi-region ingest)
TON-paid owner placements when they appear as channel posts in channels we track (partial coverage — not all channels are tracked)

Known gaps:

Group-level advertising: Telegram groups and supergroups are not indexed (sponsored messages only run in channels; TON-paid posts in groups are outside our scope)
Bot-to-user messages: Advertisers who send promotional messages directly to users via bots are not captured — we only see channel-level placements
Inline bot results: Telegram's inline query ads (rare) are not captured
Very new channels: Channels created recently that haven't been discovered by any pipeline path may be missed for days or weeks
Low-tier channels (< 1k subscribers): Not eligible for EUR-cabinet sponsored messages; TON placements in very small channels are not in our scope

11. Data licensing and citation#

All data in Telegram Ads Spy's archive is released under CC-BY-4.0. You may use, republish, and analyse it for any purpose with attribution:

Source: tgadsspy.com · tgadsspy.com/blog/tgadsspy-classifier-pipeline-technical-deep-dive · CC-BY-4.0

For programmatic access: public API documentation · bulk CSV export.

For bug reports or data correction requests: open an issue at the GitHub repo or email [email protected].

/about — non-technical methodology overview
Public API docs — endpoint reference for developers
State of Telegram Ads 2026 — what the pipeline has collected
Regulation Guide 2026 — how regulators can use this data

Also available in:

Spanish French Indonesian Italian Ukrainian Arabic German Portuguese Russian Turkish

Cite this article

tgadsspy research (2026). How tgadsspy Works: Technical Deep Dive into the Classifier and Ingest Pipeline. tgadsspy.com. Retrieved from https://tgadsspy.com/blog/tgadsspy-classifier-pipeline-technical-deep-dive

Licensed CC-BY-4.0 — reuse allowed including commercial, attribution required.

Related research

#osint →

2026-04-20
How tgadsspy works — public methodology for Telegram ad indexing

Purpose and audience#

1. Data source: gramesh API#

Key endpoints used#

Fan-out by region#

2. Channel pool and tiering#

3. Creative deduplication#

4. Niche classification#

Architecture#

Niche taxonomy#

Classification accuracy#

5. Geo classification#

6. Media mirror#

7. Advertiser extraction#

8. Aggregation caching#

9. Discover-similar BFS spider#

10. Data completeness and known gaps#

11. Data licensing and citation#

Related documentation#

Cite this article

Related research

How tgadsspy works — public methodology for Telegram ad indexing

Purpose and audience#

1. Data source: gramesh API#

Key endpoints used#

Fan-out by region#

2. Channel pool and tiering#

3. Creative deduplication#

4. Niche classification#

Architecture#

Niche taxonomy#

Classification accuracy#

5. Geo classification#

6. Media mirror#

7. Advertiser extraction#

8. Aggregation caching#

9. Discover-similar BFS spider#

10. Data completeness and known gaps#

11. Data licensing and citation#

Related documentation#

Cite this article

Related research

How tgadsspy works — public methodology for Telegram ad indexing