Methodology
Transparent rules for source leanings, Story clustering, spectrum bars, and Blindspot detection.
சுதேசமித்திரன் (Swadesamitran) is a civic transparency project. Our ratings describe sources, not individual journalists or readers. They are editorial aids, not legal or moral verdicts.
Source leanings
Each publisher in the registry carries a Tamil-adapted leaning label (for example DMK-leaning, AIADMK-leaning, BJP-leaning, centrist/independent, corporate ownership, regional/community). Classifications combine public bias databases, ownership data, Tamil fact-checking references, and manual curation. Every source also has public methodology notes on the Sources page.
New sources start as unknown until reviewed. Disputes can be filed via the
grievance form (bias / methodology).
Stories and clustering
- Articles from many outlets are normalized (URL decoding, fingerprints, deduplication).
- Related coverage is grouped into a single Story with a canonical title.
- Clustering is deterministic in the Python pipeline; the Worker serves the latest batch from D1.
- We do not store full publisher article text — only titles, URLs, dates, and derived metadata.
Bias spectrum
For each Story, we count how many linked articles come from each leaning bucket in the registry. Percentages appear in the spectrum bar on cards and detail views. The bar reflects who published, not automated sentiment analysis of article bodies (MVP).
Blindspots
A Blindspot highlights when a Story has meaningful coverage from some leanings but silence from others — for example strong DMK-leaning and AIADMK-leaning coverage with no BJP-leaning articles. Severity considers source count, momentum, and category. The dedicated Blindspots feed ranks these for review.
Summaries and NLP
Short Story summaries use extractive methods (TextRank via sumy) on available titles/snippets. Optional LLM enrichment is gated by cost caps and Tamil validation. Quotes and categories are heuristic until a stronger Tamil model path is approved.
Personalization (beta)
The “For You” tab uses first-party signals only: recency, momentum, category affinity, and leaning
diversity. Signed-in users can merge session history. Hybrid ranking (Workers AI embeddings) may
reorder when vectors exist — see /metrics on the API for embedding coverage.
Mixed media
YouTube Shorts are embedded via official players (no re-hosting). First-party images and video use Cloudflare R2 when enabled. See the embed policy.
Updates and versioning
Taxonomy version 1.0 (May 2026). Changes are logged in the repository
(docs/BIAS_TAXONOMY.md, data/sources.seed.json) and reflected in the public registry.