Aegis Pulse
Features Architecture Pricing Blog

How PyPI download metrics are calculated

Last updated: 2026-06-01

This page explains where the numbers on the PyPI tab come from, what each metric measures, why the raw total is larger than you might expect, and how fresh the data is.

Downloads
1.28M ▲ 12%
Avg / Day
91.8K ▲ 12%
Avg / Week
642K ▲ 12%
Avg / Month
2.75M ▲ 12%
Bot %
63% ▼ 5%

Example figures. Avg / Week = Avg / Day × 7; Avg / Month = Avg / Day × 30.

Downloads

A download is a single HTTP request to PyPI for a package file - a wheel, sdist, or egg. The number is not deduplicated by user or IP address; every request counts, even if the same machine downloaded the same version twice.

Because the public PyPI dataset records raw request counts, the total includes traffic from mirror clients, CI pipelines, scrapers, and automated tooling alongside real human installs. The Filtered and Human series subtract these out - see How traffic is filtered.

The Downloads card shows the sum for the selected date range. Changing the range (7d / 1m / 3m / etc.) changes both the current total and the prior period used for the percentage chip.

Daily, weekly, and monthly averages

Avg / Day is total downloads divided by the number of days in the selected range that had any recorded data. Days with no data row (e.g. the very latest day if the upstream hasn't updated yet) are not counted in the denominator, so a partial day at the end of the window doesn't drag the average down.

Avg / Week and Avg / Month are simple projections of that daily rate:

  • Avg / Week = Avg / Day × 7
  • Avg / Month = Avg / Day × 30

These are rates, not calendar sums. A 7-day window with one busy day (a release spike) produces a high weekly rate even though you only observed one week of data. Use them for "if this pace holds, what would a week/month look like" - not as an assertion that you had that many downloads in a specific calendar week or month.

The percentage chip on Avg / Day, Avg / Week, and Avg / Month all compare the same thing: this period's daily average against the prior period's daily average. All three chips show the same percentage.

Bot share

Bot % is the share of downloads that came from mirrors and automated installers rather than people:

Bot % = (1 - human downloads / total downloads) × 100

A lower number is better - it means more of your downloads are real installs. The percentage chip uses inverted color logic: a drop in Bot % is shown in green (improvement), a rise is shown in red.

"Human" downloads subtract the known automated installer categories listed in the filtering section. The definition is stricter than PyPI's public "without mirrors" figure - it removes mirror clients and additional automated user-agents (empty user-agents, requests, OS-level tools, etc.).

How traffic is filtered

The Downloads chart shows three series, each progressively more filtered:

  • Total - every recorded request. Includes mirror clients, CI pipelines, scrapers, and all automated tooling.
  • Filtered - excludes the four canonical public mirror clients (bandersnatch, z3c.pypimirror, Artifactory, devpi). This matches the "without mirrors" figure published by pypistats.org. CI and other automation still appears in this series.
  • Human - the strictest cut. Removes the same mirror clients as Filtered plus additional automated installer signatures: empty user-agents, requests library downloads, OS-level package managers, and similar tooling. Note: CI pipelines running pip install use a normal pip user-agent and still count here. "Human" means "not a known mirror or scraper," not "a person at a keyboard."

Illustrative example. The gap between Total (teal) and Filtered (cyan) is mirror traffic. The gap between Filtered and Human (indigo) is other known automated installers.

Even the Human series is dominated by CI pipelines on popular packages. A package with 100K "human" downloads per day almost certainly has far fewer than 100K distinct people installing it - the bulk is build servers running pip install on every push.

The filtering sets are applied at read time, not stored. Updating the excluded-installer list instantly reshapes every chart and metric for every project with no migration.

Mirrors and syncing (bandersnatch)

bandersnatch is the official PyPI mirroring tool. Mirror operators run it on a schedule to keep a local copy of the full package index in sync. Each sync re-downloads package files in bulk, generating a large number of requests that have nothing to do with users installing your package.

Illustrative breakdown. Mirror traffic (bandersnatch) dominates on less popular packages.

Mirror clients sync every package on a schedule, regardless of popularity. For a niche package with few real installs, mirror traffic often represents the majority of the raw total - which is why Bot % tends to be higher for less popular packages and lower for widely-used ones.

Filtered removes bandersnatch and the three other canonical mirror clients. Human removes those plus additional automated user-agents. Neither series is "correct" - they answer different questions: Filtered approximates what PyPI itself reports publicly; Human subtracts more known automation on top of that.

Data freshness

PyPI download data is sourced from the PyPI BigQuery public dataset, with fallback to a secondary mirror when BigQuery is unavailable. The dataset is typically one day behind - yesterday's numbers arrive today. Occasional upstream delays can push the lag to a few days; this is temporary, not a sign of a problem.

The freshness indicator (a small colored dot near the date range picker on the package search page) shows how current the data is:

  • Teal dot - data is 0-1 days old (normal)
  • Amber dot - data is 2-3 days old (minor lag)
  • Red dot - data is 4+ days old (upstream delay)

Because the source records UTC-day totals, all dates shown in charts and tooltips are UTC. The day boundary is midnight UTC, not the user's local midnight.