Monthly cron schedule

When every tier should fire across a 30-day cycle — and what’s actually wired up today.

The Spec

How the pipeline is supposed to be scheduled.

Two cron entries cover the entire pipeline. The first fires the full pipeline once every 30 days — intake plus all enrichment tiers in one chain. The second fires the lighter enrichment tiers every Monday and Thursday in between.

The two cron lines (spec)

30-day intake
0 10 1 * * python orchestrator.py --cadence all
Fires 10:00 UTC on the 1st of every month. Runs everything (intake + monthly + weekly + biweekly) as one atomic chain.
Mon + Thu enrichment
30 10 * * 1,4 python orchestrator.py
Fires 10:30 UTC every Monday and Thursday. Orchestrator picks the tier from the calendar (weekly on Mon, biweekly on Thu). Skips intake.

What each calendar day looks like

For a representative 30-day cycle, where Day 1 = the 1st of the month:

Click any fire-day row to expand the exact list of external API endpoints that step calls on that day.

Day
DOW
What fires
Cost
Day 1
1st
FULL INTAKE — intake + monthly + weekly + biweekly (every step)
~$200–300
Intake — 4 paid API calls
  • 001a
    Google Maps SERP fetch
    DFS SERP · maps
    DataForSEO
  • 002a
    Domain HTTP verification
    DFS OnPage
    DataForSEO
  • 003c
    Classify with Haiku — “is this a PI firm?”
    Anthropic · claude-haiku
    Anthropic
  • 004a
    Detect practice areas with Haiku
    Anthropic · claude-haiku
    Anthropic
Monthly enrichment — 13 paid API calls
  • 005a
    Deep page crawl (every URL of every firm)
    DFS OnPage · task_post / task_get
    DataForSEO
  • 005h
    Lighthouse audit (perf / accessibility / SEO)
    DFS OnPage · lighthouse
    DataForSEO
  • 005i
    PageSpeed Insights (real-user Core Web Vitals)
    Google · pagespeedonline v5
    Google PSI
  • 005j
    Haiku attorney count (per firm's Our-Team page)
    Anthropic · claude-haiku
    Anthropic
  • 006a
    WHOIS lookup (registration date / registrar)
    DFS Domain Analytics · whois
    DataForSEO
  • 006b
    Tech-stack detection (WordPress, React, GA, etc.)
    DFS Domain Analytics · technologies
    DataForSEO
  • 008a
    Maps SERP grid (firm + 12 GPS neighbors)
    DFS SERP · maps task_post
    DataForSEO
  • 008b
    Organic SERP — who outranks the firm
    DFS SERP · organic task_post
    DataForSEO
  • 009a
    Backlinks summary (totals, ref domains, rank)
    DFS Backlinks · summary
    DataForSEO
  • 009b
    Backlinks live — every URL pointing at the firm
    DFS Backlinks · backlinks
    DataForSEO
  • 010a
    Ranked keywords (every keyword each firm ranks for)
    DFS Labs · ranked_keywords
    DataForSEO
  • 010c
    Domain rank overview (organic visibility score)
    DFS Labs · domain_rank_overview
    DataForSEO
  • 010h
    Topical categories per domain
    DFS Labs · categories_for_domain
    DataForSEO
Weekly (also fires on Day 1) — 6 paid API calls
  • 007a
    GBP info (address, hours, rating, categories)
    DFS Business Data · my_business_info
    DataForSEO
  • 007b
    Pull every Google review per firm
    DFS Business Data · reviews
    DataForSEO
  • 007c
    Pull every Google Post / GBP update
    DFS Business Data · my_business_updates
    DataForSEO
  • 008c
    Local Finder (expanded local pack)
    DFS SERP · local_finder
    DataForSEO
  • 008d
    Autocomplete suggestions (firm name / category)
    DFS SERP · autocomplete
    DataForSEO
  • 010b
    Bulk organic traffic estimate
    DFS Labs · bulk_traffic_estimation
    DataForSEO
Biweekly (also fires on Day 1) — 11 paid API calls
  • 009c
    Bulk Domain Rank (DFS authority score)
    DFS Backlinks · bulk_ranks
    DataForSEO
  • 009d
    Bulk total backlinks count
    DFS Backlinks · bulk_backlinks
    DataForSEO
  • 009e
    Bulk spam score
    DFS Backlinks · bulk_spam_score
    DataForSEO
  • 009f
    Bulk referring-domains count
    DFS Backlinks · bulk_referring_domains
    DataForSEO
  • 009g
    Bulk new / lost referring domains
    DFS Backlinks · bulk_new_lost_referring_domains
    DataForSEO
  • 009h
    Bulk indexed pages count
    DFS Backlinks · bulk_pages_summary
    DataForSEO
  • 010d
    Bulk keyword difficulty scores
    DFS Labs · bulk_keyword_difficulty
    DataForSEO
  • 010e
    Related keywords
    DFS Labs · related_keywords
    DataForSEO
  • 010f
    Keyword suggestions (autocomplete-style)
    DFS Labs · keyword_suggestions
    DataForSEO
  • 010g
    Long-tail keyword ideas
    DFS Labs · keyword_ideas
    DataForSEO
  • 010i
    Keyword overview (per-firm performance summary)
    DFS Labs · keyword_overview
    DataForSEO
Day 4
Thu
biweekly — bulk backlinks + bulk keywords
~$2.30
11 paid API calls
  • 009c
    Bulk Domain Rank
    DFS Backlinks · bulk_ranks
    DataForSEO
  • 009d
    Bulk total backlinks count
    DFS Backlinks · bulk_backlinks
    DataForSEO
  • 009e
    Bulk spam score
    DFS Backlinks · bulk_spam_score
    DataForSEO
  • 009f
    Bulk referring-domains count
    DFS Backlinks · bulk_referring_domains
    DataForSEO
  • 009g
    Bulk new / lost referring domains
    DFS Backlinks · bulk_new_lost_referring_domains
    DataForSEO
  • 009h
    Bulk indexed pages count
    DFS Backlinks · bulk_pages_summary
    DataForSEO
  • 010d
    Bulk keyword difficulty scores
    DFS Labs · bulk_keyword_difficulty
    DataForSEO
  • 010e
    Related keywords
    DFS Labs · related_keywords
    DataForSEO
  • 010f
    Keyword suggestions (autocomplete-style)
    DFS Labs · keyword_suggestions
    DataForSEO
  • 010g
    Long-tail keyword ideas
    DFS Labs · keyword_ideas
    DataForSEO
  • 010i
    Keyword overview (per-firm performance summary)
    DFS Labs · keyword_overview
    DataForSEO
Day 8
Mon
weekly + biweekly — GBP + local + traffic + bulk backlinks + bulk keywords
~$25
Weekly — 6 paid API calls
  • 007a
    GBP info (address, hours, rating, categories)
    DFS Business Data · my_business_info
    DataForSEO
  • 007b
    Pull every Google review per firm
    DFS Business Data · reviews
    DataForSEO
  • 007c
    Pull every Google Post / GBP update
    DFS Business Data · my_business_updates
    DataForSEO
  • 008c
    Local Finder (expanded local pack)
    DFS SERP · local_finder
    DataForSEO
  • 008d
    Autocomplete suggestions (firm name / category)
    DFS SERP · autocomplete
    DataForSEO
  • 010b
    Bulk organic traffic estimate
    DFS Labs · bulk_traffic_estimation
    DataForSEO
Biweekly (also fires every Monday) — 11 paid API calls
  • 009c
    Bulk Domain Rank
    DFS Backlinks · bulk_ranks
    DataForSEO
  • 009d
    Bulk total backlinks count
    DFS Backlinks · bulk_backlinks
    DataForSEO
  • 009e
    Bulk spam score
    DFS Backlinks · bulk_spam_score
    DataForSEO
  • 009f
    Bulk referring-domains count
    DFS Backlinks · bulk_referring_domains
    DataForSEO
  • 009g
    Bulk new / lost referring domains
    DFS Backlinks · bulk_new_lost_referring_domains
    DataForSEO
  • 009h
    Bulk indexed pages count
    DFS Backlinks · bulk_pages_summary
    DataForSEO
  • 010d
    Bulk keyword difficulty scores
    DFS Labs · bulk_keyword_difficulty
    DataForSEO
  • 010e
    Related keywords
    DFS Labs · related_keywords
    DataForSEO
  • 010f
    Keyword suggestions
    DFS Labs · keyword_suggestions
    DataForSEO
  • 010g
    Long-tail keyword ideas
    DFS Labs · keyword_ideas
    DataForSEO
  • 010i
    Keyword overview
    DFS Labs · keyword_overview
    DataForSEO
Day 11
Thu
biweekly — same 11 endpoints as Day 4
~$2.30

Same endpoint set as Day 4. Click that row above for the list.

Day 15
Mon
weekly + biweekly — same 17 endpoints as Day 8
~$25

Same endpoint set as Day 8. Click that row above for the list.

Day 18
Thu
biweekly — same 11 endpoints as Day 4
~$2.30

Same endpoint set as Day 4.

Day 22
Mon
weekly + biweekly — same 17 endpoints as Day 8
~$25

Same endpoint set as Day 8.

Day 25
Thu
biweekly — same 11 endpoints as Day 4
~$2.30

Same endpoint set as Day 4.

Day 29
Mon
weekly + biweekly — same 17 endpoints as Day 8
~$25

Same endpoint set as Day 8.

other
idle — nothing fires on Sun/Tue/Wed/Fri/Sat
$0
Total
1 full intake + 4 Mondays + 4 Thursdays = 9 cron fires per 30 days
~$310

Day 1 of every cycle is the heavy day. Exact day-of-week and exact non-Day-1 fire days shift each month, but the rhythm is always: 1 intake + ~4 weekly Mondays + ~4 biweekly Thursdays.

What each tier actually does

Intake tier
Fires once / 30 days. Phases 001–004: SERP fetch, dedupe, geofence, verify, classify, specialty-tag. Rebuilds gold_domains from scratch. New firms in, lost firms out.
Monthly tier
Fires once / 30 days (with intake). Deep OnPage crawl, WHOIS, tech stack, Maps SERP grid, organic SERP, deep backlinks, ranked keywords, domain rank, categories. 18 enrichment steps.
Weekly tier
Fires every Monday. GBP info + reviews + updates, Local Finder, autocomplete, bulk traffic. 5 enrichment steps. Things that change week-to-week.
Biweekly tier
Fires every Monday and Thursday. Bulk backlinks endpoints + bulk keyword endpoints. 12 enrichment steps. The cheap, fast-refresh layer.

Monthly cost spec

Cron fire typePer fireFires / monthSubtotal
Full intake (Day 1)~$2501~$250
Weekly Mondays (Mon, not first-of-month)~$25~4~$100
Biweekly Thursdays~$2.30~4~$9
Total per month~9 fires~$359
What's Fucked

The schedule isn’t running the spec. Here’s exactly how.

Finding 1 — The 30-day intake cron doesn’t exist.

Run crontab -l on the VM right now. You will find zero cron entries that fire --cadence intake. There is no first-of-month intake job. There is no 30-day cycle anywhere in the actual schedule.

The intake tier in cadence.py is defined with runs_on: [] — meaning “never auto-fires.” That part is internally consistent: no cron fires it, the cadence map says it never auto-fires. The problem is the spec says it should fire every 30 days, and nothing makes that happen.

Finding 2 — CRON_TZ=America/Los_Angeles is being ignored.

The crontab on the VM contains the line CRON_TZ=America/Los_Angeles immediately before the orchestrator entry. The cron daemon on this Ubuntu installation is not honoring it.

Evidence: /var/log/syslog shows the orchestrator cron firing at 10:30 UTC, which is 3:30 AM Pacific, not 10:30 AM Pacific as the comment above the cron line claims. Two firings on record so far:

2026-05-11
CRON fired at 10:30:01 UTC = 3:30:01 AM PT (intended: 10:30 AM PT)
2026-05-14
CRON fired at 10:30:01 UTC = 3:30:01 AM PT (intended: 10:30 AM PT)

The pipeline runs at 3:30 AM Pacific. Reports / pitches / dashboards built off this data are looking at numbers refreshed 7 hours earlier than anyone reading them expects.

Finding 3 — The cron itself didn’t exist until 2026-05-13.

stat /var/spool/cron/crontabs/localaccount reports the file was created on 2026-05-13 20:29 UTC. Before that date, there was no cron-driven pipeline at all. Everything ran by hand.

Finding 4 — The 2026-05-04 first-Monday monthly run had no cron to fall back on.

May 1 fell on a Friday; the first Monday of May 2026 was 2026-05-04. Per the spec, this should have been a heavy first-of-month full-intake run. The spec cron didn’t exist yet (Finding 3). The operator fired the pipeline by hand from SSH that day — and it cascaded across 5/4 → 5/6 (we found 55 step log files on disk proving it executed). The cost of that manual cascade never made it into BigQuery’s cost ledger because the standalone scripts bypass the orchestrator’s logging.

Finding 5 — What the cron line actually says today

Exact contents of the VM crontab right now:

CRON_TZ=America/Los_Angeles ← ignored
30 10 * * 1,4 cd /mnt/workspace/amicus && /mnt/workspace/venv/bin/python pipeline/steps/orchestrator.py >> /mnt/workspace/amicus/000_log_files/pipeline_orchestrator_cron.log 2>&1

That’s it. One cron line covers the pipeline — for Mon and Thu only, at the wrong hour, never fires intake. Compare to the spec which calls for two cron lines (intake + enrichment) with TZ-correct hours.

What’s missing in one sentence

The Schedule Gap

Of the spec’s 9 cron fires per 30 days (1 intake + 4 weekly + 4 biweekly), the VM is currently honoring 8 — everything except the 30-day intake. That one missing cron is the most expensive run of the cycle (~$250) and the one that keeps gold_domains current. Until it’s installed, every Mon+Thu run is enriching a stale firm list.