SCAHA Schedule Cache: Ensuring HockeyGoTime Resilience

Oct 15, 2025 by ADMIN 55 views

Hey everyone! Let's dive into a plan to make HockeyGoTime (HGT) more resilient when the SCAHA website (scaha.net) experiences those frustrating outages. We've all been there – especially on Monday mornings after a busy weekend of games. The goal? To keep HGT running smoothly, even when SCAHA.net isn't cooperating. Here’s the plan.

The Problem: Frequent SCAHA.net Outages

SCAHA website downtime is a recurring issue, particularly on Mondays following busy game weekends.
These extended outages (lasting hours) often coincide with bulk stats updates.
The impact on HockeyGoTime is significant: users can't access schedules or stats, leading to a poor user experience.
Currently, there's no fallback mechanism in place, and error messages don't clearly explain the problem.

We've noticed a pattern, guys: The site often goes down Monday mornings, right after those packed Sunday game schedules. It seems to be related to when they're doing those big data processing tasks and stats updates. And when SCAHA.net is down, HockeyGoTime can't grab the schedule or stats data, which isn't great for our users. They can't check schedules, and there's no backup plan in place. Plus, the error messages don't really explain what's going on. So, let's fix this!

The Solution: Nightly Backup Cache

Implement a nightly backup cache of SCAHA schedules and stats data.
Fallback mechanism: When SCAHA.net is unreachable, use the cached data.
User notification: Clearly inform users when they're viewing cached data.
Cache expiration: Set an age threshold (24-48 hours) for cached data.

So, the idea is to set up a nightly job that grabs and saves the important SCAHA data. Then, if SCAHA.net goes offline, we'll automatically switch to using this cached data. We'll also make sure to let users know they're seeing older data with a message like this:

🔴 SCAHA.net is currently unavailable. Showing cached data (last updated: 2025-10-13 11:00 PM)

Core Requirements:

A nightly scheduled job to fetch and cache the important SCAHA data.
A fallback mechanism in the MCP server to use cached data when the live site is down.
A clear user notification when we're serving stale data.
An age threshold for the cache (maybe 24-48 hours?).

Rabbit Holes to Avoid

To keep things manageable, let's agree on what not to do:

Don't try to cache everything – focus on high-value data like schedules and team standings.
Don't build a complex versioning system; a simple "last good snapshot" is enough.
Don't cache individual player stats initially – focus on team-level data.
Avoid real-time sync complexity; nightly snapshots are adequate.
Don't replicate SCAHA's entire database – cache only what HGT actively queries.

We don't want to overcomplicate things, guys. Let's stick to caching only the stuff we really need, like schedules and standings. No need for a fancy versioning system – just a simple snapshot will do. And let's not worry about individual player stats or trying to sync everything in real-time. Keep it simple and focused on what HGT actually uses.

No-Gos

Here are some things we should not do:

Changing the MCP SDK or core architecture.
Implementing real-time change detection (too complex).
Caching data that changes intra-day (live scores, in-progress games).
Creating a full SCAHA.net mirror/scraper.
Building a UI for cache management (use Vercel cron + logs).

Basically, let's avoid anything that's too complex or time-consuming. We don't want to mess with the core MCP stuff, try to detect changes in real-time, or cache data that's constantly changing. And definitely no need to build a full-blown SCAHA.net mirror or a fancy UI for managing the cache. We can keep it simple and use Vercel's tools for that.

Design Questions to Consider

1. What Data Should We Cache?

High Priority (actively queried by HGT):

Team schedules (by season, division, team).
Team standings (by division).
Venue information (already handled separately).

Medium Priority (nice-to-have):

Team rosters
Game results (past games)

Low Priority (defer):

Individual player stats
League-wide analytics

Okay, so what data should we actually be saving? Definitely the stuff that HGT uses all the time, like team schedules, standings, and venue info. Team rosters and past game results would be nice to have, but not essential. And let's forget about individual player stats and league-wide analytics for now – we can always add those later if needed.

2. Where Should We Store the Cache?

Option A: Vercel KV (Redis) (Recommended)

✅ Fast reads (<1ms)
✅ Built-in TTL support
✅ No separate infrastructure
✅ Simple key-value model
❌ Costs $ after free tier (10,000 commands/day)

Option B: Supabase (PostgreSQL)

✅ Already used for venue data
✅ Free tier is generous
✅ Relational queries possible
❌ Slower than KV store
❌ More complex schema

Option C: Vercel Blob Storage

✅ Good for large JSON snapshots
✅ Cheap storage
❌ Not optimized for frequent small reads
❌ No built-in TTL

Option D: File system cache in scaha-mcp repo

✅ Zero external dependencies
✅ Works for local development
❌ Doesn't work in Vercel serverless (no persistent file system)
❌ Not accessible to HGT directly

Recommendation: Start with Vercel KV for simplicity. If cost becomes an issue, migrate to Supabase (we already have it set up).

Where should we actually store this cached data? I think Vercel KV (which is basically Redis) is the way to go. It's super fast, has built-in support for expiring old data, and we don't need any extra infrastructure. It might cost a bit after the free tier, but let's start there. If it gets too expensive, we can always move to Supabase since we're already using it for venue data.

3. How Do We Schedule the Nightly Backup?

Option A: Vercel Cron Jobs (Recommended)

Create /api/cron/cache-scaha endpoint
Configure in vercel.json:

{
  "crons": [{
    "path": "/api/cron/cache-scaha",
    "schedule": "0 23 * * *"  // 11 PM daily (after games are finalized)
  }]
}

✅ Native Vercel feature
✅ Simple setup
❌ Only works in production

Option B: GitHub Actions

Workflow triggers nightly
Calls HGT API endpoint or directly scrapes SCAHA
✅ Works for any deployment
✅ Independent of hosting platform
❌ More moving parts

Recommendation: Vercel Cron for simplicity.

For scheduling the nightly backup, let's use Vercel Cron Jobs. It's the simplest option since it's built right into Vercel. We'll create an API endpoint (/api/cron/cache-scaha) and then configure the cron job in vercel.json to run every night at 11 PM. The downside is that it only works in production, but that's where we need it anyway. GitHub Actions is another option, but it's more complex.

4. When Should We Use Cached Data?

Strategy: Progressive fallback

Try live SCAHA.net (primary)
- Timeout: 10 seconds
- Retry: 1 time with exponential backoff
Fall back to cache (if live fails)
- Check cache age < 48 hours
- Return cached data with staleness notification
Return error (if cache is too old or missing)
- Clear error message: "SCAHA.net is down and cached data is too old (last updated: X)"

Cache Key Structure:

scaha:schedule:{season}:{division}:{team}
scaha:standings:{season}:{division}
scaha:last_updated

Here's the plan for when to use the cached data: First, we'll try to get the data live from SCAHA.net. If that fails (after 10 seconds and one retry), then we'll check the cache. If the cache is less than 48 hours old, we'll use that and let the user know. If the cache is too old or missing, then we'll show an error message saying that SCAHA.net is down and the cached data is too old. For the cache keys, let's use a structure like scaha:schedule:{season}:{division}:{team}.

5. How Do We Notify Users About Stale Data?

In Chat Response:

⚠️ **SCAHA.net is currently unavailable**

I'm showing you cached schedule data from **Sunday, Oct 13 at 11:00 PM**.
This data may not reflect recent changes or cancellations.

---

[Rest of response with schedule data]

In API Response (for MCP tool):

Add metadata field:

{
  "data": { ... },
  "meta": {
    "cached": true,
    "cached_at": "2025-10-13T23:00:00Z",
    "age_hours": 14,
    "warning": "SCAHA.net unavailable, using cached data"
  }
}

When we're using cached data, we need to let the users know. In the chat response, we'll add a warning message at the top saying that SCAHA.net is unavailable and that the data might be outdated. In the API response (for the MCP tool), we'll add a meta field with information about the cache, like when it was last updated and a warning message.

6. How Do We Test This?

Manual Testing:

Simulate SCAHA.net downtime by blocking requests in the test environment.
Verify fallback to cached data.
Verify staleness notification appears.

Monitoring:

Log cache hits vs live hits.
Alert if the cache is used for >6 hours (indicates a prolonged outage).
Track cache refresh failures.

To test this, we can simulate SCAHA.net being down in our test environment and make sure that we're falling back to the cached data and that the warning message is showing up. We should also log how often we're using the cache versus the live data, and set up alerts if we're using the cache for more than 6 hours, which would mean there's a long outage. And of course, we need to keep an eye on whether the cache is refreshing properly.

Implementation Plan (High-Level)

Phase 1: Cache Infrastructure (2-3 hours)

Add Vercel KV to the project
Create cache utility functions (set, get, check age)
Create /api/cron/cache-scaha endpoint
Configure vercel.json cron schedule

Phase 2: SCAHA MCP Updates (3-4 hours)

Update get_schedule tool to support cache fallback
Add cache-checking logic with progressive fallback
Include staleness metadata in responses
Add cache age validation (reject >48hr old data)

Phase 3: HGT Chat Agent Updates (1-2 hours)

Update system prompt to handle cache metadata
Display staleness warnings to users
Test chat responses with cached vs live data

Phase 4: Testing & Monitoring (2 hours)

Test cache refresh job
Simulate SCAHA outage and verify fallback
Add Vercel Analytics events for cache usage
Document cache behavior in README

Total Estimate: 8-11 hours

Alright, here's the plan to get this done: First, we'll set up the cache infrastructure with Vercel KV and create the API endpoint for the cron job. Then, we'll update the SCAHA MCP to use the cache as a fallback and add the staleness metadata to the responses. Next, we'll update the HGT chat agent to display the warnings to users. And finally, we'll test everything and set up monitoring. I'm thinking this will take about 8-11 hours total.

Success Criteria

Nightly cache job runs successfully in production
When SCAHA.net is down, users get cached data within 2 seconds
Staleness warnings are clearly displayed in chat responses
Cache data is never older than 48 hours
Zero user impact during SCAHA.net outages (besides staleness notice)
Cache refresh failures are logged and monitorable

How will we know if this is successful? Well, the nightly cache job needs to run without issues. When SCAHA.net is down, users should get the cached data quickly (within 2 seconds). The staleness warnings need to be clear and easy to understand. The cache data should never be older than 48 hours. There should be zero user impact during SCAHA.net outages, except for the staleness notice. And we need to be logging and monitoring any cache refresh failures.

Future Enhancements (Post-MVP)

Cache invalidation on demand (manual refresh button)
Smart cache warming (pre-fetch frequently queried teams)
Multi-tier cache (hot/warm/cold based on query frequency)
Webhook from SCAHA when data updates (if they provide one)

Down the road, we could add some nice-to-have features like a manual refresh button, smart cache warming (where we pre-fetch data for frequently queried teams), a multi-tier cache, and even a webhook from SCAHA if they provide one so we know when the data has been updated.

Reference

Related Issue: Venue resolution system (feature 005) - shows precedent for caching strategy
SCAHA MCP: https://github.com/joerawr/scaha-mcp (where MCP server lives)
Vercel Cron: https://vercel.com/docs/cron-jobs
Vercel KV: https://vercel.com/docs/storage/vercel-kv

Priority: P2 (Enhancement - improves resilience but not blocking)

Complexity: Medium (requires cross-repo changes, cron setup, caching strategy)

User Impact: High (eliminates frustration during SCAHA outages)