Monitor SCAHA Selectors With Daily GitHub Actions

by SLV Team 50 views

Hey guys, let's dive into how we can proactively monitor the health of our SCAHA element selectors using a nifty GitHub Action. This is super important because when SCAHA updates its configuration, the auto-generated element IDs change. This can cause our scrapers to break silently, which is not ideal. We want to catch these issues before our users even notice. This article will break down the problem, the solution, and the benefits of implementing this automated monitoring system.

The Problem: Silent Scraper Failures and Manual Detection

So, here's the deal: SCAHA.net periodically updates its JavaServer Faces (JSF) configuration, and this causes the element IDs on the pages to change. These IDs are like the addresses we use to find specific parts of the webpage with our scrapers. When these addresses change, our scrapers can't find what they're looking for, and they break. The problem? These breaks often happen silently. Users only realize something is wrong when they start seeing errors or missing data, like in the case of Issue #7. Currently, our selectors are hardcoded in the src/lib/browser-scrapers.ts and src/lib/scrapers.ts files. There's no automated system to detect when SCAHA changes these element IDs. This means we have to manually inspect the pages and identify the changes whenever an issue pops up. This manual process is time-consuming and reactive. We need a way to be proactive and catch these issues before they affect our users. We need to implement a proactive monitoring system to detect selector changes as soon as they happen. This is where our solution comes in!

The Solution: Automated Selector Validation with GitHub Actions

Our solution involves implementing a daily GitHub Action that validates the SCAHA element selectors to ensure they're still valid. This automated system will run a series of tests to confirm that our scrapers are working correctly. Let’s break down the key components of this solution, shall we?

1. The Test File: tests/integration/selector-health.test.ts

This file is the heart of our monitoring system. It uses Puppeteer, a Node library that allows us to control a headless Chrome browser. This allows us to load the scoreboard.xhtml and statscentral.xhtml pages. Once the pages are loaded, the test file does the following:

  • Validates Selectors: Checks if all the expected selectors exist on the page. These selectors are the HTML element IDs, classes, and other attributes that our scrapers use to find the data. If a selector is missing, the test will fail.
  • Checks Season Availability: Validates the current season is available in the dropdown menus. For instance, it checks if the 2025/26 season is available in the dropdown menu. This helps ensure that the pages are up-to-date and functioning correctly.
  • Efficiency: The entire test runs in approximately 30 seconds using headless Chrome. This ensures that the test runs quickly and efficiently, so it doesn't slow down the main test suite.

2. The GitHub Action: .github/workflows/scaha-health-check.yml

This is where the magic happens. The GitHub Action automates the process of running our tests and reporting any issues. Here's what it does:

  • Scheduled Execution: Runs daily at 12:00 UTC (coordinated universal time). This ensures that we regularly check the selectors for any changes.
  • Manual Trigger: Can be manually triggered via workflow_dispatch. This provides flexibility to run the tests on demand, such as after a code update or when you suspect a change in SCAHA.
  • Chrome Installation: Installs Chrome to provide a browser environment for the test.
  • Selector Validation Test: Runs the selector validation test we discussed earlier.
  • Automatic Issue Creation: Automatically creates a GitHub issue if the selectors fail validation. This is a critical feature, as it allows us to know immediately if there's a problem, and the issue includes detailed information.

3. Expected Selectors: The Single Source of Truth

In our test file, we have a section that lists the EXPECTED_SELECTORS. This is the single source of truth for our selectors. It looks like this:

const EXPECTED_SELECTORS = {
  scoreboard: {
    seasonDropdown: 'j_id_4d:j_id_4kInner',
    scheduleDropdown: 'j_id_4d:j_id_4nInner',
    teamDropdown: 'j_id_4d:j_id_4qInner',
    standingsTable: 'j_id_4d:parts',
  },
  statscentral: {
    seasonDropdown: 'j_id_4d:j_id_4kInner',
    scheduleDropdown: 'j_id_4d:schedulelistInner',
    playerTable: 'j_id_4d:playertotals',
    goalieTable: 'j_id_4d:goalietotals',
  },
};

This object contains all the selectors we expect to find on the scoreboard and statscentral pages. If any of these selectors are missing, the test will fail, and an issue will be created.

Benefits: Why This Matters

So, what are the benefits of this system? Let's take a look:

  • Early Detection: ✅ Early detection (within 24 hours of SCAHA changes) - The daily run of the GitHub Action ensures that we catch any selector changes within a day. This is a significant improvement over manual detection, which can take much longer.
  • Automatic Issue Creation: ✅ Automatic issue creation with details - When the selectors fail, the GitHub Action automatically creates a detailed issue with all the information needed to resolve the problem. This saves us time and effort.
  • Historical Record: ✅ Historical record of selector changes - The GitHub issues provide a historical record of selector changes, making it easier to track and understand the evolution of the website.
  • Reduced MTTR: ✅ Reduces MTTR (mean time to resolution) - By catching issues quickly and providing detailed information, we can significantly reduce the mean time to resolution (MTTR). This means our scrapers will be back up and running faster, minimizing the impact on our users.
  • Cost-Effective: ✅ No infrastructure costs (uses GitHub Actions free tier) - We're using the free tier of GitHub Actions, so there are no additional costs associated with this monitoring system.

Issue Creation Template: The Details You Need

When the selectors fail, the system automatically creates a GitHub issue to alert us of the problem. This issue includes valuable information to help us resolve the problem quickly:

  • Title: