PUDL Archiver: October 28th, 2025 Archives Review
Let's dive into the October 28th, 2025 archives for the PUDL (Public Utility Data Liberation) project! This post summarizes the results of the archive runs, focusing on what needs to be reviewed, published, or further investigated. You can see the job run logs and results here.
Review and Publish Archives
Alright, team, let's get these archives sorted! For each archive listed below, we need to check its run status in the Github Archiver run. If the validation tests passed with flying colors, it's time for a manual review. We need to make sure everything looks good before publishing. If no changes are detected during the run, simply delete the draft. However, if changes are detected, we've got a bit more work to do. Follow the guidelines in step 3 of README.md to meticulously review the archive. Once you're satisfied, publish the new version. Finally, confirm the publication status by adding a note (e.g., "v1 published", "no changes detected, draft deleted"). If anything seems off or requires further action, don't hesitate to create a follow-up sub-issue. This process ensures that our archives are accurate and up-to-date.
Detailed Steps for Review and Publication
- Check Run Status: Begin by verifying the run status in the Github Archiver run. Ensure that the archive process completed successfully without any initial errors.
- Validation Test Review: If the validation tests have passed, proceed to the next step. Passing validation tests indicate that the basic integrity and format of the data are correct, but a manual review is still necessary.
- Manual Review: Download the archive and thoroughly inspect the data. Look for any anomalies, inconsistencies, or unexpected changes. This step is crucial to catch any errors that automated tests might miss.
- Change Detection: If no changes are detected, confirm that the draft archive is deleted to avoid confusion and ensure that only current and relevant versions are available.
- Review Guidelines: If changes are detected, meticulously follow the guidelines in step 3 of the
README.mdfile. This document provides detailed instructions on how to handle updates and modifications to the archive. - Publication: After a successful review, publish the new version of the archive. Ensure that the publication process is completed without any errors.
- Confirmation: Confirm the publication status by adding a note, such as "v1 published" or "no changes detected, draft deleted." This helps keep track of the archive's status and any actions taken.
- Sub-Issues: If any issues arise or further action is needed, create a follow-up sub-issue. This ensures that all problems are addressed and resolved promptly.
By following these steps, we can maintain the accuracy and reliability of our archives, providing valuable data for public utility research and analysis.
Changed Archives
The following archives have successfully run and have new data. Review each archive prior to publication.
(No specific archives listed in the original document)
Validation Failures
Okay, Houston, we have a problem! Some runs failed due to validation test failures. You'll find these in the GHA logs. The first step is to add each failed run to the task list. Then, download the run summary JSON by heading to the "Upload run summaries" tab in the GHA run for each dataset and clicking the link. Time to put on our detective hats and investigate the validation failure! If, after a manual review, the failure seems okay (e.g., Q2 2024 data doubles the file size compared to Q1, but the new data itself looks correct), go ahead and approve the archive. Just make sure to leave a note explaining your reasoning in the task list. However, if the validation failure is a major roadblock (e.g., incorrect file format, the entire dataset changes size by 200%), we need to create an issue to resolve it before proceeding. Addressing these failures diligently ensures that the data we're archiving is accurate and reliable.
Investigating Validation Failures
When dealing with validation failures, it's important to approach the issue systematically to determine the root cause and appropriate action. Here’s a detailed guide to help you through the process:
- Add to Task List: For each run that failed due to validation test failures, add it to the task list. This ensures that no failed run is overlooked and that all issues are tracked.
- Download Run Summary JSON: Navigate to the "Upload run summaries" tab in the GHA (GitHub Actions) run for each dataset and follow the link to download the run summary JSON. This file contains detailed information about the validation failure, including error messages and relevant data.
- Investigate Validation Failure: Analyze the JSON file to understand the specific validation tests that failed. Look for error messages, unexpected data patterns, or any other anomalies that might indicate the cause of the failure.
- Manual Review: After understanding the error, perform a manual review of the data. This involves examining the actual data files to see if the failure is due to a legitimate issue or a false positive. For example, a file size increase might be due to the inclusion of additional data, which is perfectly acceptable.
- Approve with Notes: If, after the manual review, the validation failure is deemed acceptable (e.g., new data is valid despite causing a size increase), approve the archive and leave a detailed note explaining your decision in the task list. This ensures transparency and helps others understand why the failure was accepted.
- Create an Issue: If the validation failure is significant and indicates a real problem (e.g., incorrect file format, massive data size change), create an issue to resolve it. Provide as much detail as possible in the issue, including the error messages, steps to reproduce the failure, and any relevant context.
By following these steps, you can effectively address validation failures, ensuring that the archived data is accurate and reliable. This meticulous approach helps maintain the integrity of the PUDL project and the quality of its data.
Other Failures
If a run failed for any other reason besides validation issues – maybe the underlying data changed unexpectedly, or we ran into some code hiccups – we need to create an issue outlining the problem. Then, we'll need to figure out the necessary steps to fix it. This might involve updating our code, adjusting our data sources, or something else entirely. The key is to document the failure and develop a plan to resolve it so we can get the archive back on track!
Addressing Other Failures
When a run fails for reasons other than validation issues, it's crucial to take a systematic approach to identify and resolve the problem. Here’s a step-by-step guide to help you through the process:
- Identify the Failure Type: Determine the specific reason for the failure. This could be due to changes in the underlying data, code errors, network issues, or other unexpected problems. Review the error logs and any available documentation to understand the nature of the failure.
- Create an Issue: Create a detailed issue outlining the problem. Include information such as the error message, the steps leading up to the failure, and any relevant context. The more information you provide, the easier it will be for others to understand and help resolve the issue.
- Document the Problem: In the issue, thoroughly document the problem. Explain what went wrong, why you think it went wrong, and any initial steps you’ve taken to investigate the issue.
- Develop a Resolution Plan: Based on your understanding of the failure, develop a plan to resolve it. This might involve updating code, adjusting data sources, or implementing new error-handling procedures. Consult with other team members if necessary to ensure the plan is comprehensive and effective.
- Implement the Solution: Implement the solution according to the plan. This might involve writing new code, modifying existing code, or making changes to the data sources. Test the solution thoroughly to ensure that it resolves the issue and doesn’t introduce any new problems.
- Test Thoroughly: After implementing the solution, test it thoroughly to ensure that it resolves the issue and doesn’t introduce any new problems. Use a variety of test cases to cover different scenarios and ensure that the system is functioning correctly.
- Update Documentation: Once the issue is resolved, update the documentation to reflect the changes. This will help others understand the solution and prevent similar issues from occurring in the future.
By following these steps, you can effectively address other failures, ensuring the PUDL archiver remains robust and reliable. Documenting each issue and its resolution helps build a knowledge base that can be used to prevent similar problems in the future.
Specifically, we have this error:
...
) pudl_archiver.depositors.zenodo.depositor.ZenodoClientError: ZenodoClientError(status=400, message=Empty files are not accepted., errors=None)
Unchanged Archives
These archives didn't show any changes during the run. That means the data is likely the same as the last time we archived it.
No changes.
No changes.
Other Issues
And finally, any other lingering issues that popped up during the archiving process that need some attention:
- [ ] issue