Telemetry Logs Update For Artifact Upload Flow
Hey guys! We're diving deep into improving our telemetry logging for the Artifacts Upload flow. This is super important because it helps us understand how you're using the feature, identify potential issues, and ultimately make the experience smoother and more efficient. Think of it as giving us a peek under the hood so we can fine-tune everything for optimal performance. In this article, we'll break down the updates we're making to the logUsage
calls, ensuring we capture a coherent story for each upload attempt. We're talking about tracking everything from the initial upload call to the final success or failure, and even those little steps in between. Let's get into the details and see how these changes will help us build a better product for you!
Why Update Telemetry Logs?
Okay, so why are we even bothering with these updates? Well, in the world of software development, telemetry is our secret weapon. It's like having a detective on the case, gathering clues about how our features are being used. By updating our telemetry logs, we're essentially giving our detective better tools to solve mysteries. We want to know exactly what's happening during the artifact upload process â where things are going smoothly, and where they might be hitting a snag. This is crucial for a few key reasons:
- Understanding User Behavior: We can see how users interact with the upload flow, which helps us tailor the experience to your needs.
- Identifying Pain Points: If there's a step where users frequently encounter issues, we'll spot it in the logs and can jump in to fix it.
- Improving Reliability: By tracking successes and failures, we can pinpoint areas for optimization and make the entire process more robust.
- Data-Driven Decisions: Ultimately, these logs provide the data we need to make informed decisions about future development and improvements.
Think of it this way: without good telemetry, we're flying blind. These updates are like installing a high-tech navigation system, guiding us to build the best possible experience for you. And believe me, we're all about that!
Where We're Updating the logUsage
Calls
Alright, let's get down to the nitty-gritty. We're focusing our attention on specific locations within the Artifacts Upload flow where we make those all-important logUsage
calls. These calls are the breadcrumbs we're leaving behind, allowing us to trace the path of each upload attempt. Here's a breakdown of the key areas we're targeting:
- Upload Called: This is the starting point. We want to know the moment a user initiates the upload process. It's like the opening scene of our story, setting the stage for everything that follows.
- User Entered Form to Specify Params: Next up, we're tracking when a user engages with the form where they input the necessary parameters for the upload. This tells us they're actively moving forward with the process.
- Each Form Field Changed: This is where things get granular. We're logging every change a user makes to the form fields. This level of detail helps us understand how users are configuring their uploads and identify any fields that might be causing confusion or friction.
- Form Completed or Exited w/o Completion: Did the user finish filling out the form and proceed, or did they bail out halfway through? This is crucial information for understanding user intent and identifying potential roadblocks. Maybe the form is too long, too confusing, or asks for information that's not readily available. Knowing this helps us streamline the process.
- Upload to Cloud Started: We're tracking the moment the upload process actually kicks off and data starts transferring to the cloud. This is a critical milestone in the flow.
- Process Steps (Consolidated Logging): This is a bit different. Instead of logging each individual step (like getting a pre-signed URL, uploading to the provider, and adding to Confluent), we're grouping them together. Why? Because logging each step individually would create a huge amount of noise in the logs. Instead, we'll focus on logging the overall outcome: success or failure. If there's an error during one of these steps, we'll capture the error message in our failure log, giving us the details we need without overwhelming the system.
- Upload Final State - Success or Failure: The grand finale! We're logging whether the upload was ultimately successful or if it failed. If it failed, we're including the reason or error message so we can diagnose the problem and prevent it from happening again. This is the most critical piece of information, as it tells us whether the entire process worked as expected.
By focusing on these specific locations, we're ensuring that we capture a comprehensive picture of the upload flow without getting bogged down in unnecessary details. It's all about being efficient and effective with our telemetry data.
Updated Telemetry Schema: What We're Tracking
Now, let's talk about the blueprint for our telemetry data: the schema. This is the structure we're using to organize the information we collect, ensuring that it's consistent and easy to analyze. We've updated the schema to capture the specific details we need to understand the Artifacts Upload flow. Here's the breakdown:
We're using a consistent event name: "Flink Artifact Action"
This helps us easily identify these specific telemetry events in our data.
Each event will include the following properties:
action: "upload"
: This clearly identifies the event as related to the upload action.step: "started" | $formStep | "cancelled" | "failed" | "succeeded"
: This is the crucial step identifier. It tells us where the user is in the upload flow. Possible values include:"started"
: The upload process has just begun.$formStep
: Represents the specific form step the user is on (e.g., "field1", "field2"). This allows us to track progress within the form itself."cancelled"
: The user exited the form or cancelled the upload before completion."failed"
: The upload failed at some point."succeeded"
: The upload was completed successfully.
message?: if not success, string error message or early exited msg
: This is an optional field that we'll use to store error messages or reasons for cancellation when the upload fails or is cancelled. This gives us valuable context for troubleshooting.cloud
: The cloud provider being used (e.g., AWS, Azure, GCP).region
: The specific region within the cloud provider.artifactId
: The ID of the artifact being uploaded (when available). This helps us track specific artifacts and their upload history.
By using this schema, we're ensuring that we capture all the key information we need in a structured and consistent way. This makes it much easier to analyze the data and draw meaningful conclusions. We're talking about having a clear and organized dataset, rather than a jumbled mess of information. Think of it as having a well-organized filing system, rather than a pile of papers on your desk. Trust me, it makes a huge difference!
Focus on User Events for a Coherent Story
One of the main goals of these updates is to tell a coherent story for each upload attempt. We want to be able to trace the user's journey from start to finish, understanding exactly what happened along the way. To achieve this, we're focusing on logging user events. What does that mean? It means we're primarily interested in actions that the user takes, such as initiating an upload, filling out a form, or cancelling the process. These events provide the narrative thread that ties the entire upload flow together. By focusing on these key moments, we can build a clear and concise picture of the user's experience.
Each logUsage
call we make will add a "Flink Artifact Actionâ User Event in Segment. Segment is our telemetry platform, and these user events will be the building blocks of our analysis. By having a consistent stream of user events, we can:
- Track Conversion Rates: How many users who start the upload process actually complete it successfully?
- Identify Drop-off Points: Where are users most likely to abandon the upload process?
- Analyze Error Patterns: What types of errors are users encountering, and how frequently?
- Measure Feature Adoption: How many users are actively using the Artifacts Upload feature?
By understanding these key metrics, we can make data-driven decisions about how to improve the user experience. It's all about using the power of telemetry to guide our development efforts.
What's Not Being Logged (And Why)
It's just as important to understand what we're not logging as it is to understand what we are. As mentioned earlier, we're not logging each individual step within the "Process Steps" category (get pre-signed URL, upload to provider, add to Confluent). This is a deliberate decision, and here's why:
- Reducing Log Noise: Logging every single step would create a massive amount of data, making it difficult to analyze the truly important events. We want to focus on the big picture, not get lost in the weeds.
- Maintaining Performance: Excessive logging can impact the performance of the application. We need to strike a balance between capturing enough information and keeping things running smoothly.
- Focusing on Outcomes: Ultimately, we're most interested in the final outcome of these process steps: did they succeed or fail? If they failed, we'll capture the error message, giving us the information we need to troubleshoot without logging every intermediate step.
Think of it like watching a movie. You don't need to see every single frame to understand the story. You just need the key scenes. Similarly, we're focusing on logging the key user events and outcomes, giving us the essential information without overwhelming the system.
In Conclusion: A Clearer View of Artifact Uploads
So, there you have it! We've walked through the updates we're making to the logUsage
calls in the Artifacts Upload flow. By focusing on key user events, using an updated schema, and streamlining our logging process, we're building a much clearer picture of how users are interacting with this feature. This will allow us to identify areas for improvement, fix issues more quickly, and ultimately create a better experience for everyone. We're excited about these changes and the insights they will provide. Thanks for taking the time to learn more about what we're up to. Stay tuned for more updates as we continue to refine and improve our telemetry practices!