DCNM VRF Error: Creating VRF Without Vrf_id On ND 4.1
Hey guys! Today, we're diving into a specific issue encountered while using the dcnm_vrf module with Ansible and Cisco DCNM (Data Center Network Manager). Specifically, this problem arises when trying to create a VRF (Virtual Routing and Forwarding) instance without explicitly specifying a vrf_id on a Cisco Network Device (ND) running version 4.1. Let's break down the problem, explore the context, and understand how to tackle it.
The Problem
So, the main issue is that when you attempt to create a VRF using the cisco.dcnm.dcnm_vrf Ansible module without including the vrf_id parameter in your playbook, the process fails with a cryptic error message. This contrasts with the behavior observed in older DCNM versions (like 3.2), where the system would automatically assign the next available vrf_id if one wasn't provided in the configuration. This can be a real head-scratcher, especially if you're used to the older behavior or if you're trying to automate VRF creation across different DCNM versions.
The error message you'll likely encounter looks something like this:
fatal: [ND]: FAILED! => {"changed": false, "module_stderr": "'NoneType' object is not subscriptable", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error"}
This error isn't super informative at first glance, but it hints at a problem within the dcnm_vrf module's code, specifically in how it handles the absence of a vrf_id during VRF creation. It suggests that the module is expecting a value that isn't there, leading to a NoneType error when it tries to access it.
Root Cause Analysis
The investigation points to a specific function within the dcnm_vrf.py module called diff_merge_create. The code responsible for querying the next available vrf_id seems to be working fine. However, the subsequent logic that handles VRF creation when a vrf_id is not explicitly provided differs from the path taken when a vrf_id is present in the playbook. This discrepancy leads to a 500 error being returned from DCNM, ultimately causing the Ansible task to fail.
In essence, the module's logic for automatically assigning a vrf_id in the absence of one being specified isn't functioning correctly in DCNM 4.1. This could be due to changes in the DCNM API, modifications in the expected request format, or simply a bug in the module's code that wasn't present in earlier versions.
Reproducing the Issue
To reproduce this issue, you'll need the following:
-
An Ansible environment with the
cisco.dcnmcollection installed (version 3.9.1 in this case). -
A Cisco ND environment running DCNM version 4.1.1g.
-
The following Ansible playbook:
- name: NDFC Playbook Add VRF hosts: ND gather_facts: no tasks: - name: Add VRF cisco.dcnm.dcnm_vrf: fabric: pod-1 state: merged config: - vrf_name: test vrf_template: Default_VRF_Universal vrf_extension_template: Default_VRF_Extension_UniversalNote: Make sure to replace
pod-1with the actual name of your fabric in DCNM.
When you run this playbook, you should observe the aforementioned error, indicating that the VRF creation failed because the vrf_id was not provided.
Workarounds and Solutions
So, what can you do to get around this issue? Here are a few options:
1. Explicitly Provide vrf_id
The simplest workaround is to explicitly include the vrf_id parameter in your Ansible playbook. This ensures that the module takes the code path that does work correctly in DCNM 4.1.
First, you'll need to determine the next available vrf_id in your DCNM environment. You can do this by either:
- Checking the DCNM GUI.
- Using the DCNM API to query existing VRFs and find the highest
vrf_idin use.
Once you have the next available vrf_id, modify your playbook to include it:
- name: NDFC Playbook Add VRF
hosts: ND
gather_facts: no
tasks:
- name: Add VRF
cisco.dcnm.dcnm_vrf:
fabric: pod-1
state: merged
config:
- vrf_name: test
vrf_id: <your_next_available_vrf_id>
vrf_template: Default_VRF_Universal
vrf_extension_template: Default_VRF_Extension_Universal
Replace <your_next_available_vrf_id> with the actual value you obtained.
While this workaround solves the immediate problem, it's not ideal for fully automated deployments, as it requires you to manually determine the next available vrf_id. However, you can use Ansible's uri module to query the DCNM API, get the max vrf id and increment that in your playbook.
2. Modify the dcnm_vrf Module (Advanced)
Disclaimer: This approach involves modifying the Ansible collection code, which is generally not recommended unless you're comfortable with Python and have a good understanding of the module's inner workings. Always back up your original files before making any changes.
If you're feeling adventurous, you can attempt to modify the dcnm_vrf.py module to correctly handle the case where vrf_id is not provided. This would involve:
- Identifying the exact location in the
diff_merge_createfunction where the error occurs. - Examining the code path taken when
vrf_idis present and understanding how it differs from the path taken when it's absent. - Modifying the code to ensure that the correct API calls are made to DCNM to automatically assign a
vrf_idwhen one is not provided.
This approach requires a solid understanding of the DCNM API and the dcnm_vrf module's code. It's also important to test your changes thoroughly to ensure that they don't introduce any new issues.
3. Contribute to the cisco.dcnm Collection
The best long-term solution is to contribute a fix to the cisco.dcnm collection on Ansible Galaxy. This ensures that the issue is resolved for everyone and that your changes are properly tested and maintained.
To do this, you can:
- Fork the
cisco.dcnmcollection repository on GitHub. - Implement the fix in your forked repository.
- Submit a pull request to the main
cisco.dcnmrepository.
This allows the maintainers of the collection to review your changes and incorporate them into a future release.
Conclusion
So there you have it! The issue of VRF creation failing without a vrf_id in DCNM 4.1 when using the dcnm_vrf module can be a frustrating one. However, by understanding the root cause and implementing one of the workarounds or solutions described above, you can overcome this hurdle and continue automating your network deployments. Remember to always test your changes thoroughly and consider contributing back to the community to help others who may encounter the same problem. Happy automating!