The ibnetdiscover utility is used to perform InfiniBand subnet discovery and outputs a human-readable topology file. GUIDs, node types, and port numbers are displayed, as well as port LIDs and node descriptions. All nodes and links are displayed, providing a full topology. This utility can also be used to list the current connected nodes. The output is printed to the standard output unless a topology file is specified.
InfiniBand is a high-performance, low-latency interconnect technology used in AI and HPC data centers, particularly for east-west traffic between compute nodes in large-scale fabrics. Ensuring seamless data flow requires tools to troubleshoot and monitor the network, including the ability to trace and discover network paths between nodes. The question asks for the specific tool used to trace and discover paths in an InfiniBand fabric, which is a key task in InfiniBand troubleshooting.
According to NVIDIA’s official InfiniBand documentation, the ibnetdiscover tool is designed to discover and map the topology of an InfiniBand fabric, including the paths between nodes. It scans the fabric, queries the subnet manager, and generates a topology map that details the connections between switches, Host Channel Adapters (HCAs), and other devices. This tool is essential for verifying connectivity, identifying routing paths, and troubleshooting issues like misconfigured routes or link failures in large-scale InfiniBand fabrics.
Exact Extract from NVIDIA Documentation:
“The ibnetdiscover tool is used to discover the InfiniBand fabric topology and generate a map of the network. It queries the subnet manager to retrieve information about all nodes, switches, and links in the fabric, providing a detailed view of the paths between nodes. This tool is critical for troubleshooting connectivity issues and ensuring proper routing in InfiniBand networks.”
— NVIDIA InfiniBand Networking Guide
This extract confirms that ibnetdiscover is the correct tool for discovering network paths in an InfiniBand east-west fabric. It provides a comprehensive view of the fabric’s topology, enabling administrators to trace paths between compute nodes and ensure seamless data flow.