Ryan Grant at Queen’s University has been awarded $2M from NSERC and Mitacs for research in a partnership with Rockport Networks to significantly increase the speed at which computers share information.
The initiative is focused on using multipath networks to optimize large-scale open infrastructure for artificial intelligence (AI) and high-performance computing (HPC).
AI and all forms of advanced computing are transforming every industry and putting extreme pressure on existing infrastructure to move all that data efficiently. According to Grant, even in a very fast traditional network, data movement can still become congested.
“Imagine if the highway from Kingston to Montreal only had a single lane,” he explains. “Presently, we have this high-speed route, but there’s still only one way to get from Point A to Point B. A network that runs like this may have high-bandwidth (payload) traffic, like big trucks, and low-latency traffic, like sports cars, but they are all moving in the same lane. There’s no way for the sports cars to pass the trucks.”
Grant’s work in extreme-scale computing made him a natural fit to help advance traffic flow on Rockport’s new open infrastructure system. At its foundation is the multipath, high-speed network fabric developed by Rockport, which is akin to multiple, parallel multilane highways. “We call this latency-sensitive traffic,” he says. “We make more routes that don’t overlap, so that the low-latency sports cars can take a different path than the high-latency trucks.”
Working from Rockport’s multipath design simplifies the process of re-envisioning data flows for AI and makes accelerated computer systems more flexible and easier to design. “Instead of packing Graphical Processing Units (GPUs) and other specialized accelerators into traditional servers, you can disaggregate them into open infrastructure systems where they can be shared and used more efficiently,” says Matthew Williams, Chief Technology Officer at Rockport Networks. “The work we’re doing with Dr. Grant and his team will help us calibrate the per-workload optimizations that will make traffic flows highly responsive for complex AI, machine learning and deep learning applications.”
“Many nodes in the Rockport system connect to multiple other nodes,” Grant expands. “You can use a system, called a chassis, to wire a server to operate as though it has 16 or 32 GPUs plugged into it, something that would be physically impossible.”
In aggregate, the open infrastructure model is not only more flexible and efficient, but more sustainable. “We now generate more data in one day than we did in the entire 20th century,” Williams says. “Data centres are effectively wasting hundreds of millions of litres of water a day and using the equivalent power of a large city. We can either keep building bigger, fatter server farms and data centres that use more water for cooling and more power to operate, or we can find a way to stop the sprawl at the source.”
At Queen’s, Grant is focusing on how to create algorithms to optimize this data movement for new computing system architectures. “How do you use all these multipaths? What’s the right direction to send traffic in?” he says. “We’re going to build new capabilities into the software that lets an app talk to the network, the software between the app and the network hardware. It’s that software that will let us use these new multipath networks more efficiently than our current ones and is already in use with major AI software and the vast majority of scientific software in use on the world’s largest computing systems.”
This work continues to put Grant — and Queen’s Engineering — on the map as a global leader in this area. “We’re on track to be one of the largest system software laboratories in the world, the largest in this region in Canada for sure,” he says. “There’s a lot of industry support, and we’re working on cutting-edge stuff. We’re attracting interest from industry, academics, and from graduate students who want to join us to get training in an area with tremendous academic and career potential.”
“It’s not very often that you can work on something that’s really going to change things,” adds Williams. “We’re thrilled to be part of this research collaboration because of the impact it will have on data centre infrastructure sustainability and the opportunity to put Canadian innovation at the forefront of those discoveries.”