In the context of MLOps, the benefits of using a multi-tenant system are manifold. Machine learning engineers, data scientists, analysts, modelers, and other practitioners contributing to MLOps processes often need to perform similar activities with equally similar software stacks. It is hugely beneficial for a company to maintain only one instance of the stack or its capabilities—this cuts costs, saves time, and enhances collaboration. In essence, MLOps teams on multi-tenant systems can be exponentially more efficient because they aren’t wasting time switching between two different stacks or systems.
Growing demand for multi-tenancy
Adoption of multi-tenant systems is growing, and for good reason. These systems help unify compute environments, discouraging those scenarios where individual groups set up their own bespoke systems. Fractured compute environments like these are highly duplicative and exacerbate cost of ownership because each group likely needs a dedicated team to keep their local system operational. This also leads to inconsistency. In a large company, you might have some groups running software that is on version 7 and others running version 8. You may have groups that use certain pieces of technology but not others. The list goes on. These inconsistencies create a lack of common understanding of what’s happening across the system, which then exposes the potential for risk.
Ultimately, multi-tenancy is not a feature of a platform: It’s a baseline security capability. It’s not sufficient to simply plaster on security as an afterthought. It needs to be a part of a system’s fundamental architecture. One of the greatest benefits for teams that endeavor to build multi-tenant systems is the implicit architectural commitment to security, because security is inherent to multi-tenant systems.
Challenges and best practices
Despite the benefits of implementing multi-tenant systems, they don’t come without challenges. One of the main hurdles for these systems, regardless of discipline, is scale. Whenever any scaling operation kicks off, patterns emerge that likely weren’t apparent before.
As you begin to scale, you garner more diverse user experiences and expectations. Suddenly, you find yourself in a world where users begin to interact with whatever is being scaled and use the tool in ways that you hadn’t anticipated. The bigger and more fundamental challenge is that you’ve got to be able to manage more complexity.
When you’re building something multi-tenant, you’re likely building a common operating platform that multiple users are going to use. This is an important consideration. Something that is multi-tenant is also likely to become a fundamental part of your business because it’s such a meaningful investment.
To successfully execute on building multi-tenant systems, strong product management is crucial, especially if the system is built by and for machine learning experts. It’s important that the people designing and building a domain-specific system have deep fluency in the field, enabling them to work backward from their end users’ requirements and capabilities while being able to anticipate future business and technology trends. This need is only underscored in evolving domains like machine learning, as demonstrated by the proliferation and growth of MLOps systems.
Aside from these best practices, make sure to obsessively test each component of the system and the interactions and workflows they enable—we’re talking hundreds of times—and bring in users to test each element and emergent property of functionality. Sometimes, you’ll find that you need to implement things in a particular way because of the business or technology. But you really want to be true to your users and how they’re using the system to solve a problem. You never want to misinterpret a user’s needs. A user may come to you and say, “Hey, I need a faster horse.” You may then spend all your time training a faster horse, when what they actually needed was a more reliable and rapid means of conveyance that isn’t necessarily powered by hay.