Introduction
The last decade or so has seen a paradigm shift in the way that network security is achieved in the modern organisation. This shift can be summarised as a move away from assuming trust for machines within a private network, towards a model where all participants on the network continually prove that they have authorisation for access. This so-called zero-trust model leads to such significant security benefits that it is now recommended by the NSA and will be required for all US federal agencies by 2024 [1, 2]. In this blog post, I’ll describe some thoughts on how this new security model can be leveraged for federated learning and analytics.
The VPN model
The traditional security model for many organisations depended on the concept of a private network. Originally, the private network was a physical network, and the assumption was that all machines on the network could be trusted to communicate with one another. Anything outside of the network could be compromised and so most of the security team’s effort would be spent ensuring that no-one outside of the network could get in.
As time has progressed, the private network concept has evolved to allow a virtual form. This Virtual Private Network (VPN) allows machines to be physically spread across the world, but for networking purposes they all appear to be on the same subnet. This provides an opportunity for federated data science, because it allows the participants in a federated analysis to restrict who can participate.
The security concern in the VPN approach is that if any individual machine is compromised, the entire network becomes at risk. Other machines in the VPN trust that machine and have fewer protections as a result. This has proven to be catastrophic in situations where one machine is identified by attackers as a weak point and is then used to infiltrate a network.
Using the VPN approach also has other operational issues in the context of federated analytics:
- If a VPN is to be used to connect machines from different organisations, then the organisations need to all connect a machine to a VPN managed by someone else. This can cause governance problems since many organisations require machines to reside within their own network.
- If the same machine is to be used in multiple federated analyses, the machine would have to change which VPN it is connected to, which is difficult to achieve.
Zero-Trust
In the zero-trust model, every request made to a service is checked for authorisation. The fact of a machine being on the same network is considered irrelevant for deciding whether to handle requests. This approach has proven to be much more secure because it assumes that breaches of a network’s security are “inevitable or have already occurred” [2].
Applying this approach for federated analytics requires that the processors of data (Pods in Bitfount terminology) each authenticate and authorise the person or service making requests upon each request. The most popular approach for doing federated authentication is to use standards like OIDC and SAML. In the Bitfount architecture these have been adapted so that the authentication checks are done over a message service instead of requiring direct HTTP calls (you can read more about our choice to use a messaging architecture here).
Zero-Trust + Subnet
Using a Zero-trust architecture doesn’t mean, of course, that you should give up on the benefits that a secure network can bring. The ideal situation is to have your data behind a secure network firewall, within a limited subnet so that it can only access appropriate data. Then, add zero-trust as an additional layer of protection.
This approach can be achieved with Bitfount by restricting the firewall for the subnet where the processor of data (Pod) only allows outgoing connections, and these connections are made only to relevant Bitfount urls. The subnet can also have its access within the internal network restricted to the data that is relevant to potential collaborations. Federated authentication methods like OIDC and SAML then provide the final layer of zero-trust security for 3rd parties to run analyses.
Conclusion
Zero-trust models bring significant security benefits to organisations, so it’s important that federated data science be made to work within this new paradigm. As we’ve explained above, zero-trust and federated data science don’t have to conflict. Federated approaches can make use of the same standards like OIDC and SAML to build a world where collaboration is both secure and simple.