Claude Mandy is the chief evangelist at Symmetry Systems and a former Gartner analyst and former CISO.
Not since 2016 has there been so much excitement focused on data protection. GDPR was due to be adopted in April 2016 and organizations have certainly focused on data protection, considering the massive increase in financial consequences due to failure to protect personal data.
Today, regulators, startups and organizations are refocusing their efforts on data protection and AI. The explosion of AI approaches, including generative AI tools and especially large language models (LLMs) such as ChatGPT, Bard and others, has certainly caught everyone’s attention. These tools have rapidly democratized access to artificial intelligence.
Adoption of generative AI has been so rapid that “asking ChatGPT or Bard” is as much a part of our everyday speech as Googling something. The benefits are huge, including benefits for cybersecurity teams. They enable individuals and organizations to leverage models trained on massive amounts of data, achieve their business goals, and get real-time results that would otherwise require significant time and effort. This technology will revolutionize many fields in all aspects of our lives, from healthcare to finance, helping organizations make informed decisions and improve outcomes for their customers.
But the use of artificial intelligence in its many forms comes with challenges. GDPR and other similar modern privacy laws have expanded the reach. While these tools democratize access to insights from massive data sets in a way that enables individuals and organizations to find knowledge and solutions and generate content (whether code, art or text) in response to human input in near real-time, legal requirements to enforce data protection principles such as purpose limitation, data minimization, special treatment of ‘sensitive data’ and limitation on automated decisions are unchanged. Is it surprising that ChatGPT and other LLMs already have faced the wrath of regulators such as the Italian data protection authority, Garante, responsible for enforcing the GDPR in Italy?
CISOs are concerned about the risks of allowing these tools to be used without proper control over what their users input into these tools. They are also trying to securely meet the business requirements for new AI tools to develop, refine and monitor models across all the organizational data they are responsible for.
The challenge for privacy regulators and CISOs is that it doesn’t seem easy to understand exactly what data LLM is trained on and what sensitive data is retained after training, nor how to realistically exercise any control over what users enter as inputs. ChatGPT itself states that it was “trained on a vast amount of textual data from various sources on the Internet, such as books, articles, websites, and other text-based sources.” In response to inquiries about personal information, it quickly reminds users that it does not have access to personal or confidential information, and answers are generated solely based on data and training algorithms. On the other hand, it will quickly reveal the date of birth of a celebrity, despite the fact that we know that dates of birth are personal information.
Only when an organization has performed thoughtful analysis and training dataset preparation to control the training data used by LLM will it be possible for a human to understand the data on which the model is based, let alone the model. This is a challenge for organizations that simply believe they can point AI at their data – they simply don’t know what data they have and where it came from, and struggle to maintain an auditable data pipeline to allow them to fully trust the data.
It is nearly impossible for organizations with their existing cloud security tools to definitively state that a training dataset does not contain personal data or to warn when sensitive data is accidentally included. Sensitive data may exist or be infused into training datasets and the AI may accidentally reveal it to all users running the AI without verifying that they have authorized access to the underlying sensitive data.
To meet these challenges, we must take a multifaceted approach to data security to keep pace with generative artificial intelligence. First and foremost, we cannot ignore or block the innovations that are knocking on our doors. We need to accept this and find ways to protect the data used in a way that complies with the data protection principles set out in the GDPR. It is up to us to further work out how these principles can be applied to these new technologies in a way that balances this against the beneficial use of these models.
Undoubtedly, this means that organizations need to become more aware of the data they hold and how it is being used. “I don’t know” just isn’t good enough. Second, organizations must be able to manage their data more effectively. To do all this at scale and speed, we need to invest in new technologies and approaches to data security. Your enterprise DLP is looking to share data outside of your organization, rather than controlling and monitoring data where it resides.
Data-centric tools like data security management can be more effective in understanding, tracking access to, and observing changes to data. Forward-thinking organizations should assess the risks arising from exposure to new generation AI tools (ChatGPT, etc.) and mitigate them without disrupting access to AI tools to enable their organizations to gain a competitive edge over other market players.
Ultimately, the future of data security depends on our ability to become more data-centric in our security approaches and thereby adapt and innovate in response to these new challenges. In this way, we can ensure that our personal data remains safe and secure, while unlocking the full potential of data-driven innovation.
Forbes Technology Advice is an invitation-only community for world-class CIOs, CTOs and CTOs. Do I qualify?
Forbes – Innovation