Can personal information be used to develop or train GenAI?

This year, we have seen Australians express increasing concern about the use of their personal information to train generative artificial intelligence (AI) products. The message from the public is clear: organisations need to protect privacy.

Today the OAIC took an important step in articulating how the Australian Privacy Act 1988 applies to the development of generative AI models by releasing new guidance to help organisations protect privacy. The guidance sets the boundaries of what is and is not an appropriate use of personal information and highlights an approach that is respectful of privacy rights. It also clarifies how the Privacy Act applies to several practices involved in developing and fine-tuning generative AI models.

Although the guidance primarily targets generative AI model development, a number of the principles are relevant to developers using personal information to train other forms of AI.

Privacy and GenAI – why and when developers should care

We are publishing this guidance at a time when there is a lot of work around AI being done across government. The Department of Industry, Science and Resources (DISR) recently released the Voluntary AI Safety Standard and concluded consultation on proposed mandatory guardrails for high-risk AI. However, developers still need to comply with their existing privacy obligations. This is acknowledged in DISR’s work, which recognises existing legislation such as the Privacy Act will not be replaced by AI-specific standards and guardrails.

There is also a reason beyond legal compliance to consider privacy. Providing transparency and choice through following privacy best practice will help to build trust and avoid adding to concern among the community over the use of personal information to train AI models. This guidance helps to walk developers through some of the key privacy considerations when training a model.

We are conscious not all generative AI models will be trained using personal information. Models that do not involve personal information in their development or use will not need to comply with this guidance. But if you are a developer, we have 2 messages:

You should carefully consider whether your AI model will involve the collection, storage, use or disclosure of personal information, either by design or through an overly broad collection of data for training. Do this early in the process so you can mitigate any privacy risks.
Personal information is a broad category, and the risk of data re-identification needs to be considered.

Placing our guidance in the global framework

In publishing this guidance, we join a number of data protection authorities who have provided comment on the application of their respective data protection frameworks to either AI generally or generative AI specifically.

We commenced this task with an appreciation that generative AI models are often developed for use across borders, meaning organisations developing these models need to engage with differing frameworks. We have considered guidance provided overseas, including the UK Information Commissioner’s Office’s consultation series on generative AI, the French Commission nationale de l'informatique et des libertés’ ‘AI how to sheets’ and the joint Canadian privacy regulators’ principles for responsible development and use of generative AI.

To reduce the burden on businesses that operate globally, where possible we have sought to align our work with international guidance on privacy obligations in the context of AI. For example, in relation to guidance on the kinds of information to provide individuals to ensure they are adequately informed about an entity’s data practices around generative AI, and approaches to accuracy.

However, ultimately our guidance interprets the Australian Privacy Act as it currently stands, and there are a few features of our law that are important to note:

The Privacy Act does not have a legitimate interests basis for data processing or a business improvement exception.
Sensitive information can only be collected in very limited circumstances, and generally consent is required.

These features of Australian law have implications for when personal information, especially sensitive information, can be used to train generative AI models.

Privacy compliance as technology develops

This guidance is drafted based on the current state of technology and practices in the market. Some technological aspects that are highly relevant in the context of privacy include:

Model unlearning does not yet appear to be a robust solution, elevating privacy risks by making it more difficult to remediate privacy harms.
De-identified information is increasingly able to be re-identified, creating privacy risks for an organisation where information is taken outside its control.

We appreciate that as technology changes, the risks and mitigations available may also evolve. For this reason, each section of the guidance includes a high-level statement of the law in addition to detailed tips and examples. Developers should continue to apply these high-level principles as technology and market practices develop.

Developers wishing to commercialise AI systems in Australia should also consider our Guidance on privacy and the use of commercially available AI products so you are aware of the privacy considerations for potential customers.

Our ultimate goal is to make privacy compliance easier when it comes to AI. Privacy compliance is good for consumers, who feel more confident participating in the digital economy, but also for organisations, which can innovate knowing guardrails are in place that will help to earn the trust of the community.

We use cookies on this site

Can personal information be used to develop or train GenAI?

Privacy and GenAI – why and when developers should care

Placing our guidance in the global framework

Privacy compliance as technology develops