Researchers from the University of California, Berkeley have found that OpenAI model ChatGPT has memorized a large number of copyrighted works. This can introduce bias to analytics conducted with OpenAI models.

Transparency and Unseen Biases

The researchers’ primary interest is in transparency and the potential for unseen biases when those relying on OpenAI remain in the dark about what sources were included and excluded from input. They have reported their findings on the arXiv preprint server.

Built-In Bias

Science fiction and fantasy books dominate the list of memorized books, presenting a built-in bias on the nature of responses ChatGPT may provide. The accuracy of such models is strongly dependent on the frequency with which a model has seen information in the training data, calling into question their ability to generalize.

Open Models

The researchers said their findings make the case for the use of open models that disclose training data. Knowing what books a model has been trained on is critical to assess sources of bias.

Legal Challenges

Major legal challenges are likely in the near future. What are the limitations of “fair use” when copying text? Who owns the copyright on text generated in full or in part by ChatGPT? Who prevails when copyright protection is sought for multiple similar or identical outputs by multiple parties? And perhaps a more interesting question: Is machine language copyrightable at all?

The researchers’ findings raise questions of propriety and copyright protections, and their work has shown that OpenAI models know about books in proportion to their popularity on the web. While ChatGPT was found to be quite knowledgeable about works in the public domain, lesser-known works were largely unknown. The researchers suggest that machine language should be transparent and free from bias.

Technology

Articles You May Like

Warner Bros. Discovery announces departure of Chris Licht from CNN
The Ethical Use of Data: Meta’s Approach to Privacy in Training AI
Exploring the Role of Cathode Materials in Microbial Fuel Cells
The Impact of Differential Privacy on Data Analysis during the COVID-19 Pandemic

Leave a Reply

Your email address will not be published. Required fields are marked *