Other techniques involve using synthetic data sets. For example, Runway, a startup that makes generative models for video production, has trained a version of the popular image-making model Stable Diffusion on synthetic data such as AI-generated images of people who vary in ethnicity, gender, profession, and age. The company reports that models trained on this data set generate more images of people with darker skin and more images of women. Request an image of a businessperson, and outputs now include women in headscarves; images of doctors will depict people who are diverse in skin color and gender; and so on.
Critics dismiss these solutions as Band-Aids on broken base models, hiding rather than fixing the problem. But Geoff Schaefer, a colleague of Smith’s at Booz Allen Hamilton who is head of responsible AI at the firm, argues that such algorithmic biases can expose societal biases in a way that’s useful in the long run.
As an example, he notes that even when explicit information about race is removed from a data set, racial bias can still skew data-driven decision-making because race can be inferred from people’s addresses—revealing patterns of segregation and housing discrimination. “We got a bunch of data together in one place, and that correlation became really clear,” he says.
Schaefer thinks something similar could happen with this generation of AI: “These biases across society are going to pop out.” And that will lead to more targeted policymaking, he says.
But many would balk at such optimism. Just because a problem is out in the open doesn’t guarantee it’s going to get fixed. Policymakers are still trying to address social biases that were exposed years ago—in housing, hiring, loans, policing, and more. In the meantime, individuals live with the consequences.
Prediction: Bias will continue to be an inherent feature of most generative AI models. But workarounds and rising awareness could help policymakers address the most obvious examples.
How will AI change the way we apply copyright?
Outraged that tech companies should profit from their work without consent, artists and writers (and coders) have launched class action lawsuits against OpenAI, Microsoft, and others, claiming copyright infringement. Getty is suing Stability AI, the firm behind the image maker Stable Diffusion.
These cases are a big deal. Celebrity claimants such as Sarah Silverman and George R.R. Martin have drawn media attention. And the cases are set to rewrite the rules around what does and does not count as fair use of another’s work, at least in the US.
But don’t hold your breath. It will be years before the courts make their final decisions, says Katie Gardner, a partner specializing in intellectual-property licensing at the law firm Gunderson Dettmer, which represents more than 280 AI companies. By that point, she says, “the technology will be so entrenched in the economy that it’s not going to be undone.”
In the meantime, the tech industry is building on these alleged infringements at breakneck pace. “I don’t expect companies will wait and see,” says Gardner. “There may be some legal risks, but there are so many other risks with not keeping up.”
Some companies have taken steps to limit the possibility of infringement. OpenAI and Meta claim to have introduced ways for creators to remove their work from future data sets. OpenAI now prevents users of DALL-E from requesting images in the style of living artists. But, Gardner says, “these are all actions to bolster their arguments in the litigation.”
Google, Microsoft, and OpenAI now offer to protect users of their models from potential legal action. Microsoft’s indemnification policy for its generative coding assistant GitHub Copilot, which is the subject of a class action lawsuit on behalf of software developers whose code it was trained on, would in principle protect those who use it while the courts shake things out. “We’ll take that burden on so the users of our products don’t have to worry about it,” Microsoft CEO Satya Nadella told MIT Technology Review.