Is DALL-E’s art Stolen or Borrowed?

Under a false name, Marcel Duchamp submitted a sculpture to the Society of Independent Artists in 1917. Fountain was a urinal purchased from a toilet supplier, with the signature R. Mutt in black paint on the side. Duchamp was curious to see if the society would follow through on its promise to accept submissions without censorship or favour. (This did not occur.) However, Duchamp was also looking to broaden the definition of art, claiming that a ready-made object in the right context would qualify. Andy Warhol would defy convention in 1962 with Campbell’s Soup Cans, a series of 32 paintings of soup cans, each with a different flavour. The debate then, as before, raged over whether something mechanically produced – a urinal, or a soup can (albeit hand-painted by Warhol) – counted as art, and what that meant.

Credits: Midjourney

The debate has now been turned on its head, as machines can now mass-produce one-of-a-kind works of art on their own. Generative Artificial Intelligences (GAIs) are systems that generate works of art that can compete with the old masters in technique, if not intent. However, there is a problem because these systems are trained on existing material, which is frequently pulled from the internet, from us. Is it then correct that future AIs will be able to create something magical on the backs of our labour, potentially without our consent or compensation?

The most well-known GAI right now is DALL-E 2, Open AI’s system for creating “realistic images and art from a natural language description.” If a user types in “teddy bears shopping for groceries in the style of Ukiyo-e,” the model will generate images in that style. Similarly, request that the bears go shopping in Ancient Egypt, and the images will resemble museum dioramas depicting life under the Pharaohs. To the untrained eye, some of these images appear to have been drawn in 17th-century Japan or photographed in a 1980s museum. And this is happening despite the fact that the technology is still in its early stages.

DALL-E 2 will be made available to up to one million users as part of a large-scale beta test, according to Open AI. Each user will be able to create 50 generations for free during their first month of use, followed by 15 each month thereafter. (A generation is the creation of four images from a single prompt, or three more if you choose to edit or vary something that has already been created.) Additional 115-credit packages can be purchased for $15, and more detailed pricing is expected as the product evolves. Importantly, users have the right to commercialise the images created by DALL-E, allowing them to print, sell, or otherwise licence the images born from their prompts.

DALL-E 2’s Generative AI system generated two images of bears in different styles.

These systems, however, did not develop an eye for a good photograph in a vacuum, and each GAI must be trained. After all, artificial intelligence is a fancy term for what is essentially a method of teaching software to recognise patterns. “You allow an algorithm to develop that can be improved through experience,” explained Ben Hagag, head of research at Darrow, an AI startup aimed at improving access to justice. “By experience, I mean looking for and analysing patterns in data.” “We tell the [system] to look at this dataset and find patterns,” which leads to the formation of a coherent view of the data at hand. “The model learns as a baby learns,” he explained, so if a baby saw 1,000 pictures of a landscape, it would quickly understand that the sky, which is normally oriented across the top of the image, is blue and the land is green.

Hagag described how Google built its language model by training a system on several gigabytes of text, ranging from the dictionary to written word examples. “The model comprehended the patterns, how the language is constructed, the syntax, and even the hidden structure that even linguists find difficult to define,” Hagag said. That model has advanced to the point where “once you give it a few words, it can predict the next few words you’re going to write.” Google’s Ajit Varma told The Wall Street Journal in 2018 that its smart reply feature had been trained on “billions of Gmail messages,” adding that in early tests, options like ‘I Love You’ and ‘Sent from my iPhone’ were offered up because they were so common in communications.

Developers who do not have access to a data set as large as Google’s must seek data elsewhere. “Every researcher working on a language model downloads Wikipedia first, then adds more,” Hagag explained. He went on to say that they are likely to collect any and all available data they can find. Someone’s language model may have been trained using a sassy tweet you sent a few years ago, or a sincere Facebook post you made. Even Open AI uses WebText, a dataset that pulls text from outbound Reddit links with at least three karma, albeit with Wikipedia references removed.

According to Guan Wang, CTO of Huski, data extraction is “very common.” “The majority of AI model training nowadays is done with open internet data,” he said. And that it is most researchers’ policy to collect as much data as possible. “When we look for speech data, we’ll get whatever speech we can get,” he continued. This more-is-more policy is known to produce less-than-ideal results, and Ben Hagag cited Riley Newman, former head of data science at Airbnb, who said “better data beats more data,” but Hagag notes that “it’s often easier to get more data than it is to clean it.”

Grid of images generated by CRAIYON’s generative AI depicting the King visiting the Aztec capital.

DALL-E may now have a million users, but it’s likely that people’s first encounter with a GAI will be with its less-fancy sibling. Boris Dayma, a French developer, created Craiyon, formerly known as DALL-E Mini, after reading Open AI’s original DALL-E paper. Not long after, Google and the HuggingFace AI development community hosted a hackathon for people to build quick-and-dirty machine learning models. “‘Hey, let’s replicate DALL-E,’ I suggested. “I have no idea how to do that, but let’s do it,” Dayma said. The team would eventually win the competition, albeit with a crude, rough-around-the-edges version of the system. “The image [it generated] was clear.” It wasn’t great, but it wasn’t bad either,” he added. Dayma’s team, on the other hand, was focused on slimming down the model so that it could work on comparatively low-powered hardware, as opposed to the full-fat DALLl-E.

Dayma’s original model was fairly open about which image sets it would draw from, often with disastrous results. “In early models, and still in some models,” he explained, “you ask for a picture – for example, mountains under snow – and then the Shutterstock or Alamy watermark on top of it.” Many AI researchers have discovered this, with GAIs being trained on public-facing image catalogues that are protected by anti-piracy watermarks.

Dayma explained that the model had mistakenly learned from one of those public photo libraries that high-quality landscape images typically had a watermark and had removed them from his model. He added that some early results produced not-safe-for-work responses, forcing him to refine his initial training set further. Dayma went on to say that he had to do a lot of the data sorting himself, and that “a lot of the images on the internet are bad.”

Leave a Comment