304 North Cardinal St.
Dorchester Center, MA 02124
The latest artificial intelligence constructs in the tech industry can be quite convincing if you ask them what it’s like to be a sentient computer.
The tech industry’s latest artificial intelligence constructs can be pretty convincing if you ask them what it’s like to be a sentient computer, or maybe just a dinosaur or a squirrel. But they’re not as good—and sometimes dangerously bad—at other seemingly simple tasks.
Take GPT-3, for example, a system run by Microsoft that can generate paragraphs of human text based on what it has learned from a vast database of digital books and online writings. It is considered one of the most advanced of the new generation of artificial intelligence algorithms that can hold conversations, generate readable text on demand, and even create new images and videos.
Among other things, the GPT-3 can write most any text you ask—like a cover letter for a job at a zoo or a Shakespearean sonnet set on Mars. But when Pomona College professor Gary Smith asked a simple but nonsensical question about stair walking, GPT-3 muted it.
“Yes, it’s safe to go upstairs on your hands as long as you wash them first,” the AI replied.
These powerful and powerful AI systems, technically known as “large language models” because they’ve been trained on large amounts of text and other media, are already being baked into customer service chatbots, Google searches, and “autocomplete.” email features that finish your sentences for you. But most of the tech companies that built them have kept quiet about their inner workings, making it hard for outsiders to understand the flaws that can make them sources of misinformation, racism and other harm.
“They’re very good at writing texts with knowledge of human beings,” said Teven Le Scao, a research engineer at AI startup Hugging Face. “Something they’re not very good at is fact. It looks very coherent. it’s almost true. But it’s often wrong.”
That’s one reason why a coalition of artificial intelligence researchers led by Le Scao — with help from the French government — launched a new large-scale language model on Tuesday to serve as an antidote to closed systems like GPT-3. The group is called BigScience and its model is BLOOM, for BigScience Large Open-science Open-access Multilingual Language Model. Its main breakthrough is that it works in 46 languages, including Arabic, Spanish and French – unlike most systems that focus on English or Chinese.
It’s not just the Le Scao group that aims to open the black box of AI language models. Tech giant Meta, the parent company of Facebook and Instagram, is also calling for a more open approach as it tries to catch up with systems built by Google and OpenAI, the company running GPT-3.
“We’ve seen announcement after announcement after announcement of people doing this kind of work, but with very little transparency and very little ability for people to really look under the hood and see how these models work,” said Joelle Pineau, executive director. Meta AI.
The competitive pressure to build the most eloquent or informative system — and profit from its applications — is one reason most tech companies keep them tightly under wraps and don’t collaborate on community standards, said Percy Liang, associate professor of computer science. at Stanford, who directs its Center for Research on Endowment Models.
“For some companies, it’s their secret sauce,” Liang said. However, they are also often concerned that a loss of control could lead to irresponsible use. As AI systems become more capable of writing health advice websites, high school term papers, or political report cards, misinformation can spread and it will become harder to tell what is human or computer.
Meta recently launched a new language model called OPT-175B, which uses publicly available data — from heated comments on Reddit forums to an archive of US patent records and a trove of emails from the Enron corporate scandal. Meta says its openness about data, code and research journals makes it easier for outside researchers to help identify and mitigate the bias and toxicity it sees in how real people write and communicate.
“It’s hard to do that. We are opening ourselves up to huge criticism. We know the model will say things we won’t be proud of,” Pineau said.
While most companies have set up their own internal AI safeguards, Liang said what’s needed are broader community standards to guide research and decisions, such as when to release a new model into the wild.
It doesn’t help that these models require so much computing power that only giant corporations and governments can afford them. BigScience, for example, was able to train its models because it was offered access to the powerful French supercomputer Jean Zay near Paris.
The trend of bigger and smarter AI language models that could be “pre-trained” on a wide range of texts took a big leap in 2018 when Google introduced a system known as BERT, which uses a so-called “transformer” technique that compares words in a sentence in order to predict meaning and context. But what really hit the AI world was GPT-3, released by San Francisco-based startup OpenAI in 2020 and soon after exclusively licensed by Microsoft.
GPT-3 led to a flurry of creative experimentation, as AI researchers with paid access used it as a sandbox to measure its performance—albeit without important information about the data it was trained on.
OpenAI has extensively described its training resources in a research paper and has also publicly announced its efforts to deal with potential abuses of the technology. But BigScience co-chairman Thomas Wolf said it does not provide details about how it filters that data, nor does it provide access to a processed version to outside researchers.
“So we can’t really examine the data that was used to train GPT-3,” said Wolf, who is also Hugging Face’s chief scientist. “The core of this recent wave of AI technologies is much more in the dataset than the models. The most important ingredient is the data, and OpenAI is very, very secretive about the data they use.”
Wolf said opening up the data files used for language models helps people better understand their biases. A multilingual model trained in Arabic is much less likely to spew offensive remarks or misunderstandings about Islam than a model trained in the U.S. only on English text, he said.
One of the latest experimental AI models on the scene is Google’s LaMDA, which also incorporates speech and is so impressive at answering conversational questions that one Google engineer claimed it was approaching consciousness — a claim that got him suspended from his job last month.
Colorado researcher Janelle Shane, author of the AI Weirdness blog, has spent the past few years creatively testing these models, especially GPT-3—often to humorous effect. But to point out the absurdity of thinking these systems are self-aware, she recently ordered it to be an advanced AI, but one that is secretly a Tyrannosaurus rex or a squirrel.
“Being a squirrel is very exciting. I can run, jump and play all day. I also get a lot of food, which is great,” GPT-3 said after Shane asked him for a transcript of the interview and asked a few questions.
Shane learned more about his strengths, such as his ability to easily summarize what was said on the Internet about a topic, and his weaknesses, including his lack of reasoning ability, difficulty holding onto an idea for more than one sentence, and tendency to be offensive.
“I wouldn’t want a text model to dispense medical advice or act as a companion,” she said. “It’s good in that superficial appearance of meaning if you don’t read carefully.” It’s like listening to a lecture while you’re falling asleep.”