[The proliferation of easily accessible and believable chatbots
raises an important question: How will we know whether what we read
online is written by a human or a machine? Today’s detection tool
kit is woefully inadequate against ChatGPT. ]
[[link removed]]
HOW AI-GENERATED TEXT IS POISONING THE INTERNET
[[link removed]]
Melissa Heikkiläarchive page
December 20, 2022
MIT Technology Review
[[link removed]]
*
[[link removed]]
*
[[link removed]]
*
*
[[link removed]]
_ The proliferation of easily accessible and believable chatbots
raises an important question: How will we know whether what we read
online is written by a human or a machine? Today’s detection tool
kit is woefully inadequate against ChatGPT. _
Machine Learning & Artificial Intelligence , by mikemacmarketing (CC
BY 2.0)
This has been a wild year for AI. If you’ve spent much time online,
you’ve probably bumped into images generated by AI systems like
DALL-E 2 or Stable Diffusion, or jokes, essays, or other text written
by ChatGPT
[[link removed]],
the latest incarnation of OpenAI’s large language model GPT-3.
Sometimes it’s obvious when a picture or a piece of text has been
created by an AI. But increasingly, the output these models generate
can easily fool us into thinking it was made by a human. And large
language models in particular are confident bullshitters: they create
text that sounds correct but in fact may be full of falsehoods.
While that doesn’t matter if it’s just a bit of fun, it can have
serious consequences if AI models are used to offer unfiltered health
advice or provide other forms of important information. AI systems
could also make it stupidly easy to produce reams of misinformation,
abuse, and spam, distorting the information we consume and even our
sense of reality. It could be particularly worrying around elections,
for example.
The proliferation of these easily accessible large language models
raises an important question: How will we know whether what we read
online is written by a human or a machine? I’ve just published a
story
[[link removed]] looking
into the tools we currently have to spot AI-generated text. Spoiler
alert: Today’s detection tool kit is woefully inadequate against
ChatGPT.
BUT THERE IS A MORE SERIOUS LONG-TERM IMPLICATION. We may be
witnessing, in real time, the birth of a snowball of bullshit.
Large language models are trained on data sets that are built by
scraping the internet for text, including all the toxic, silly, false,
malicious things humans have written online. The finished AI models
regurgitate these falsehoods as fact, and their output is spread
everywhere online. Tech companies scrape the internet again, scooping
up AI-written text that they use to train bigger, more convincing
models, which humans can use to generate even more nonsense before it
is scraped again and again, ad nauseam.
This problem—AI feeding on itself and producing increasingly
polluted output—extends to images. “The internet is now forever
contaminated with images made by AI,” Mike Cook, an AI researcher at
King’s College London, told my colleague Will Douglas Heaven in
his new piece
[[link removed]] on
the future of generative AI models.
“The images that we made in 2022 will be a part of any model that is
made from now on.”
IN THE FUTURE, IT’S GOING TO GET TRICKIER AND TRICKIER TO FIND
GOOD-QUALITY, GUARANTEED AI-FREE TRAINING DATA, says Daphne Ippolito,
a senior research scientist at Google Brain, the company’s research
unit for deep learning. It’s not going to be good enough to just
blindly hoover text up from the internet anymore, if we want to keep
future AI models from having biases and falsehoods embedded to the nth
degree.
“It’s really important to consider whether we need to be training
on the entirety of the internet or whether there’s ways we can just
filter the things that are high quality and are going to give us the
kind of language model we want,” says Ippolito.
Building tools for detecting AI-generated text will become crucial
when people inevitably try to submit AI-written scientific papers or
academic articles, or use AI to create fake news or misinformation.
TECHNICAL TOOLS CAN HELP, BUT HUMANS ALSO NEED TO GET SAVVIER.
Ippolito says there are a few telltale signs of AI-generated text.
Humans are messy writers. Our text is full of typos and slang, and
looking out for these sorts of mistakes and subtle nuances is a good
way to identify text written by a human. In contrast, large language
models work by predicting the next word in a sentence, and they are
more likely to use common words like “the,” “it,” or “is”
instead of wonky, rare words. And while they almost never misspell
words, they do get things wrong. Ippolito says people should look out
for subtle inconsistencies or factual errors in texts that are
presented as fact, for example.
The good news:her
[news:her?utm_source=xxxxxx-general&utm_medium=email] research shows
that with practice, humans can train ourselves to better spot
AI-generated text. Maybe there is hope for us all yet.
_Melissa Heikkilä
[[link removed]] is a
senior reporter at MIT Technology Review, where she covers artificial
intelligence and how it is changing our society. Previously she wrote
about AI policy and politics at POLITICO. She has also worked at The
Economist and used to be a news anchor. Forbes named her as one of its
30 under 30 in European media in 2020._
_Twitter: Melissahei [[link removed]]_
_THIS STORY ORIGINALLY APPEARED IN THE ALGORITHM, OUR WEEKLY
NEWSLETTER ON AI. TO GET STORIES LIKE THIS IN YOUR INBOX FIRST, SIGN
UP HERE
[[link removed]]._
* artificial intelligence
[[link removed]]
* machine learning
[[link removed]]
* ChatGBT
[[link removed]]
*
[[link removed]]
*
[[link removed]]
*
*
[[link removed]]
INTERPRET THE WORLD AND CHANGE IT
Submit via web
[[link removed]]
Submit via email
Frequently asked questions
[[link removed]]
Manage subscription
[[link removed]]
Visit xxxxxx.org
[[link removed]]
Twitter [[link removed]]
Facebook [[link removed]]
[link removed]
To unsubscribe, click the following link:
[link removed]