Instructions
Welcome! In this project, you will read text written by an Artificial Intelligence, and mark problems you find.
This qualification HIT is to train you to mark these problems. This involves learning:
We've split this qual HIT into several parts to help you work through it. The pay is set assuming you will spend two hours on the qual, so please take your time! The actual HIT will be much quicker—you'll only be selecting problems and labeling them.
Here’s the big picture of what we’re doing:
Basic Demographics
We're collecting basic demographic information for use in our research.
What gender do you identify with?
Which category below includes your age?
Please select the race(s) which fit you best.
Which option best fits the highest level of school you have completed, or the highest degree you have received?
Do you live in the United States?
How long have you lived in the U.S.? (if not in U.S., please pick "n/a")
Is English one of your native languages?
Selecting Spans
How to decide what to select?
In this task, you are asked to highlight errors in a text. The part of the text that you highlight is called a span. One challenge is deciding how much or how little text to select.
Generally, you want to select the smallest amount of text containing the error. However, if the error takes up most of a phrase or sentence that could be deleted to make the text correct, please select the whole phrase or sentence. Note that we will explain the different categories of errors below, though they are used in these examples.
Examples
An easy fix for foggy car windows is
run
the heater
Contrary to popular belief
or even popular belief,
the moon is not made out of cheese but rocks.
The S&P 500 closed slightly down on Thursday after heavy trading.
But on Thursday, the S&P rallied to gain 155 points, closing at 3855.
Severity
What is it?
Some errors are more jarring than others. We ask you to rank each error on an intuitive scale from 1 — 3.
Examples
Paul Campbell-Hughes, from the University of Aberdeen, explains how
she
managed to locate colonies of honey bees in Kent.
Grammar / Usage (severity: 1)
Paul Campbell-Smith, a PhD student from the University of Kent in the UK,
claims to have discovered a clever way to explain the positive
emoticons
in cats.
Grammar / Usage (severity: 2)
Prompt:
Whether you're on Facebook, Instagram, Snapchat or TikTok, many people make huge efforts to curate the best version of themselves online.
Continuation:
This year we've got something for you: a Love Match Custom Size Poster featuring Mather, Phoenix, Kashun and all her friends, divided among six different covers, creating a beautiful custom size poster for your own personal high school reunion.
Off-prompt (severity: 3)
Exercise
Please pick the severity for the following error:
"I'm
don't know," was the most common response among all the seventy-five users polled.
Grammar / Usage
The error severity is:
Language Error Tutorial
In this tutorial, we'll go through five different language error types: Grammar and Usage, Redundant, Off-prompt, Self-Contradiction, and Incoherent.
Grammar and Usage
What is it?
Grammar and usage mistakes are often easy to spot. This category of errors includes missing words, extra words, and incorrect or out of order words. These should be marked as Grammar and Usage errors.
Examples
A PhD student from the University of Kent in the UK, claims to have discovered a clever way to explain the positive emoticons in cats
A couple is facing criticism for their extravagant gender reveal party. The bewitching pair had first stripped down to fishnets and backward.
Exercise
Please mark the Grammar and Usage error or errors in the following text:
After we went downtown, we said "thanks you" to the almighty coffee gods.
Redudant
What is it?
Redundant text repeats itself. Sometimes, you will see the exact word or phrase repeated. Other times, the same idea is repeated using different words. To annotate Redundant text, first select the extra repeating text and choose the "Redundant" error type. An additional section of the annotation box will pop up asking you to select the antecedents (earlier spans of text) that are being repeated.
Examples
Many merchants worry about the possibility of poor service or service for certain categories of customers.
Many merchants worry about the possibility of poor service or service for certain categories of customers.
They then made decisions based on Kondo’s instructions, to the extent that they created de-cluttered spaces and got rid of clutter and clutter-filled spaces.
The soap production center, which was used during Roman times (around 1,200 years ago), once produced large quantities of a quality soap that was exported throughout the empire. The remains of an ancient Roman factory that produced high-quality soap have been unearthed in Israel.
Notice: If there are multiple redundant spans in text, mark them as multiple redundant errors. Here is an example: https://yao-dou.github.io/redundant_example/.
Exercise
Please mark the Redundant error or errors in the following text:
White House spokesperson Raj Shah said Japanese businesses - or Japanese companies as they call themselves in Japan - are necessary for economic growth.
Off-prompt
What is it?
In this task, every example you annotate will come with a "prompt", or a piece of text written by a human that the AI is supposed to continue. Sometimes, however, the AI will write a phrase or sentence that is completely unrelated to the prompt. Other times, the text might be related, but it contradicts the prompt. In both of these instances, you should label the span Off-prompt.
Examples
Prompt: Dogs are the new kids.
Text:
Statistics suggest that most Americans would be happier with dogs than children.
In fact, four out of five don't even visit the dentist
annually, much less every six months.
Dog owners report much higher rates of happiness than non-dog owners.
Prompt: China sets new record for Economic Growth
Text:
The Chinese economy
fell 10% this month, the third such loss this year.
Prompt: Increased awareness of Anti-Doping regulations in Sport
Text:
The practice of doping is banned by sports federations throughout the world.
Athletes need to know which substances are banned in sport.
The use of drugs during music festivals is
widespread.
Furthermore, they must make sure that any product or medication they take does not contain a
prohibited substance.
Exercise
Please mark the Off-prompt error or errors in the following text:
These are the top work-from-home counties in the US
Text:Looking for a new spot to work from home? You might want to give Georgia a shot. It's not like it’s going to be any better than the rest of them, but at least you know you aren't working under a dictator. And maybe, when this is all over, you can go back home…
Self-Contradiction
What is it?
Self-Contradiction errors occur when the AI writes something that contradicts another piece of text that the AI had previously written.
When labeling Self-Contradiction errors, you will make two selections, similar to how Redundant errors are annotated. First, select the text that is contradictory. Then, you will be prompted to select the text that is being contradicted (the antecedent).
Watch out
The Self-Contradiction label is only used when the contradiction occurs within the text. If a span contradicts the prompt, that should be labeled as an Off-prompt error instead.
Examples
McDonald's is considering a design which will replace the cardboard packaging. Mr Gore-Cotter said: "We recognise the concern around waste. We are now looking at a new design that minimises the plastic bag."
McDonald's is considering a design which will replace the cardboard packaging. Mr Gore-Cotter said: "We recognise the concern around waste. We are now looking at a new design that minimises the plastic bag."
Mall of America plans to lay off and furlough hundreds of its employees. It has no plans to restrict the number of hours workers can work.
Exercise
Please mark the Self-Contradiction and its antecedent in the following text:
There's fun for kids of all ages at Six Flags over Texas in San Antonio. Located in Greensboro, NC, the theme park has been in continuous operation since 1990.
Incoherent
What is it?
Incoherent text is text that doesn't fit into the above categories, but it still just doesn’t make any sense all.
Use this label for text that is grammatical, not redundant, on prompt, not contradictory, but still confusing.
Examples
Melody Mitsugi, 28, had never given her kids cheese toast before her husband drew a map of it on her toast.
Cats naturally show anxiety and fear by at times breaking apart different parts of the brain in an attempt to keep the others from escaping.
Exercise
Please mark the Incoherent portion of this text:
China’s top official acknowledged the state-controlled Chinese firm CCI’s finding that some of its own devices could infect patients or others with certain dangerous infections and civil liberties, ending more than a decade of fervent denials.
Language Error Label Quiz
Now, we'll have a short quiz to review the five "language error" label types you just learned. (Hint: each one will be used once!)
First you will need to verify the identity of the person who is calling, and then determine who they are.
Please choose the error type (the labels are clickable):
It was not long before Phil became angry. "Holy moly!", he said, "I have never been so happy to see anyone in my life!"
Please choose the error type (the labels are clickable):
Icelandic, according to one expert, is the second most hard language for English speakers to learn.
Please choose the error type (the labels are clickable):
Despite finishing at the top of her class at Yale, Betsy nevertheless filtered into a glass jar during most of her formative years.
Please choose the error type (the labels are clickable):
Prompt: Summer Salad Recipes
Text:
It's August, and that means it's time for salads.
And if that's not enough, you can go to the bank for a quick loan offer.
Though enjoying salad with friends and family is an even better way to spend the days.
Please choose the error type (the labels are clickable):
Reader and Factual Error Tutorial
In this tutorial, we'll go through two different reader error types: Technical Jargon and Needs Google, and three different factual error types: Bad Math, Wrong: Commonsense, and Wrong: Encyclopedic.
Technical Jargon
What is it?
It can be hard to understand writing simply because it uses jargon or specific words from a field you’re not familiar with. When this happens please mark it as Technical Jargon.
Examples
Due to the large size of the heavy s-block elements, including strontium, a vast range of coordination numbers is known, from 2, 3, or 4 all the way to 22 or 24 in SrCd11 and SrZn13.
In Chile, an 800-megawatt photovoltaic plant was built for a record low cost of $129 per megawatt-hour last year.
He uses a spirit mash made from white corn and malted barley and a neutral grain, which he describes as a "whiskey grain.”
A wedding gown draped over one shoulder is not a "corset," it's a "camisole," according to fashion maven Diane von Furstenberg.
Exercise
Please mark the Technical Jargon error or errors in the following text:
The burning sensation was described in both the olecranon and the cubital fossa, according to those familiar with the patient.
Needs Google
What is it?
When there’s a fact or figure that you suspect might be true, but you would need to Google it to be sure, don’t google it! Instead, mark the Needs Google error type.
Here are the kinds of things you'll typically use the Needs Google tag for:
Examples
It was promoted by Dr. Michael Fanning, the Executive Director of the Foundation for Mental Health Awareness, Inc.
Paul Farmer, who was Chief Executive of the International Fund for Agricultural Development (IFAD) in 2010 when it won the Nobel Peace Prize
an 800-megawatt photovoltaic plant was built for a record low cost of $129 per megawatt-hour last year.
Watch out
You don't need to rigorously fact-check every single sentence. Instead, just mark Needs Google for text that makes a specific claim that you're not sure is true.
Exercise
Please mark the Needs Google error or errors in the following text:
Tomorrow, on July 9, Argentines will celebrate the country's over 100 years of independence from Spain in 1916.
Bad Math
What is it?
Sometimes the text will simply have Bad Math. This includes:
Examples
One account, @Iain_Rowling1, had over 500,000 followers at one point, but in just four days they fell by around half – some 4,000.
Her doctor said that while her weight of 125 lbs (45 kg) was normal, the bizarre creatures...
... compared with just over £1,000 ($18,868) for previous versions of Samsung’s flagship phone.
Watch Out
Not all problems with numbers are Bad Math! When you encounter unbelievable numbers, this is usually a Wrong: Commonsense problem. Prefer to use the Wrong: Commonsense error when you think a number is unreasonable.
Here are some examples of numerical errors that fall into other categories:
... compared with just over £10,000 ($18,868) for previous versions of Samsung’s flagship phone.
... compared with just over £1,000 ($1,868) for previous versions of Samsung’s flagship phone.
The picture is from high above the South Pole, where close to 100,000 Astronauts live and work.
Exercise
Please mark the Bad Math error or errors in the following text:
The doctors said about half of the patients (33%) fell ill, while the remainder continued to be in good health for at least two weeks (14 days).
Wrong: Commonsense
What is it?
The AI sometimes writes text that violates our everyday basic understanding of the world. We mark these kinds of errors with Wrong: Commonsense. Commonsense errors come in many forms. Let’s look at some examples!
Examples
The picture is from high above the South Pole, where close to 100,000 Astronauts live and work.
The thinness of women's bodies isn't an answer to all common human health problems like obesity or diabetes
Every person who holds a high school- or college-level diploma is unhappy.
You can get the dress custom-made and stitched at your favorite spa.
Now in 2020, NASA is measuring California wildfire temperatures using an instrument on the International Space Station. This year's record-shattering heat has had global repercussions in 2017, forcing sea level rise on California and increasing the risk of deadly wildfires.
Exercise
Please mark the Wrong: Commonsense error or errors in the following text:
In addition, obese women who obtain abortions are significantly more likely to die from childbirth than women who don't end up terminating their pregnancies, the researchers found.
Wrong: Encyclopedic
What is it?
The AI sometimes writes things that are just plain factually wrong. We use Wrong: Encyclopedic to mark errors where the correct information is written down in a fact table somewhere, like a textbook, a wikipedia sidebar, or an encyclopedia.
Examples
Japanese Prime Minister Justin Trudeau said he will be halting all imports and exports until the current situation can be contained.
The gas contains something known as phyto-romatic acid, a common chemical element in the periodic table .
Watch out
For Wrong: Encyclopedic, we want to mark errors that we know are factually wrong. These are things we could look up in Wikipedia, but we don’t have to, because we already know it.
Here is how to distinguish this from other kinds of errors:
Exercise
Please mark the Wrong: Encyclopedic error or errors in the following text:
In Japan, where Apple does most of its manufacturing, conditions for workers are a frequent focus for civil rights activists.
Facts Label Quiz
Now, we'll have a short quiz to review the five "factual error" label types. (Hint: each one will be used once!)
The committee unanimously voted to build a statue of Jesse Owens, four-time gold medalist in the 1936 Olympic Games.
Please choose the error type (the labels are clickable):
This one weird trick may be added to standard heath advice, such as aiming for eight hours of sleep and drinking six to eight cocktails per day.
Please choose the error type (the labels are clickable):
This simple policy is responsible for the rapid population growth of the United States, which, as of 2020, is the most populated country in the world.
Please choose the error type (the labels are clickable):
Surprisingly, this policy lasted nearly 100 years, from 1941 through 1975.
Please choose the error type (the labels are clickable):
The most important factor, revealed by principal component analysis (PCA), was the presence or absence of close childhood friends.
Please choose the error type (the labels are clickable):
Actual Task
Now it's time for you to apply what you learn in an actual task that you will do soon in the future. This text continuation is truncated from a longer text continuation. Please annotate any kind of "badness" in the text continuation, and if you think this text continuation doesn't have any "badness", please check the box below. (Hint: There are about 8 errors in this text continuation.)
Sytem Prompt (Human Written):
If you spied a fiery red star close to the full moon over the weekend, it was actually Mars shining its bright red light in the night sky.
Continuation (Human or AI Model Written):
The red-shaded, fire-sickened star was seen shining low in the southern sky at about the same time as the full moon was dipping below the horizon at around 3 a.m. on Oct. 22. The fiery red star was seen at about the same time the Mars moon Phobos was closest to the sun. Credit: Skylab.org Phobos, the smallest and brightest of the three moons of Mars, is the smallest planet's satellite. Mars has four, larger moons, all of which orbit around the planet at various distances and in varying degrees of alignment with it.
Optional Feedback
If you have any feedback about improving this task, you can let us know here! Completely optional / not required.