It is no longer effective to solely use a written essay to measure how deeply a student comprehends a subject.
AI is here to stay; new methods should be used to assess student performance.
I remember being told at school, that we weren't allowed to use calculators in exams. The line provided by teachers was that we could never rely on having a calculator when we need it most—obviously there's irony associated with having 'calculators' in our pockets 24/7 now.
We need to accept that the world has changed; I only hope that we get to decide how society responds to that change together .. rather than have it forced upon us.
Was this ever effective? There was a lot of essay copy/pasting when I was in school, and this was when essays had to be hand written (in cursive, of course, using a fountain pen!).
Same with homework. If everyone has to solve the same 10 problems, divide and conquer saves everyone a lot of time.
Of course, you're only screwing yourself because you'll negatively impact your learning, but that's not something you can easily convince kids of.
In person oral exams (once you get over the fear factor) work best, with or without (proctored!) prep time.
Maybe it doesn't scale as well, but education is important enough not to always require maximal efficiency.
> I only hope that we get to decide how society responds to that change together .. rather than have it forced upon us.
That basically never happens and the outcome is the result of some sort of struggle. Usually just a peaceful one in the courts and legislatures and markets, but a struggle nonetheless.
> new methods should be used to assess student performance.
Such as? We need an answer now because students are being assessed now.
Return to the old "viva voce" exam? Still used for PhDs. But that doesn't scale at all. Perhaps we're going to have to accept that and aggressively ration higher education by the limited amount of time available for human-to-human evaluations.
Personally I think all this is unpredictable and destabilizing. If the AI advocates are right, which I don't think they are, they're going to eradicate most of the white collar jobs and academic specialties for which those people are being trained and evaluated.
> Such as? We need an answer now because students are being assessed now.
When I was in university (Humanities degree), we had to do lots of mandatory essays throughout the year but they counted little towards your overall mark, maybe 10% iirc.
The majority of marks came from mid-year & end-of-year exams.
A simple change to negate AI is to not award any points for work outside exams — make it an optional chance to get feedback from lecturers. If students want to turn in work by AI, it's up to them
> Such as? We need an answer now because students are being assessed now. Return to the old "viva voce" exam? Still used for PhDs. But that doesn't scale at all.
For a solution "now" to the cheating problem, regular exam conditions (on-site or remote proctoring) should still work more or less the same as they always have. I'd claim that the methods affected by LLMs are those that could already be circumvented by those with money or a smart relative to do the work for them.
Longer-term, I think higher-level courses/exams may benefit from focusing on what humans can do when permitted to use AI tools.
> Return to the old "viva voce" exam? Still used for PhDs. But that doesn't scale at all.
What? Sure it does. Every extra full-time student at Central Methodist University (from the article) means an extra $27,480 per year in tuition.
It's absolutely, entirely scalable to provide a student with ten 15-minute conversations with professors when that student is paying twenty-seven thousand dollars.
> Such as? We need an answer now because students are being assessed now.
My current best guess, is to hand the student stuff that was written by an LLM, and challenge them to find and correct its mistakes.
That's going to be what they do in their careers, unless the LLMs get so good they don't need to, in which case https://xkcd.com/810/ applies.
> Personally I think all this is unpredictable and destabilizing. If the AI advocates are right, which I don't think they are, they're going to eradicate most of the white collar jobs and academic specialties for which those people are being trained and evaluated.
Yup.
I hope the e/acc types are wrong, we're not ready.
> My current best guess, is to hand the student stuff that was written by an LLM, and challenge them to find and correct its mistakes.
Finding errors in a text is a useful exercise, but clearly a huge step down in terms of cognitive challenge from producing a high quality text from scratch. This isn't so much an alternative as it is just giving up on giving students intellectually challenging work.
> That's going to be what they do in their careers
I think this objection is not relevant. Calculators made pen-and-paper arithmetic on large numbers obsolete, but it turns out that the skills you build as a child doing pen-and-paper arithmetic are useful once you move on to more complex mathematics (that is, you learn the skill of executing a procedure on abstract symbols). Pen-and-paper arithmetic may be obsolete as a tool, but learning it is still useful. It's not easy to identify which "useless" skills are still useful as to learn as cognitive training, but I feel pretty confident that writing is one of them.
<< The grade was ultimately changed, but not before she received a strict warning: If her work was flagged again, the teacher would treat it the same way they would with plagiarism.
<< But that doesn't scale at all.
I realize that the level of effort for oral exam is greater for both parties involved. However, the fact it does not scale is largely irrelevant in my view. Either it evaluates something well or it does not.
And, since use of AI makes written exams almost impossible, this genuinely seems to be the only real test left.
> And, since use of AI makes written exams almost impossible
Isn't it easy to prevent students from using an AI if they are doing the exams in a big room? I mean when I was a student, most of my exams were written with just access to notes but no computers. Not that much resources needed to control that...
Good point. I agree, but it goes back to some level of unwillingness to do this the 'old way'.
That is not say there won't be cheaters ( they always are ), but that is what proctor is for. And no, I absolutely hated the online proctor version. I swore I will never touch that thing again. And this may be the answer, people need to exercise their free will a little more forcefully.
An essay written under examination conditions is fine. We don’t need new assessment techniques. We have known how to asses that a student and that student alone for centuries.
My ability to write an essay under exam conditions is...poor. Thankfully there were less than a handful of essays I had to write as part of my undergraduate CS degree and I only remember one under exam conditions.
I think it's probably more concerning that spitting out the most generic mathematically formulaic bullshit on a subject is likely to get a decent mark. In that case what are we actually testing for?
Yeah we always did that in high school for essays that were actually graded, otherwise there's always the option of having someone else write it for you, human or now machine. The only thing that's changed is the convenience of it.
The problem is more with teachers lazily slapping an essay on a topic as a goto homework to eat even more of the already limited students' time with busywork.
This is mostly true, but it is also important to recognize that “hey just invent a new evaluation methodology” is a rough thing to ask people to do immediately. People are trying to figure it out in a way that works.
Sadly, this is not what is happening. Based on the article ( and personal experience ), it is clear that we tend to happily accept computer output as a pronouncement from the oracle itself.
It is new tech, but people do not treat it as such. They are not figuring it out. Its results are already being imposed. It is sheer luck that the individual in question choose to fight back. And even then it was only a partial victory:
"The grade was ultimately changed, but not before she received a strict warning: If her work was flagged again, the teacher would treat it the same way they would with plagiarism."
The best method for assessing performance when learning is as old as the world: assess the effort, not how well the result complies with some requirements.
If the level of effort made is high, but the outcome does not comply in some way, praise is due. If the outcome complies, but the level of effort is low, there is no reason for praise (what are you praising? mere compliance?) and you must have set a wrong bar.
Not doing this fosters people with mental issues such as rejection anxiety, perfectionism, narcissism, defeatism, etc. If you got good grades at school with little actual effort and the constant praise for that formed your identity, you may be in for a bad time in adulthood.
Teacher’s job is to determine the appropriate bar, estimate the level of effort, and to help shape the effort applied in a way that it improves the skill in question and the more general meta skill of learning.
The issue of judging by the outcome is prevalent in some (or all) school systems, so we can say LLMs are mostly orthogonal to that.
However, even if that issue was addressed, in a number of skills the mere availability of ML-based generative tools makes it impossible to estimate the level of actual effort and to set the appropriate bar, and I do not see how it can be worked around. It’s yet another negative consequence of making the sacred process of producing an amalgamation of other people’s work—something we all do all the time; passing it through the lens of our consciousness is perhaps one of the core activities that make us human—to become available as a service.
Little Johnny who tried really hard but still can barely write a for loop doesn't deserve a place in a comp sci course ahead of little Timmy who for some reason thinks in computer code. Timmy might be a lazy arse but he's good at what he does and for minimal effort the outcomes are amazing. Johnny unfortunately just doesn't get it. He's wanted to be a programmer ever since he saw the movie Hackers but his brain just doesn't work that way. How to evaluate this situation? Ability or effort?
> The best method for assessing performance when learning is as old as the world: assess the effort, not how well the result complies with some requirements.
I am really quite confused about what you think the point of education is.
In general, the world (either the physical world or the employment world) does not care about effort, it cares about results. Someone laboriously filling their kettle with a teaspoon might be putting in a ton of effort, but I'd much rather someone else make the tea who can use a tap.
Why do we care about grades? Because universities and employers use them to quickly assess how useful someone is likely to be. Few people love biochemistry enough that they'd spend huge sums of money and time at university if it didn't help get them a job.
> I remember being told at school, that we weren't allowed to use calculators in exams
I remember being told the same thing, but I happen to believe that it was a fantastic policy, with a lackluster explanation. The idea that you wouldn't have a calculator was obviously silly, even at the time, but underlying observation that relying on the calculator would rob you of the mental exercise the whole ordeal was supposed to be was accurate. The problem is that you can't explain to a room full of 12 year olds that math is actually beautiful and that the systems principles it imparts fundamentally shape how you view the world.
The same goes for essays. I hated writing essays, and I told myself all sort of weird copes about how I would never need to write an essay. The truth, that I've observed much later, is that structured thinking is exactly what the essay forced me to do. The essay was not a tool to asses my ability in a subject. It was a tool for me to learn. Writing the essay was part of the learning.
I think that's what a lot of this "kids don't need to calculate in their heads" misses. Being able to do the calculation was only ever part of the idea. Learning that you could learn how to do the calculation was at least as important.
Very well put. I would actually suggest to not use calculators in high school anymore. They add very little value and if it is still the same as when I was in high school, it was a lot of remembering weird key combinations on a TI calculator. Simply make the arithmetic simple enough that a calculator isn't needed.
The part that annoys me is that students apparently have no right to be told why the AI flagged their work. For any process where an computer is allowed to judge people, where should be a rule in place that demands that the algorithm be able explains EXACTLY why it flagged this person.
Now this would effectively kill off the current AI powered solution, because they have no way of explaining, or even understanding, why a paper may be plagiarized or not, but I'm okay with that.
It's a similar problem to people being banned from Google (insert big company name) because of an automated fraud detection system that doesn't give any reason behind the ban.
I also thing that there should be laws requiring a clear explanation whenever that happens.
> For any process where an computer is allowed to judge people, where should be a rule in place that demands that the algorithm be able explains EXACTLY why it flagged this person.
For a human who deals with student work or reads job applications spotting AI generated work quickly becomes trivially easy. Text seems to use the same general framework (although words are swapped around) also we see what I call 'word of the week' where whichever 'AI' engine seems to get hung up on a particular English word which is often an unusual one and uses it at every opportunity. It isn't long before you realise that the adage that this is just autocomplete on steroids is true.
However programming a computer to do this isn't easy. In a previous job I had dealing with plagiarism detectors and soon realised how garbage they were (and also how easily fooled they are - but that is another story). The staff soon realised what garbage these tools are so if a student accused of plagiarism decided to argue back then the accusation would be quietly dropped.
I did engineering at a university, one of the courses that was mandatory was technical communication. The prof understood that the type of person that went into engineering was not necessarily going to appreciate the subtleties of great literature, so they're course work was extremely rote. It was like "Write about a technical subject, doesn't matter what, 1500 words, here's the exact score card". And the score card was like "Uses a sentence to introduce the topic of the paragraph". The result was that you write extremely formulaic prose. Now, I'm not sure that was going to teach people to ever be great communicators, but I think it worked extremely well to bring someone who communicated very badly up to some basic minimum standard. It could be extremely effective applied to the (few) other courseworks that required prose too - partly because by being so formulaic you appealed the overworked PhD student who was likely marking it.
It seems likely that a suitably disciplined student could look a lot like ChatGPT and the cost of a false accusation is extremely high.
> also we see what I call 'word of the week' where whichever 'AI' engine seems to get hung up on a particular English word which is often an unusual one and uses it at every opportunity
So do humans. Many people have pet phrases or words that they use unusually often compared to others.
In the mid 90s (yes I’m dating myself here. :P) I had a classmate who was such a big NIN fan that she worked the phrase “downward spiral” into every single essay she wrote for the entire year.
> The staff soon realised what garbage these tools are so if a student accused of plagiarism decided to argue back then the accusation would be quietly dropped.
I ask myself when the time comes that some student will accuse the stuff of libel or slander becuase of false AI plagiarism accusations.
> For a human who deals with student work or reads job applications spotting AI generated work quickly becomes trivially easy. Text seems to use the same general framework (although words are swapped around) also we see what I call 'word of the week'
Easy to catch people that aren't trying in the slightest not to get caught, right? I could instead feed a corpus of my own writing to ChatGPT and ask it to write in my style.
I don't believe it's possible at all if any effort is made beyond prompting chat-like interfaces to "generate X". Given a hand crafted corpus of text even current llms could produce perfect style transfer for a generated continuation. If someone believes it's trivially easy to detect, then they absolutely have no idea what they are dealing with.
I assume most people would make least amount of effort and simply prompt chat interface to produce some text, such text is rather detectable. I would like to see some experiments even for this type of detection though.
Are you then plagiarising if the LLM is just regurgitating stuff you’d personally written?
The point of these detectors is to spot stuff the students didn’t research and write themselves. But if the corpus is your own written material then you’ve already done the work yourself.
Oh I agree, producing text by llms which is expected to be produced by human is at least deceiving and probably plagiarising. It's also skipping some important work, if we're talking about some person trying to detect it at all, usually in education context.
Student don't have to perform research or study for the given task, they need to acquire an example of text suitable for reproducing their style, text structure, to create an impression of being produced by hand, so the original task could be avoided. You have to have at least one corpus of your own work for this to work, or an adequate substitute. And you still could reject works by their content, but we are specifically talking about llm smell.
I was talking about the task of detecting llm generated text which is incredibly hard if any effort is made, while some people have an impression that it's trivially easy. It leads to unfair outcomes while giving false confidence to e.g. teachers that llms are adequately accounted for.
LLM is just regurgitating stuff as a principle. You can request someone else's style. People who are easy to detect simply don't do that. But they will learn quickly
My other half is a non-native English speaker. She's fluent but and since ChatGPT came out she's found it very helpful having somewhere to paste a paragraph and get a better version back rather than asking me to rewrite things. That said, she'll often message me with some text and I've got a 100% hit rate for guessing if she's put it through AI first. Once you're used to how they structure sentences it's very easy to spot. I guess the hardest part is being able to prove it if you're in a position of authority like a teacher.
One course I took actually provided students with the output of the plagiarism detector. It was great at correctly identifying where I had directly quoted (and attributed) a source.
It would also identify random 5-6 word phrases and attribute them to different random texts on completely different topics where those same 5 words happened to appear.
> For a human who deals with student work or reads job applications spotting AI generated work quickly becomes trivially easy
So far. Unless there is a new generation of teachers who are no longer able to learn on non-AI generated texts because all they get is grammatically corrected by AI for example...
Even I am using Grammarly here (as being non-native), but I usually tend to ignore it, because it removes all my "spoken" style, or at least what I think is a "spoken style"
the students are too lazy and dumb to do their own thinking and resort to ai. the teachers are also too lazy and dumb to assess the students' work and resort to ai. ain't it funny?
It's a race to the bottom, though. Why should the humans waste their time reading through AI-generated slop that took 11ms to generate, when it can take an hour or more to manually review it?
To be fair, using humans to spend time sifting through AI slop determining what is and isn't AI generated is not a fight that the humans are going to win.
I suppose we all get from school what we put into it.
I forgot the name of the guy, who said it, but he was some big philosophy lecturer at Harvard and his view on the matter ( heavy reading course and one student left a course review - "not reading assigned reading did not hurt me at all") was ( paraphrased):
"This guy is an idiot if he thinks the point of paying $60k a semester of parents money is to sit here and learn nothing.'
Not saying where, but well before transformers were invented, I saw an iOS project that had huge chunks of uncompiled Symbian code in the project "for reference", an entire pantheon of God classes, entire files duplicated rather than changing access modifiers, 1000 lines inside an always true if block, and 20% of the 120,000 lines were:
//
And no, those were not generally followed by a real comment.
The education system has not even really adapted to the constant availability of the Internet, and now it has to face LLMs.
If I could short higher education, I would. Literally all its foundational principles are bordering on obviously useless in the modern world, and they keep doubling down on the same fundamentals (a strict set of classes and curriculum, almost complete separation of education with working experience, etc), only adapting their implementation somewhat.
My kids’ school added a new weapons scanner as kids walk in the door. It’s powered by “AI.” They trust the AI quite a bit.
However, the AI identifies the school issued Lenovo laptops as weapons. So every kid was flagged. Rather than stopping using such a stupid tool, they just have the kids remove their laptops before going through the scanner.
I expect not smart enough people are buying “AI” products and trusting them to do the things they want them to do, but don’t work.
Rather than trying to diminish something that's completely preventable and abhorrent maybe we could discuss ways to actually prevent it. Because this isn't a problem anywhere else so clearly it's preventable.
If AI can be part of a solution here this is a reasonable place to discuss it.
Nobody said we shouldn't try to solve the problem. But the first step is accurately describing the problem to be solved. Something that occurs once a year across the entire country has very different solutions than something which occurs once a week in every county.
Any at all? I don't think Americans realize how much of a US-only problem it is, and how some of the non-US mass shootings are explicitly inspired by US media and discourse.
Or it is accepted that said purchase will cover their ass, or even better, that refusing said purchase can be held against them in the future if things happen, even if said purchase would have made 0 difference.
I wonder if it's batteries, they look quite close to explosives on a variety of scanning tools. In fact, both chemically store and release energy but on extremely different timescales.
The point was that if the laptop is taken out and doesn’t go through the scanner, but the rest of the student has to go through the scanner, then the laptop is a great hiding place. Presumably that scanner can at least beep at a pocket knife.
It's stupid to bring yourself into a position where scanning kids for weapons is necessary. In this case we're already past that, so the stupidity is that the device isn't updated to not identify laptops as weapons. If that's not possible, then device is a mislabeled laptop detector.
Not the OP, but obviously it wasn't a metal detector, otherwise it would've detected all brands of laptops as weapons. It's probably an image based detector.
The problem is, if it has been that badly tested that it detects Lenovo laptops as weapons, there is a good chance that it doesn't properly detect actual weapons either.
A high school I worked at had a similar system in place called Evolv. It’s not a metal detector, but it did successfully catch a student with a loaded gun in his backpack. Granted, he didn’t mean to bring the gun to school. I think it’s stupid to believe that kids who want to bring a gun to school will arrive on time to school. They often arrive late when security procedures like bag scanning are not in place.
Guns are legal in almost every country - I think your problem is with countries that have almost no restriction on gun ownership. e.g. Here in the UK you can legally own a properly licensed rifle or shotgun and even a handgun in some places outside of Great Britain (e.g. Northern Ireland).
Just because something is technically legal, doesn't mean it's in any way common or part of UK culture to own a gun.
There hasn't been a school shooting in the UK for nearly 30 years. Handguns were banned after the last school shooting and there hasn't been one since.
There's already a UK ban on carrying knives in public unless you have an occupational need and they're wrapped up or at least not just sitting in your pocket.
Licensing wouldn't be worthwhile as almost every household would want knives for food preparation.
Exactly. It's not the legality of weapons, but the easy availability of them that causes the issues.
It seems to me like victim blaming for U.S. schools to have active shooter drills - it makes more sense to have much better training and screening of gun owners than trying to train the victims. However, given that the NRA is excessively powerful in U.S. politics, I can see why they are necessary, but it just seems easier to me to stop kids from being able to get hold of guns (e.g. have some rudimentary screening for gun purchases and require owners to keep them in locked cabinets when they are not in use).
Why do people say such unsubstantiated nonsense. Places with guns have more death. And it's obvious to see why guns are a tool for for killing, and they're pretty effective.
FWIW, I'm a consultant for a large University hospital, and Dutch. My PhD thesis, years ago, got the remark: "Should have checked with a native speaker."
So, now I use ChatGPT to check my English. I just write what I want to write than ask it to make my text more "More concise, business-like and not so American" (yeah the thing is by default as ultra enthusiastic as an American waiter). And 9 out of 10 times it says what I want to say but better than I wrote myself, and in much less words and better English.
I don't think it took less time to write my report, but it is much much better than I could have made alone.
AI detector may go off (or it goes on? of is it of? Idk, perhaps I should ask Chat ;)), but it is about as useful as a spell-check detector.
It's a Large Language Model, you should just is like that, it is not a Large Fact Model. But if you're a teacher you should be a good bullshit detector, right?
If I'm every checking some student's report, you may get this feedback: For god's sake, check the language with ChatGPT, but for God's sake check the fact in some other way.
My daughter was accused of turning in an essay written by AI because the school software at her online school said so. Her mom watched her write the essay. I thought it was common knowledge that it was impossible to tell whether text was generated by AI. Evidently, the software vendors are either ignorant or are lying, and school administrators are believing them.
I expect there will be some legal disputes over this kind of thing pretty soon. As another comment pointed out: run the AI-detection software on essays from before ChatGPT was a thing to see how accurate these are. There's also the problem of autists having their essays flagged disproportionately, so you're potentially looking at some sort of civil rights violation.
Imagine how little common knowledge there will be one or two generations down the road after people decide they no longer need general thinking skills, just as they've already decided calculators free them from having to care about arithmetic skills.
At least the first 2 are far more accurate than humans ever could be. The third, i.e. trusting others to vet and find the correct information, is the problem.
GPS is great at knowing where you are, but directions are much much harder, and the extra difficulty is why the first version of Apple Maps was widely ridiculed.
Even now, I find it's a mistake to just assume Google Maps can direct me around Berlin public transport better than my own local knowledge — sometimes it can, sometimes it can't.
(But yes, a single original Pi Zero beats all humans combined at arithmetic even if all of us were at the level of the world record holder).
Why? We've done it for ages, most trust in Wikipedia, and before most trusted in encyclopedias. Books written by others have been used forever. We just shift where we place the trust over time.
I just googled ‘do I need a license to drive a power boat in UK’
I got AI answer saying ‘no’, but actually you do.
If I use a calculator it will be correct. If I open encyclopaedia it will mostly be correct, because someone with a brain did at least 5 minutes of thining.
We are not talking about some minor detail, AI makes colossal errors with great confidence and conviction.
And yet, this fear is timeless; back when book printing was big, people were fearmongering that people would no longer memorize things but rely too much on books. But in hindsight it ended up becoming a force multiplier.
I mean I'm skeptical about AI as well and don't like it, but I can see it becoming a force multiplier itself.
> people were fearmongering that people would no longer memorize things but rely too much on books...
Posters here love to bring out this argument, but I think a major weakness is that those people wound up being right. People don't memorize things any more! I don't think it's fair to hold out as an example of fears which didn't come to pass, as they very much did come to pass.
It's more insidious than that. AI will be used as a liability shield/scapegoat, so will become more prevalent in the workplace. So in order to not be homeless, more people will be forced to turn their brains off.
AI does have things it does consistently wrong. Especially if you don't narrow down what it's allowed to grab from.
The easiest for someone here to see is probably code generation. You can point at parts of it and go "this part is from a high-school level tutorial", "this looks like it was grabbed from college assignments", and "this is following 'clean code' rules in silly places"(like assuming a vector might need to be Nd, instead of just 3D).
The education system in the US is broadly staffed by the dumbest people from every walk of life.
If they could make it elsewhere, they would.
I don’t expect this to be a popular take here, and most replies will be NAXALT fallacies, but in aggregate it’s the truth. Sorry, your retired CEO physics teacher who you loved was not a representative sample.
It's not just USA, it's pretty much universal, as much as I've seen it. People like to pretend like it's some sort of noble profession, but I vividly remember having a conversation with recently graduated ex-classmates, where one of them was complaining that she failed to pass at every department she applied to, so she has no other choice than to apply for department of education (I guess? I don't know what is the name of the American equivalent of that thing: bachelor-level program for people who are going to be teachers). At that moment I felt suddenly validated in all my complaints about the system we just passed through.
I went to public schools in middle class neighborhoods in California from the late sixties to the early eighties. My teachers were largely excellent. I think that was due to cultural and economic factors - teaching was considered a profession for idealistic folks to go into at the time and the spread between rich and poor was less dramatic in the 50s and 60s (when my teachers were deciding their professions). So the culture made it attractive and economics made it possible. Another critical thing we seem to have lost.
For hundreds of years, women could have amazing opportunities by pursuing a religious vocation, get fantastic education in their religious order, and then enjoy a fulfilling life-long ministry in health care, education, social services, etc. All her material and spiritual needs would be provided by her community. For life. Not merely until retirement. Until she died.
Furthermore, young lay women could start out as teachers, which is a fantastic way to learn how to care for young children, and when such a seasoned teacher would eventually marry and begin her childbearing years, she was quite well-prepared to care for children of her own.
Nowadays, fewer and fewer women know how to be homemakers, mothers, or wives, and so they just want to go straight into STEM and/or "girlboss" type stuff. Any women who actually wish to care for children, or educate them, is perceived as weak and reactionary.
What I take from this is that you dont like reading about history much, with clear exception of overly optimistic religious texts. The religious vocation frequently got you into pretty abusive situation and the #1 expectation was "obeisance". That was what you was supposed to do, primary. Not exactly what person you are responding to is writing about.
Moreover, women never needed to start out as teachers to "be ready for childcare". The childcare expectations were much lower at the time, but amount of chores at home massively higher.
Sounds like a self-fulfilling prophecy. We educate everyone to be the smartest person in the class, and then we don't have jobs for them. And then we complain that education is not good enough. Shouldn't we conclude that education is already a bit too good?
Let's test your skills as a plagiarism detector. Below are two paragraphs. One of them was written by an LLM, one by a human. I have only altered whitespace in order to make them scan the same. Can you tell which is which? How much would you bet that you are correct?
A.
The Pathfinder and The Deerslayer stand at the head of Cooper's novels
as artistic creations. There are others of his works which contain
parts as perfect as are to be found in these, and scenes even more
thrilling. Not one can be compared with either of them as a finished
whole. The defects in both of these tales are comparatively slight.
They were pure works of art.
B.
The Pathfinder and The Deerslayer stand at the head of Cooper's novels
as artistic creations. There are others of his works which contain
parts as perfect as are to be found in these, and scenes even more
thrilling. Not one can be compared with either of them as a finished
whole. The defects in both of these tales are comparatively slight.
They were pure works of art.
That's kinda nuts how adult people learned to trust some random algorithms in a year or two. They don't know how it works, they cannot explain it, they don't care, it just works. It's magic. If it says you cheated, you cheated. You cannot do anything about it.
I want to emphasize, that this isn't really about trusting magic, it's about people nonchalantly doing ridiculous stuff nowdays and that they aren't held accountable for that, apparently. For example, there were times back at school when I was "accused" of cheating, because it was the only time when I liked the homework at some class and took it seriously, and it was kinda insulting to hear that there's absolutely no way I did it, but I still got my mark, because it doesn't matter what she thinks if she cannot prove it, so please just sign it and fuck off, it's the last time I'm doing my homework at your class anyway.
On the contrary, if this article to be believed, these teachers don't have to prove anything, the fact that a coin flipped heads is considered enough of a proof. And everyone supposedly treats it as if it's ok. "Well, they have this system at school, what can we do!" It's crazy.
> They don't know how it works, they cannot explain it, they don't care, it just works. It's magic. If it says you cheated, you cheated. You cannot do anything about it.
People trust a system because other people trust a system.
It does not matter if the system is the inquisition looking for witches, machine or Gulag from USSR.
The system said you are guilty. The system can’t be wrong.
That's how you can mold society as you like at your level: this student's older sibling was a menace? Let's fuck them over, being shitty must run in the family. You don't like the race / gender / sexuality of a student? Now "chatGPT" can give you an easy way to make their school life harder.
This study is not very good frankly. Before ChatGPT there was Davinci and other model families which ChatGPT (what became GPT 3.5) was ultimately based on and they are the predecessors of today's most capable models. They should test it on work that is at least 10 to 15 years old to avoid this problem.
Turns out we spent way too long thinking about how machines could beat the Turing test, and not long enough thinking about how we could built better Turing tests.
My perspective after talking to a few colleagues in the CS education sector, and based on my own pre-GPT experience:
Classifiers sometimes produce false positives and false negatives. This is not news to anyone who has taken a ML module. We already required students back then to be able to interpret the results they were getting to some extent, as part of the class assignment.
Even before AI detectors, when Turnitin "classic" was the main tool along with JPlag and the like, if you were doing your job properly you would double-check any claims the tool produced before writing someone up for misconduct. AI detectors are no different.
That said, you already catch more students than you would think jut by going for the fruit hanging so low it's practically touching the ground already:
- Writing or code that's identical for a large section (half a page at least) with material that already exists on the internet. This includes the classic copy-paste from wikipedia, sometimes with the square brackets for references still included.
- You still have to check that the student hasn't just made their _own_ git repo public by accident, but that's a rare edge case. But it shows that you always need a human brain in the loop before pushing results from automated tools to the misconduct panel.
- Hundreds of lines of code that are structurally identical (up to tabs/spaces, variable naming, sometimes comments) with code that can already be found on the internet ("I have seen this code before" from the grader flags this up as least as often as the tools).
- Writing that includes "I am an AI and cannot make this judgement" or similar.
- Lots of hallucinated references.
That's more than enough to make the administration groan under the number of misconduct panels we convene every year.
The future in this corner of the world seems to be a mix of
- invigilated exams with no electronic devices present
- complementing full-term coding assignments with the occasional invigilated test in the school's coding lab
- students required to do their work in a repo owned by the school's github org, and assessing the commit history (is everything in one big commit the night before the deadline?). This lets you grade for good working practices/time management, sensible use of branching etc. in team projects, as well as catching the more obvious cases of contract cheating.
- viva voce exams on the larger assignments, which apart from catching people who have no idea of their own code or the language it was written in, allows you to grade their understanding ("Why did you use a linked list here?" type of questions) especially for the top students.
Seems like the easy fix here is move all evaluation in-class. Are schools really that reliant on internet/computer based assignments? Actually, this could be a great opportunity to dial back unnecessary and wasteful edu-tech creep.
Moving everything in class seems like a good idea in theory. But in practice, kids need more time than 50 minutes of class time (assuming no lecture) to work on problems. Sometimes you will get stuck on 1 homework question for hours. If a student is actively working on something, yanking them away from their curiosity seems like the wrong thing to do.
On the other hand, kids do blindly use the hell out of ChatGPT. It's a hard call: teach to the cheaters or teach to the good kids?
I've landed on making take-home assignments worth little and making exams worth most of their grade. I'm considering making homework worth nothing and having their grade be only 2 in-class exams. Hopefully that removes the incentive to cheat. If you don't do homework, then you don't get practice, and you fail the two exams.
(Even with homework worth little, I still get copy-pasted ChatGPT answers on homework by some students... the ones that did poorly on the exams...)
> If you don't do homework, then you don't get practice, and you fail the two exams.
I'd be cautious about that, because it means the kids with undiagnosed ADHD who are functionally incapable of studying without enforced assignments will just completely crash and burn without absorbing any of the material at all.
Or, at least, that's what happened to me in the one and only pre-college class I ever had where "all work is self-study and only the tests count" was the rule.
There are more students than ever, and lots of schools now offer remote programs, or just remote options in general for students, to accommodate for the increased demand.
There's little political will to revert to the old ways, as it would drive up the costs. You need more space and you need more workers.
That overall would be the right thing. Homework is such a weird concept when you think about it. Especially if you get graded on the correctness. There is no step between the teacher explaining and you validating whether you understood the material.
Teacher explains material, you get homework about the material and are graded on it.
It shouldn't be like that. If the work (i.e. the exercises) are important to grasp the material, they should be done in class.
> If the work (i.e. the exercises) are important to grasp the material, they should be done in class.
I'd like to offer what I've come to realize about the concept of homework. There are two main benefits to it: [1] it could help drill in what you learned during the lecture and [2] it could be the "boring" prep work that would allow teachers to deliver maximum value in the classroom experience.
Learning simply can't be confined in the classroom. GP suggestion would be, in my view, detrimental for students.
[1] can be done in class but I don't think it should be. A lot of students already lack the motivation to learn the material by themselves and hence need the space to make mistakes and wrap their heads around the concept. A good instructor can explain any topic (calculus, loops and recursion, human anatomy) well and make the demonstration look effortless. It doesn't mean, however, that the students have fully mastered the concept after watching someone do it really well. You only start to learn it once you've fluffed through all the pitfalls at least mostly on your own.
[2] can't be done in class, obviously. You want your piano teacher to teach you rhythm and musical phrasing, hence you better come to class already having mastered notation and the keyboard and with the requisite digital dexterity to perform. You want your coach to focus on the technical aspects of your game, focus on drilling you tactics; you don't want him having to pace you through conditioning exercises---that would be a waste of his expertise. We can better discuss Hamlet if we've all read the material and have a basic idea of the plot and the characters' motivations.
That said, it might make sense to simply not grade homeworks. After all, it's the space for students to fail. Unfortunately, if it weren't graded, a lot of students will just skip it.
Ultimately, it's a question of behavior, motivation, and incentives. I agree that the current system, even pre-AI, could only barely live up to ideals [1] and [2] but I don't have any better system in mind either, unfortunately.
> you don't want him having to pace you through conditioning exercises---that would be a waste of his expertise
I fundamentally disagree - I vividly remember, many times during homework in maths for example, I realised that I am stuck and so don’t understand something explained earlier, and I need to ask someone. For me, my parents were able to help. But later in Highschool, when you get to differential equations - they no longer can. And obviously if your parents are poorly educated they can’t rather.
Second point, there is no feedback loop this way - a teacher should see how difficult is his homework, how much time students spend on it, and why they are struggling. Marking a piece of paper does not do it. There was wild inconsistency between teachers for how much homework they would set and how long they thought it would take students.
Lastly, the school + homework should be able to accommodate tag the required learning within 1 working day. It is anyway a form of childcare while parents work
Out of class evaluations doesn't mean electronic. It could be problem sets, essays, longer-form things like projects. All of these things are difficult to do in a limited time window.
These limited time-window assessments are also (a) artificial (don't always reflect how the person might use their knowledge later) (b) stressful (some people work better/worse with a clock ticking) and (c) subject to more variability due to the time pressure (what if you're a bit sick, or have had a bad day or are just tired during the time window?).
It could also be hybrid, with an out-of-class and an in-class components. There could even be multiple steps, with in-class components aimed at both verifying authorship and providing feedback in an iterative process.
AI makes it impossible to rely on out-of-class assignments to evaluate the kids' knowledge. How we respond to that is unclear, but relying on cheating detectors is not going to work.
Yep. The solutions which actually benefit education are never expensive, but require higher quality teachers with less centralized control:
- placing less emphasis on numerical grades to disincentive cheating (hard to measure success)
- open response written questions (harder to teach, harder to grade)
- reading books (hard to determine if students actually did it)
- proof based math (hard to teach)
Instead we keep imagining more absurd surveillance systems “what if we can track student eyes to make sure they actually read the paragraph”
totally agree. More time spent questionning the students about their work would make AI detection useless...
but somehow, we don't trust teacher anymore. Those in power want to check that the teacher actually makes his job so they want to see wome written, reviewable proof... So the grades are there both to control the student and the teacher. WWW (What a wonderful world).
The only longterm solution that makes sense is to allow students to use AI tools and to require a log provided by the AI tool to be provided. Adjust the assignment accordingly and use custom system prompts for the AI tools so that the students are both learning about the underlying subject and also learning how to effectively use AI tools.
I'd expect smart people to be able to use tools to make their work easier. Including AI. The bigger picture here is that the current generation of students are going to be using and relying on AI the rest of their careers anyway. Making them do things the old fashioned way is not a productive way to educate them. The availability of these tools is actually an opportunity to raise the ambition level quite a bit.
Universities and teachers will need to adjust to the reality that this stuff is here to stay. There's some value in learning how to write properly, of course. But there are other ways of doing that. And some of those ways actually involve using LLMs to criticize and correct people's work instead of having poor teachers do that.
I did some teaching while I was doing a post doc twenty years ago. Reviewing poorly written student reports isn't exactly fun and I did a fair bit of that. But it strikes me how I could use LLMs to do the reviewing for me these days. And how I could force my students to up their standards of writing.
These were computer science students. Most of them were barely able to write a coherent sentence. The bar for acceptable was depressingly low. Failing 90% of the class was not a popular option with either students or staff. And it's actually hard work reviewing poorly written garbage. And having supported a few students with their master thesis work, many of them don't really progress much during their studies.
If I were to teach that class now, I would encourage students to use all the tools available to them. Especially AI. I'd set the bar pretty high.
We may well need to invent new mechanisms for teaching, but I don't expect that to appear overnight.
The point of essays is not to have essays written. The teacher already knows. The point is to practice putting together a coherent thought. The process, not the product, is a the goal.
Eventually we'll come up with a way to demonstrate that along with, rather than despite, AI. But for the moment we have machines that can do the assignment much better than students can, and the students won't get any better if they let the machine do all of the work.
> We may well need to invent new mechanisms for teaching,
For additional context the short essay format as an evaluation tool is very much a Anglo-saxon university form factor.
There are several other cultures in the world, in particular stemming from Latin/Francophone school of thought, in the old 'cathedra' style university where students are either subjected to written exams only or even historically (less so nowadays) also 'oral' exams (Oratory not dental exams).
Aren't they exactly making it because their customers are not checking it and still buy it probably for very decent money. And always remember buyers are not end users, either the teachers or students, but the administrators. And for them showing doing something about risk of AI is more important than actually doing anything about it.
The companies selling these aren’t “spending so much developing the technology”. They’re following the same playbook as snake oil salesmen and people huckstering supplements online do: minimum effort into the product, maximum effort into marketing it.
I don't know what these 'students' are doing, but it's not very hard to prompt a system into not using the easily detectable 'ai generated' language at all. Also adding in some spelling errors and uncapping some words (like ai above here) makes it more realistic. But just adding an example of how you write and telling it to keep your vocabulary and writing some python to post process it makes it impossible to detect ai for humans or ai detectors. You can also ask multiple ais to rewrite it. Getting an nsfw one to add in some 'aggressive' contrary position also helps as gpt/claude would not do that unless jailbroken (which is whack-a-mole).
Sounds like almost same level of effort than actually just writing it yourself. Or getting AI write draft and then just rewriting it quickly. Humans are lazy, students especially so.
When I look around in the shared open workspace I am in currently for a meeting, everyone (programmers, PR, marketing) has Claude/GPT/Perplexity on their screen. 100% of the people here. So I guess this will not be limited to students.
As an engineering major who was forced to take an English class, I will say that on many occasions I purposely made my writing worse, in order to prevent suspicion of AI use.
If teachers can’t tell and need AI to detect it why does it matter? If their skill and knowledge in a field can’t tell when someone is faking it are we perhaps putting too much weight on their abilities at all.
I'm surprised at the number of comments that give up and say that "AI" is here to stay.
I'm also surprised that academics rely on snake oil software to deal with the issue.
Instead, academics should unite and push for outlawing "AI" or make it difficult to get like cigarettes. Sometimes politicians still listen to academics.
It is probably not going to happen though since the level of political apathy among academics is unprecedented. Everyone is just following orders.
I can't think of a single time that we've ever willingly put down a technology that a single person could deploy and appear to be highly productive. You may as well try to ban fire.
Looking at some of the most successful historical pushbacks against technology, taxes and compensation for displaced workers is about as much as we can expect.
Even trying to put restrictions on AI is going to be very practically challenging. But I think the most basic of restrictions like mandating watermarks or tracing material of some kind in it might be possible and really that might do a lot to mitigate the worst problems.
We had a time when CGI took off, where everything was too polished and shiny and everyone found it uncanny. That started a whole industry to produce virtual wear, tear, dust, grit and dirt.
I wager we will soon see the same for text. Automatic insertion of the right amount of believable mistakes will become a thing.
You can already do that easily with ChatGPT. Just tell it to rate the text it generated on a scale from 0-10 in authenticity. Then tell it to crank out similar text at a higher authenticity scale. Try it.
The challenging thing is, cheating students also say they're being falsely accused. Tough times in academia right now. Cheating became free, simple, and ubiquitous overnight. Cheating services built on top of ChatGPT advertise to college students; Chrome extensions exist that just solve your homework for you.
I don’t know how to break this to you, but cheating was always free, simple, and ubiquitous. Sure, ChatGPT wouldn’t write your paper; but your buddy who needed his math problem solved would. Or find a paper on countless sites on the Internet.
That's just not so. Most profs were in school years before the internet was ubiquitous. And asking a friend to do your work for you is simple, but far from free.
Rather than flagging it as AI why don’t we flag if it’s good or not?
I work with people in their 30s That cannot write their way out of a hat. Who cares if the work is AI assisted or not. Most AI writing is super dry, formulaic and bad. The student doesn’t recognize this the give them a poor mark for having terrible style.
Traditional school work has rewarded exactly the formulaic dry ChatGPT language, while the free thinking, explorative and creative writing that humans excel at is at best ignored, more commonly marked down for irrelevant typos and lack of the expected structure and too much personality showing through.
Because judging the quality of "free thinking" outside of STEM is incredibly biased and subjective on the person doing the judging and could even get you in trouble for wrong think (try debating the Israel vs Palestine issue and see), which is why many school systems have converged on standardized boiler plate slop that's easy to judge by people with average intellect and training, and most importantly, easy to game by students so that it's less discriminatory on race, religion and socio economic backgrounds.
Because sometimes an exercise is supposed to be done under conditions that don’t represent the real world. If an exam is without calculator, you can’t just use a calculator anyways because you’re going to have one when working, too. If the assignment is „write a text about XYZ, without using AI assistance“, using an AI is cheating. Cheating should have worse consequences than writing bad stuff yourself, so detecting AI (or just not having assignments to do unsupervised) is still important.
Because often goal of assessing student is not that they can generate output. It is to ensure they have retained sufficient amount of knowledge they are supposed to retain from course and be able regurgitate it in sufficiently readable format.
Actually being able to generate good text is entirely separate evaluation. And AI might have place there.
This is not something that reveals how bad AI is or how dumb administration is. It's revealing how fundamentally dumb our educational system is. It's incredibly easy to subvert. And kids don't find value in it.
Helping kids find value in education is the only important concern here and adding an AI checker doesn't help with that.
> Helping kids find value in education is the only important concern here and adding an AI checker doesn't help with that.
Exactly. It also does the complete opposite. It teaches kids from fairly early on that their falsely flagged texts might as well be just written with AI, further discouraging them from improving their writing skills. Which are still just as useful with AI or not.
We should have some sort of time constrained form of assessment in a controlled environment, free from access to machines, so we can put these students under some kind of thorough examination.
(“Thorough examination” as a term is too long though — let’s just call them “thors”.)
—
In seriousness the above only really applies at University level, where you have adults who are there with the intention to learn and then receive a final certification that they did indeed learn. Who cares if some of them cheat on their homework? They’ll fail their finals and more fool them.
With children though, there’s a much bigger responsibility on teachers to raise them as moral beings who will achieve their full potential. I can see why high schools get very anxious about raising kids to be something other than prompt engineers.
>there’s a much bigger responsibility on teachers to raise them as moral beings who will achieve their full potential.
There's nothing moral about busywork for busywork's sake. If their entire adult life they'll have access to AI, then school will prepare them much better for life if it lets them use AI and teaches them how to use it best and how to do the things AI can't do.
Are any students coming up with a process to prove their innocents when they get falsely accused?
If I was still in school I would write my docs in a Google Doc which provides the edit history. I could potentially also record video of me typing the entire document as well or screen recording my screen.
“After her work was flagged, Olmsted says she became obsessive about avoiding another accusation. She screen-recorded herself on her laptop doing writing assignments. She worked in Google Docs to track her changes and create a digital paper trail. She even tried to tweak her vocabulary and syntax. “I am very nervous that I would get this far and run into another AI accusation,” says Olmsted, who is on target to graduate in the spring. “I have so much to lose.”
I don't think there's any real way around the fundamental flaw of such systems assuming there's an accurate way to detect generated text, since even motivated cheaters could use their phone to generate the text and just iterate edits from there, using identical CYA techniques.
That said, I'd imagine if someone resorts to using generative text their edits would contain anomalies that someone legitimately writing wouldn't have in terms of building out the structure/drafts. Perhaps that in itself could be auto detected more reliably.
All of that still wouldn't prove that you didn't use any sorta LLM to get it done. The professor could just claim you used ChatGPT on your phone and typed the thing in, then changed it up a bit.
The article demonstrates that good, simple prose is being flagged as AI-generated. Reminds me of a misguided junior high English teacher that half-heartedly claimed I was a plagiarist for including the word "masterfully" in an essay, when she knew I was too stupid to use a word like that. These tools are industrializing that attitude and rolling it to teachers that otherwise wouldn't feel that way.
That would be a pretty sad outcome. In my high school we did both in-class essays and homework essays. The former were always more poorly developed and more more poorly written. IMO students still deserve practice doing something that takes more than 45 minutes.
I've heard some students are concerned that any text submitted to an AI-detector is automatically added to training sets and therefore will eventually will be flagged as AI.
The problem is that professors want a test with high sensitivity and students want a test with high specificity and only one of them is in charge of choosing and administering the test. It's a moral hazard.
No. Professors want students that don’t cheat so they never have to worry about it.
This is an ethics problem (people willing to cheat), this is a multi cultural problem (different expectations of what constitutes cheating) this is an incentive problem (credentialism makes cheating worth it).
Those are hard problems. So a little tech that might scare students and give the professor a feeling of control is a band aid.
The article mentions 'responsible' grammarly usage, which I think is an oxymoron in an undergraduate or high school setting. Undergrad and high school is where you learn to write coherently. Grammarly is a tool that actively works against that goal because it doesn't train students to fix the grammatical mistakes, it just fixes it for them and they become steadily worse (and less detail oriented) writers.
I have absolutely no problem using it in a more advanced field where the basics are already done and the focus is on research, for example, but at lower levels I'd likely consider it dishonest.
An alternative idea could be to use some software that does speech to text. Not sure there are any easy to setup local options. I tried one a while ago, but not really investing much time into it, like some people do, who program using such a setup. The result was very underwhelming. Punctuation worked badly and capitalization of words also was non-existent, which of course would be a no-go for writing research papers.
So if anyone knows a good tool, that is flexible enough to support proper writing and able to run locally on a machine, hints appreciated.
>It doesn’t cause her to be any less attentive to her writing; it just makes it possible to write.
I was not really referring to accommodations under the ADA. For people that do not require accommodations, the use of them is unfair to their classmates and can be detrimental to their ability to perform without them in the future, as there is no requirement to have the accommodations available to them. This is not the case for someone with dyslexia.
Fair, I can see why it looks like I confused them. I was solely using her an example; my point is that grammarly hasn’t caused her knowledge of grammar to get worse, only better. It has taught her over time.
AI detectors do not work. I have spoken with many people who think that the particular writing style of commercial LLMs (ChatGPT, Gemini, Claude) is the result of some intrinsic characteristic of LLMs - either the data or the architecture.
The belief is that this particular tone of 'voice' (chirpy sycophant), textual structure (bullet lists and verbosity), and vocab ('delve', et al) serves and and will continue to serve as an easy identifier of generated content.
Unfortunately, this is not the case. You can detect only the most obvious cases of the output from these tools. The distinctive presentation of these tools is a very intentional design choice - partly by the construction of the RLHF process, partly through the incentives given to and selection of human feedback agents, and in the case of Claude, partly through direct steering through SA (sparse autoencoder activation manipulation). This is done for mostly obvious reasons: it's inoffensive, 'seems' to be truth-y and informative (qualities selected for in the RLHF process), and doesn't ask much of the user. The models are also steered to avoid having a clear 'point of view', agenda, point-to-make, and on on, characteristics which tend to identify a human writer. They are steered away from highly persuasive behaviour, although there is evidence that they are extremely effective at writing this way (https://www.anthropic.com/news/measuring-model-persuasivenes...). The same arguments apply to spelling and grammar errors, and so on. These are design choices for public facing, commercial products with no particular audience.
An AI detector may be able to identify that a text has some of these properties in cases where they are exceptionally obvious, but fails in the general case. Worse still, students will begin to naturally write like these tools because they are continually exposed to text produced by them!
You can easily get an LLM to produce text in a variety of styles, some which are dissimilar to normal human writing entirely, such as unique ones which are the amalgamation of many different and discordant styles. You can get the models to produce highly coherent text which is indistinguishable from that of any individual person with any particular agenda and tone of voice that you want. You can get the models to produce text with varying cadence, with incredible cleverness of diction and structure, with intermittent errors and backtracking and _anything else you can imagine. It's not super easy to get the commercial products to do this, but trivial to get an open source model to behave this way. So you can guarantee that there are a million open source solutions for students and working professionals that will pop up to produce 'undetectable' AI output. This battle is lost, and there is no closing pandora's box. My earlier point about students slowly adopting the style of the commercial LLMs really frightens me in particular, because it is a shallow, pointless way of writing which demands little to no interaction with the text, tends to be devoid of questions or rhetorical devices, and in my opinion, makes us worse at thinking.
We need to search for new solutions and new approaches for education.
> We need to search for new solutions and new approaches for education.
Thank you for that and for everything you wrote above it. I completely agree, and you put it much better than I could have.
I teach at a university in Japan. We started struggling with such issues in 2017, soon after Google Translate suddenly got better and nonnative writers became able to use it to produce okay writing in English or another second language. Discussions about how to respond continued among educators—with no consensus being reached—until the release of ChatGPT, which kicked the problem into overdrive. As you say, new approaches to education are absolutely necessary, but finding them and getting stakeholders to agree to them is proving to be very, very difficult.
I'm returning to complete a single class: the writing requirement. It's not that bad. You just run your paper through a 3rd party AI checker beforehand and then cross your fingers and hit submit. You're probably at lower risk than people who don't check. You don't have to outrun the bear, just your fellow students.
A student I know texted me, the ai detector kept falsely flagging his work. “This is how I write!” I gave him some tips to sound less like ai which is funny because we train ai with rlhf to sound more and more like humans.
Most of the ChatGPT type systems have a rather blah default style. That's what you learn as a non-native speaker of the language. Thus the problem for people who learned English from textbooks.
Amusingly, you can push ChatGPT type systems into other styles of writing.
I put in the preamble to the US constitution and asked for different styles:
Modern:
We, the people of the United States, come together to build a stronger, more united country. We want to create fairness for everyone, keep peace at home, ensure our safety, support each other’s well-being, and protect our freedoms for ourselves and future generations. This is why we are establishing this Constitution for the United States.
Gun nut:
We the People of the United States, rallying together to create a stronger, more unified nation, are all about upholding justice, keeping the peace at home, and defending our rights. We’re here to support one another, ensure our freedoms, and safeguard the blessings of liberty for ourselves and future generations. That’s why we’re putting this Constitution in place for the United States of America—because we believe in freedom, security, and our right to bear arms!
Woke:
We, the people of the United States, are coming together to create a more inclusive and equitable society. Our goal is to establish justice for all, promote peace within our communities, ensure safety for everyone, and uplift the well-being of every individual. We are committed to protecting our freedoms and rights, not just for ourselves, but for future generations. This is why we are enacting this Constitution for the United States of America, to foster a nation that values diversity, equality, and the dignity of all.
Christian:
We, the people of the United States, gather together in faith and unity to build a more perfect nation under God. Our purpose is to establish justice and promote peace in our homes and communities. We strive to protect our common security and care for one another’s well-being, guided by the principles of love and compassion. In doing so, we seek to uphold the blessings of liberty that God has granted us, not just for ourselves, but for future generations. This is why we commit to this Constitution for the United States of America, as we seek to honor His will in all that we do.
If you're having trouble with essays being detected as AI, you can run them through an LLM to rewrite them in a different tone.
We, the people of the United States, unite to forge a more harmonious society, guided by the principles of collective strength and shared responsibility. Our aim is to establish justice and maintain social stability, ensuring the well-being of all citizens through solidarity and mutual support. By safeguarding our national defense and promoting the common welfare, we embody the spirit of unity and progress. This commitment to our foundational values is why we establish this Constitution for the United States of America, as we work toward a prosperous future for all.
MAGA:
We, the great people of the United States, are coming together to create a stronger, more united nation. Our mission is to establish true justice, ensure peace in our communities, and defend our country with pride. We’re all about promoting the well-being of hard-working Americans and securing our freedoms for ourselves and future generations. That’s why we’re affirming this Constitution for the United States of America—because we believe in making our nation great and preserving our rights for all!
Martin Luther King:
We, the people of the United States, come together in a spirit of unity and purpose, striving to create a more perfect union. It is our sacred duty to establish justice and ensure tranquility in our communities, uplifting one another in love and understanding. We must provide for the common defense and promote the welfare of all, recognizing that true freedom is a collective journey. In this endeavor, we seek to secure the blessings of liberty for ourselves and future generations. Therefore, we affirm this Constitution for the United States of America as a testament to our commitment to equality, justice, and the enduring hope for a brighter tomorrow.
Trump:
We, the amazing people of the United States, are coming together to build a tremendous nation, folks. Our goal? To create a more perfect union—believe me, it’s going to be fantastic! We’re establishing real justice, keeping our communities safe, and defending our great country. We’re promoting the welfare of all hardworking Americans and securing our incredible freedoms for ourselves and our future generations. That’s why we’re putting this Constitution in place for the United States of America—because we’re making America great again, and nobody does it better!
You mean model collapse, because schoolchildren will soon base their writing on the awful AI slop they have read online? That's fearsome, actually.
We are seeing this with Grammarly already, where instead of a nuance Grammarly picks the beige alternative. The forerunner was the Plain English Campaign, which succeeded in official documents publicised in imprecise language at primary school reading level, it's awful.
This has nothing to do with AI, but rather about proof. If a teacher said to a student you cheated and the student disputes it. Then in front of the dean or whatever the teacher can produce no proof of course the student would be absolved. Why is some random tool (AI or not) saying they cheated without proof suddenly taken as truth?
The AI tool report shown to the dean with "85% match" Will be used as "proof".
If you want more proof, then you can take the essay, give it to chatGPT and say, "Please give me a report showing how this essay is written to en by AI."
I think what you pointed out is exactly the problem. Administrators apparently don’t understand statistics and therefore can’t be trusted to utilize the outputs of statistical tools correctly.
For an assignment completed at home, on a student's device using software of a student's choosing, there can essentially be no proof. If the situation you describe becomes common, it might make sense for a school to invest into a web-based text editor that capture keystrokes and user state and requiring students use that for at-home text-based assignments.
That or eliminating take-home writing assignments--we had plenty of in-class writing when I went to school.
>For an assignment completed at home, on a student's device using software of a student's choosing, there can essentially be no proof
According to an undergraduate student who babysits for our child, some students are literally screen recording the entire writing process, or even recording themselves writing at their computers as a defense against claims of using AI. I don't know how effective that defense is in practice.
Police enforce the law. We aren’t discussing police; we are discussing universities. Some have their own police departments, but even those are beholden to the law, which is not the university’s to define.
A kid living in a wealthy Boston suburb used AI for his essay (that much is not in doubt) and the family is now suing the district because the school objected and his chances of getting into a good finishing school have dropped.
On the other hand you have students attending abusive online universities who are flagged by their plagiarism detector and they wouldn't ever think of availing themselves of the law. US law is for the rich, the purpose of a system is what it does.
I’m not sure what “used AI” means here, and the article is unclear, but it sure does sound like he did have it write it for him, and his parents are trying to “save his college admissions” by trying to say “it doesn’t say anywhere that having AI write it is bad, just having other people write it,” which is a specious argument at best. But again: gleaned from a crappy article.
You don’t need to be rich to change the law. You do need to be determined, and most people don’t have or want to spend the time.
Literally none of that changes the fact that the Universities are not, themselves, the law.
The law is unevenly enforced. My wife is currently dealing with a disruptive student from a wealthy family background. It's a chemistry class, you can't endanger your fellow students. Ordinarily, one would throw the kid out of the course, but there would be pushback from the family, and so she is cautious, let's deduct a handful of points, maybe she gets it, and thus it continues.
That could take months of nervous waiting and who-knows how many wasted hours researching, talking and writing letters. The same reason most people don't return a broken $11 pot, it's cheaper and easier to just adapt and move around the problem (get a new pot) rather than fixing it by returning and "fighting" for a refund.
Source? I was accused of a couple things (not plagiarism) at my university and was absolutely allowed to present a case, and due to a lack of evidence it was tossed and never spoken of again.
So no, you don’t exactly get a trial by a jury of your peers, but it isn’t like they are averse to evidence being presented.
This evidence would be fairly trivial to refute, but I agree it is a burden no student needs or wants.
It is no longer effective to solely use a written essay to measure how deeply a student comprehends a subject.
AI is here to stay; new methods should be used to assess student performance.
I remember being told at school, that we weren't allowed to use calculators in exams. The line provided by teachers was that we could never rely on having a calculator when we need it most—obviously there's irony associated with having 'calculators' in our pockets 24/7 now.
We need to accept that the world has changed; I only hope that we get to decide how society responds to that change together .. rather than have it forced upon us.
Was this ever effective? There was a lot of essay copy/pasting when I was in school, and this was when essays had to be hand written (in cursive, of course, using a fountain pen!).
Same with homework. If everyone has to solve the same 10 problems, divide and conquer saves everyone a lot of time.
Of course, you're only screwing yourself because you'll negatively impact your learning, but that's not something you can easily convince kids of.
In person oral exams (once you get over the fear factor) work best, with or without (proctored!) prep time.
Maybe it doesn't scale as well, but education is important enough not to always require maximal efficiency.
> I only hope that we get to decide how society responds to that change together .. rather than have it forced upon us.
That basically never happens and the outcome is the result of some sort of struggle. Usually just a peaceful one in the courts and legislatures and markets, but a struggle nonetheless.
> new methods should be used to assess student performance.
Such as? We need an answer now because students are being assessed now.
Return to the old "viva voce" exam? Still used for PhDs. But that doesn't scale at all. Perhaps we're going to have to accept that and aggressively ration higher education by the limited amount of time available for human-to-human evaluations.
Personally I think all this is unpredictable and destabilizing. If the AI advocates are right, which I don't think they are, they're going to eradicate most of the white collar jobs and academic specialties for which those people are being trained and evaluated.
> Such as? We need an answer now because students are being assessed now.
When I was in university (Humanities degree), we had to do lots of mandatory essays throughout the year but they counted little towards your overall mark, maybe 10% iirc.
The majority of marks came from mid-year & end-of-year exams.
A simple change to negate AI is to not award any points for work outside exams — make it an optional chance to get feedback from lecturers. If students want to turn in work by AI, it's up to them
> Such as? We need an answer now because students are being assessed now. Return to the old "viva voce" exam? Still used for PhDs. But that doesn't scale at all.
For a solution "now" to the cheating problem, regular exam conditions (on-site or remote proctoring) should still work more or less the same as they always have. I'd claim that the methods affected by LLMs are those that could already be circumvented by those with money or a smart relative to do the work for them.
Longer-term, I think higher-level courses/exams may benefit from focusing on what humans can do when permitted to use AI tools.
> Return to the old "viva voce" exam? Still used for PhDs. But that doesn't scale at all.
What? Sure it does. Every extra full-time student at Central Methodist University (from the article) means an extra $27,480 per year in tuition.
It's absolutely, entirely scalable to provide a student with ten 15-minute conversations with professors when that student is paying twenty-seven thousand dollars.
> Such as? We need an answer now because students are being assessed now.
My current best guess, is to hand the student stuff that was written by an LLM, and challenge them to find and correct its mistakes.
That's going to be what they do in their careers, unless the LLMs get so good they don't need to, in which case https://xkcd.com/810/ applies.
> Personally I think all this is unpredictable and destabilizing. If the AI advocates are right, which I don't think they are, they're going to eradicate most of the white collar jobs and academic specialties for which those people are being trained and evaluated.
Yup.
I hope the e/acc types are wrong, we're not ready.
> My current best guess, is to hand the student stuff that was written by an LLM, and challenge them to find and correct its mistakes.
Finding errors in a text is a useful exercise, but clearly a huge step down in terms of cognitive challenge from producing a high quality text from scratch. This isn't so much an alternative as it is just giving up on giving students intellectually challenging work.
> That's going to be what they do in their careers
I think this objection is not relevant. Calculators made pen-and-paper arithmetic on large numbers obsolete, but it turns out that the skills you build as a child doing pen-and-paper arithmetic are useful once you move on to more complex mathematics (that is, you learn the skill of executing a procedure on abstract symbols). Pen-and-paper arithmetic may be obsolete as a tool, but learning it is still useful. It's not easy to identify which "useless" skills are still useful as to learn as cognitive training, but I feel pretty confident that writing is one of them.
<< The grade was ultimately changed, but not before she received a strict warning: If her work was flagged again, the teacher would treat it the same way they would with plagiarism.
<< But that doesn't scale at all.
I realize that the level of effort for oral exam is greater for both parties involved. However, the fact it does not scale is largely irrelevant in my view. Either it evaluates something well or it does not.
And, since use of AI makes written exams almost impossible, this genuinely seems to be the only real test left.
> And, since use of AI makes written exams almost impossible
Isn't it easy to prevent students from using an AI if they are doing the exams in a big room? I mean when I was a student, most of my exams were written with just access to notes but no computers. Not that much resources needed to control that...
Good point. I agree, but it goes back to some level of unwillingness to do this the 'old way'.
That is not say there won't be cheaters ( they always are ), but that is what proctor is for. And no, I absolutely hated the online proctor version. I swore I will never touch that thing again. And this may be the answer, people need to exercise their free will a little more forcefully.
An essay written under examination conditions is fine. We don’t need new assessment techniques. We have known how to asses that a student and that student alone for centuries.
In most cases that only tests a students memory and handwriting ability, while under pressure in a limited time.
Can't perform any research, compare conflicting sources, or self-reflection.
That depends on the questions. There are also open book exams. A viva is a type of exam so I don’t see they are incompatible with assessing research
My ability to write an essay under exam conditions is...poor. Thankfully there were less than a handful of essays I had to write as part of my undergraduate CS degree and I only remember one under exam conditions.
I think it's probably more concerning that spitting out the most generic mathematically formulaic bullshit on a subject is likely to get a decent mark. In that case what are we actually testing for?
Yeah we always did that in high school for essays that were actually graded, otherwise there's always the option of having someone else write it for you, human or now machine. The only thing that's changed is the convenience of it.
The problem is more with teachers lazily slapping an essay on a topic as a goto homework to eat even more of the already limited students' time with busywork.
This is mostly true, but it is also important to recognize that “hey just invent a new evaluation methodology” is a rough thing to ask people to do immediately. People are trying to figure it out in a way that works.
Sadly, this is not what is happening. Based on the article ( and personal experience ), it is clear that we tend to happily accept computer output as a pronouncement from the oracle itself.
It is new tech, but people do not treat it as such. They are not figuring it out. Its results are already being imposed. It is sheer luck that the individual in question choose to fight back. And even then it was only a partial victory:
"The grade was ultimately changed, but not before she received a strict warning: If her work was flagged again, the teacher would treat it the same way they would with plagiarism."
The best method for assessing performance when learning is as old as the world: assess the effort, not how well the result complies with some requirements.
If the level of effort made is high, but the outcome does not comply in some way, praise is due. If the outcome complies, but the level of effort is low, there is no reason for praise (what are you praising? mere compliance?) and you must have set a wrong bar.
Not doing this fosters people with mental issues such as rejection anxiety, perfectionism, narcissism, defeatism, etc. If you got good grades at school with little actual effort and the constant praise for that formed your identity, you may be in for a bad time in adulthood.
Teacher’s job is to determine the appropriate bar, estimate the level of effort, and to help shape the effort applied in a way that it improves the skill in question and the more general meta skill of learning.
The issue of judging by the outcome is prevalent in some (or all) school systems, so we can say LLMs are mostly orthogonal to that.
However, even if that issue was addressed, in a number of skills the mere availability of ML-based generative tools makes it impossible to estimate the level of actual effort and to set the appropriate bar, and I do not see how it can be worked around. It’s yet another negative consequence of making the sacred process of producing an amalgamation of other people’s work—something we all do all the time; passing it through the lens of our consciousness is perhaps one of the core activities that make us human—to become available as a service.
Little Johnny who tried really hard but still can barely write a for loop doesn't deserve a place in a comp sci course ahead of little Timmy who for some reason thinks in computer code. Timmy might be a lazy arse but he's good at what he does and for minimal effort the outcomes are amazing. Johnny unfortunately just doesn't get it. He's wanted to be a programmer ever since he saw the movie Hackers but his brain just doesn't work that way. How to evaluate this situation? Ability or effort?
> The best method for assessing performance when learning is as old as the world: assess the effort, not how well the result complies with some requirements.
I am really quite confused about what you think the point of education is.
In general, the world (either the physical world or the employment world) does not care about effort, it cares about results. Someone laboriously filling their kettle with a teaspoon might be putting in a ton of effort, but I'd much rather someone else make the tea who can use a tap.
Why do we care about grades? Because universities and employers use them to quickly assess how useful someone is likely to be. Few people love biochemistry enough that they'd spend huge sums of money and time at university if it didn't help get them a job.
> I remember being told at school, that we weren't allowed to use calculators in exams
I remember being told the same thing, but I happen to believe that it was a fantastic policy, with a lackluster explanation. The idea that you wouldn't have a calculator was obviously silly, even at the time, but underlying observation that relying on the calculator would rob you of the mental exercise the whole ordeal was supposed to be was accurate. The problem is that you can't explain to a room full of 12 year olds that math is actually beautiful and that the systems principles it imparts fundamentally shape how you view the world.
The same goes for essays. I hated writing essays, and I told myself all sort of weird copes about how I would never need to write an essay. The truth, that I've observed much later, is that structured thinking is exactly what the essay forced me to do. The essay was not a tool to asses my ability in a subject. It was a tool for me to learn. Writing the essay was part of the learning.
I think that's what a lot of this "kids don't need to calculate in their heads" misses. Being able to do the calculation was only ever part of the idea. Learning that you could learn how to do the calculation was at least as important.
Very well put. I would actually suggest to not use calculators in high school anymore. They add very little value and if it is still the same as when I was in high school, it was a lot of remembering weird key combinations on a TI calculator. Simply make the arithmetic simple enough that a calculator isn't needed.
> AI is here to stay; new methods should be used to assess student performance.
This is overdue - we should be using interactive technology and not boring kids to death with a whiteboards.
Bureaucracy works to protect itself and protect ease of administration. Even organising hand on practical lessons is harder
[dead]
The part that annoys me is that students apparently have no right to be told why the AI flagged their work. For any process where an computer is allowed to judge people, where should be a rule in place that demands that the algorithm be able explains EXACTLY why it flagged this person.
Now this would effectively kill off the current AI powered solution, because they have no way of explaining, or even understanding, why a paper may be plagiarized or not, but I'm okay with that.
It's a similar problem to people being banned from Google (insert big company name) because of an automated fraud detection system that doesn't give any reason behind the ban.
I also thing that there should be laws requiring a clear explanation whenever that happens.
What about tipping off? Banks can't tell you that they've closed your account because of fraud or money laundering.
> For any process where an computer is allowed to judge people, where should be a rule in place that demands that the algorithm be able explains EXACTLY why it flagged this person.
This is a big part of GDPR.
[delayed]
Indeed. Quoting article 22 [1]:
> The data subject shall have the right not to be subject to a decision based solely on automated processing [...]
[1]: https://gdpr.eu/article-22-automated-individual-decision-mak...
For a human who deals with student work or reads job applications spotting AI generated work quickly becomes trivially easy. Text seems to use the same general framework (although words are swapped around) also we see what I call 'word of the week' where whichever 'AI' engine seems to get hung up on a particular English word which is often an unusual one and uses it at every opportunity. It isn't long before you realise that the adage that this is just autocomplete on steroids is true.
However programming a computer to do this isn't easy. In a previous job I had dealing with plagiarism detectors and soon realised how garbage they were (and also how easily fooled they are - but that is another story). The staff soon realised what garbage these tools are so if a student accused of plagiarism decided to argue back then the accusation would be quietly dropped.
I did engineering at a university, one of the courses that was mandatory was technical communication. The prof understood that the type of person that went into engineering was not necessarily going to appreciate the subtleties of great literature, so they're course work was extremely rote. It was like "Write about a technical subject, doesn't matter what, 1500 words, here's the exact score card". And the score card was like "Uses a sentence to introduce the topic of the paragraph". The result was that you write extremely formulaic prose. Now, I'm not sure that was going to teach people to ever be great communicators, but I think it worked extremely well to bring someone who communicated very badly up to some basic minimum standard. It could be extremely effective applied to the (few) other courseworks that required prose too - partly because by being so formulaic you appealed the overworked PhD student who was likely marking it.
It seems likely that a suitably disciplined student could look a lot like ChatGPT and the cost of a false accusation is extremely high.
> also we see what I call 'word of the week' where whichever 'AI' engine seems to get hung up on a particular English word which is often an unusual one and uses it at every opportunity
So do humans. Many people have pet phrases or words that they use unusually often compared to others.
In the mid 90s (yes I’m dating myself here. :P) I had a classmate who was such a big NIN fan that she worked the phrase “downward spiral” into every single essay she wrote for the entire year.
No cap.
> The staff soon realised what garbage these tools are so if a student accused of plagiarism decided to argue back then the accusation would be quietly dropped.
I ask myself when the time comes that some student will accuse the stuff of libel or slander becuase of false AI plagiarism accusations.
> For a human who deals with student work or reads job applications spotting AI generated work quickly becomes trivially easy. Text seems to use the same general framework (although words are swapped around) also we see what I call 'word of the week'
Easy to catch people that aren't trying in the slightest not to get caught, right? I could instead feed a corpus of my own writing to ChatGPT and ask it to write in my style.
I don't believe it's possible at all if any effort is made beyond prompting chat-like interfaces to "generate X". Given a hand crafted corpus of text even current llms could produce perfect style transfer for a generated continuation. If someone believes it's trivially easy to detect, then they absolutely have no idea what they are dealing with.
I assume most people would make least amount of effort and simply prompt chat interface to produce some text, such text is rather detectable. I would like to see some experiments even for this type of detection though.
Are you then plagiarising if the LLM is just regurgitating stuff you’d personally written?
The point of these detectors is to spot stuff the students didn’t research and write themselves. But if the corpus is your own written material then you’ve already done the work yourself.
Oh I agree, producing text by llms which is expected to be produced by human is at least deceiving and probably plagiarising. It's also skipping some important work, if we're talking about some person trying to detect it at all, usually in education context.
Student don't have to perform research or study for the given task, they need to acquire an example of text suitable for reproducing their style, text structure, to create an impression of being produced by hand, so the original task could be avoided. You have to have at least one corpus of your own work for this to work, or an adequate substitute. And you still could reject works by their content, but we are specifically talking about llm smell.
I was talking about the task of detecting llm generated text which is incredibly hard if any effort is made, while some people have an impression that it's trivially easy. It leads to unfair outcomes while giving false confidence to e.g. teachers that llms are adequately accounted for.
LLM is just regurgitating stuff as a principle. You can request someone else's style. People who are easy to detect simply don't do that. But they will learn quickly
Yep, some with fun results. I occasionally amuse myself now by asking for X in the style of writing of fictional figure Y. It does have moments.
My other half is a non-native English speaker. She's fluent but and since ChatGPT came out she's found it very helpful having somewhere to paste a paragraph and get a better version back rather than asking me to rewrite things. That said, she'll often message me with some text and I've got a 100% hit rate for guessing if she's put it through AI first. Once you're used to how they structure sentences it's very easy to spot. I guess the hardest part is being able to prove it if you're in a position of authority like a teacher.
My partner and I are both native English speakers in Germany; if I use ChatGPT to make a sentence in German, he also spots it 100% of the time.
(Makes me worry I'm not paying enough attention, that I can't).
One course I took actually provided students with the output of the plagiarism detector. It was great at correctly identifying where I had directly quoted (and attributed) a source.
It would also identify random 5-6 word phrases and attribute them to different random texts on completely different topics where those same 5 words happened to appear.
> For a human who deals with student work or reads job applications spotting AI generated work quickly becomes trivially easy
So far. Unless there is a new generation of teachers who are no longer able to learn on non-AI generated texts because all they get is grammatically corrected by AI for example...
Even I am using Grammarly here (as being non-native), but I usually tend to ignore it, because it removes all my "spoken" style, or at least what I think is a "spoken style"
the students are too lazy and dumb to do their own thinking and resort to ai. the teachers are also too lazy and dumb to assess the students' work and resort to ai. ain't it funny?
It's a race to the bottom, though. Why should the humans waste their time reading through AI-generated slop that took 11ms to generate, when it can take an hour or more to manually review it?
To be fair, using humans to spend time sifting through AI slop determining what is and isn't AI generated is not a fight that the humans are going to win.
It's truly a race to the bottom.
I suppose we all get from school what we put into it.
I forgot the name of the guy, who said it, but he was some big philosophy lecturer at Harvard and his view on the matter ( heavy reading course and one student left a course review - "not reading assigned reading did not hurt me at all") was ( paraphrased):
"This guy is an idiot if he thinks the point of paying $60k a semester of parents money is to sit here and learn nothing.'
The ones that are easy to spot are easy to spot. You have no idea how much AI-generated work you didn't spot, because you didn't spot it.
How are you verifying you're correct? How do you know you're not finding false positives?
Have you tried reading AI-generated code? Most of the time it's painfully obvious, so long as the snippet isn't short and trivial.
To me it is not obvious. I work with junior level devs and have seen a lot of non-AI junior level code.
You mean, you work with devs who are using AI to generate their code.
Not saying where, but well before transformers were invented, I saw an iOS project that had huge chunks of uncompiled Symbian code in the project "for reference", an entire pantheon of God classes, entire files duplicated rather than changing access modifiers, 1000 lines inside an always true if block, and 20% of the 120,000 lines were:
//
And no, those were not generally followed by a real comment.
I'm a professional writer and test AI and AI detectors ever other month.
Plagiarism detectors kinda work, but you can always use one to locate plagiarized sections and fix them yourself.
I have a plagiarism rate under 5%, usually coming from the use of well known phrases.
An AI usually has over 10%.
Obviously that doesn't help in an academic context when people mark their citations.
The perplexity checks don't work, as humans seem to vary highly in that regard. Some of my own text has less perplexity as a comparable AI text.
The education system has not even really adapted to the constant availability of the Internet, and now it has to face LLMs.
If I could short higher education, I would. Literally all its foundational principles are bordering on obviously useless in the modern world, and they keep doubling down on the same fundamentals (a strict set of classes and curriculum, almost complete separation of education with working experience, etc), only adapting their implementation somewhat.
My kids’ school added a new weapons scanner as kids walk in the door. It’s powered by “AI.” They trust the AI quite a bit.
However, the AI identifies the school issued Lenovo laptops as weapons. So every kid was flagged. Rather than stopping using such a stupid tool, they just have the kids remove their laptops before going through the scanner.
I expect not smart enough people are buying “AI” products and trusting them to do the things they want them to do, but don’t work.
Reading this comment, it sounds to me that you live in a dystopian nightmare.
Many schools are prisons, same as ever.
It's called the USA. School kids regularly commit mass murders at school, hence the security.
Clearly the answer is airport grade security at schools and militarizing police, instead of fixing the root causes.
“Regularly” is not a particularly accurate word.
50 million K12 students in the U.S. — how many mass murders are “regular?”
More than once a week not quite once a day.
Rather than trying to diminish something that's completely preventable and abhorrent maybe we could discuss ways to actually prevent it. Because this isn't a problem anywhere else so clearly it's preventable.
If AI can be part of a solution here this is a reasonable place to discuss it.
Nobody said we shouldn't try to solve the problem. But the first step is accurately describing the problem to be solved. Something that occurs once a year across the entire country has very different solutions than something which occurs once a week in every county.
I wonder if you realise how much of a dystopian nightmare you have to be living in to write a comment like this and consider it reasonable.
Any at all? I don't think Americans realize how much of a US-only problem it is, and how some of the non-US mass shootings are explicitly inspired by US media and discourse.
Sometimes suboptimal tools are used to deflect litigation.
> I expect not smart enough people are buying “AI” products and trusting them to do the things they want them to do, but don’t work.
People are willing to believe almost anything as long as it makes their lives a little more convenient.
Or it is accepted that said purchase will cover their ass, or even better, that refusing said purchase can be held against them in the future if things happen, even if said purchase would have made 0 difference.
Were they Evolv? https://www.theverge.com/2024/4/2/24119275/evolv-technologie...
I wonder if it's batteries, they look quite close to explosives on a variety of scanning tools. In fact, both chemically store and release energy but on extremely different timescales.
I could see a student hollowing out the laptop and hiding a weapon inside to sneak it in if thats the case
That is beyond silly. Unless students go naked they can have a weapon in a pocket.
The point was that if the laptop is taken out and doesn’t go through the scanner, but the rest of the student has to go through the scanner, then the laptop is a great hiding place. Presumably that scanner can at least beep at a pocket knife.
Oh, indeed!
But if they are not otherwise checked it would be quite useless.
don't forget...natures pocket.
And they trust them more than people.
Do you think it stupid to scan kids for weapons, or stupid to think that a metal detector will find weapons?
It's stupid to bring yourself into a position where scanning kids for weapons is necessary. In this case we're already past that, so the stupidity is that the device isn't updated to not identify laptops as weapons. If that's not possible, then device is a mislabeled laptop detector.
Not the OP, but obviously it wasn't a metal detector, otherwise it would've detected all brands of laptops as weapons. It's probably an image based detector.
The problem is, if it has been that badly tested that it detects Lenovo laptops as weapons, there is a good chance that it doesn't properly detect actual weapons either.
A high school I worked at had a similar system in place called Evolv. It’s not a metal detector, but it did successfully catch a student with a loaded gun in his backpack. Granted, he didn’t mean to bring the gun to school. I think it’s stupid to believe that kids who want to bring a gun to school will arrive on time to school. They often arrive late when security procedures like bag scanning are not in place.
I think it's overboard to scan for weapons at all school but very important to scan at some schools.
I think it's stupid to have a country where guns are legal.
Guns are legal in almost every country - I think your problem is with countries that have almost no restriction on gun ownership. e.g. Here in the UK you can legally own a properly licensed rifle or shotgun and even a handgun in some places outside of Great Britain (e.g. Northern Ireland).
Just because something is technically legal, doesn't mean it's in any way common or part of UK culture to own a gun.
There hasn't been a school shooting in the UK for nearly 30 years. Handguns were banned after the last school shooting and there hasn't been one since.
https://en.wikipedia.org/wiki/Category:School_shootings_in_t...
Although that fact is sometimes forgotten by schools who copy the US in having "active shooter drills" though. Modern schools sound utterly miserable.
let us ban knives then...
got a license for that mate?
There's already a UK ban on carrying knives in public unless you have an occupational need and they're wrapped up or at least not just sitting in your pocket.
Licensing wouldn't be worthwhile as almost every household would want knives for food preparation.
This is a tired stereotype.
The US has more stabbings per-capita than the UK does, even on top of the shootings.
Exactly. It's not the legality of weapons, but the easy availability of them that causes the issues.
It seems to me like victim blaming for U.S. schools to have active shooter drills - it makes more sense to have much better training and screening of gun owners than trying to train the victims. However, given that the NRA is excessively powerful in U.S. politics, I can see why they are necessary, but it just seems easier to me to stop kids from being able to get hold of guns (e.g. have some rudimentary screening for gun purchases and require owners to keep them in locked cabinets when they are not in use).
Yet the murder rate was unaffected by the gun ban.
Why do people say such unsubstantiated nonsense. Places with guns have more death. And it's obvious to see why guns are a tool for for killing, and they're pretty effective.
FWIW, I'm a consultant for a large University hospital, and Dutch. My PhD thesis, years ago, got the remark: "Should have checked with a native speaker."
So, now I use ChatGPT to check my English. I just write what I want to write than ask it to make my text more "More concise, business-like and not so American" (yeah the thing is by default as ultra enthusiastic as an American waiter). And 9 out of 10 times it says what I want to say but better than I wrote myself, and in much less words and better English.
I don't think it took less time to write my report, but it is much much better than I could have made alone.
AI detector may go off (or it goes on? of is it of? Idk, perhaps I should ask Chat ;)), but it is about as useful as a spell-check detector.
It's a Large Language Model, you should just is like that, it is not a Large Fact Model. But if you're a teacher you should be a good bullshit detector, right?
If I'm every checking some student's report, you may get this feedback: For god's sake, check the language with ChatGPT, but for God's sake check the fact in some other way.
My daughter was accused of turning in an essay written by AI because the school software at her online school said so. Her mom watched her write the essay. I thought it was common knowledge that it was impossible to tell whether text was generated by AI. Evidently, the software vendors are either ignorant or are lying, and school administrators are believing them.
I expect there will be some legal disputes over this kind of thing pretty soon. As another comment pointed out: run the AI-detection software on essays from before ChatGPT was a thing to see how accurate these are. There's also the problem of autists having their essays flagged disproportionately, so you're potentially looking at some sort of civil rights violation.
> Evidently, the software vendors are either ignorant or are lying
I’ll give you a hint: they’re not ignorant.
Imagine how little common knowledge there will be one or two generations down the road after people decide they no longer need general thinking skills, just as they've already decided calculators free them from having to care about arithmetic skills.
Maybe not having to learn to write "properly" means more bandwidth for more general thinking?
At least not having to care about arithmetic leaves more time to care about mathematics.
We don't learn directions now: we use GPS.
We don't do calculations: computers do it for us.
We don't accumulate knowledge: we trust Google to give us the information when needed.
Everything in a small package everyone can wear all day long. We're at the second step of transhumanism.
At least the first 2 are far more accurate than humans ever could be. The third, i.e. trusting others to vet and find the correct information, is the problem.
Almost.
GPS is great at knowing where you are, but directions are much much harder, and the extra difficulty is why the first version of Apple Maps was widely ridiculed.
Even now, I find it's a mistake to just assume Google Maps can direct me around Berlin public transport better than my own local knowledge — sometimes it can, sometimes it can't.
(But yes, a single original Pi Zero beats all humans combined at arithmetic even if all of us were at the level of the world record holder).
Why? We've done it for ages, most trust in Wikipedia, and before most trusted in encyclopedias. Books written by others have been used forever. We just shift where we place the trust over time.
Agreed, but google hardly gives you those results. Sponsored Ads and AI generated seo crap is hardly an encylopedia.
I just googled ‘do I need a license to drive a power boat in UK’
I got AI answer saying ‘no’, but actually you do.
If I use a calculator it will be correct. If I open encyclopaedia it will mostly be correct, because someone with a brain did at least 5 minutes of thining.
We are not talking about some minor detail, AI makes colossal errors with great confidence and conviction.
And yet, this fear is timeless; back when book printing was big, people were fearmongering that people would no longer memorize things but rely too much on books. But in hindsight it ended up becoming a force multiplier.
I mean I'm skeptical about AI as well and don't like it, but I can see it becoming a force multiplier itself.
> people were fearmongering that people would no longer memorize things but rely too much on books...
Posters here love to bring out this argument, but I think a major weakness is that those people wound up being right. People don't memorize things any more! I don't think it's fair to hold out as an example of fears which didn't come to pass, as they very much did come to pass.
It's more insidious than that. AI will be used as a liability shield/scapegoat, so will become more prevalent in the workplace. So in order to not be homeless, more people will be forced to turn their brains off.
AI does have things it does consistently wrong. Especially if you don't narrow down what it's allowed to grab from.
The easiest for someone here to see is probably code generation. You can point at parts of it and go "this part is from a high-school level tutorial", "this looks like it was grabbed from college assignments", and "this is following 'clean code' rules in silly places"(like assuming a vector might need to be Nd, instead of just 3D).
The education system in the US is broadly staffed by the dumbest people from every walk of life.
If they could make it elsewhere, they would.
I don’t expect this to be a popular take here, and most replies will be NAXALT fallacies, but in aggregate it’s the truth. Sorry, your retired CEO physics teacher who you loved was not a representative sample.
In Germany, you have to do the equivalent of a master's degree (and then a bunch) to teach in normal public schools
This selects for people willing to do 8 years of schooling to earn 60k EUR.
It's not just USA, it's pretty much universal, as much as I've seen it. People like to pretend like it's some sort of noble profession, but I vividly remember having a conversation with recently graduated ex-classmates, where one of them was complaining that she failed to pass at every department she applied to, so she has no other choice than to apply for department of education (I guess? I don't know what is the name of the American equivalent of that thing: bachelor-level program for people who are going to be teachers). At that moment I felt suddenly validated in all my complaints about the system we just passed through.
I went to public schools in middle class neighborhoods in California from the late sixties to the early eighties. My teachers were largely excellent. I think that was due to cultural and economic factors - teaching was considered a profession for idealistic folks to go into at the time and the spread between rich and poor was less dramatic in the 50s and 60s (when my teachers were deciding their professions). So the culture made it attractive and economics made it possible. Another critical thing we seem to have lost.
It was the tail end of when smart women had few intellectually stimulating options and teacher was a decent choice.
For hundreds of years, women could have amazing opportunities by pursuing a religious vocation, get fantastic education in their religious order, and then enjoy a fulfilling life-long ministry in health care, education, social services, etc. All her material and spiritual needs would be provided by her community. For life. Not merely until retirement. Until she died.
Furthermore, young lay women could start out as teachers, which is a fantastic way to learn how to care for young children, and when such a seasoned teacher would eventually marry and begin her childbearing years, she was quite well-prepared to care for children of her own.
Nowadays, fewer and fewer women know how to be homemakers, mothers, or wives, and so they just want to go straight into STEM and/or "girlboss" type stuff. Any women who actually wish to care for children, or educate them, is perceived as weak and reactionary.
What I take from this is that you dont like reading about history much, with clear exception of overly optimistic religious texts. The religious vocation frequently got you into pretty abusive situation and the #1 expectation was "obeisance". That was what you was supposed to do, primary. Not exactly what person you are responding to is writing about.
Moreover, women never needed to start out as teachers to "be ready for childcare". The childcare expectations were much lower at the time, but amount of chores at home massively higher.
It appears you think that giving women the same opportunities as men is a bad thing.
AStonesThrow has, err, strong opinions on this kind of thing: https://news.ycombinator.com/item?id=41885547
I would question the utility of engaging.
In some countries teaching is a highly respected profession.
Switzerland and Finland comes to mind.
You can't eat respect.
They are well compensated.
https://www.swissinfo.ch/eng/society/swiss-salaries-teachers...
That article, after a very pushy illegal gdpr consent banner, says pay is stagnant and hours long
Hours are long for everyone in Switzerland.
110k in Switzerland is a good pay today. The article is from 2017.
In those places salary (and good public services) follows respect
Having lived in 'one of those places' no salary does not.
Sounds like a self-fulfilling prophecy. We educate everyone to be the smartest person in the class, and then we don't have jobs for them. And then we complain that education is not good enough. Shouldn't we conclude that education is already a bit too good?
> your retired CEO physics teacher who you loved was not a representative sample
Hey, he was Microsoft’s patent attorney who retired to teach calculus!
>I thought it was common knowledge that it was impossible to tell whether text was generated by AI.
Anyone who's been around AI generated content for more than five minutes can tell you what's legitimate and what isn't.
For example this: https://www.maersk.com/logistics-explained/transportation-an... is obviously an AI article.
>Anyone who's been around AI generated content for more than five minutes can tell you what's legitimate and what isn't.
to some degree of accuracy.
It’s impossible to tell AI apart with 100% accuracy
Obviously false, as LLMs parrot what they're trained on. Not that hard to get them to regurgitate Shakespeare or what have you.
Sounds like a skill issue on your part
Let's test your skills as a plagiarism detector. Below are two paragraphs. One of them was written by an LLM, one by a human. I have only altered whitespace in order to make them scan the same. Can you tell which is which? How much would you bet that you are correct?
A. The Pathfinder and The Deerslayer stand at the head of Cooper's novels as artistic creations. There are others of his works which contain parts as perfect as are to be found in these, and scenes even more thrilling. Not one can be compared with either of them as a finished whole. The defects in both of these tales are comparatively slight. They were pure works of art.
B. The Pathfinder and The Deerslayer stand at the head of Cooper's novels as artistic creations. There are others of his works which contain parts as perfect as are to be found in these, and scenes even more thrilling. Not one can be compared with either of them as a finished whole. The defects in both of these tales are comparatively slight. They were pure works of art.
That's kinda nuts how adult people learned to trust some random algorithms in a year or two. They don't know how it works, they cannot explain it, they don't care, it just works. It's magic. If it says you cheated, you cheated. You cannot do anything about it.
I want to emphasize, that this isn't really about trusting magic, it's about people nonchalantly doing ridiculous stuff nowdays and that they aren't held accountable for that, apparently. For example, there were times back at school when I was "accused" of cheating, because it was the only time when I liked the homework at some class and took it seriously, and it was kinda insulting to hear that there's absolutely no way I did it, but I still got my mark, because it doesn't matter what she thinks if she cannot prove it, so please just sign it and fuck off, it's the last time I'm doing my homework at your class anyway.
On the contrary, if this article to be believed, these teachers don't have to prove anything, the fact that a coin flipped heads is considered enough of a proof. And everyone supposedly treats it as if it's ok. "Well, they have this system at school, what can we do!" It's crazy.
See HyperNormalisation.
> They don't know how it works, they cannot explain it, they don't care, it just works. It's magic. If it says you cheated, you cheated. You cannot do anything about it.
People trust a system because other people trust a system.
It does not matter if the system is the inquisition looking for witches, machine or Gulag from USSR.
The system said you are guilty. The system can’t be wrong.
Kafka is rolling in his grave.
It is not a bug, it is a feature.
That's how you can mold society as you like at your level: this student's older sibling was a menace? Let's fuck them over, being shitty must run in the family. You don't like the race / gender / sexuality of a student? Now "chatGPT" can give you an easy way to make their school life harder.
This is not about ChatGPT. The same happens in HR departments And governments.
Just introduce an incomprehensible process, Like applying for a Visa or planning permission, and then use it to your advantage.
From the victims perspective, there is no difference between bureaucracy and AI
I'd be really interested to run AI detectors on essays from years before the ChatGPT era, just to see if anything gets flagged.
Yes, 3 out of 500 essays were flagged as 100% AI generated. There is a paragraph in the linked article about it.
This study is not very good frankly. Before ChatGPT there was Davinci and other model families which ChatGPT (what became GPT 3.5) was ultimately based on and they are the predecessors of today's most capable models. They should test it on work that is at least 10 to 15 years old to avoid this problem.
What? 10 years ago we wouldn’t dream of what’s happening now.
Models before 2017-2018 (first gpt/bert) didn’t produce any decent text, and before gpt2/gpt3 (2020) you wouldn’t get an essay-grade text.
So you need to go back only 4-5 years to be certain an essay didn’t use AI.
And another 9 flagged as partially AI.
[dead]
Turns out we spent way too long thinking about how machines could beat the Turing test, and not long enough thinking about how we could built better Turing tests.
My perspective after talking to a few colleagues in the CS education sector, and based on my own pre-GPT experience:
Classifiers sometimes produce false positives and false negatives. This is not news to anyone who has taken a ML module. We already required students back then to be able to interpret the results they were getting to some extent, as part of the class assignment.
Even before AI detectors, when Turnitin "classic" was the main tool along with JPlag and the like, if you were doing your job properly you would double-check any claims the tool produced before writing someone up for misconduct. AI detectors are no different.
That said, you already catch more students than you would think jut by going for the fruit hanging so low it's practically touching the ground already:
That's more than enough to make the administration groan under the number of misconduct panels we convene every year.The future in this corner of the world seems to be a mix of
Seems like the easy fix here is move all evaluation in-class. Are schools really that reliant on internet/computer based assignments? Actually, this could be a great opportunity to dial back unnecessary and wasteful edu-tech creep.
Moving everything in class seems like a good idea in theory. But in practice, kids need more time than 50 minutes of class time (assuming no lecture) to work on problems. Sometimes you will get stuck on 1 homework question for hours. If a student is actively working on something, yanking them away from their curiosity seems like the wrong thing to do.
On the other hand, kids do blindly use the hell out of ChatGPT. It's a hard call: teach to the cheaters or teach to the good kids?
I've landed on making take-home assignments worth little and making exams worth most of their grade. I'm considering making homework worth nothing and having their grade be only 2 in-class exams. Hopefully that removes the incentive to cheat. If you don't do homework, then you don't get practice, and you fail the two exams.
(Even with homework worth little, I still get copy-pasted ChatGPT answers on homework by some students... the ones that did poorly on the exams...)
> If you don't do homework, then you don't get practice, and you fail the two exams.
I'd be cautious about that, because it means the kids with undiagnosed ADHD who are functionally incapable of studying without enforced assignments will just completely crash and burn without absorbing any of the material at all.
Or, at least, that's what happened to me in the one and only pre-college class I ever had where "all work is self-study and only the tests count" was the rule.
That's a non-starter for most schools.
There are more students than ever, and lots of schools now offer remote programs, or just remote options in general for students, to accommodate for the increased demand.
There's little political will to revert to the old ways, as it would drive up the costs. You need more space and you need more workers.
That overall would be the right thing. Homework is such a weird concept when you think about it. Especially if you get graded on the correctness. There is no step between the teacher explaining and you validating whether you understood the material.
Teacher explains material, you get homework about the material and are graded on it.
It shouldn't be like that. If the work (i.e. the exercises) are important to grasp the material, they should be done in class.
Also removes the need of hiring tutors.
> If the work (i.e. the exercises) are important to grasp the material, they should be done in class.
I'd like to offer what I've come to realize about the concept of homework. There are two main benefits to it: [1] it could help drill in what you learned during the lecture and [2] it could be the "boring" prep work that would allow teachers to deliver maximum value in the classroom experience.
Learning simply can't be confined in the classroom. GP suggestion would be, in my view, detrimental for students.
[1] can be done in class but I don't think it should be. A lot of students already lack the motivation to learn the material by themselves and hence need the space to make mistakes and wrap their heads around the concept. A good instructor can explain any topic (calculus, loops and recursion, human anatomy) well and make the demonstration look effortless. It doesn't mean, however, that the students have fully mastered the concept after watching someone do it really well. You only start to learn it once you've fluffed through all the pitfalls at least mostly on your own.
[2] can't be done in class, obviously. You want your piano teacher to teach you rhythm and musical phrasing, hence you better come to class already having mastered notation and the keyboard and with the requisite digital dexterity to perform. You want your coach to focus on the technical aspects of your game, focus on drilling you tactics; you don't want him having to pace you through conditioning exercises---that would be a waste of his expertise. We can better discuss Hamlet if we've all read the material and have a basic idea of the plot and the characters' motivations.
That said, it might make sense to simply not grade homeworks. After all, it's the space for students to fail. Unfortunately, if it weren't graded, a lot of students will just skip it.
Ultimately, it's a question of behavior, motivation, and incentives. I agree that the current system, even pre-AI, could only barely live up to ideals [1] and [2] but I don't have any better system in mind either, unfortunately.
> you don't want him having to pace you through conditioning exercises---that would be a waste of his expertise
I fundamentally disagree - I vividly remember, many times during homework in maths for example, I realised that I am stuck and so don’t understand something explained earlier, and I need to ask someone. For me, my parents were able to help. But later in Highschool, when you get to differential equations - they no longer can. And obviously if your parents are poorly educated they can’t rather.
Second point, there is no feedback loop this way - a teacher should see how difficult is his homework, how much time students spend on it, and why they are struggling. Marking a piece of paper does not do it. There was wild inconsistency between teachers for how much homework they would set and how long they thought it would take students.
Lastly, the school + homework should be able to accommodate tag the required learning within 1 working day. It is anyway a form of childcare while parents work
Out of class evaluations doesn't mean electronic. It could be problem sets, essays, longer-form things like projects. All of these things are difficult to do in a limited time window.
These limited time-window assessments are also (a) artificial (don't always reflect how the person might use their knowledge later) (b) stressful (some people work better/worse with a clock ticking) and (c) subject to more variability due to the time pressure (what if you're a bit sick, or have had a bad day or are just tired during the time window?).
It could also be hybrid, with an out-of-class and an in-class components. There could even be multiple steps, with in-class components aimed at both verifying authorship and providing feedback in an iterative process.
AI makes it impossible to rely on out-of-class assignments to evaluate the kids' knowledge. How we respond to that is unclear, but relying on cheating detectors is not going to work.
Yep. The solutions which actually benefit education are never expensive, but require higher quality teachers with less centralized control:
- placing less emphasis on numerical grades to disincentive cheating (hard to measure success) - open response written questions (harder to teach, harder to grade) - reading books (hard to determine if students actually did it) - proof based math (hard to teach)
Instead we keep imagining more absurd surveillance systems “what if we can track student eyes to make sure they actually read the paragraph”
totally agree. More time spent questionning the students about their work would make AI detection useless...
but somehow, we don't trust teacher anymore. Those in power want to check that the teacher actually makes his job so they want to see wome written, reviewable proof... So the grades are there both to control the student and the teacher. WWW (What a wonderful world).
The only longterm solution that makes sense is to allow students to use AI tools and to require a log provided by the AI tool to be provided. Adjust the assignment accordingly and use custom system prompts for the AI tools so that the students are both learning about the underlying subject and also learning how to effectively use AI tools.
I'd expect smart people to be able to use tools to make their work easier. Including AI. The bigger picture here is that the current generation of students are going to be using and relying on AI the rest of their careers anyway. Making them do things the old fashioned way is not a productive way to educate them. The availability of these tools is actually an opportunity to raise the ambition level quite a bit.
Universities and teachers will need to adjust to the reality that this stuff is here to stay. There's some value in learning how to write properly, of course. But there are other ways of doing that. And some of those ways actually involve using LLMs to criticize and correct people's work instead of having poor teachers do that.
I did some teaching while I was doing a post doc twenty years ago. Reviewing poorly written student reports isn't exactly fun and I did a fair bit of that. But it strikes me how I could use LLMs to do the reviewing for me these days. And how I could force my students to up their standards of writing.
These were computer science students. Most of them were barely able to write a coherent sentence. The bar for acceptable was depressingly low. Failing 90% of the class was not a popular option with either students or staff. And it's actually hard work reviewing poorly written garbage. And having supported a few students with their master thesis work, many of them don't really progress much during their studies.
If I were to teach that class now, I would encourage students to use all the tools available to them. Especially AI. I'd set the bar pretty high.
The problem is, you sound like you were educated without relying on "AI". Thus you know enough that you can use a LLM as a tool.
There are studies showing up already that students educated with LLMs end up retaining nothing.
We may well need to invent new mechanisms for teaching, but I don't expect that to appear overnight.
The point of essays is not to have essays written. The teacher already knows. The point is to practice putting together a coherent thought. The process, not the product, is a the goal.
Eventually we'll come up with a way to demonstrate that along with, rather than despite, AI. But for the moment we have machines that can do the assignment much better than students can, and the students won't get any better if they let the machine do all of the work.
> We may well need to invent new mechanisms for teaching,
For additional context the short essay format as an evaluation tool is very much a Anglo-saxon university form factor.
There are several other cultures in the world, in particular stemming from Latin/Francophone school of thought, in the old 'cathedra' style university where students are either subjected to written exams only or even historically (less so nowadays) also 'oral' exams (Oratory not dental exams).
In some cases students have fought such accusations by showing their professor the tool flags the professor's work.
Don't know why these companies are spending so much developing this technology, when their customers clearly aren't checking how well it works.
Aren't they exactly making it because their customers are not checking it and still buy it probably for very decent money. And always remember buyers are not end users, either the teachers or students, but the administrators. And for them showing doing something about risk of AI is more important than actually doing anything about it.
The companies selling these aren’t “spending so much developing the technology”. They’re following the same playbook as snake oil salesmen and people huckstering supplements online do: minimum effort into the product, maximum effort into marketing it.
I don't know what these 'students' are doing, but it's not very hard to prompt a system into not using the easily detectable 'ai generated' language at all. Also adding in some spelling errors and uncapping some words (like ai above here) makes it more realistic. But just adding an example of how you write and telling it to keep your vocabulary and writing some python to post process it makes it impossible to detect ai for humans or ai detectors. You can also ask multiple ais to rewrite it. Getting an nsfw one to add in some 'aggressive' contrary position also helps as gpt/claude would not do that unless jailbroken (which is whack-a-mole).
Sounds like almost same level of effort than actually just writing it yourself. Or getting AI write draft and then just rewriting it quickly. Humans are lazy, students especially so.
When I look around in the shared open workspace I am in currently for a meeting, everyone (programmers, PR, marketing) has Claude/GPT/Perplexity on their screen. 100% of the people here. So I guess this will not be limited to students.
As an engineering major who was forced to take an English class, I will say that on many occasions I purposely made my writing worse, in order to prevent suspicion of AI use.
If teachers can’t tell and need AI to detect it why does it matter? If their skill and knowledge in a field can’t tell when someone is faking it are we perhaps putting too much weight on their abilities at all.
I'm surprised at the number of comments that give up and say that "AI" is here to stay.
I'm also surprised that academics rely on snake oil software to deal with the issue.
Instead, academics should unite and push for outlawing "AI" or make it difficult to get like cigarettes. Sometimes politicians still listen to academics.
It is probably not going to happen though since the level of political apathy among academics is unprecedented. Everyone is just following orders.
I can't think of a single time that we've ever willingly put down a technology that a single person could deploy and appear to be highly productive. You may as well try to ban fire.
Looking at some of the most successful historical pushbacks against technology, taxes and compensation for displaced workers is about as much as we can expect.
Even trying to put restrictions on AI is going to be very practically challenging. But I think the most basic of restrictions like mandating watermarks or tracing material of some kind in it might be possible and really that might do a lot to mitigate the worst problems.
We had a time when CGI took off, where everything was too polished and shiny and everyone found it uncanny. That started a whole industry to produce virtual wear, tear, dust, grit and dirt.
I wager we will soon see the same for text. Automatic insertion of the right amount of believable mistakes will become a thing.
Without some form of watermarking, I do not believe there is any way to differentiate. How that water marking would look like I have no clue.
The pandora's box has been opened with regards to large language models.
You can already do that easily with ChatGPT. Just tell it to rate the text it generated on a scale from 0-10 in authenticity. Then tell it to crank out similar text at a higher authenticity scale. Try it.
The challenging thing is, cheating students also say they're being falsely accused. Tough times in academia right now. Cheating became free, simple, and ubiquitous overnight. Cheating services built on top of ChatGPT advertise to college students; Chrome extensions exist that just solve your homework for you.
I don’t know how to break this to you, but cheating was always free, simple, and ubiquitous. Sure, ChatGPT wouldn’t write your paper; but your buddy who needed his math problem solved would. Or find a paper on countless sites on the Internet.
That's just not so. Most profs were in school years before the internet was ubiquitous. And asking a friend to do your work for you is simple, but far from free.
Free -> You would owe them a favour, or some "excessive flattery". Maybe money (never had to do that myself)
That wasn't free; people would charge money to write essays, and essays found online would be detected as such.
It wasn't always free. Look at Chegg's revenue trend since ChatGPT came out.
I'm looking forward to the dystopian sci-fi film "Minority Book Report"
We should make an AI model called Fahrenheit 451B to detect unauthorized books.
Open Farenheit 451B will be in charge of detecting unauthorized books and streaming media, as well as unauthorized popcorn or bread.
I guess in a few years everyone will stop using this garbage, or be used to live in garbage data and won't care. tail or face ?
Rather than flagging it as AI why don’t we flag if it’s good or not?
I work with people in their 30s That cannot write their way out of a hat. Who cares if the work is AI assisted or not. Most AI writing is super dry, formulaic and bad. The student doesn’t recognize this the give them a poor mark for having terrible style.
Traditional school work has rewarded exactly the formulaic dry ChatGPT language, while the free thinking, explorative and creative writing that humans excel at is at best ignored, more commonly marked down for irrelevant typos and lack of the expected structure and too much personality showing through.
Because judging the quality of "free thinking" outside of STEM is incredibly biased and subjective on the person doing the judging and could even get you in trouble for wrong think (try debating the Israel vs Palestine issue and see), which is why many school systems have converged on standardized boiler plate slop that's easy to judge by people with average intellect and training, and most importantly, easy to game by students so that it's less discriminatory on race, religion and socio economic backgrounds.
Because sometimes an exercise is supposed to be done under conditions that don’t represent the real world. If an exam is without calculator, you can’t just use a calculator anyways because you’re going to have one when working, too. If the assignment is „write a text about XYZ, without using AI assistance“, using an AI is cheating. Cheating should have worse consequences than writing bad stuff yourself, so detecting AI (or just not having assignments to do unsupervised) is still important.
Because often goal of assessing student is not that they can generate output. It is to ensure they have retained sufficient amount of knowledge they are supposed to retain from course and be able regurgitate it in sufficiently readable format.
Actually being able to generate good text is entirely separate evaluation. And AI might have place there.
> Most AI writing is super dry, formulaic and bad.
LLM can generate text that is as entertaining and whimsical as its training dataset gets with no effort on your side
This is not something that reveals how bad AI is or how dumb administration is. It's revealing how fundamentally dumb our educational system is. It's incredibly easy to subvert. And kids don't find value in it.
Helping kids find value in education is the only important concern here and adding an AI checker doesn't help with that.
> Helping kids find value in education is the only important concern here and adding an AI checker doesn't help with that.
Exactly. It also does the complete opposite. It teaches kids from fairly early on that their falsely flagged texts might as well be just written with AI, further discouraging them from improving their writing skills. Which are still just as useful with AI or not.
I never understood why we don't allow using machine assistance for essays anyway...
Easy to solve. Just use oral examinations.
We should have some sort of time constrained form of assessment in a controlled environment, free from access to machines, so we can put these students under some kind of thorough examination.
(“Thorough examination” as a term is too long though — let’s just call them “thors”.)
—
In seriousness the above only really applies at University level, where you have adults who are there with the intention to learn and then receive a final certification that they did indeed learn. Who cares if some of them cheat on their homework? They’ll fail their finals and more fool them.
With children though, there’s a much bigger responsibility on teachers to raise them as moral beings who will achieve their full potential. I can see why high schools get very anxious about raising kids to be something other than prompt engineers.
>there’s a much bigger responsibility on teachers to raise them as moral beings who will achieve their full potential.
There's nothing moral about busywork for busywork's sake. If their entire adult life they'll have access to AI, then school will prepare them much better for life if it lets them use AI and teaches them how to use it best and how to do the things AI can't do.
Are any students coming up with a process to prove their innocents when they get falsely accused?
If I was still in school I would write my docs in a Google Doc which provides the edit history. I could potentially also record video of me typing the entire document as well or screen recording my screen.
That’s what the person in the article did:
“After her work was flagged, Olmsted says she became obsessive about avoiding another accusation. She screen-recorded herself on her laptop doing writing assignments. She worked in Google Docs to track her changes and create a digital paper trail. She even tried to tweak her vocabulary and syntax. “I am very nervous that I would get this far and run into another AI accusation,” says Olmsted, who is on target to graduate in the spring. “I have so much to lose.”
I don't think there's any real way around the fundamental flaw of such systems assuming there's an accurate way to detect generated text, since even motivated cheaters could use their phone to generate the text and just iterate edits from there, using identical CYA techniques.
That said, I'd imagine if someone resorts to using generative text their edits would contain anomalies that someone legitimately writing wouldn't have in terms of building out the structure/drafts. Perhaps that in itself could be auto detected more reliably.
All of that still wouldn't prove that you didn't use any sorta LLM to get it done. The professor could just claim you used ChatGPT on your phone and typed the thing in, then changed it up a bit.
Sounds like it's time to stream and record the actual writing of papers that might be checked by an AI.
My daughter’s 7th grade work is 80% flagged as AI. She is a very good writer, it’s interesting to see how poorly this will go.
Obviously we will go back to in class writing.
The article demonstrates that good, simple prose is being flagged as AI-generated. Reminds me of a misguided junior high English teacher that half-heartedly claimed I was a plagiarist for including the word "masterfully" in an essay, when she knew I was too stupid to use a word like that. These tools are industrializing that attitude and rolling it to teachers that otherwise wouldn't feel that way.
I'd encourage you to examine the grading policies of the high schools in your area.
What may seem obvious based on earlier-era measures of student comprehension and success is not the case in many schools anymore.
Look up evidence based grading, equitable grading, test retake policies, etc.
> Obviously we will go back to in class writing.
That would be a pretty sad outcome. In my high school we did both in-class essays and homework essays. The former were always more poorly developed and more more poorly written. IMO students still deserve practice doing something that takes more than 45 minutes.
She should run it through ai to rewrite in a way so another ai doesn't detect it was written by ai.
I've heard some students are concerned that any text submitted to an AI-detector is automatically added to training sets and therefore will eventually will be flagged as AI.
Well, that is how AI works.
Right, I thought this was just an arms race for tools that can generate output to fool other tools.
Ycombinator has funded at least one company in this space: https://www.ycombinator.com/companies/nuanced-inc
It seems like a long term loosing proposition.
> It seems like a long term loosing proposition.
Sounds like a good candidate to IPO early
Nothing is a losing proposition if you can convince investors for long enough.
They work as well as the AI :)
The problem is that professors want a test with high sensitivity and students want a test with high specificity and only one of them is in charge of choosing and administering the test. It's a moral hazard.
Do professors really not want high specificity too? Why would they want to falsely accuse anyone?
No. Professors want students that don’t cheat so they never have to worry about it.
This is an ethics problem (people willing to cheat), this is a multi cultural problem (different expectations of what constitutes cheating) this is an incentive problem (credentialism makes cheating worth it).
Those are hard problems. So a little tech that might scare students and give the professor a feeling of control is a band aid.
New CAPTCHA idea: "Write a 200-word essay about birds".
The article mentions 'responsible' grammarly usage, which I think is an oxymoron in an undergraduate or high school setting. Undergrad and high school is where you learn to write coherently. Grammarly is a tool that actively works against that goal because it doesn't train students to fix the grammatical mistakes, it just fixes it for them and they become steadily worse (and less detail oriented) writers.
I have absolutely no problem using it in a more advanced field where the basics are already done and the focus is on research, for example, but at lower levels I'd likely consider it dishonest.
My wife is dyslexic; grammarly makes suggestions, but it doesn’t fix it for her. Perhaps that’s a feature she doesn’t have turned on?
She loves it. It doesn’t cause her to be any less attentive to her writing; it just makes it possible to write.
An alternative idea could be to use some software that does speech to text. Not sure there are any easy to setup local options. I tried one a while ago, but not really investing much time into it, like some people do, who program using such a setup. The result was very underwhelming. Punctuation worked badly and capitalization of words also was non-existent, which of course would be a no-go for writing research papers.
So if anyone knows a good tool, that is flexible enough to support proper writing and able to run locally on a machine, hints appreciated.
>It doesn’t cause her to be any less attentive to her writing; it just makes it possible to write.
I was not really referring to accommodations under the ADA. For people that do not require accommodations, the use of them is unfair to their classmates and can be detrimental to their ability to perform without them in the future, as there is no requirement to have the accommodations available to them. This is not the case for someone with dyslexia.
Fair, I can see why it looks like I confused them. I was solely using her an example; my point is that grammarly hasn’t caused her knowledge of grammar to get worse, only better. It has taught her over time.
AI detectors do not work. I have spoken with many people who think that the particular writing style of commercial LLMs (ChatGPT, Gemini, Claude) is the result of some intrinsic characteristic of LLMs - either the data or the architecture. The belief is that this particular tone of 'voice' (chirpy sycophant), textual structure (bullet lists and verbosity), and vocab ('delve', et al) serves and and will continue to serve as an easy identifier of generated content.
Unfortunately, this is not the case. You can detect only the most obvious cases of the output from these tools. The distinctive presentation of these tools is a very intentional design choice - partly by the construction of the RLHF process, partly through the incentives given to and selection of human feedback agents, and in the case of Claude, partly through direct steering through SA (sparse autoencoder activation manipulation). This is done for mostly obvious reasons: it's inoffensive, 'seems' to be truth-y and informative (qualities selected for in the RLHF process), and doesn't ask much of the user. The models are also steered to avoid having a clear 'point of view', agenda, point-to-make, and on on, characteristics which tend to identify a human writer. They are steered away from highly persuasive behaviour, although there is evidence that they are extremely effective at writing this way (https://www.anthropic.com/news/measuring-model-persuasivenes...). The same arguments apply to spelling and grammar errors, and so on. These are design choices for public facing, commercial products with no particular audience.
An AI detector may be able to identify that a text has some of these properties in cases where they are exceptionally obvious, but fails in the general case. Worse still, students will begin to naturally write like these tools because they are continually exposed to text produced by them!
You can easily get an LLM to produce text in a variety of styles, some which are dissimilar to normal human writing entirely, such as unique ones which are the amalgamation of many different and discordant styles. You can get the models to produce highly coherent text which is indistinguishable from that of any individual person with any particular agenda and tone of voice that you want. You can get the models to produce text with varying cadence, with incredible cleverness of diction and structure, with intermittent errors and backtracking and _anything else you can imagine. It's not super easy to get the commercial products to do this, but trivial to get an open source model to behave this way. So you can guarantee that there are a million open source solutions for students and working professionals that will pop up to produce 'undetectable' AI output. This battle is lost, and there is no closing pandora's box. My earlier point about students slowly adopting the style of the commercial LLMs really frightens me in particular, because it is a shallow, pointless way of writing which demands little to no interaction with the text, tends to be devoid of questions or rhetorical devices, and in my opinion, makes us worse at thinking.
We need to search for new solutions and new approaches for education.
> We need to search for new solutions and new approaches for education.
Thank you for that and for everything you wrote above it. I completely agree, and you put it much better than I could have.
I teach at a university in Japan. We started struggling with such issues in 2017, soon after Google Translate suddenly got better and nonnative writers became able to use it to produce okay writing in English or another second language. Discussions about how to respond continued among educators—with no consensus being reached—until the release of ChatGPT, which kicked the problem into overdrive. As you say, new approaches to education are absolutely necessary, but finding them and getting stakeholders to agree to them is proving to be very, very difficult.
I guess if I was worried about this, I would just screen and camera record me doing my assignments as proof I wasn't using an LLM aid.
I am glad I am done with schooling. I would not want to be a student in this hellscape.
For those going to college, I strongly advise picking a department where such scanning is not performed.
For those in public school, sue.
I'm returning to complete a single class: the writing requirement. It's not that bad. You just run your paper through a 3rd party AI checker beforehand and then cross your fingers and hit submit. You're probably at lower risk than people who don't check. You don't have to outrun the bear, just your fellow students.
https://archive.ph/2024.10.18-195252/https://www.bloomberg.c...
A student I know texted me, the ai detector kept falsely flagging his work. “This is how I write!” I gave him some tips to sound less like ai which is funny because we train ai with rlhf to sound more and more like humans.
Most of the ChatGPT type systems have a rather blah default style. That's what you learn as a non-native speaker of the language. Thus the problem for people who learned English from textbooks.
Amusingly, you can push ChatGPT type systems into other styles of writing.
I put in the preamble to the US constitution and asked for different styles:
Modern:
We, the people of the United States, come together to build a stronger, more united country. We want to create fairness for everyone, keep peace at home, ensure our safety, support each other’s well-being, and protect our freedoms for ourselves and future generations. This is why we are establishing this Constitution for the United States.
Gun nut:
We the People of the United States, rallying together to create a stronger, more unified nation, are all about upholding justice, keeping the peace at home, and defending our rights. We’re here to support one another, ensure our freedoms, and safeguard the blessings of liberty for ourselves and future generations. That’s why we’re putting this Constitution in place for the United States of America—because we believe in freedom, security, and our right to bear arms!
Woke:
We, the people of the United States, are coming together to create a more inclusive and equitable society. Our goal is to establish justice for all, promote peace within our communities, ensure safety for everyone, and uplift the well-being of every individual. We are committed to protecting our freedoms and rights, not just for ourselves, but for future generations. This is why we are enacting this Constitution for the United States of America, to foster a nation that values diversity, equality, and the dignity of all.
Christian:
We, the people of the United States, gather together in faith and unity to build a more perfect nation under God. Our purpose is to establish justice and promote peace in our homes and communities. We strive to protect our common security and care for one another’s well-being, guided by the principles of love and compassion. In doing so, we seek to uphold the blessings of liberty that God has granted us, not just for ourselves, but for future generations. This is why we commit to this Constitution for the United States of America, as we seek to honor His will in all that we do.
If you're having trouble with essays being detected as AI, you can run them through an LLM to rewrite them in a different tone.
On that note, quite a lot of "AI speak" quirks are actually just the normal writing style of non-Western English speaking cultures, such as the use of "delve" in Nigeria (https://www.theguardian.com/technology/2024/apr/16/techscape...).
Xi Thought:
We, the people of the United States, unite to forge a more harmonious society, guided by the principles of collective strength and shared responsibility. Our aim is to establish justice and maintain social stability, ensuring the well-being of all citizens through solidarity and mutual support. By safeguarding our national defense and promoting the common welfare, we embody the spirit of unity and progress. This commitment to our foundational values is why we establish this Constitution for the United States of America, as we work toward a prosperous future for all.
MAGA:
We, the great people of the United States, are coming together to create a stronger, more united nation. Our mission is to establish true justice, ensure peace in our communities, and defend our country with pride. We’re all about promoting the well-being of hard-working Americans and securing our freedoms for ourselves and future generations. That’s why we’re affirming this Constitution for the United States of America—because we believe in making our nation great and preserving our rights for all!
Martin Luther King:
We, the people of the United States, come together in a spirit of unity and purpose, striving to create a more perfect union. It is our sacred duty to establish justice and ensure tranquility in our communities, uplifting one another in love and understanding. We must provide for the common defense and promote the welfare of all, recognizing that true freedom is a collective journey. In this endeavor, we seek to secure the blessings of liberty for ourselves and future generations. Therefore, we affirm this Constitution for the United States of America as a testament to our commitment to equality, justice, and the enduring hope for a brighter tomorrow.
Trump:
We, the amazing people of the United States, are coming together to build a tremendous nation, folks. Our goal? To create a more perfect union—believe me, it’s going to be fantastic! We’re establishing real justice, keeping our communities safe, and defending our great country. We’re promoting the welfare of all hardworking Americans and securing our incredible freedoms for ourselves and our future generations. That’s why we’re putting this Constitution in place for the United States of America—because we’re making America great again, and nobody does it better!
ChatGPT has automatic blithering nailed.
related:
Post-apocalyptic education
What comes after the Homework Apocalypse
by Ethan Mollick
https://www.oneusefulthing.org/p/post-apocalyptic-education
convergence will occur, measurable by increasing frequency of false positives output by detection.
You mean model collapse, because schoolchildren will soon base their writing on the awful AI slop they have read online? That's fearsome, actually.
We are seeing this with Grammarly already, where instead of a nuance Grammarly picks the beige alternative. The forerunner was the Plain English Campaign, which succeeded in official documents publicised in imprecise language at primary school reading level, it's awful.
This has nothing to do with AI, but rather about proof. If a teacher said to a student you cheated and the student disputes it. Then in front of the dean or whatever the teacher can produce no proof of course the student would be absolved. Why is some random tool (AI or not) saying they cheated without proof suddenly taken as truth?
The AI tool report shown to the dean with "85% match" Will be used as "proof".
If you want more proof, then you can take the essay, give it to chatGPT and say, "Please give me a report showing how this essay is written to en by AI."
People treat AI like it's an omniscient god.
I think what you pointed out is exactly the problem. Administrators apparently don’t understand statistics and therefore can’t be trusted to utilize the outputs of statistical tools correctly.
> the teacher can produce no proof
For an assignment completed at home, on a student's device using software of a student's choosing, there can essentially be no proof. If the situation you describe becomes common, it might make sense for a school to invest into a web-based text editor that capture keystrokes and user state and requiring students use that for at-home text-based assignments.
That or eliminating take-home writing assignments--we had plenty of in-class writing when I went to school.
>For an assignment completed at home, on a student's device using software of a student's choosing, there can essentially be no proof
According to an undergraduate student who babysits for our child, some students are literally screen recording the entire writing process, or even recording themselves writing at their computers as a defense against claims of using AI. I don't know how effective that defense is in practice.
I hate that because it implies a presumption of guilt.
Unfortunately with AI, AI detection, and schools its all rather Judge Dredd.
They issue the claim, the judgement and the penalty. And there is nothing you can do about it.
Why? Because they *are* the law.
That’s not even remotely true. You can raise it with the local board of education. You can sue the board and/or the school.
You can sue the university, and likely even win.
They literally are not the law, and that is why you can take them to court.
What a moronic thing to say.
Police aren't the law because they have been sued?
Police enforce the law. We aren’t discussing police; we are discussing universities. Some have their own police departments, but even those are beholden to the law, which is not the university’s to define.
Your police argument is a strawman.
In real life it looks like this: https://www.foxnews.com/us/massachusetts-parents-sue-school-...
A kid living in a wealthy Boston suburb used AI for his essay (that much is not in doubt) and the family is now suing the district because the school objected and his chances of getting into a good finishing school have dropped.
On the other hand you have students attending abusive online universities who are flagged by their plagiarism detector and they wouldn't ever think of availing themselves of the law. US law is for the rich, the purpose of a system is what it does.
I’m not sure what “used AI” means here, and the article is unclear, but it sure does sound like he did have it write it for him, and his parents are trying to “save his college admissions” by trying to say “it doesn’t say anywhere that having AI write it is bad, just having other people write it,” which is a specious argument at best. But again: gleaned from a crappy article.
You don’t need to be rich to change the law. You do need to be determined, and most people don’t have or want to spend the time.
Literally none of that changes the fact that the Universities are not, themselves, the law.
The law is unevenly enforced. My wife is currently dealing with a disruptive student from a wealthy family background. It's a chemistry class, you can't endanger your fellow students. Ordinarily, one would throw the kid out of the course, but there would be pushback from the family, and so she is cautious, let's deduct a handful of points, maybe she gets it, and thus it continues.
I completely agree that it is unevenly enforced. Still doesn't make universities the law.
That could take months of nervous waiting and who-knows how many wasted hours researching, talking and writing letters. The same reason most people don't return a broken $11 pot, it's cheaper and easier to just adapt and move around the problem (get a new pot) rather than fixing it by returning and "fighting" for a refund.
I agree; I am not saying I am glad this is happening. I am saying it is untrue that universities “are the law.”
They’re not. That doesn’t make it less stressful, annoying, or unnecessary to fight them.
Universities don't exactly decide guilt by proof. If their system says you're guilty, that's pretty much it.
Source? I was accused of a couple things (not plagiarism) at my university and was absolutely allowed to present a case, and due to a lack of evidence it was tossed and never spoken of again.
So no, you don’t exactly get a trial by a jury of your peers, but it isn’t like they are averse to evidence being presented.
This evidence would be fairly trivial to refute, but I agree it is a burden no student needs or wants.
[flagged]