AI More Reliable than Lie Detectors to Trace NYT Op-Ed

Machines can hunt authors by their own writing styles

As a senior systems engineer with decades of IT experience in the private sector, it’s not that often that I professionally recommend looking to academia for a real world solution to anything, especially since Rolling Stone called me “the hacker who cared too much” for allegedly outsmarting all the PhDs at Harvard and single-handedly knocking their entire smug university off the Internet after its $2 billion primary pediatric teaching hospital got Justina Pelletier’s diagnosis FUBAR and nearly killed her.

But when it comes to finding out who in the White House - if anyone - wrote the recent anonymous op-ed in the New York Times, the best answer does appear to originate in America’s universities, and it’s not based on polygraph examinations, a.k.a. “lie detector tests.”


The problems with polygraph

First and foremost, “lie detector tests” have never passed scientific muster, and for this reason they are inadmissible in court. Second, polygraph examinations are time-consuming, especially when there are dozens of subjects who each must be tested. They also require the voluntary cooperation of the test subject, who has already been encouraged to resign from the White House as Vice President Pence has indeed asserted. And as Penn and Teller demonstrated on their Showtimes serie Penn & Teller: Bullshit!, people can easily be trained to fool “lie detectors” anyways.

Thankfully though, modern computer science has provided a better way which isn’t burdened by any of these limitations, and that solution is found in how the field is already tackling a related challenge.  


The rise of the Internet and the problem of attribution

For college professors, the arrival of the Internet was a double-edged sword. On the one hand, never before had such a diverse cornucopia of information been available to academicians in such an easily searchable way. Largely gone were card catalogs and the Dewey Decimal System, which had dominated information retrieval for decades, and in came LexisNexis and Google.

On the other hand though, the ease of copy/paste soon led plagiarism and other forms of cheating to run rampant. Occasionally, for sheer chance of recognizing the original work, a grader would detect when a student had turned in somebody else's paper as their own. But it seemed widely-accepted that the problem was far worse than had been documented and that most plagiarists were getting away with it.

It was obvious that the higher education system was littered with cracks through which cheaters could easily slip. For example, one professor probably wouldn’t recognize a paper that had previously been turned into one of his colleagues in the same department or for that matter at a different university all together. Thus, the age-old industry of paid paper writing started leveraging electronic communications, allowing the same work-for-hire to be monetized multiple times.

To respond, academia as a whole needed a way to trace a written work across institutions and to know to a high degree of accuracy where and when it had first appeared, if not its original author, even if some parts of the writing had been altered to evade detection.


The solution: Bayesian algorithms, heuristics, and machine learning

Ironically, colleges and universities would have to deploy artificial intelligence in order to combat the dropping level of human intelligence in their own student bodies.

Bayesian algorithms, the same type that detect spam emails in your inbox, were useful, but could only go so far. Thankfully though, heuristics and machine learning were starting to come of age. What all of these have in common is their ability to detect patterns.

And what writers consider to be idiosyncrasies can be quantified and then detected by machines, just as human beings might learn to recognize the unique styles of Twain or Hemingway or Thoreau.

But unlike humans, machines have perfect memories and in fractions of a second computers can compare a writing sample to the completed works of every published author who ever lived. Also, unlike humans, a computer’s pattern matching gets more accurate as its reference library grows larger.

According to Wikipedia, in order to prevail on Jeopardy! and thoroughly best 74-time human champion Ken Jennings, IBM’s Watson, which is probably the world’s most widely-known machine learning device, needed a collection comprised of “millions of documents, including dictionaries, encyclopedias, and other reference material.”

So all that the White House needs to do now in order to find the person who wrote the NYT op-ed is simply prime one of the far more primitive plagiarism-detecting systems used in academia with the emails and other writing of White House staff as well as with similar compositions of NYT reporters and other suspects and feed in the text of the op-ed.

Then the administration can question the culprit who likely authored the piece to find out whether it was submitted to the paper as their own work or whether, ironically, they ghost wrote it for someone else of a higher stature who the New York Times found more compelling.

Thus, whoever sent that op-ed to the Times made a big miscalculation if they thought they were just going up against the President. Maybe they would have thought twice if they had bothered to look into how they could be tracked. And perhaps the Times should think about whether an op-ed attributed to “a senior official in the Trump administration” could ever remain anonymous in the 21st century.

The author, Marty Gottesfeld, is an Obama-era political prisoner. To learn more about his case and/or support him go to