Classified documents are obscuring America's past, Matthew Connelly says. Historian Matthew Connelly says government records are marked as classified three times every second — and many of them will never be declassified. His new book is The Declassification Engine.

Is the U.S. government designating too many documents as 'classified'?

  • Download
  • <iframe src="https://www.npr.org/player/embed/1149906531/1150037855" width="100%" height="290" frameborder="0" scrolling="no" title="NPR embedded audio player">
  • Transcript

DAVE DAVIES, HOST:

This is FRESH AIR. I'm Dave Davies, in today for Terry Gross. We've heard a lot lately about presidential mishandling of classified documents, and no doubt we'll hear more in the coming months. Our guest today, Matthew Connelly, says one reason confidential records get where they shouldn't is that there are way, way too many of them, and that has some serious consequences. Connelly has studied the subject for years, and he says nowadays the U.S. government classifies so many records that it's become almost impossible to preserve them all, much less review and declassify them for future study by researchers and historians. He writes that there are enough government files locked up in record centers across the country to fill 26 Washington Monuments.

Connelly says records are often classified simply to protect the reputations of officials involved. His new book is a history of the cynical use of government secrecy and an account of his own effort to address the problem. He and some data scientists have used artificial intelligence and machine learning to develop a system to analyze huge troves of records to determine which should be truly classified and which can be made public. He'll tell us in a bit what happened when he tried to interest government officials in that idea. Matthew Connelly is a professor of international and global history at Columbia University and principal investigator at History Lab, a project that's been funded by the National Science Foundation but now gets support from the National Endowment for the Humanities and others. His new book is "The Declassification Engine: What History Reveals About America's Top Secrets."

Matthew Connelly, welcome to FRESH AIR.

MATTHEW CONNELLY: Thank you, Dave - happy to be here.

DAVIES: Give us a sense of how many classified records we have in the United States and how it's grown in recent years.

CONNELLY: Well, Dave, you know, the real answer is no one really knows. One way of measuring this is, you know, how often government officials decide to classify something. And that is a number - in fact, it's a number that was published every year by the government. When I began my work back in 2012, it was 95 million times. So three times every second, some government official somewhere decided to classify something. But more recently, the government has given up on even trying to estimate how many times officials are classifying information. And, of course, you know, they're not just stamping documents as secret or top secret. More and more of what's classified are things like PowerPoint presentations and spreadsheets and text messages and, you know, video conferences and whatnot. So the sheer volume is something we can't even measure anymore in paper, as much as we might like to. More and more, you know, what it is that we don't know is what's stored in the cloud or, in some cases, deleted and just destroyed completely so that no one will ever know.

DAVIES: Wow. Now, I assume part of the reason for all this is that digital technology simply produces so many records that never existed, you know, three decades ago - emails, text messages, PowerPoint presentations and the like.

CONNELLY: Yeah, that's right. I mean, classified information, in that way, is like other information, right? And we know that information of all kinds has been growing exponentially. But what's different about secret information produced by the government is that nobody's ever going to be allowed to see any of this unless some other official decides, you know, that information is safe to release to the public. And the method for reviewing these records and releasing them to the public hasn't changed. It hasn't changed in 80 years. And so even now, even though we're living in this age of big data, it's still a requirement that officials review every one of these pages, page by page. And in some cases, you'll have multiple officials going through the same records - in some cases, redacting different things. And the whole process is incredibly laborious, and there's almost no effort to try to apply technology to try to accelerate it.

DAVIES: You know, I've worked as a reporter for a lot of years. And I did, you know, records requests from local governments and state governments. And, you know, I would get - it might take weeks or months or days. When I did a federal Freedom of Information request, it would sometimes take years. I guess that's what's going on here.

CONNELLY: That's right. I submitted myself - I submitted FOIAs to try to get records from presidential libraries, several of them. That was several years ago. And in all this time, I haven't received one page.

DAVIES: Do you know that, for a fact, that more people in the government have the ability to classify material than was true, I don't know, 10 or 20 years ago? Or is it that the same number of people have the ability to mark stuff as classified, but they're using it more broadly?

CONNELLY: You know, that is a - that determination, you know, as to who has the power to classify information, that's one of the ways in which different presidents have tried to retain control of secrecy - or regain control, I should say. You know, because even presidents, you know, worry that there are too many officials out there who have this power, you know, to create new secrets to the point where even the White House may not know, you know, what it is that's out there - right? - these unknown special access programs and so on. And so one way they do that is they will limit the number of officials who have what's called original classification authority. And the number is actually relatively low. It's, you know, some two to three thousand officials, many of them appointed by the president. And in this way, presidents, you know, try to decide, you know, who it is who's allowed to create secrets on their behalf. And they try to keep that power from the, you know, parts of the government that might want to classify secrets for their own reasons.

DAVIES: You know, I guess one way of getting a sense of whether material is being classified that really doesn't need to be, that really doesn't provide confidential information, is to look at a huge batch of classified records. And, of course, that's not easily done. But sometimes they get leaked, and I'm thinking of when Chelsea Manning leaked about a quarter-million State Department cables. How sensitive was the information in that material? Do we know?

CONNELLY: Well, some of it was incredibly sensitive. I mean, there were the names of people who were confidential informants for American diplomats. You know, there were people who, for example, spoke to foreign service officers about human rights abuses in China. There were people who spoke out about things that were happening in their own country in the Middle East or what have you. And, you know, in cases like that, some of these people had to flee for their lives, I mean, quite literally.

But in fact, you know, if you remember that time when, you know, WikiLeaks dumped, you know, some quarter of a million of these diplomatic cables, for a few weeks, there were many stories published in the media, dozens of them. Some of them, you know, were quite entertaining. You know, we definitely learned some things about American diplomacy that we didn't know previously. But for the most part, what we found was that the secrets contained in those documents were the secrets of other governments, right? And some of it was embarrassing, right? We found out, for example, how, you know, the Gulf states were trying to encourage the United States to go to war with Iran. You know, we found out, too, you know, how does - that China was trying to hack Google and so on.

But for all the hype, you know, about how it is that these documents were going to reveal, you know, high crimes committed by the U.S. government, for the most part, what we found was rather banal, right? I mean, a lot of what's secret is actually not that secret. And so, you know, when, for example, stories were published in The New York Times and other places, in many cases, the same information was already available, only now they're able to quote from these once-secret documents. For the most part, it only confirmed, you know, that old joke. You know, a lot of secret intelligence is actually not that secret, and what is secret is, in some cases, not very intelligent.

DAVIES: I want to get to that in a minute. What is the criteria by which documents are supposed to be classified, as opposed to made available to the public?

CONNELLY: So documents are not supposed to be classified as national security information unless it poses a threat to American national security. And over the years, you know, practically every president has issued an executive order where they try to define, you know, what it means for something to be top secret or secret or confidential or what have you. Now, the problem is, of course, you know, is that there are two or three thousand officials who have the power, you know, to classify, you know, something new - a new program, a new technology. They have the power to decide that everything related to that program is going to be classified at a certain level. Let's say it's top secret. But once they've done that, you know, every official who's involved - and there are literally millions of people who have security clearances - every one of them then can and, in fact, is required to stamp that anything related to that, as being classified at the same level. And so what happens is, you know - maybe to start with, you know, some of this stuff is sensitive. But over time, you know, so much information starts to be classified that it can be ludicrous.

And some good examples of that include when - if you recall during the - you know, the so-called scandal over Hillary Clinton's emails, a lot of what ended up getting classified was, for example, newspaper reporting, you know, about drone strikes, right? So even information that's, you know, out there in the public, even things, you know, in some cases, that were headline news at the time, once they're exchanged among senior officials, then even that information ends up getting classified.

DAVIES: Why would a newspaper article be designated as classified?

CONNELLY: Now, the reason given is that it's one thing, you know, for, say, The New York Times to disclose that the United States is operating drones over Pakistan - right? - that are killing Pakistani citizens. You know, but if it appears - if only because an official, you know, is exchanging a report about this program with some other official, if they do so in a way that's then subsequently disclosed in public, it can appear to be an acknowledgement - right? - that there's - the U.S. government has knowledge of this program, in fact, that the U.S. government is conducting this program to the point where, you know, the government of Pakistan might then be obliged to respond - right? - and answer questions as to why this is happening, why they're allowing this to happen on their own soil.

And so these are examples where, you know, the information is not actually secret, but what they try to conceal is plausible deniability. You know, it's the idea, as implausible as it may seem, that if officials are not acknowledging drone strikes, that they're killing Pakistani citizens, then the Pakistani government doesn't have to answer for it. It helps to imagine yourself in the position of these officials - right? - the ones who are making these kinds of decisions. And the people who are doing this typically are lifers - right? - as opposed to, say, the political appointees, you know, who might be on the National Security Council for a few years. In most cases, the people who are deciding, you know, whether the information is going to be reviewed, whether it's going to be redacted and released, these are lifers, people who devoted, in some cases, decades of their lives to serving in the national security establishment, right?

So they've gone through this long process where they've gotten their security clearance. They've been read into different programs. And they've dedicated their lives, you know, decades, you know, to working within this community. And this community, in some cases, resembles a cult, right? It has its code words. It has its initiation rituals.

And so within this world, you find people who, in fact, in some cases, don't really know much about the world outside that world. And they get to be very jealous of their prerogatives. And they are very protective of the things that they decide are national security information to the point where they might even resent a little - in some cases, a lot - the people who they would view as interlopers, these political appointees who come and go and, from their point of view, don't always honor and respect, you know, the things that they think are secret and have to remain so.

DAVIES: We need to take a break here. Let me reintroduce you. We're speaking with Matthew Connelly. He's a professor of international and global history at Columbia University. His book is "The Declassification Engine: What History Reveals About America's Top Secrets." We'll continue our conversation in just a moment. This is FRESH AIR.

(SOUNDBITE OF AHMAD JAMAL'S "THE LINE")

DAVIES: This is FRESH AIR, and we're speaking with Matthew Connelly. He's a professor of international and global history at Columbia University. His new book is about the overclassification of records by the United States government. It's called "The Declassification Engine: What History Reveals About America's Top Secrets."

You write that all of the post-World War II presidents have promised openness and accountability, but have increasingly used secrecy, classifying more and more material. And you write that Barack Obama really outdid his predecessors in this. In what way?

CONNELLY: Well, in the sense that, you know, to the extent we can actually measure, like, how many times government officials create new secrets, that reached its all-time peak under the Obama administration, right? And there are other ways you could look at this as well. Under the Obama administration, the Justice Department indicted more people, you know, under the Espionage Act for disclosing classified information than all previous administrations combined. And in fact, the amount of spending on government secrecy, it reached new heights - right? - to the point where by the end of his administration, the U.S. government was spending some $18 billion on protecting national security information.

DAVIES: What does that figure come from? What does that mean? Is it guards at the Archives? What is it (laughter)?

CONNELLY: Oh, it means - it can mean a lot of things. It can mean the kinds of physical protections, you know, that you see in spy movies - right? - those retinal scanners - right? - the barbed wire fencing. It can also mean, like, training programs - right? - the way that they have to train new employees and how to recognize and protect secrets. It could also mean, you know - and more and more, this is what it actually means. It can mean all the systems in the software to try to prevent, you know, the Edward Snowdens of this world from releasing some of these secrets.

DAVIES: What about Donald Trump? Was he different from his predecessors?

CONNELLY: Well, you know, here we have another example. Here's a president who said that when he became president, he was going to, for example, release all the secrets that have been held up to that point about the Kennedy assassination. But somehow, in the end, he decided not to, right? And, in fact, you know, by the end of his administration, he was saying that, you know, everything he touched - you know, everything was classified at the highest level.

As much as Donald Trump, you know, seemed to represent a dramatic, you know, departure from traditions of the American presidency, in one way, he was entirely consistent with every president who came before him. And that's because, you know, he decided that he didn't even need to write a new executive order for secrecy. Every president going all the way back to FDR had written an executive order that was supposed to define and control national security information. But Donald Trump decided he was happy with the executive order written by Barack Obama. And so he just kept everything in place.

DAVIES: When he said that the material that he had at Mar-a-Lago he had declassified personally just by, I guess, declaring it declassified - I don't think his lawyers have argued that in court. But he has said that. Is that possibly true? Can a president do that?

CONNELLY: He could have done it if he wanted to. I mean, presidents, you know, do have this sovereign power over secrecy. But there's no evidence that he actually did. I mean, he suggested even that he could have done it in his own mind. But actually, you know, this system is incredibly complex, and it's - above all, it's about paper, right? So every decision like this about whether something gets declassified or how it's classified - all this has to be recorded in paper, and so there's no evidence that Trump declassified any of these records.

DAVIES: You know, as an example of what it's like to declassify a particular batch of records, you cite Hillary Clinton's emails, and that actually began with a freedom of information request from a reporter, is that right? Do I have this right?

CONNELLY: Yeah. That's right. This was something that began because a reporter filed a FOIA and in this case, at least, a judge decided that, in fact, the government had the responsibility to review these records to decide what could be released. Now, until then, you know, as we know, Hillary Clinton was treating these as personal records, but once those records were disclosed and everyone began to realize that these were actually public records and they belonged to the people, then the government was legally obliged to review them, and it was then that a whole - you know, it was - apparently about 50 different people began going through these records page by page and deciding what could be released and what had to be redacted.

DAVIES: And what did they find? I mean, how long did it take to go through how many emails?

CONNELLY: You know, they're - are still releasing some of those emails. I mean, if you go back and check every now and then, a few more have gotten out, but - bulk of them, it took about eight months. And if you think about 50 different people having to go through these records, there are some 30,000 of these email, and having to read them, you know, page by page, it may strike some of us as, you know, an inordinately long time. But the fact is those 50 different people represented many different departments and agencies. And many times, you know, the CIA will consider something as classified, even top secret, when, say, the State Department decides that, in fact, this is already public knowledge. And so oftentimes, you know, much of that time, these officials are arguing with one another, and they're quibbling over what has to be redacted and what has to be withheld.

DAVIES: And just so that we're fair about this, in the end, how serious a breach, if any, was it that Hillary Clinton had these emails on her private server?

CONNELLY: So some of those records, you know, were classified at the highest level. You know, some of them involve what's called sensitive compartmentalized information, right? That might have been information related to signals intelligence. It will likely be a long time before we know, you know, what actually was in there. And certainly, you know, there's good reason to believe that some of what was in there was actually pretty innocuous, and it was being withheld, you know, largely for political reasons, diplomatic reasons, in the sense that these were things that the U.S. government didn't want to disclose officially even if they were already known or largely known to the public.

What bothers me more, you know, as a historian, even as a citizen, is how many of those emails were destroyed because before Clinton, you know, handed over those 30,000 or so emails that she decided were public records. There are tens of thousands of other records that she and her lawyers decided were private. They deleted them so that no one will ever know, you know, what it is that she decided no one else was going to be allowed to see.

DAVIES: It was recently reported that President Biden - his lawyers discovered records - some classified records of his. What's your take on that?

CONNELLY: Yeah. I think you could look at it two ways. You know, one way of looking at it is that this is just more evidence about how state secrecy is out of control, right? I mean, they just can't keep track of all the secrets that they're generating - right? - because there are just too many of them. And so, you know, even when - you know, if you credit Joseph Biden and the people around him and you think that they were responsible, then you still have to ask yourself, you know, how is it that they, you know, lost track of records that apparently, at least in some cases, were classified as top secret?

But the other way to look at this is how it is that, you know, time and again - here's another example. You see that the people - whether it was the president or in this case, the vice president, the people who are most invested in state secrecy, the ones who are most determined, you know, to cling to these secrets - it turns out these are the people, you know, who have the power over these secrets because it's, you know, above all, the president and in this case, the vice president, who don't want to give up that secrecy. And, you know, in the case of Trump and Mar-a-Lago, I think that was, in a way, like, the reductio ad absurdum because in this case, you literally, you know, had a president clinging to secret documents, refusing to give them up, right? But if his addiction to that power, you know, that powerful feeling of having control over secrets - right? - and not having to reveal them to anyone and being able to keep them all to himself - if Trump is, like, an extreme case, then it could be that Biden had a somewhat more mild affliction.

DAVIES: And I guess in a discussion of Trump and his treatment of records, you have to note the number of cases where it's been reported that he, you know, tore some documents up into tiny pieces or, in some cases, flushed them down the toilet.

CONNELLY: Yeah. And, I mean, to me, it still boggles the mind. I mean, this is the president who apparently made it a practice to tear up the papers after he was done working on them. And there are actually two people who work full-time in the White House just trying to tape these papers back together, right? Right? So, to me, this is the ultimate example - right? - where, you know, the American people, we would think - right? - that in electing our presidents, that the least that they can do is keep a record of their actions - right? - because after all, if they don't leave a record, how in the end can we keep them - how in the end can we hold them responsible, right? If we - and now I'm talking about historians - you know, if even historians, even decades after the fact, if we don't have the documents to reconstruct the decisions made by powerful people, then in the end of the day, you know, even the court of history is closed, right? In the end, these people, ultimately, are accountable to no one.

DAVIES: Let me reintroduce you. We're going to take another break here. We are speaking with Matthew Connelly. He's a professor of international and global history at Columbia University. His book is "The Declassification Engine: What History Reveals About America's Top Secrets." He'll be back to talk more after this short break. I'm Dave Davies, and this is FRESH AIR.

(SOUNDBITE OF JOHNNY KLIMEK AND TOM TYKWER'S "HETZJAGD")

DAVIES: This is FRESH AIR. I am Dave Davies, in for Terry Gross. We are speaking with Columbia University professor of international and global history Matthew Connelly. His new book is about the increasing trend of public officials to designate government records as classified often for cynical reasons. He says there are now more classified records than the government can realistically store and review for declassification, effectively hiding historical details from future researchers and denying the public knowledge of their leaders' actions. Connelly and some data scientists have crafted a method using artificial intelligence to try and address the problem. His book is "The Declassification Engine: What History Reveals About America's Top Secrets."

You know, I think we'd all agree that, you know, there are some details that need to be kept from the public. But, you know, your book kind of looks at the history of this and cites some truly appalling examples of leaders keeping records hidden for really cynical reasons. Do you want to give us an example?

CONNELLY: Oh, where to begin, right? I think that we don't need the details of every covert operation. You know, we don't need to understand, like, the minutia when it comes to NSA exploits and so on. But we should know something - right? - about the fact of NSA surveillance. We need to know when our government goes to war. We need to know about that. And in some cases, like, the kinds of things that are kept from the public, they're more than just details. I mean, there are famous examples. And in some cases, like, in my book, we find ways in which the full dimensions of these things were unsuspected, right? And now, you know, by reconstructing it, using data science, we can begin to learn more. But to me, like, what's revealing is how, you know, even some very familiar stories, even the story of Pearl Harbor, for example, we can learn things not just about what happened, but what it is the government didn't want us to know about them.

DAVIES: You describe the clash of an Air Force bomber that had, I think, three civilians on it, who were killed. And the families wanted information about this. You want to describe this?

CONNELLY: Yeah, absolutely. This is an example where the Air Force was conducting research in electronics. They had an American bomber that was fitted out and equipped with RCA technicians - right? - who were onboard monitoring the equipment. They ran into trouble. And eventually, the plane crashed, and the crew - most of them were able to bail out. But these technicians died in the wreckage.

And afterwards, their widows tried to find out what happened. They tried to get the Air Force, you know, to release the accident report. And the secretary of the Air Force at the time insisted that this was national security information that would endanger the government's security if was released to them. And in the end, the Supreme Court agreed with the Air Force, and they set an important precedent where for decades thereafter, judges will not even examine records that government officials claim represent national security information.

And this is a great example, you know, how it is the judiciary has deferred to the executive branch. They've allowed, you know, the White House and the Pentagon to decide what's national security information. And they won't even second-guess, you know, what it is that they've claimed is information that would endanger national security.

In this case, when the report was finally released, it was found that, in fact, there really was no national security information. But there was a lot of information in that report of the negligence of the Air Force, you know, the fact that these technicians had never been briefed about emergency procedures, you know, how it is that, you know, the plane itself was not airworthy. And this was the secret, in fact, that the Air Force was trying to keep from the public. And in this case, it led them to defraud these widows.

DAVIES: You know, you just described a case of an abuse which eventually became public. And I guess what that says is that, you know, in the '50s and '60s and '70s, a lot of classified records were retained and some of them eventually declassified and revealed really embarrassing stuff. We're now in a position where there are so many records that they - that it seems they may never be declassified. Are records being destroyed as a result of this?

CONNELLY: Oh, absolutely. My favorite example is the joint chiefs of staff. So back in 1971, the joint chiefs of staff began to realize that some of the records that they had preserved over the years could eventually be released because of the Freedom of Information Act. And it was then that they decided that they were going to destroy all of the records of all their meetings going back decades. And henceforth, the joint chiefs would never again, you know, keep records of their meetings, right? So just imagine, you know, it's as if the Pentagon was being run like a numbers racket, and the people at the top decided they would commit nothing to paper. So that's an example, you know, how it is - in some cases, it's not just about destroying records. It's also choosing not to create records lest those records have to be released.

DAVIES: Wow. I mean, how can you run a board as important as the joint chiefs of staff without some records?

CONNELLY: You would think. And the few records that have been released - like, for example, the most infamous may be Operation Northwoods. I mean, this was an example where the Pentagon proposed to the Kennedy White House that they carry out attacks on American citizens, that they stage bombings in American cities, that they sink refugee boats - all this so that they could create cause for going to war with Cuba, right? So if this is the kind of information that did trickle out, you just have to ask yourself, how much more is there that was destroyed?

DAVIES: So there's a real loss of accountability when important records are never declassified or destroyed. I mean, the other thing - and you've made this point - is that it makes it harder for historians to mine primary documents of recent decades and get, you know, those insightful details about what really happened behind closed doors. Is there evidence that historical research has been affected by this?

CONNELLY: Oh, absolutely. You know, when I started out, when I was a graduate student in the 1990s, I and a lot of my colleagues, we were doing research about the 1950s because typically, you know, back then anyway, it took about 30 years or so before a lot of these records about American wars and diplomacy would be released. And over the years, you know, when, 10 years later, you know, people began to look back, they began working on the 1960s, right? And then, you would expect a decade later, people would start working on the 1970s. But that's not happening now.

There are actually fewer records available, you know, for studying the history of the 1970s than we have for the 1940s. If we take it up to the present - you know, if historians now are working on the history as recent as the history we were working on 30 years ago, you'd expect that we'd have new revelations all the time about the history of, say, the end of the Cold War - right? - the 1990s. This history is incredibly relevant - right? - when you think about the way Putin has accused the U.S. and others of reneging on our commitments - right? - when we decided to expand NATO into Eastern Europe and tried to expand it even to Ukraine. This history is super-relevant, too, when we think about, you know, the history of the 1990s and humanitarian emergencies and all the rest. But we can't actually study that history because we just don't have the records. The review and release of records, whether we're talking about CIA documents or State Department volumes, has almost ground to a halt.

DAVIES: So I want to talk about this project, which you and some data scientists have engaged in, that's an effort to address this problem. Let's just walk through this. I mean, you and these data scientists from Columbia have assembled this huge store of declassified government documents. I mean, roughly, how many, what kind of documents?

CONNELLY: Well, we're closing on 5 million. And we're also, you know, beginning to bring in records from international organizations and alliances like NATO, the United Nations, the World Bank and so on.

DAVIES: And they've been declassified. Does that mean they have redactions, you know, those big, heavy, blotted lines where names and parts are redacted?

CONNELLY: Yeah, absolutely. In fact, you know, for me, that's the fun part because in many cases, what you find is duplicate versions of the same document. You know, you might have a document that was found in a presidential library. But you'll find one version that was reviewed and released by, say, the CIA. And you find another version that was reviewed and released by the Pentagon. And different viewers redacted different things. And so you could begin - by amassing them, aggregating them, you could begin to make out patterns and anomalies and the kinds of things the government still doesn't want us to know.

DAVIES: Right. Now, I can imagine curling up by the fire with two versions of the same document and having some fun with that. I mean, with 5 million documents, you're not going to do it that way. You use artificial intelligence, machine learning, I mean, you - to develop algorithms to analyze these documents. I mean, these are words - I don't really get what all this means. But how do you analyze it? And what do you hope to discover?

CONNELLY: So one thing I love about this is that, you know, as a historian, typically, when I go to an archive, I have to know what I'm looking for, right? And, you know, it could be that I'm interested in, say, Pentagon war plans, you know, or covert operations or what have you. But, you know, let's say, instead, I'm approaching not a physical archive, right? But I'm approaching, in this case, a database. It has millions of documents. And maybe I don't necessarily know what I'm looking for because what I really want to know is, what information does the government consider most sensitive, right? I don't know the answer to that.

But by using algorithms, you know, I can, in effect, you know, make out the patterns and the practices of reviewing releasing materials. And I can figure out what it is the government is least likely to reveal. And I can also find out things like, what are they most likely to redact? And the fun part, again, for me is, I don't know what the answer is when I start out. I may have my suspicions, but oftentimes I'm surprised.

DAVIES: And so by use of all these algorithms, going through all these millions of documents, are you able to develop hypotheses or conclusions about what should not have been classified?

CONNELLY: Absolutely, Dave. So we ran an experiment where we wanted to see whether you could train an algorithm to predict the classification of one of these diplomatic cables, right? And when you do this kind of research, it's, in a way, a little bit like, say, Amazon - right? - when Amazon wants to predict, you know, what it is that you might want to purchase, or Netflix wants to predict what kind of movie you might like to see. And the way you do this is by using a lot of training data, right? So, you know, Amazon - they know all the things you purchase. And Netflix knows all the movies that you've watched.

And in this case, what we know is we know all of those millions of cables that have been classified at different levels and what sort of topics, you know, what sort of, you know, other information we can find - like who's sending them, who's receiving them, what kind of language did we find in these communications? And by training algorithms to do this, you can actually get pretty good accuracy. Sometimes - you know, 90% of the time, you can predict beforehand whether a cable will be classified. And you'll be able to determine, you know, whether it was classified as secret or unclassified.

But to me, what's, in some cases, even more interesting is you can find those cases where the algorithm is wrong. And so we found hundreds of examples, you know, of things were completely innocuous, you know, like hotel reservations - right? - that was classified at the time as secret. On the other hand, you know, we found things that the algorithm predicted should have been classified as secret but weren't. And there we found things, you know, that - confidential informants, you know, told CIA case officers, you know, about, you know, criminal activity that was ongoing at the time and, in some cases, leading to diplomatic crises. You know, we found things that almost certainly should have been classified.

Now, it wasn't just me. I actually, you know, had the good fortune to have colleagues, you know, had security clearances - or at least did in the past - and had a lot of experience in this area. And I gave them, you know, a blind test, right? I asked them to look at those documents that were classified and those documents that weren't classified. And I asked them to tell us, you know, which of those records should have been classified and at what level. And time and again, you know, they were astonished, you know? They were more likely to agree with the algorithm than to agree with the officials who originally classified this information or didn't classify it.

So the conclusion I took away from this is that there is a lot of human error in official secrecy, right? There are many examples. We found many hundreds of examples where officials either were over-classifying things or examples where they were under-classifying things, things that could actually have been potentially harmful for national security.

DAVIES: Let me reintroduce you. We'll take another break here. We are speaking with Matthew Connelly. He's a professor of international and global history at Columbia University. His new book is "The Declassification Engine: What History Reveals About America's Top Secrets." We'll talk more after a moment. This is FRESH AIR.

(SOUNDBITE OF CUONG VU AND PAT METHENY'S "SEEDS OF DOUBT")

DAVIES: This is FRESH AIR. And we're speaking with Matthew Connelly. He's a professor of international and global history at Columbia University. His new book deals with the overclassification of documents in the United States, leading to the prospect that many will never be declassified. And some will be destroyed. His new book is called "The Declassification Engine: What History Reveals About America's Top Secrets."

So if you took this method of developing algorithms with machine learning to analyze, you know, hundreds of thousands, maybe millions of government documents, would this help the government make more expeditious and wise decisions about declassifying material? Could it help solve the problem of having way, way too much classified material?

CONNELLY: Oh, absolutely. For more than 10 years now, you can find government officials, government committees and commissions all coming to the same conclusion. They would say that at least part of the reason why we are seeing such a crisis in official secrecy where we can't even protect the information that really ought to be protected - it's because of the way in which, you know, with computers, we're generating more information all the time, including more and more classified information. And so part of the solution has to be technology. We have to devise technology that's going to allow us to prioritize, you know, that information that really does require protection and then accelerate the release of everything else. Now, the problem is everybody agrees - right? - that this research should happen and the government should put resources behind it. But after 10 years, it's still not happening. The government is still not conducting research in this area. All the money is still going to finding ways, you know, to create even more secrets and finding new ways of protecting them.

DAVIES: You know, I have to say, one of the more entertaining parts of the book is when you describe how you and some of these data scientists you have go to Washington with this method of analyzing documents through these algorithms. And you meet with several different federal government agencies to make this pitch that, you know, we can help. We can help with this problem here. You want to just kind of summarize the responses you got?

CONNELLY: Yeah. So my colleagues and I - we went to Washington, and we had a whole series of meetings. We went to the National Archives. We had a meeting with officials from the CIA, from the office of the Director of National Intelligence. We talked to people from the State Department. And over and over again, they all said the same thing. They said, this is great. We love this technology. We absolutely need to develop this and begin applying it. But unfortunately, we just don't have the funding. We don't have the budget to pay for it. And every time, what they said is, really, what you ought to do is talk to the people at IARPA, right?

And so we'd heard about IARPA. IARPA - you know, it's a little bit like DARPA, if you've ever heard of the Defense Advanced Research Projects Agency. You know, this is this outfit - DARPA, that is - that has a multibillion-dollar budget to develop, you know, new weapons, like secret weapons. So IARPA does the same thing for the intelligence community. And we also thought it made sense. And not only that, but in fact, the Obama White House had directed both DARPA and IARPA to begin supporting this mission, to begin, you know, deploying, you know, advanced technology to accelerate declassification. But unfortunately, when we finally got to IARPA and we had our meeting and we made our pitch, we found that the official there had no interest in developing technology for declassification.

DAVIES: Right. The reason in that case was kind of interesting. It was dollars and cents - right...

CONNELLY: Yeah.

DAVIES: ...But not a lack of funding. It was how much they wouldn't save because - well, you explain.

CONNELLY: Sure. So it turns out that IARPA, or at least the official we spoke with - they were interested in machine learning algorithms for managing national security information. But what they wanted to do was to develop algorithms that could automatically classify information, right? In a way, this is just the other side of the coin, right? The same technology could be used, you know, to identify information that didn't need to be classified or information that could be declassified and released. But the same official had zero interest in supporting that. And when we asked her - because, again, it was puzzling, right? - we asked her, well, why is that? I mean, after all, the White House has directed you to carry out this research. It's very much related to research that you're already doing or would like to do.

And the answer we got was that there would be insufficient return on investment. And I was a little puzzled by that because, you know, to my mind, the kinds of things we're talking about are priceless, right? We're talking about government accountability. We're talking about, you know, information the public needs to have to understand what the government does in our name. But what she was thinking about was how little the government spends on declassification, right?

So the government spends about $100 million a year on reviewing records to release them. That may sound like a lot, but the government spends $18 billion a year on classifying information and trying to protect it. So even if she developed perfect technology and the government had its own declassification engine so they could zero out that budget for reviewing and releasing documents to the public, all they would ever save was about $100 million. And for this official, that was chump change, and it wasn't worth her time.

DAVIES: So I guess the hope is that at some point, the government will take the methods and attack all of these classified records that are just building and building.

CONNELLY: I guess - I would say the hope is, you know, the government, you know, finally comes around and realizes that it has to develop a risk management approach to managing national security information. Everybody realizes - any engineer, that is - you know, realizes that you have to develop a technology that is going to, you know, work most of the time, right? If you try to develop a technology that is foolproof, then that technology will never be practical, right? So similarly, in this case, you have to develop methods for declassification that's going to allow us to manage the risk of information being released to the public that shouldn't be because the current policy we have now and the lack of technology being applied to these problems means that lots and lots of information is getting out. Some of it could be dangerous, right? And loads more information, orders of magnitude more, is not being released to the public, important information that we need to do our job as citizens, right? So the rational approach to this is to try to balance these risks and manage them.

DAVIES: Well, Matthew Connelly, thanks so much for speaking with us.

CONNELLY: Thank you, Dave.

DAVIES: Matthew Connelly is a professor of international and global history at Columbia University. His book, "The Declassification Engine: What History Reveals About America's Top Secrets," will be published February 14. Coming up, John Powers reviews "Saint Omer," the gripping new courtroom drama that's France's submission to this year's Oscars. This is FRESH AIR.

(SOUNDBITE OF LOUIS SCLAVIS' "FETE FORAINE")

Copyright © 2023 NPR. All rights reserved. Visit our website terms of use and permissions pages at www.npr.org for further information.

NPR transcripts are created on a rush deadline by an NPR contractor. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.