Too much digital dust obscures real threats

This is an archived article that was published on sltrib.com in 2013, and information in the article may be outdated. It is provided only for personal research purposes and may not be reprinted.

I have been following the recent news about the supposedly secret monitoring of phone calls and Internet use in the United States with amusement.

Back in the '70s, I helped implement a big data calibration system for the U.S. military. Because the data that we asked for was very carefully selected and minimized, we were able to successfully manage the data and significantly reduce costs.

But in many recent cases, the information seems to have become so great that the U.S. government can't use it until after an incident has occurred. A good example is the information that was collected before the 9-11 terrorist attacks. Some of the information seems to have predicted the terrorist attacks. But the data mass was so large that the hints were missed.

That is the problem with too much information. It is called dust in the data world. It obscures the important data. Software programers with experience in so-called "big data" are important to developing algorithms that can sift and sort the data and provide relevant information that can facilitate better decisions.

The problem now is that there appears to be so much data collected that the dust probably is hiding the important information. If only one or two people in the U.S. talk about or email hints about crashing a plane into a building, and 200 million other emails or phone calls are being monitored, then the chance of stopping the plan is incredibly small.

It would make more sense for the U.S. government to significantly limit the data that it tries to collect so that the important information can be recognized. Instead of big caches of data that allow a government to backtrack and find everyone who contacted a supposed terrorist by email in the past year, the request should be (with a court order) to start monitoring anyone that they contact.

If I do a Google search for an explosive or a gas pipeline route, I shouldn't create a lot of backtracking investigation of everyone who I contact or everyone who contacts me. (I have a degree in chemistry and I am active in municipal pipeline safety issues.)

Big companies that provide a lot of services on the Internet collect a lot of data. In many cases, the data collected can be minimized. But the U.S. government data collection effort seems to have no limit.

That is too close to a 1984 scenario where the thought police can zero in and arrest someone who does not agree with the government in power. I don't think that it will happen here, but it is a constant threat in other countries.

Instead of collecting all of the digital dust to try and prevent terrorist attacks against the U.S., we would be safer if the U.S. limited its collection of data to narrowly defined individuals who are less than 1 percent of telephone and Internet users. Otherwise we won't be able to see the plot until it has already happened and, in the meantime, Big Brother government could become a big bully government.

George Chapman started working with big data collection and analysis in the 1970s. He also worked in other engineering fields including military, nuclear and cell phone engineering.