The National Security Agency is funding a “top secret” $60 million dollar data analysis lab at North Carolina State University which will scrutinize information collected from private emails, phone calls and Google searches.
“The Laboratory for Analytic Sciences will be launched in a Centennial Campus building that will be renovated with money from the federal agency, but details about the facility are top secret. Those who work in the lab will be required to have security clearance from the U.S. government,” reports the News & Observer.
The project was initially supposed to be revealed in June, but the scandal surrounding the NSA’s PRISM surveillance program prompted the university to delay the announcement, with faculty staff citing, “that bit out of The Guardian (newspaper) on NSA collecting phone records of Verizon customers.”
According to NCSU Chancellor Randy Woodson, the program will revolve around “making sense out of the deluge of data that we’re all swimming in every day,” although the university denies that it will be involved in “mass surveillance”.
However, according to an Associated Press report, the data lab will analyze information collected by the NSA’s new $2 billion dollar data center in Bluffdale, Utah, which is set to collect ”complete contents of private emails, cell phone calls, and Google searches, as well as all sorts of personal data trails—parking receipts, travel itineraries, bookstore purchases, and other digital “pocket litter.”
According to the AP report, the new data lab will help perfect technology that will “analyze that data for patterns identifying terrorists and other security threats.”
The announcement of the new data center coincides with a Washington Post report which reveals that the NSA “has broken privacy rules or overstepped its legal authority thousands of times each year since Congress granted the agency broad new powers in 2008.”
During a press conference last week, President Obama claimed that the agency was not “actually abusing these programs and, you know, listening in on people’s phone calls or inappropriately reading people’s e-mails.”
Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems.
The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.
The term is a buzzword, and is frequently misused to mean any form of large-scale data or information processing (collection, extraction, warehousing, analysis, and statistics) but is also generalized to any kind of computer decision support system, including artificial intelligence, machine learning, and business intelligence. In the proper use of the word, the key term is discovery, commonly defined as "detecting something new".
Even the popular book "Data mining: Practical machine learning tools and techniques with Java" (which covers mostly machine learning material) was originally to be named just "Practical machine learning", and the term "data mining" was only added for marketing reasons. Often the more general terms "(large scale) data analysis", or "analytics" – or when referring to actual methods, artificial intelligence and machine learning.