Data Mining for National Security: U.S. Government Programs by Michael Erbschloe - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

 

Introduction

This book provides an overview of data mining activities of the U.S. Government that focus on national security. The purpose is to preserve and disseminate that documentation. The editor is not attempting to copyright public documents.

Data mining enables corporations and government agencies to analyze massive volumes of data quickly and relatively inexpensively. The use of this type of information retrieval has been driven by the exponential growth in the volumes and availability of information collected by the public and private sectors, as well as by advances in computing and data storage capabilities. In response to these trends, generic data mining tools are increasingly available for—or built into—major commercial database applications.

There is no universally agreed-upon definition for the term “data mining.” Some definitions of the term are quite broad. For example, the Technology and Privacy Advisory Committee (“TAPAC”) of the Department of Defense defined data mining as: Searches of one or more electronic databases of information concerning U.S. persons, by or on behalf of an agency or employee of the government.

Authors of other reports use narrower definitions. For example, the Congressional Research Service (“CRS”) defines data mining as follows: Data mining involves the use of sophisticated data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. These tools can include statistical models, mathematical algorithms, and machine learning methods (algorithms that improve their performance automatically through experience, such as neural networks or decision trees). Consequently, data mining consists of more than collecting and managing data, it also includes analysis and prediction.

The General Accountability Office (GAO) defines data mining as the application of database technology and techniques—such as statistical analysis and modeling—to uncover hidden patterns and subtle relationships in data and to infer rules that allow for the prediction of future results. We based this definition on the most commonly used terms found in a survey of the technical literature. In the GAO initial survey of chief information officers, these officials found the definition sufficient to identify agency data mining efforts.

Federal agencies are using data mining for a variety of purposes, ranging from improving service or performance to analyzing and detecting terrorist patterns and activities. The GAO survey of 128 federal departments and agencies on their use of data mining shows that 52 agencies are using or are planning to use data mining. These departments and agencies reported 199 data mining efforts, of which 68 were planned and 131 were operational. The most common uses of data mining efforts were described by agencies

  • as  improving service or performance;
  • detecting fraud, waste, and abuse;
  • analyzing scientific and research information;
  • managing human resources;
  • detecting criminal activities or patterns; and
  • analyzing intelligence and detecting terrorist activities

The Department of Defense reported having the largest number of data mining efforts aimed at improving service or performance and at managing human resources. Defense was also the most frequent user of efforts aimed at analyzing intelligence and detecting terrorist activities, followed by the Departments of Homeland Security, Justice, and Education.

In addition, out of all 199 data mining efforts identified, 122 used personal information. For these efforts, the primary purposes were detecting fraud, waste, and abuse; detecting criminal activities or patterns; analyzing intelligence and detecting terrorist activities; and increasing tax compliance.

(Link http://www.gao.gov/new.items/d04548.pdf)