Fixing faulty data is challenging. Even if the data are made consistent among the data sets–meaning that under standardized conditions, the same numerical value of a chemical property is found no matter what database is consulted–it is difficult to ensure that each data entry is correct.
Consistency Not Sufficient
To understand the nature of the data inconsistency problem and the data error problem, imagine that in one database, the price of a quart of milk is listed as $10 million and in a second database, the price of a quart of milk is listed as $5. Officials responsible for establishing consistency between the two databases meet and subsequently revise the two databases, but now in each database, the price of a quart of milk is listed as $15,000. There is certainly consistency–both databases yield the same figure–but the result is wrong, as a quart of milk certainly does not cost $15,000.
It is the accuracy problem with the data entries in disseminated databases and models that ultimately will need to be addressed once the problem of internal consistency is resolved. Many, if not most, of the data entries in the disseminated databases and models are not well-established.
Consequences Are Serious
In the real world, the use of poor-quality data makes a significant difference. For example, the U.S. Chamber of Commerce asked Cambridge Environmental to estimate the cost difference in the cleanup of PCBs at a contaminated Puget Sound site. Using different databases for the cleanup of PCBs resulted in two vastly different cleanup cost estimates of $7.5 million and $55 million.
Addressing the problem requires developing an agreed-upon standard methodology for critical data review and then applying it across the board to all data. Assembling a federal interagency group to look at the problem would be appropriate, as the collective intellectual expertise lies among numerous government agencies, not just within the Environmental Protection Agency.
— William L. Kovacs