- Ensuring that
it always conforms to its definition.
- Validating the stored data and the input data
- Controlling the execution of update processes ensuring proper authorization,
controlling concurrent update and synchronizing update of multiple copies.
Database quality can be threatened by erroneous input or improper update
actions. Even if good quality control procedures exist for input data,
undetected errors can propagate, gradually degrading the quality of
the stored data. Every DBMS retains some information regarding the structure
and format of the stored data. The system uses this information to properly
interpret and process the stored data.
Users also use this information to properly interpret the data and to
establish what to expect from the system. It may seem obvious, but,
a system should ensure that the stored data always conforms to its definition
the definition as understood by the users. It is a grievous mistake
for a system to permit the definition of certain data characteristics,
such as alpha or numeric type data fields, but not ensure that all stored
values conform to the declared type. This will undoubtedly lead to a
loss of user confidence in a system, or unwilling tolerance of its shortcomings.
Whether a system
stores a sketchy database definition or a comprehensive definition is
immaterial here. A skeleton definition makes it easy for the system
to check for conformance. Users are responsible for testing additional
conditions which should be satisfied for the data to be valid. A comprehensive
definition means more work for the system but also means higher quality
data. This can instill greater user confidence in the database and its
Data validation means comparing data to an expression of what the data
should look like. For stored data, the definition represents validity
conditions. It can also include explicit validation criteria beyond
the normal size and type declarations. A more comprehensive definition
of the stored data provides bases for better quality control. In addition
to testing data for conformance to its definition, validation of input
data before it is used to update the database can increase the quality
of the database.
With a reduced database definition capability, input transaction validation
becomes relatively more important for database integrity. It is generally
easier and more efficient to validate input transactions than to continuously
monitor the database against a comprehensive database definition. This
may account for the greater emphasis placed on transaction validation
in practice. Nevertheless, it would be wrong to conclude that input
validation can be a substitute for monitoring against the database definition.
The database must still conform to its definition.
change the database can disrupt the information system by destroying
the quality of the database. Threats may result from multiple processes
attempting to update the same data concurrently, a runaway update process,
an incompletely debugged program, or an update initiated by an unauthorized
user. These threats suggest the need to control the development, cataloguing,
initiation and execution of update processes.
Various levels of update may demand different levels of control. Merely
adding data to a database is not generally as disruptive as changing
the existing data. Tighter controls may be needed on processes which
delete data, particularly whole records or files. Not everyone in an
organization should be permitted to freely update the database. Some
responsible authority must tell the system that is permitted to initiate
and what update operations, and the system must check every requested
update action to ensure that it is properly authorized.
The independent and uncontrolled execution of the concurrent update
processes can threaten the quality of the database. The solution of
allowing a process to lockout concurrent update processes can lead to
deadlock. Every multi-user DBMS must have some solution to the potential
of deadlock. Update synchronization is required when data is stored
redundantly, in multiple copies. Besides the obvious cost of additional
storage space, the major cost of data redundancy is in synchronizing
updates. These costs must be weighed against the benefits of increased
availability of data, faster response to requests for data and better
recovery with the redundant data