FOREST ECOSYSTEM: GOODS AND SERVICES
Energy & Wetlands Research Group, Centre for Ecological Sciences, Indian Institute of Science, Bangalore, Karnataka, 560 012, India.
E Mail: tvr@iisc.ac.in; Tel: 91-080-22933099, 2293 3503 extn 101, 107, 113
Housekeeping tools
 

From the users viewpoint the ideal GIS should include enough functions perform all conceivable manipulations data. In practice, user needs comprise various tasks.

  • Database management system (DBMS)

  • Query language (QL)

  • User interface

  • Application function and programs

Database management system
Data is the name given to the basic facts and entities such as names and numbers. Data consists of a series of facts or statements that may have been collected, stored, processed and / or manipulated but not have been organised or placed into context. When data is organised, it becomes information. Information can be processed and used to draw generalised conclusions or knowledge.

Database can be defined as ‘A collection of structured data’. The structure of the data is independent of any particular application. A Database is a file of data structured in such a way that it may serve a number of applications without its structure being dictated by any one of those applications. A Database Management System is a computerised record-keeping system that stores, maintains and provides access to information. A Database system involves four major components Data, Hardware, Software and the Users.         

The Database Models that are been used to organise and represent the data are

  • Hierarchical Database Model,

  • Network Database Model and

  • Relational Database Model.

 The Hierarchical Database Model uses one to many relationships. The parent-child relations are employed here. This model is easy to understand and easy to update and expand. The disadvantage of this model is that large memory is required and at times certain attribute values should be repeated, which results in redundancy, storage and access costs. This software uses this database model only.

 The Network Database Model uses many to many relationships. The attributes are linked from one place to another. These are interlinked within each other. The attributes can be retrieved from another place also. The entity can have more than one parent. One member can belong to more than one relationship. The Hierarchical and Network models are conceptually simple but while implementing they appear to be complicated in giving the interrelationships.

The Relational Database Model uses relations to store the data. A relational database is a collection of tabular relations, each having a set of attributes. The data in a relation are structured as a set of rows and is called as ‘tuples’ consisting of a list of values, one for each attribute. An attribute has a domain, associated with it, from which its values are drawn.

A Database Management System  (DBMS) is a program that allows users to define, manipulate and process the data in a database, in order to produce meaningful information. DBMS is a collection of programs that enables one to store, modify and extract information from a database. There are many types of DBMSs, ranging from small systems that run on personal computers to huge systems that run on mainframes.

Functions of a DBMS:

  • To store data

  • To organise data         

  • To control access to data

  • To protect data

Advantages of DBMS:

  • DBMS is not only effective for generating and maintaining a wide variety of routine management and operating reports, but also adaptable to meeting the new and emerging requirements of management to answer a myriad of “What if?” questions.

  • Data elements can be structured in a manner more suitable to their application, allowing their retrieval with a minimum effort.

  • DBMS keeps redundancy of data elements to a minimum.

  • Application programs are independent of the changes in the database, so that their maintenance is kept to a bare minimum.

  • It gives a clear picture of logical organisation of data set.

  • It provides data protection not only for accessing one database record at a time, but also for preventing database access by unauthorised personnel.

  • It provides centralisation for multi-users.

  • It provides data independence.

  • It monitors database performance.

  • Centralised data reduces management problems.

  • Data redundancy and consistency are controllable.

  • Program-data interdependency is diminished.

  • Flexibility of data is increased.

  • Reduction in data redundancy.

  • Maintenance of data integrity and quality.

  • Data are self-documented or self-descriptive.

  • Avoidance of inconsistency.

  • Reduced cost of software development.

  • Security restrictions.

  • Application programs are independent of structure of DB.

  • Application programs share the same data.

  • New programs are easier and cost less to implement.

Normalization: Normalization is a process that involves eliminating problems by decomposing the relation into two or more relations without loss of information. It is the procedure by which a relation in one normal form can be replaced by a set of relations in some more desirable form. It is the process of successive reduction of a given collection of relations to some more desirable form. This process is reversible and information preserving. The Normal Forms are the set of rules that are to be followed in decomposing of relations.

 

1NF: A relation is said to be in 1NF if and only if all underlying domains contain atomic values only. Generally every relation is in 1NF.
Every relation is said to be in 1NF.
For example, consider the following relation in which the supplier number (s#), status, city, part number (p#) and quantity of parts supplied is given.

S#

status

      city

P#

 qty

S1
S1
S1
S1
S1
S1
S2
S2
S3
S4
S4
S4

20
20
20
20
20
20
10
10
10
20
20
20

London
London
London
London
London
London
Paris
Paris
Paris
London
London
London

P1
P2
P3
P4
P5
P6
P1
P2
P2
P2
P4
P5

300
200
400
200
100
100
300
400
200
300
300
400

Though the above relation in 1NF, we have certain problems called as ANOMALIES in handling the relation. To overcome the problems we decompose the relation into two relations without loss of information.

2NF: A relation is said to be in 2NF if and only if it is in 1NF and every non-key attribute is fully dependent on the primary key.

For example, consider the following relations, which are obtained after decomposing the first relation. These are in 2NF.

 S#

 status

     city

S1
S2
S3
S4

20
10
10
20

London
Paris
Paris
London

S#

P#

  qty

S1
S1
S1
S1
S1
S1
S2
S2
S3
S4
S4
S4

P1
P2
P3
P4
P5
P6
P1
P2
P2
P2
P4
P5

300
200
400
200
100
100
300
400
200
300
300
400

Though some of the anomalies are rectified in this decomposition, some more anomalies are still present. To resolve those anomalies, we decompose the second relation further.

3NF: A relation is said to be in 3NF if and only if it is in 2NF and every non-key attribute is non-transitively dependent on the primary key.

For example, consider the following relation, which is in 3NF and is obtained by decomposing the second relation.

S#

     city

S1
S2
S3
S4
S5

London
Paris
Paris
London
Athens

    city

Status

Athens
London
Paris

30
20
10

These relations are obtained by decomposing the second relation.  These relations are in 1NF, 2NF and 3NF and free of all anomalies.
Consider the following relation, which is in 3NF.

S#

  major

  fname

100
150
200
250
300

Maths
Psychology
Maths
Maths
Psychology

Cauchy
Jung
Rieman
Cauchy
Pearls

BCNF: A relation is said to be in BCNF if and only if it is in 3NF and every determinant is a candidate key. The relations designed in this software are normalized to this level.

Though the above relation is in 3NF, it has some anomalies still. One faculty member can teach only one major and at the same time, one student studies one major only. If we delete the student information, the faculty member’s information is also deleted.

To resolve the anomalies, we decompose the relation into two relations. For example, consider the following relations.

S#

Adviser

100
150
200
250
300

Cauchy
Jung
Rieman
Cauchy
Pearls

Fname

Major

Cauchy
Jung
Rieman
Cauchy
Pearls

Maths
Psychology
Maths
Maths
Psychology

These two relations are in 3NF but not in BCNF. One student may have more than one major. The following relation is both in 3NF and in BCNF.

 

Sid

      major

     activity

100
100
100
100
100

Music
Accounting
Music
Accounting
Maths

Swimming
Swimming
Tennis
Tennis
Jogging

It is in BCNF as it is an all-key relation.

4NF: A relation is said to be in 4NF if and only if it is in BCNF and if all the multivalued dependencies are the functional dependencies.

For example, consider the above relation. It is in BCNF but not in 4NF. We decompose the above relation to reduce the anomalies and to bring it to 4NF.

  Sid

    Major

100
100
100

Music
Accounting
Maths

  Sid

   Activity

100
100
100

Swimming
Tennis
Jogging

These relations are in 4NF and BCNF.
5NF: A relation is said to be in 5NF if and only if it is in 4NF and every join dependency is satisfied.
For example, consider the following relation. It is in 4NF but not in 5NF due to join dependency.

Emp number

Item
number

Customer
number

17
17
19
19

4014
4019
4014
4014

1002
1003
1003
1003

This relation has the employ number, who sold the item with that particular item number to a customer with a particular customer number.
This relation is decomposed into three independent relations to bring the relations into 5NF. The relations are as follows.

EmpNo

ItemNo

17
17
19

4014
4019
4014

ItemNo

CustNo

4014
4019
4014

1002
1003
1003

EmpNo

CustNo

17
17
19

1002
1003
1002

These relations are in 5NF.
Database management systems specialise in the storage and management of all types of data including geographic data as dealt in the introduction. DBMSs are optimised to store and retrieve data and many GISs rely on them for this purpose. By using simple storage structures in standard DBMS, the basic data model and applications become less dependent on each other.

Distributed database: Distributed database are specialised decentralized solution. A system with a distributed database comprises several database on different computers closely integrated with the assistance of a network and treated as one unit. The users experience this as if they are working against one database.

 

Database for map data: Database for digital map data should be able to manipulate records of varying length efficiently. For example the length may vary considerably, resulting in a corresponding variation in the number of coordinates entered. A database system should reflect geographic reality by such means as requiring that data on object of the same type, such as the lines forming a property boundary, be stored in the close proximity in the database, to speed up the response.

Partitioning and Indexing: We have ascertained that spaghetti data require a long search time since the data are stored in a relatively casual and unconnected sequence in this file. The time used to search for and retrieve topological data is also governed by the way in which the data are structured for storage. A rational data structure will reduce the storage volume. Special techniques have therefore been developed for dividing and structuring data.

 Generally, map data are stored in map sheets or other geographical units, but storing map sheet data in single sequential files lengthens the response time. This has resulted in some GISs employing indexing to speed up the searching process, and enabling current map sheets to appear on screen almost immediately. Indexing specifies locations, so map sheets are divided into sections which are distributed in such a manner as to accelerate the search. For example, zooming focuses on data in those sections relevant to a selected area and ignores the remainder of the map sheet.

The use of traditional hashing techniques and trees makes it very difficult to handle divided areas that overlap. However, routines have been developed that can handle overlapping data relatively efficiently. These have also been implemented for object-oriented solutions. In recent years, more powerful and rapid hardware has made it easier to use simple data structures for storage, but many GISs still use different “smart” solutions to obtain rapid access to data stored on the disk.

No current database system or structure completely fulfils the needs of database applications. There are grounds for suspecting that the excessively complex and voluminous data collections of many GISs may be ascribed to the databases employed. It goes without saying, then, that further database development is in order. One goal might be to develop better object-oriented database systems.
Structured Query Language: The simple structures of relational database systems have permitted the development of standard query languages, one of which is Structured Query Language (SQL). SQL gives users access to data in relational DBMSs by describing the data they may wish to see. SQL also allows users to define data in a database and to manipulate those data. Additional functions that SQL supplies to relational databases are very useful for many GIS applications.

Relational algebra may be performed using two classes of storage and retrieval operations. The set operations include union, intersection, difference, and product. The relational operations include selection (accessing rows), projection (accessing columns), joining, and dividing. Relational joining links tables and creates a new table from data retrieved from various tables. The new table need not be stored physically in the database.

There are six logical operations in SQL:

  • =  Equal

  • <> Not Equal

  • < Less than

  • > Greater then

  • £ Less than or equal

  • ³ Greater than or equal

There are five aggregate functions:

  1. The total of all rows, satisfying any conditions, of the given column, where the given column is numerical

  2. The average of the given column

  3. The largest figure in the given column

  4. The smallest figure in the given column

  5. The number of rows satisfying the conditions

     Most GIS users have developed application programs with various human—machine interfaces. SQL is used most frequently in searching, although other query procedures are also used. Complex GIS functions such as data search within specified rectangles or circles, creation of buffer zones, and overlay require operations that are not implemented in standard SQL. However, several suppliers of GIS software have developed special SQL dialects. This applies in particular to systems that use relational databases for storage of both geometry and attributes.

Organization of Data Storage Operations

 

Software systems often organize data to ensure effective use. Such organization may involve various logical paradigms concerning the grouping of object types and the divisions of geographical areas. The physical limitations of system file capacities may also be a practical reason for thematic and geographic divisions. A list of all maps in a system, organized by location and theme, forms a map library, from which the user can select the map he or she needs and store it in the workspace of the computer.

Thematic layers: Data in most GIS are organized in layers (levels), much like the overlays of conventional mapmaking. Similarly, individual data layers are stored in individual data files. These layers may contain object types intended to be processed together, such as points in one layer, lines in another, and polygons in a third. Alternatively, the individual data layers may be organized by theme, perhaps one layer for topography, another for property boundaries, others for roads or types of land use, and so on. Furthermore, each layer may contain subsidiary layers in a hierarchy. Thus a layer for roads might encompass subsidiary layers for national, county, urban, and private roads.

     Collecting logically similar objects can reduce the amount of data required to describe an individual object. Objects that represent several themes, such as lines that are simultaneously roads and property or land-use area boundaries, may be collected in one layer. The line geometry of that layer may be transferred to other layers as needed. Objects that are updated frequently or from the same source of information may also be collected in a single layer to facilitate updating work. The cartographic effects of plotting are frequently dependent on the sequential plotting of layers containing like objects.

     The separation of data into layers may seem analogous to the traditional separation of map information into overlays, and therefore not always a realistic data model of reality. One of the reasons for this layered storage is that many earlier systems have not been able to store overlapping polygons in the same layer. Today, however, there are systems that can handle this problem. These GISs circumvent “map overlay thinking” by being more object oriented; that is, each object is manipulated as an independent entity with regard to both its geometry and its attributes.

Partitioning the area: Many GISs have facilities that will divide surfaces to promote efficient storage, use, and updating of data. Individual surface segments are then stored in individual files, division by map sheets being the most common. Some GISs support divisions of data structures into projects, each of which may then be further divided into subprojects.

The manipulation of data for a larger area often involves combining data for their constituent segments. This is done either manually by the user or automatically by the system. As many GISs are seamless (i.e., data need not be regarded as belonging to fixed map sheets), though stored data may be divided into map sheets, which in turn may be divided into cells in grids .

Users must select the most suitable elements for storing data, such as the map sheet sizes and area divisions. Choice is vital for two reasons. First, the organization of data storage elements can have a considerable influence on the efficiency with which data are been used. Second, once the storage elements are chosen and data have been stored, restructuring to other storage elements is extremely complicated.

 

Editing Attribute Data: Like digital map data, attribute data must be edited and corrected. These operations include error correction, updating, and amending. The editing tasks may be carried out by using standard editing tools, such as those available in word processing, or specific GIS commands. Some GISs use SQL (Structured Query Language) to manipulate attribute data in relational databases. Specific GIS commands include commands for changing object thematic codes and switching codes between objects, as well as for editing thematic codes containing texts.

     The guidelines for entering data may change with time, mandating changes in the codes of older data. Usually, common mathematical signs as +. -, * and / are used for this purpose. The currencies in which prices in attributes are expressed may be changed [e.g., from U.S. dollars ($) to Indian rupees (Rs.)] by entering an exchange rate. Relational and other databases used to store attribute data usually incorporate effective editing tools. These permit a variety of operations, including searching for members of a prescribed class and then editing one by one, or assigning new values to an entire class using a single command.