Subject:  Information warehousing and the role of Librarian

Dear LIS-Forum members,

I think the news item appended below may be of interest
in the context of Mr. Abdul Jaleel's mail. This appeared
in the Hindu Business Line, couple of days back.

While Mr. Jaleel mentioned information warehousing mainly
in corporate/business environments, here is an example
in the context of govt. information. I suppose we can
talk about information warehousing in academic and
research environment, entertainement, art, etc. areas
too. We already have terms like digital libraries,
electronic libraries, Internet-based libraries, etc. It will be 
useful to differentiate between these and an information warehouse,
so that we are clear of the context of our discussion.

Futher inputs from forum participants will be useful.

Regards

- T.B. Rajashekar
  raja@ncsi.iisc.ernet.in
-----------------------------------------------


Warehousing Public data
-----------------------

By Praveen Purushotham

CDAC (Centre for Development of Advanced & Computing) will be 
developing and deploying a multipurpose, multilingual, multimedia,
scalable information warehouse based on PARAM open- frame architecture
for State-level govrnance of Andhra Pradesh. The project,
expected to two years to become operational, and costing Rs. 4.8 crores. 
is part of the larger electronic governance package planned by Mr.
Chandrababu Naidu for the State.

The immediate objective of the project is to develop and deploy an
Information Warehouse for assisting State officials in their decisionmaking
and facilitating greater public access to information.

Raw data in the form of the multi-purpose household survey (MPHS)
conducted by the AP Goverpment in association with NIC (National
lnformatics Centre) and Government land records, and so on is 
already avialable. The MPHS data will be stored in four OS platforms 
(Netware. NT,SCO UNIX and OS/2) under three databases Oracle, SQL Server
and DB-2), working over a switched and segmented network. This is to be
handled by a network file server with fast and high capacity 
backup hardware which can also act as one of the platforms for the 
RDBMS server.

The warehouse requires large scalable servers and huge storage space 
along with intelligent tools and technologies to manage
them. In this scenario, the PARAM Open-Frame,with its cluster of
commodity multiprocessing servers with shared disk space, would probably
provide the only viable alternative for creating a completely scalable
and high performance information warehouse in terms of both data as
well as processing power.

C-DAC will provide a PARAM server of configuration scalable up to four 
RISC processors. The exact configuration will be worked out
based on the tentative deployment schedule and data volumes.

The Database server will be configured to handle about 20 GB of data 
initially. Scalable backup performance well beyond 10 giga-bytes/
hour and capacities up to 40 GB on each cartridge are achievable with 
backup hardware like Exabyte's Mamoth or DLT along with a Oracle Backup 
and Restore tool on PARAM.

C-DAC and APTS (Andhra Pradesh Technology Services) will work toward
designing a core object database framework with a village object and a
person object. The attributes of the village object are a superset of 
the land records data with management information of a village panchayat. 
while the attributes of the person object are the superset of the MPHS
data with information specified by the persons. Both objects will have a 
uniform set of methods for operating on their attributes.

The framework will also provide standard interfaces for accessing the 
various attributes of the village object and the person object
through simple attribute names.

The application development will follow an object- oriented paradigm with 
an emphasis on distributed Web-based usability. All applications
will provide standard messaging calls to the person object or the village
object and declare the database scheme in a specified manner to be plugged 
into the warehouse architecture  and enable extraction of data.

C-DAC and APTS will be cooperating in developing and freezing the standards
for application development in the Government. They will also invite other 
IT companies and the academia to contribute to this effort. The idea
is to drive the application development into following the 'end-to-end' 
solutions philosophy.

The MPHS suite of applications which are currently developed/proposed by 
APTS will be jointly evaluated for consistency and compliance
with standards. One of the milestones of the project is the creation of an
NII (National Information Infrastructure) centre.

C-DAC will also be conducting a risk analysis to classify the applications 
as mission-critical and non-critical. A security plan for various
aspects of deployment of hardware. system software. application software 
and database for the former is envisaged including clear documentation
of sensitive fields, authentication methodologies and physical security. The former
will be the focus of the disaster recovery plan.

Information is one of the most valuable assets of any Government. When used
properly, it can help planning and informed decision making, leading to a
positive impact on the targeted group of citizens.

The AP Information Warehouse will provide tools for the planners based on 
the MPHS online transaction database and MPHS datamart. Planners can 
design/develop new programs for specific target groups using tools
available for datawarehousing provided in the information
distribution system, village information system, revenue collection system 
etc. Data generated by these systems, when integrated into
the datawarehouse. would offer a powerful tool for forecasting and modeling
situations. This will allow the planners to leverage on the results
for planning newer infrastructure and focussed development schemes.

The warehouse forms the topmost layer of the software architecture, 
providing the foundation for information processing by creating a
unified repository of subject-oriented. integrated, historical information
for analysis. Datawarehousing is accomplished in an evolutionary,
step-at-a-time fashion, in response to the changing demands of planners
and decision makers.

Warehouse data is integrated through the use of consistent naming 
conventions, measurement of attributes,encoding structures,
physical attributes ofdat a, and so on, to enable decision makers to 
utilise the power of the information warehouse without being bothered
by its complexity.

Several applications can be Web-enabled by integrating them with advanced 
Web solutions. This enables easy access to information
from the Information Warehouse. For instance, information for a village
can be extracted from the warehouse and Web pages for the village can
be created out of this.

The Web pages for each village will be hosted at the district as well as
the State-level enabling access to information about any village.
People can not only access their own records through secure password 
but can also search for meaningful information in the Information
Warehouse.

For instance, a farmer can use a search engine for identifying the trends
across the district in the land records metadata (storing crop
details) before planting a particular crop to maximise his profits.

The benefits derived - in terms of improved governance - would also act 
as an incentive to other States to follow the lead and extend
the concept of electronic governance throughout the nation.