Subject: Information warehousing and the role of Librarian Dear LIS-Forum members, I think the news item appended below may be of interest in the context of Mr. Abdul Jaleel's mail. This appeared in the Hindu Business Line, couple of days back. While Mr. Jaleel mentioned information warehousing mainly in corporate/business environments, here is an example in the context of govt. information. I suppose we can talk about information warehousing in academic and research environment, entertainement, art, etc. areas too. We already have terms like digital libraries, electronic libraries, Internet-based libraries, etc. It will be useful to differentiate between these and an information warehouse, so that we are clear of the context of our discussion. Futher inputs from forum participants will be useful. Regards - T.B. Rajashekar raja@ncsi.iisc.ernet.in ----------------------------------------------- Warehousing Public data ----------------------- By Praveen Purushotham CDAC (Centre for Development of Advanced & Computing) will be developing and deploying a multipurpose, multilingual, multimedia, scalable information warehouse based on PARAM open- frame architecture for State-level govrnance of Andhra Pradesh. The project, expected to two years to become operational, and costing Rs. 4.8 crores. is part of the larger electronic governance package planned by Mr. Chandrababu Naidu for the State. The immediate objective of the project is to develop and deploy an Information Warehouse for assisting State officials in their decisionmaking and facilitating greater public access to information. Raw data in the form of the multi-purpose household survey (MPHS) conducted by the AP Goverpment in association with NIC (National lnformatics Centre) and Government land records, and so on is already avialable. The MPHS data will be stored in four OS platforms (Netware. NT,SCO UNIX and OS/2) under three databases Oracle, SQL Server and DB-2), working over a switched and segmented network. This is to be handled by a network file server with fast and high capacity backup hardware which can also act as one of the platforms for the RDBMS server. The warehouse requires large scalable servers and huge storage space along with intelligent tools and technologies to manage them. In this scenario, the PARAM Open-Frame,with its cluster of commodity multiprocessing servers with shared disk space, would probably provide the only viable alternative for creating a completely scalable and high performance information warehouse in terms of both data as well as processing power. C-DAC will provide a PARAM server of configuration scalable up to four RISC processors. The exact configuration will be worked out based on the tentative deployment schedule and data volumes. The Database server will be configured to handle about 20 GB of data initially. Scalable backup performance well beyond 10 giga-bytes/ hour and capacities up to 40 GB on each cartridge are achievable with backup hardware like Exabyte's Mamoth or DLT along with a Oracle Backup and Restore tool on PARAM. C-DAC and APTS (Andhra Pradesh Technology Services) will work toward designing a core object database framework with a village object and a person object. The attributes of the village object are a superset of the land records data with management information of a village panchayat. while the attributes of the person object are the superset of the MPHS data with information specified by the persons. Both objects will have a uniform set of methods for operating on their attributes. The framework will also provide standard interfaces for accessing the various attributes of the village object and the person object through simple attribute names. The application development will follow an object- oriented paradigm with an emphasis on distributed Web-based usability. All applications will provide standard messaging calls to the person object or the village object and declare the database scheme in a specified manner to be plugged into the warehouse architecture and enable extraction of data. C-DAC and APTS will be cooperating in developing and freezing the standards for application development in the Government. They will also invite other IT companies and the academia to contribute to this effort. The idea is to drive the application development into following the 'end-to-end' solutions philosophy. The MPHS suite of applications which are currently developed/proposed by APTS will be jointly evaluated for consistency and compliance with standards. One of the milestones of the project is the creation of an NII (National Information Infrastructure) centre. C-DAC will also be conducting a risk analysis to classify the applications as mission-critical and non-critical. A security plan for various aspects of deployment of hardware. system software. application software and database for the former is envisaged including clear documentation of sensitive fields, authentication methodologies and physical security. The former will be the focus of the disaster recovery plan. Information is one of the most valuable assets of any Government. When used properly, it can help planning and informed decision making, leading to a positive impact on the targeted group of citizens. The AP Information Warehouse will provide tools for the planners based on the MPHS online transaction database and MPHS datamart. Planners can design/develop new programs for specific target groups using tools available for datawarehousing provided in the information distribution system, village information system, revenue collection system etc. Data generated by these systems, when integrated into the datawarehouse. would offer a powerful tool for forecasting and modeling situations. This will allow the planners to leverage on the results for planning newer infrastructure and focussed development schemes. The warehouse forms the topmost layer of the software architecture, providing the foundation for information processing by creating a unified repository of subject-oriented. integrated, historical information for analysis. Datawarehousing is accomplished in an evolutionary, step-at-a-time fashion, in response to the changing demands of planners and decision makers. Warehouse data is integrated through the use of consistent naming conventions, measurement of attributes,encoding structures, physical attributes ofdat a, and so on, to enable decision makers to utilise the power of the information warehouse without being bothered by its complexity. Several applications can be Web-enabled by integrating them with advanced Web solutions. This enables easy access to information from the Information Warehouse. For instance, information for a village can be extracted from the warehouse and Web pages for the village can be created out of this. The Web pages for each village will be hosted at the district as well as the State-level enabling access to information about any village. People can not only access their own records through secure password but can also search for meaningful information in the Information Warehouse. For instance, a farmer can use a search engine for identifying the trends across the district in the land records metadata (storing crop details) before planting a particular crop to maximise his profits. The benefits derived - in terms of improved governance - would also act as an incentive to other States to follow the lead and extend the concept of electronic governance throughout the nation.