INFORMATION RETRIEVAL SYSTEMS
Post 470 by Gautam Shah
Information is stored sequentially or randomly. Majority of digital storage devices operate on the later mode. Non digital devices like books, ledgers, etc. and traditional storage systems such as tapes, films, etc., worked on sequential storage modality. One-time write or non-erasable systems have data that is never rearranged so it remains stored in a sequential manner. The manner of retrieval is substantially characterized by the method of storage.
Some of the techniques of information retrieval are:
Reference retrieval systems, references a document rather physically or digitally store the actual documents. Such systems, provide all ‘pre-tagged’ information about the documents, including their physical location and availability. Singapore port uses such a system to manage ship-containers’ location, arrival, dispatch, etc. Courier companies, let one check the status of a document, on-line. Companies organize their procurement strategies to minimize the cost of storage (warehousing).
Database retrieval Systems treat components of a document as a database. Such components could be linguistic characters, words, sentences, etc. This system is suitable where data is structured in various categories. Word processor programmes treat text documents (prosaic or poetic) as a database. These allow spelling and grammatical checks, replacement of characters, words or strings of words. It also checks the quality of language, word count, etc.
Hyper-text retrieval system. In this method, documents that are related by concept, sequence, hierarchy, experience, motive, or other characteristics are connected by establishing a relationship or through embedded ‘hyperlinks’. Variety of documents such as text, numeric, audio-video recordings, graphics and images, can be interlinked. From a document one can access other documents, as is done in a digital encyclopaedia such Britannica or for internet navigation.
SGML Standard Generalized Markup Language is a system for encoding electronic texts so that these can be displayed on any desired system and format. It takes advantage of standard text markers used by editors to pinpoint the location and other characteristics of document elements (paragraphs and tables, etc.). It draws semantic relationships (relating to meaning in language or logic) from a body of text. SGML is often supplemented by other syntactic techniques (arrangement of words and phrases to create well-formed sentences) to increase the precision.
Indexing Spatial Data: In indexing spatial data such as maps and astronomical images, the textual index specifies the search areas, each of which defines a territory or a spatial entity such as a triangle, rectangle, irregular polygon or circle, cuboid, spheres, etc. These spatial attributes are then used to collate or extract and present the image. Often other external attributes such as orientation, colour (normal, infra red, night vision), angles of view (perspective) etc. are applied to enhance or to de-augment the image. Indexing of spatial data is often layered which can be collated as desired, such as in BIM (Building Information Modelling) files and can be linked to word, databases or spreadsheet like formats.
Image analysis and retrieval: The content analysis of images is accomplished by two primary methods: image processing and pattern recognition. Image processing is a set of computational techniques for analyzing, enhancing, compressing, and reconstructing images. Pattern recognition is an information-reduction process: the assignment of visual or logical patterns to classes based on the features of these patterns and their relationships. The processes of pattern recognition involve measurement of the object to identify the image, distinguishing the attributes, extraction of features for defining attributes, and assignment of the object to a class based on these features. The processing and pattern recognition, both have extensive applications in various areas, including astronomy, medicine, radiography, 3G & 4G communications, forensic identification, industrial robotics, genetics, astronomy and remote sensing by satellites.
Pattern Recognition is a field in which observations being made are classified and described. It is one of the applications of artificial intelligence. If the information is in sets amenable to mathematical formation, it is analyzed as statistical information in what is known as statistical pattern recognition. This is sub-classified into disciplines such as feature extraction, discriminant analysis, cluster analysis and error estimation. The syntactical pattern recognition methodology carries out grammatical analysis and inferences. The pattern recognition methods are used in identifying data that is very complicated. Therefore, this identification system can fall in the group of algorithmic modelling’. Bar-code is one of the primary methods, but far more complex models are used to record genome, etc.
Speech analysis and retrieval: Here discrete sound elements are converted into alphanumeric equivalents. The alphanumeric data is subjected to content analysis like any other text. Sound data though contains many personal characteristics as well as acoustic features. Some of these are not distinct from one to another. The spectral sound converted to digital spectrographs is matched with sample data and also pre-stored patterns (such as the speech recognition, dictation, or phonetic order taking devices need to be ‘taught’ first). Often larger strings or ‘passages’ are checked to search and match a pattern. ‘The reverse process of digital to analog conversion is comparatively simple, but the quality of the synthetic speech is not yet satisfactory’. Sound analysis (speech or music processing) is complex, and requires high computational power and storage capacity. But someday it will offer instant translations, synthetic songs and new techniques of machine (robotics) interactions.