Post 613 –by Gautam Shah


A document is a self-sufficient, but unitized set of information. It is a meaningful entity because its contents have some logical order and interrelationship. The word Document originates from Latin word Documentum = lesson or teachings. A document is for preservation (recorded or storage) and for representation. Documents become reliable primarily through their date identification and secondarily by the content. Documents offer evidence of intentions and reports of activities.


CERN datacenter with World Wide Web www and mail servers 2010 > Wikipedia image by Hugovanmeijeren

Documents have many forms: Tablets (clay, stone, wood, etc.), inscriptions, scrolls, books, articles, reports, records, letters, movies, photograph albums, cassettes, disk drives and solid state devices. Documents are created for immediate communication or stored for future access.

Documents are formatted information. Here in-forming implies that a form is impressed onto -a medium. The formatted expression (words, symbols, representational graphics or doodles) on a medium, for the purpose of communication or storage is less likely to get lost with time. The forming mediums are physical, such as: paper, magnetic tape, etc. and formatting tools are: languages, images, graphics, metaphors, etc.


Storage of IBM Punched cards 1959 > Wikipedia image

The medium as estate or space for storage is costly or rare and the required effort is extraordinary, so the information for recording or communication is abridged through processing. With every process of expression, perception, recording and retrieving, etc. the content of a document may get corrupted. The Information originator accessing own records at some other time-space level cannot revert to the original physical and mental state, and re experience or reestablish the original. The communicated information manifests slightly differently, yet it is a reliable ‘knowledge transmission process’.

Traditional documents have linear or sequential arrangement of information. The access is generally sequential, or through preset strategies like: keywords, summaries, content lists, indices, etc. A card catalogue is pre sorted listing. Another method of facilitating access was to place sub sections of the documents in loose sheets held together by a thread (French=fil), wire, or metal-rod as a folder. Document identity was made on projecting tags, coloured edges or notched pages as employed in telephone or address books and account ledgers.


Film Archive storage Flickr image by DR-Byen DRs Kulturarvsprojekt

Very large databases such as police records, telephone directories, library records, however, are difficult to access quickly through cards. Mechanical punched card reader systems were used for reading the information and accordingly reposition (sort) the card. The language of punched and non punched locations not only made the information transmission faster and faultless, but repeatable. Later such systems allowed execution of commands through information on punched cards.


Card catalogue can contain such information but with online processes these are replaced by databases that are digitally searchable Wikipedia image by Tomwsulcer

Documents are stored at a place and in a manner where these can be accessed. Reports or documents are stored with many other similar documents. All storage arrangements have some degree of classification system.

FIRST or the basic classification is the order of arrival. This by itself though provides little meaning, but for administrative handling it shows order of arrival, what is new (-and so latest), and what is old (possibly redundant). For this purpose documents are either, time-date stamped or given a sequential identifier (a chronological number -numeric, alphanumeric or alphabetical).

SECOND classifications for administrative relevance are the size and nature of the document (book size, number of pages, bytes or MBs of data).

Document storage

THIRD relates to the name of the document. Documents have primary title as provided by the author (or the publisher), which could have personal relevance, and so in addition can have a ‘technical titlemeant to explain the content or theme of the document. These additional titles can be longer. Digital documents such as computer files or internet file protocols have abridged (or expanded) titles which include search characters, numbers, words or keys.

Many documents often have identical titles, and so can be distinguished by various appendages such as author’s name, publisher’s name, date of publication or arrival in storage system. Computer file system and internet site address protocol use the extension codes for the same purpose.

FOURTH classification concerns to title-s provided by the author, librarian or storage handler. These are usually of two to three types or tiers. The main title broadly describes the contents and sometimes the purpose of the report. Usually it is of more than one word long, and often runs for two to three lines or sentences. Main title distinguishes the document from such reports dealing with similar or parallel subjects. Main title to the report is specific and should never be a general one.


Markham Stouffville Hospital Library > Wikipedia image by Raysonho @ Open Grid Scheduler / Grid Engine

For example Study of lighting in Interiors is a non specific title, because lighting in interior could be natural, artificial, mixed, direct, reflected, borrowed, even, spot, day, night, evening, purpose related or general illumination. Interiors could be residential, public spaces, commercial, or industrial. Unless the report covers all these, a specific title could have been Study of day time artificial lighting needs in industrial interiors, or Study of lighting in terms of its effect on the perception of heights in interior spaces.

FIFTH classification range is the identity of the author (or editor, compiler). If the author is well known, certain level of content and quality can be presumed. And for this reason a brief note on the author, or reference-links to other works is included.

SIXTH classification present’s document’s relevance to other fields of knowledge. The contents of documents often refer to two or more distinct branches of knowledge. The authors fail to mention such inclusions in main or other tiers of the title. These classifications may include an abstract, a brief description, excerpt or summary. Such short descriptions are also used for primary dissemination of information, and function as a mini document.

SEVENTH classification range derives from the parts of the document. An index and table of contents, show the sequence, size, placement of sub-parts of the document. The sections, chapters and paragraph headings, other media presentations (photographs, illustrations, audio-video clips, links to other chapters, references to other documents, internet links to other resources), provide some idea about the contents.

Topics that are dealt at lower levels, i.e. at sentence or paragraph level may not be adequately covered. A Glossary of key words or terms provides an ideal reference for the sub topics. Internet search engines and research institutions draw out such keywords, and add them to their master data base of terms. The database not only provides reference as to the location of terms but also their context.


The format of a document has completely changed with modern day electronic multi tasking capability and multi media capable systems. Terms like Index, Glossary, list, appendixes were indicative of physical placement of various categories of information. Once these physical locations were difficult to access. Digital media allows interactive presentation formats in audio, video, virtual reality, etc. Hypertext has become a tool for interactive access system. Documents in other storage devices located at different geographical locations are accessible.




by Gautam Shah ➔

Library -Alexandria

A document is a self sufficient or a unique collection of specific information or structured data, which can be stored, retrieved or communicated. Any such lot, when referred to, provides the intended information.

Hamurabi Documents

Documents have many different formats depending on what they carry, and how these are to be stored, retrieved or communicated. Traditional documents, in written, pictorial or graphical form, are the most common units of communication.


Paperwork Bureaucracy Work Aktenordner Office

A document serves many different purposes, it describes, defines, specifies, reports or certifies activities, lists the requirements for performing activities or mentions results of activities.


Letters, reports drawings, specifications, procedures, instructions, records, radio-graphs, computer tapes and disks, purchase orders, invoices, process control charts, films, microfilms, photograph etc., are all examples of documents.

Document storage system

When information manifests as data, a document comes into being. A document carries many identifiers such as:

  • time (of origin)
  • size (of storage, transmission time & effort)
  • content
  • place of origin
  • place of a destination
  • affiliations
  • embedded codes
  • signs, symbols
  • language
  • style
  • mode of communication
  • extent of exposure
  • limits and conditions of relevance
  • It is through these type of identities that a document begins to be relevant or worthy of access.


In modern terminology, information lots or documents are called Files, because it helps in identifying the contents. A filed information or data lot has: a title, a list of contents, description of contents and the mass of contents. Additionally it occupies a space, so size and the birth context (date/ time/ location/ other circumstances of origin). Beyond these primary endowments, a file may be given many levels of attachments (references).

Disc Storage

A Simple data file may contain several sub entities, each of which may be allocated a specific physical space. A complex file may have varied or standard size of pre-set space allocations. Alternatively a Start and/or End marker (fixed or floating) may separate file partitions. Filters decide which of the entities are to be allocated a free or variable space. Data entities are invariably accompanied by their titles or identifiers.

Neural Network


Data entities in a file remain static or are changeable. Conditions that cause a data to remain static or be variable could be external or are internal. The internal conditioners, titles and filters are inseparable parts of information files. In Static files, the structure remains unaltered even while data entities are changed. The meaning deriving out of the file, however, may change. In Dynamic files the structure of the file gets altered along with the change in data entities. Static files are easy to process, but cannot provide qualitative information. Static files usually contain data that is mathematical or substantially logical. Dynamic files are difficult to process and provide little quantitative information. Dynamic files contain data that is generally textual or metaphoric.